You learnt about some of the issues with raw data and understood the need for data cleaning. Now, let’s listen to anand to understand different cases in fixing the columns and rows of a given dataset.
Now let’s summarise what you learnt with the help of the checklists below. Make sure you correctly identify these issues and resolve them before moving on to the next stage of data cleaning.
Checklist for fixing rows:
- Delete summary rows: Total and Subtotal rows.
- Delete incorrect rows: Header row and footer row.
- Delete extra rows: Column number, indicators, blank rows, page number.
Checklist for fixing columns:
- if needed, merge columns for creating unique identifiers, for example, merge the columns state and city into the column full address
- Split columns to get more data: Split the address column to get state and city columns to analyse each separately.
- Add column names: add column names if missing.
- Rename columns consistently: abbreviations, encoded columns.
- Delete columns: delete unnecessary columns.
- Align misaligned columns: The data set may have shifted columns, which you need to align correctly.
Now, let’s listen to rahim to learn how to fix the columns in the bank marketing dataset.
You have seen in the above video that both heading rows have been deleted, as they have no use in our analysis. It is very important to note here that if you find anything irregular at the very glance of the data set then it is very essential to get rid of that at the very first process.
In the next video, rahim will explain how to fix the columns in our bank marketing dataset.
Now you have learnt to fix the following columns:
- Customerid: It has been dropped, as it has no specific use in the analysis.
- Jobedu: It has been separated to extract job and education. Job and education have to be analysed separately. You will understand in further sessions how education and job play a very important role in determining the customer segment who will respond positively to term deposits.
- Month: The month name will be extracted in the further segments based on the missing values imputation analysis.
In the next segment, you will learn how to treat the issue of missing values.
FREQUENTLY ASKED QUESTIONS (FAQ)