Missing data is one of the most common problems that occur during the process of Data Cleaning or Exploratory Analysis. Before we learn about the ways to handle it, it is important to understand the different types of missing data.
Missing Completely at Random (MCAR): In this case, the missing values are randomly distributed across all observations. There is no pattern found in the missing data of any variable.
Missing at Random (MAR): In MAR, the data is not missing randomly across all observations but is found on the observed data.
Missing not at Random (MNAR): Not missing at random is when the missing data has a structure to it. The two possible scenarios can be that the missing value depends on the hypothetical value or missing value is dependent on some other variable’s value.
Here are a few things you can do to handle the missing data
Listwise deletion: When you’re sure that the data is missing randomly, then you can delete all the data with missing values without any substantial loss of statistical power.
Recover the values: This is possible only when the participants are available or can be contacted to fill out the missing values.
Average imputation: For this method, you can use the average value of the responses received and fill in the missing value. This method is not always recommended as it may change the variability of the data.
Educated guessing: As the name suggests, this method is purely based on assumptions. For any related questions, you can infer the missing value by deriving the most common response.
Almost every dataset has missing values and there is no single way to handle them. There are various methods to handle and more methods will evolve. Which method works the best for you?