Four methods to deal with missing data, applying method with pictures

 Drag

In MS Excel, drag and drop is a substitute to Copy and Paste. It is a way to move or repeat data. The option of Drag and Drop in Excel is also name as “Fill Handle”. The icon of plus (+) occurs when the arrow is moved to the right side bottom of a selected cell. By this option we also drag the missing values in data sheet to solve the problem of some missing values. The picture below shows this option.

Average

We’ve been working with example survey data that has missing values. One way to deal with these missing values is just to eliminate any observations that have missing data. But because of the negatives related to that, we might want to keep all of the observations and just try to fill in the blanks, so to speak. The important thing here is to provide the fake data so that our model can get closer to what's actually true. One common method is to replace the missing values with a predefined statistic, the mean, median, or mode. As a reminder to all of us, the mean is the average of the values. Technically, it's where we add up all the existing values and divide them by the number of values you have. The median is the midpoint in the data values. To determine this, you basically put all the values in order, and then select the value that's halfway between the first and the last values. The mode is the most common of the data values. This method is applying on minimum of missing values in data.


Ipolate method

In this article, I'll show you how we can easily fill missing value in our time series or panel data. We have different types of methods or tricks, so we can input all the missing cells on the missing values suppose the first one is mean imputation but researchers or good mean repetition is a not good idea. In this article, you will learn that how with a single command you can easily fill all your missing values in your one variable. Let’s start do it with Stata and I am using Stata 14. Importing my data file import Excel browse and select file- first row as variable name-ok done by all that is here first of all look at data you can see I have 113 observations with fully data in these observations four are missing. in the first column I put provinces names and in second column I put IDs, in third column I put years and in fourth column I put my variable.  Actually IDs are not necessary I copied this file from my panel data so that's why I put here IDs, so you should also put IDs. Because I will paste command of a missing value importation, it's good for you to keep same sequence in the other excel file. My variable name is tax revenue. So you can change according to your variable. Now you have seen there are four missing values. Now I would like to show how you can easily fill all the missing values. Actually you see mean imputation is not a good idea so how we can do it, you should do it with the linear imputation. This is our command:

 ipolate variable name year, gen(new variable) epolate by(id)

ipolate tax revenue year, gen(tax) epolate by(id)

Copy it and go to stutter and command paste here. ipolate variable name then here actually have a yearly data comma then space then gen and your new variable name then space epolate by and then bracket id. So you should keep same sequence of your variables in your excel sheet. Put a command in your starter so I am integer. Now we are seen again dots, you can see I have generate a new variable. If we see in the command look here again I directed the strata that generate a new variable with the name of new variable. We have a new variable and you can see there is a linear imputation and all the missing values are covered. This method is applying on less than five missing values in between filled values.




Compound growth rate method

With compound growth rate method we filled the missing values in data set. The formula of compound growth rate already has over in column E, but I'm going to break this formula out for you then I'll do it as one complex formula in Excel. Multiplying by beginning value then ending value divided by a beginning value 224.1 divided by B for the 224.1, the next is 1 divided by the number of periods. The base year does not count so one two three four five periods point in the formula. I'm going to take the beginning divided by the ending divided by the beginning but I'm taking it to the power of so that's going to be equals that was shift in the number four on my keyboard. This method tackle the problem of large number of missing data.

Formula:

Beginning value*(ending value/beginning value) ^1/4



 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Post a Comment (0)
Previous Post Next Post