free lunch

the blog about nothing and everything

Wednesday, February 23, 2005

Dude where's my data?

I'm building an econometric model using time series of a set of price variables. The problem is of the 174 dates I have, I would sometimes have 1 or 2 missing observations for certain countries in my dataset. Now this statistical issue of missing data have many solutions for cross-section data but unfortunately not for time series data.

Ignoring the problem (a good solution to many problems) is not a solution, if you delete that line with missing observation in one country, you take with it the rest of the countries in that same date. Now that's really wasting information. Not to mention if it results in screwing up the original distribution properties of your data puts all your inferences and parameter estimates to doubt. Putting in the average value of the entire series for that missing observation won't solve it either, because that would bias the variance of the entire series downward. This might result in the destruction of the universe...just kidding.

Linear interpolation using before and after observations to come up with the value of the gap is also not sound, that's no different from guessing (what if that missing gap was the one time that the series went nuts and assumed a weird number?). A quick search of statistical journals have not yielded any easy solution either. A possible solution is to employ suggested techniques, that would multiply the number of estimation I will do by the amount of techniques I will try. All that work would just earn a footnote in the paper, since you're basically going to report the most reasonable estimates. All these problems when I'm supposed to be living in the age of information!

0 Comments:

Post a Comment

<< Home