An introduction to panel data analysis - Part of our Econometrics using Stata course delivered by Christopher Baum
What is panel data?
Panel data is a combination of cross-section and time series data. The simplest definition for panel data is when we have data which contains many individuals for many time periods. One example in macroeconomics is when we want to analyse the GDP of many countries for many time periods. Because of this flexibility, panel data analysis has applications in many different fields of social science.
Panel data can be divided into categories depending on the relationship between the number of the individuals and the number of the time periods. We have micro-panel data when, in our dataset, the number of the individuals (N) is much greater compared to the number of time periods (T). Macro-panel data, on the other hand, is when the number of the individuals (N) is approximately equal to the number of the time periods:
Micro-Panel Data: T<<N
Macro-Panel Data: TN
The advantages of panel data
Panel data has several advantages compared to cross-section and time series data:
- With panel data, the researcher can increase the number of total observations (N*T). A greater total number of observations will increase the level of the degrees of freedom (N*T-N-K). The collinearity among independent variables will be reduced too. The above two factors will improve the efficiency of our model (Hsiao et al., 1995).
- Panel data can create and investigate behavioural hypotheses. The effectiveness of social programs can be evaluated from panel models. The omitted variables bias in the cross-section data makes the estimators biased. However, the panel model can control this problem pretty well.
- Panel model has a dynamic version too. This version can help to detect dynamic relationships.
- In time series analysis, it matters whether our data are stationary or not. Panel data analysis overcomes this problem. Even if our data is not stationary, i.e. the mean and the variance is not constant over the time, the panel models can produce estimators where they remain asymptotically normal.
- The nature of the data in the panel models gives researchers the ability to identify a model which in the time-series or the cross-section analysis would be unidentified; hence we can overcome the measurement errors problem.
In order to estimate the estimators in a panel model, we can use one of the Pooled OLS, fixed effects or random effects techniques. Across panel individuals (and possibly across time) we can detect heterogeneity. In the case of heterogeneity, we must include in our regression specific factors for every individual in our dataset to use the pooled OLS. Even though we have included a huge number of specific factors for every individual, it is entirely possible that unobserved heterogeneity characteristics are still present. Hence, the Pooled OLS will generate biased and inconsistent estimators.
However, the fixed effects models and random effects models allow heterogeneity in our datasets. Let’s denote the Ui as an intercept varying over the individuals, but constant over time (the effects of unobserved heterogeneity between the panel unit). The panel model regression will become:
The interpretation of Ui will define whether we will use the fixed or random effect technique. If we treat the intercept as a parameter, we will use fixed effect, but if we interpret ui as a random variable, we will use random effect.
The next step in panel data analysis is to select between the fixed or random effect technique in order to keep our estimators consistent. We need to conduct a test which will extract whether the fixed or the random effect has consistent estimators under a specific probability level. This test is the Durbin–Wu–Hausman test or, as it is commonly known, the Hausman test.
Null Hypothesis H0: RE & FE estimator Consistent, but only RE is Efficient.
Alternative Hypothesis H1: FE estimator Consistent & RE estimator Inconsistent.
In cases where we are unable to reject the null hypothesis, the Random Effect technique is preferred, however, where we are able to reject the null, we will use the Fixed Effect technique.
More information about panel data analysis
The topics above will be analysed deeper and in more detail in Economics using Stata,
an interactive three-day seminar at Cass Business School, delivered by Professor Christopher F. Baum. The seminar will cover the most important topics of econometrics, including an in-depth look at panel data analysis, as well as how to estimate outcomes in Stata and the programming capabilities of Stata. The seminar will be split in 50% theory and 50% practice in Stata.