Working with Panel Data
GVPT722
Today’s class
What are panel data?
Introduce intuition behind panel data
Introduce random effects
Worked example of fitting and interpreting a multi-level model
Looking at the same observations over time
Countries’ voting behavior in the UN in the post-Cold War period;
Countries’ propensity to go to war with one another;
Democratic backsliding across regions or countries;
A politician’s voting behavior;
An individual’s turnout behavior over all elections in which they are eligible to vote.
Looking at the same observations over time
More generally…
Actor 1’s behaviour in time \(t\) , \(t+1\) , …, \(t + n\) ;
Actor 2’s behaviour in time \(t\) , \(t+1\) , …, \(t + n\) ;
Actor N’s behaviour in time \(t\) , \(t+1\) , …, \(t + n\) .
Relationship between health and wealth
What is the relationship between people’s health and wealth?
Countries over time
# A tibble: 4,557 × 7
iso3c country region year gdp_per_cap log_gdp life_exp
<chr> <chr> <ord> <dbl> <dbl> <dbl> <dbl>
1 ZWE Zimbabwe Sub-Saharan Africa 0 565. 6.34 44.7
2 ZWE Zimbabwe Sub-Saharan Africa 1 569. 6.34 42.0
3 ZWE Zimbabwe Sub-Saharan Africa 2 529. 6.27 44.6
4 ZWE Zimbabwe Sub-Saharan Africa 3 474. 6.16 43.4
5 ZWE Zimbabwe Sub-Saharan Africa 4 477. 6.17 44.5
6 ZWE Zimbabwe Sub-Saharan Africa 5 471. 6.15 44.8
7 ZWE Zimbabwe Sub-Saharan Africa 6 441. 6.09 45.4
8 ZWE Zimbabwe Sub-Saharan Africa 7 425. 6.05 45.6
9 ZWE Zimbabwe Sub-Saharan Africa 8 352. 5.86 46.7
10 ZWE Zimbabwe Sub-Saharan Africa 9 762. 6.64 48.1
# ℹ 4,547 more rows
Relationship between health and wealth
Relationship between health and wealth
Modelling life expectancy over time
\[
Average\ life\ expectancy = \beta_0 + \beta_1Year + \epsilon
\]
(Intercept)
67.516***
(0.253)
Year
0.289***
(0.022)
Num.Obs.
4409
+ p < 0.1, * p < 0.05, ** p < 0.01, *** p < 0.001
Modelling life expectancy over time
Accounting for the structure of our data
Region-level intercepts
\[
Average\ life\ expectancy = \beta_0 + \beta_1Year + \beta_2Region + \epsilon
\]
Region-level intercepts
Country-level intercepts
\[
Average\ life\ expectancy = \beta_0 + \beta_1Year + \beta_1Country + \epsilon
\]
Country-level intercepts
Random effects: a brief introduction
\[
Average\ life\ expectancy = \beta_0 + \beta_1Year + \epsilon
\]
Where:
\[
\epsilon \sim \mathcal{N}(0,\sigma)
\]
This error captures all variance that is not explained by our IV: annual growth in average life expancies.
Country-specific starting points
Your error is too full
\[
Average\ life\ expectancy = \beta_0 + \beta_1 Year + (b_{country} + \epsilon)
\]
For example:
\[
Average\ life\ expectancy_{Australia} = \beta_0 + \beta_1 Year + (b_{Australia} + \epsilon)
\]
Country-specific error
\[
Average\ life\ expectancy = \beta_0 + \beta_1 Year + (b_{country} + \epsilon)
\]
Where:
\[
b_{country} \sim \mathcal{N}(0,\tau)
\]
What can we do with this new power?
Give each country its own starting point (intercept);
Give each country its own slope;
Given each country its starting point and slope.
Cool!
Country-specific starting point
\[
Average\ life\ expectancy = \beta_0 + b_{0,country} + \beta_1 Year + \epsilon
\]
Country-specific starting point
Country-specific starting point
Country-specific slope
\[
Average\ life\ expectancy = \beta_0 + \beta_1 Year + b_{1,country} + \epsilon
\]
Country-specific slope
Country-specific slope
Country-specific starting point and slope
\[
Average\ life\ expectancy = \beta_0 + b_{0,country} + \beta_1 Year + b_{1,country} + \epsilon
\]
Country-specific starting point and slope
Country-specific starting point and slope
Why not just include a country variable?
This approach is good at capturing that different starting point, but falls down in capturing the uncertainty around those starting points.
Multi-level modelling
Focusing on country-specific starting points:
\[
Average\ life\ expectancy = \beta_0 + b_{0,country} + \beta_1 Year + \epsilon
\]
In R:
m_multi <- lme4:: lmer (life_exp ~ year + (1 | country), data = full_df)
Life expectancy over time
(Intercept)
67.516
67.568
(0.253)
(0.591)
Year
0.289
0.290
(0.022)
(0.004)
SD (Intercept country)
8.584
SD (Observations)
1.503
Num.Obs.
4409
4409
Fixed effects
The fixed effects describe the overall relationship within our data.
tidy (m_multi, effects = "fixed" )
# A tibble: 2 × 5
effect term estimate std.error statistic
<chr> <chr> <dbl> <dbl> <dbl>
1 fixed (Intercept) 67.6 0.591 114.
2 fixed year 0.290 0.00374 77.6
Group-level effects
The group-level effects tell us how this relationship differs between groups in our data (here: countries).
tidy (m_multi, effects = "ran_pars" )
# A tibble: 2 × 4
effect group term estimate
<chr> <chr> <chr> <dbl>
1 ran_pars country sd__(Intercept) 8.58
2 ran_pars Residual sd__Observation 1.50
Group-level effects
As we move from country to country, how much does the average life expectancy change on average?
# A tibble: 1 × 4
effect group term estimate
<chr> <chr> <chr> <dbl>
1 ran_pars country sd__(Intercept) 8.58
What is the remaining universal/global variance unexplained by our IV?
# A tibble: 1 × 4
effect group term estimate
<chr> <chr> <chr> <dbl>
1 ran_pars Residual sd__Observation 1.50
Substantive interpretation
The between country variance makes up a large proportion of our total variance:
\[
\frac{SD\ (Intercept\ country)}{Total\ SD} = 85.1%
\]
This variance is also much larger than the estimated effect of our IV (0.29 years): which country you are in matters much more than which year you are in.
Relationship between health and wealth
Relationship between health and wealth over time
Relationship between health and wealth over time
Linear regression
\[
Average\ life\ expectancy = \beta_0 + \beta_1GDP\ per\ capita + \beta_2Year + \epsilon
\]
Linear regression
(Intercept)
31.197
(0.431)
GDP per capita (US$, logged)
4.568
(0.051)
Year
0.039
(0.013)
Num.Obs.
4232
Accounting for country-specific variance
(Intercept)
56.521
(0.781)
GDP per capita (US$, logged)
1.376
(0.077)
Year
0.217
(0.006)
SD (Intercept country)
6.945
SD (Observations)
1.459
What did we gain?
A lot of the unexplained variance in the simple linear regression model is explained by country-specific differences:
\[
\frac{SD\ (Intercept\ country)}{Total\ SD} = \frac{6.945\ years}{6.945\ years + 1.459\ years} = 0.82
\]
Summary
Today you learnt how to: