Working with Panel Data

GVPT722

Today’s class

  1. What are panel data?
  2. Introduce intuition behind panel data
  3. Introduce random effects
  4. Worked example of fitting and interpreting a multi-level model

Looking at the same observations over time

  • Countries’ voting behavior in the UN in the post-Cold War period;

  • Countries’ propensity to go to war with one another;

  • Democratic backsliding across regions or countries;

  • A politician’s voting behavior;

  • An individual’s turnout behavior over all elections in which they are eligible to vote.

Looking at the same observations over time

More generally…

  • Actor 1’s behaviour in time \(t\), \(t+1\), …, \(t + n\);

  • Actor 2’s behaviour in time \(t\), \(t+1\), …, \(t + n\);

  • Actor N’s behaviour in time \(t\), \(t+1\), …, \(t + n\).

Relationship between health and wealth

What is the relationship between people’s health and wealth?

Countries over time

# A tibble: 4,557 × 7
   iso3c country  region              year gdp_per_cap log_gdp life_exp
   <chr> <chr>    <ord>              <dbl>       <dbl>   <dbl>    <dbl>
 1 ZWE   Zimbabwe Sub-Saharan Africa     0        565.    6.34     44.7
 2 ZWE   Zimbabwe Sub-Saharan Africa     1        569.    6.34     42.0
 3 ZWE   Zimbabwe Sub-Saharan Africa     2        529.    6.27     44.6
 4 ZWE   Zimbabwe Sub-Saharan Africa     3        474.    6.16     43.4
 5 ZWE   Zimbabwe Sub-Saharan Africa     4        477.    6.17     44.5
 6 ZWE   Zimbabwe Sub-Saharan Africa     5        471.    6.15     44.8
 7 ZWE   Zimbabwe Sub-Saharan Africa     6        441.    6.09     45.4
 8 ZWE   Zimbabwe Sub-Saharan Africa     7        425.    6.05     45.6
 9 ZWE   Zimbabwe Sub-Saharan Africa     8        352.    5.86     46.7
10 ZWE   Zimbabwe Sub-Saharan Africa     9        762.    6.64     48.1
# ℹ 4,547 more rows

Relationship between health and wealth

Relationship between health and wealth

Modelling life expectancy over time

\[ Average\ life\ expectancy = \beta_0 + \beta_1Year + \epsilon \]

 (1)
(Intercept) 67.516***
(0.253)
Year 0.289***
(0.022)
Num.Obs. 4409
+ p < 0.1, * p < 0.05, ** p < 0.01, *** p < 0.001

Modelling life expectancy over time

Accounting for the structure of our data

Region-level intercepts

\[ Average\ life\ expectancy = \beta_0 + \beta_1Year + \beta_2Region + \epsilon \]

Region-level intercepts

Country-level intercepts

\[ Average\ life\ expectancy = \beta_0 + \beta_1Year + \beta_1Country + \epsilon \]

Country-level intercepts

Random effects: a brief introduction

\[ Average\ life\ expectancy = \beta_0 + \beta_1Year + \epsilon \]

Where:

\[ \epsilon \sim \mathcal{N}(0,\sigma) \]

This error captures all variance that is not explained by our IV: annual growth in average life expancies.

Focusing on that error

Country-specific starting points

Your error is too full

\[ Average\ life\ expectancy = \beta_0 + \beta_1 Year + (b_{country} + \epsilon) \]

For example:

\[ Average\ life\ expectancy_{Australia} = \beta_0 + \beta_1 Year + (b_{Australia} + \epsilon) \]

Australia-specific error

Country-specific error

\[ Average\ life\ expectancy = \beta_0 + \beta_1 Year + (b_{country} + \epsilon) \]

Where:

\[ b_{country} \sim \mathcal{N}(0,\tau) \]

What can we do with this new power?

  1. Give each country its own starting point (intercept);

  2. Give each country its own slope;

  3. Given each country its starting point and slope.

Cool!

Country-specific starting point

\[ Average\ life\ expectancy = \beta_0 + b_{0,country} + \beta_1 Year + \epsilon \]

Country-specific starting point

Country-specific starting point

Country-specific slope

\[ Average\ life\ expectancy = \beta_0 + \beta_1 Year + b_{1,country} + \epsilon \]

Country-specific slope

Country-specific slope

Country-specific starting point and slope

\[ Average\ life\ expectancy = \beta_0 + b_{0,country} + \beta_1 Year + b_{1,country} + \epsilon \]

Country-specific starting point and slope

Country-specific starting point and slope

Why not just include a country variable?

This approach is good at capturing that different starting point, but falls down in capturing the uncertainty around those starting points.

  • Over-contextualized;

  • Making inferences based on very little information.

Multi-level modelling

Focusing on country-specific starting points:

\[ Average\ life\ expectancy = \beta_0 + b_{0,country} + \beta_1 Year + \epsilon \]

In R:

m_multi <- lme4::lmer(life_exp ~ year + (1 | country), data = full_df)

Life expectancy over time

Simple Multi-level
(Intercept) 67.516 67.568
(0.253) (0.591)
Year 0.289 0.290
(0.022) (0.004)
SD (Intercept country) 8.584
SD (Observations) 1.503
Num.Obs. 4409 4409

Fixed effects

The fixed effects describe the overall relationship within our data.

tidy(m_multi, effects = "fixed")
# A tibble: 2 × 5
  effect term        estimate std.error statistic
  <chr>  <chr>          <dbl>     <dbl>     <dbl>
1 fixed  (Intercept)   67.6     0.591       114. 
2 fixed  year           0.290   0.00374      77.6

Group-level effects

The group-level effects tell us how this relationship differs between groups in our data (here: countries).

tidy(m_multi, effects = "ran_pars")
# A tibble: 2 × 4
  effect   group    term            estimate
  <chr>    <chr>    <chr>              <dbl>
1 ran_pars country  sd__(Intercept)     8.58
2 ran_pars Residual sd__Observation     1.50

Group-level effects

As we move from country to country, how much does the average life expectancy change on average?

# A tibble: 1 × 4
  effect   group   term            estimate
  <chr>    <chr>   <chr>              <dbl>
1 ran_pars country sd__(Intercept)     8.58

What is the remaining universal/global variance unexplained by our IV?

# A tibble: 1 × 4
  effect   group    term            estimate
  <chr>    <chr>    <chr>              <dbl>
1 ran_pars Residual sd__Observation     1.50

Substantive interpretation

The between country variance makes up a large proportion of our total variance:

\[ \frac{SD\ (Intercept\ country)}{Total\ SD} = 85.1% \]

This variance is also much larger than the estimated effect of our IV (0.29 years): which country you are in matters much more than which year you are in.

Model performance

Relationship between health and wealth

Log transformation

Relationship between health and wealth over time

Relationship between health and wealth over time

Linear regression

\[ Average\ life\ expectancy = \beta_0 + \beta_1GDP\ per\ capita + \beta_2Year + \epsilon \]

Linear regression

 (1)
(Intercept) 31.197
(0.431)
GDP per capita (US$, logged) 4.568
(0.051)
Year 0.039
(0.013)
Num.Obs. 4232

Model performance

Accounting for country-specific variance

 (1)
(Intercept) 56.521
(0.781)
GDP per capita (US$, logged) 1.376
(0.077)
Year 0.217
(0.006)
SD (Intercept country) 6.945
SD (Observations) 1.459

Model performance

What did we gain?

A lot of the unexplained variance in the simple linear regression model is explained by country-specific differences:

\[ \frac{SD\ (Intercept\ country)}{Total\ SD} = \frac{6.945\ years}{6.945\ years + 1.459\ years} = 0.82 \]

Model performance across time

Model performance across GDP per capita

Summary

Today you learnt how to:

  • Account for the structure of your data;

  • Isolate fixed- and group-effects in the relationship between your outcome and explanatory variables.