install.packages("devtools")
::install_github("hgoers/polisciols") devtools
Multiple Linear Regression
Set up
You will need to install a new package that I built for this class. Because it is not published on CRAN, you need to download it from Github directly. To do this, you also need devtools
.
Run the following to install these packages:
You then need to load all the relevant packages into your current R session:
library(tidyverse)
library(broom)
library(modelsummary)
library(marginaleffects)
library(janitor)
library(ggdist)
library(polisciols)
The economic benefits of justice
Appel and Loyle (2012) (two wonderful UMD alumni) explore the determinants of foreign direct investment (FDI) flows into and out of post-conflict states. States that are emerging from civil war often have an acute need for foreign and stable sources of capital. However, multinational corporations and other foreign commercial actors are likely to view post-conflict states as high risk countries in which to invest: the risk of a return to violence and instability is often high in the immediate aftermath of a civil war. Understanding this, leaders of post-conflict states often attempt to decrease this perceived risk.
Appel and Loyle argue that leaders can successfully do this by establishing post-conflict justice (PCJ) institutions. These institutions impose both domestic and reputational costs on post-conflict leaders. These costs allow leaders to signal their commitment to minimizing the risk of a return to violence and instability to foreign commercial actors. This is a great paper and I highly encourage you to take a look at their argument in detail.
Their argument focuses on leaders’ attempts to change foreign commercial actors’ perceptions of the risk of a return to violence. We cannot directly observe these commercial actors’ perceptions. However, we can observe the outcome of these perceptions: investment. If Appel and Loyle’s argument is correct, we should observe higher levels of FDI investment coming into post-conflict states that establish PCJ institutions compared to those that do not, on average and holding all else constant.
Appel and Loyle find strong evidence of this. We will replicate and modify this empirical work. In doing so, we will become more familiar with the underlying mechanics of multiple linear regression and strengthen our ability to interpret these models.
Let’s get started!
Net FDI inflows to post-conflict states
Appel and Loyle provide us with data on 95 different post-conflict states. These include all states that had internal armed conflicts, including internationalized conflicts, that resulted in at least 25 battle-related deaths and were settled between 1970 and 2001.
We can access these data through polisciols::ebj
:
head(ebj)
id ccode country_name pcj net_fdi_inflows gdp_per_capita
1 71 41 Haiti No institutions -9.8000 1182.498
2 71 41 Haiti No institutions 6.6000 1088.680
3 154 52 Trinidad & Tobago No institutions 510.1589 7742.736
4 102 70 Mexico No institutions -340.6904 6894.704
5 102 70 Mexico No institutions 6460.7998 7780.053
6 67 90 Guatemala PCJ institutions 431.3800 3061.873
gdp gdp_per_capita_growth ex_rate_fluc cap_account_openness labor
1 8407079981 -2.1324685 0.000000 -0.7681904 68.5
2 8055393763 -14.8832922 1.776603 -0.0871520 67.9
3 9542938026 1.9370972 0.000000 -1.1305820 55.2
4 628418000000 -7.8634830 1.978028 1.1804080 59.8
5 730752000000 5.2345648 1.129686 1.1804080 61.3
6 31339424077 0.6277104 1.046489 1.2642760 63.1
f_life_exp polity2 pol_constraints conflict_duration damage
1 55.02233 7 0.00 1 0.000000
2 56.09750 -7 0.00 1 12.855902
3 72.61483 9 0.84 1 -2.787351
4 74.67104 4 0.39 1 0.000000
5 75.23493 6 0.39 1 -8.949999
6 67.46067 8 0.43 31 -43.714600
peace_agreement victory cold_war
1 No agreement Victory Post-Cold War
2 No agreement Victory Post-Cold War
3 No agreement Victory Post-Cold War
4 Peace agreement No victory Post-Cold War
5 No agreement No victory Post-Cold War
6 Peace agreement No victory Post-Cold War
They focus on the 10-year period immediately following the conflict’s conclusion. This is the period in which we would expect foreign commercial actors to perceive the risk of a return to violence and instability to be greatest and; therefore, the period in which leaders’ attempts to quash these perceptions to be most relevant.
Therefore, each row in this data set represents a single post-conflict state. The data is generally a summary of the 10-years post-conflict. For example net_fdi_inflows
provides us with the total net FDI inflows each post-conflict state received in that 10-year period.
You can learn more about each variable using the following command:
?ebj
Their outcome of interest is net FDI inflows. Remember, if their argument is correct we would expect post-conflict states that have established a PCJ institution to have higher net FDI inflows than than states that did not. Let’s take a look at those net inflows:
summary(ebj$net_fdi_inflows)
Min. 1st Qu. Median Mean 3rd Qu. Max.
-1858.914 -0.572 38.290 759.146 408.250 24836.787
ggplot(ebj, aes(x = net_fdi_inflows)) +
geom_histogram() +
theme_minimal() +
scale_x_continuous(labels = scales::dollar)
There appears to be a clear outlier: a net FDI inflow of $24,836.79 million for Russia. You can see that this is pulling up the average net FDI inflows well above the median inflow across our group of post-conflict states. We will keep this in because this is not the focus of today’s session, but I would encourage you to explore whether these findings are sensitive to its inclusion.
We learn from this that some states have greater FDI outflows than inflows (resulting in negative net FDI inflows). Indonesia had the greatest negative inflow, with net inflows of -$1,858.91 million.
Other states received greater foreign investments than they invested elsewhere, resulting in positive net FDI inflows. In fact, on average, post-conflict states received more inflows than outflows.
PCJ institutions
Appel and Loyle focus on whether states that have PCJ institutions successfully attract greater net FDI inflows than those that do not. Therefore, their main explanatory variable is the existance of PCJ institutions.
This variable, pcj
, indicates whether the the state established a PCJ institution within five years following the end of the conflict. What proportion of states did this?
tabyl(ebj, pcj)
pcj n percent
No institutions 77 0.8105263
PCJ institutions 18 0.1894737
The majority of post-conflict states did not have PCJ institutions.
Relationship between PCJ institutions and net FDI inflows
Of the states that do have a PCJ institution, do they tend to receive higher net FDI inflows than their less reconciliatory counterparts?
|>
ebj group_by(pcj) |>
summarise(avg_net_fdi = mean(net_fdi_inflows))
# A tibble: 2 × 2
pcj avg_net_fdi
<fct> <dbl>
1 No institutions 425.
2 PCJ institutions 2189.
Yes! States that established a PCJ institution received higher net FDI inflows, on average, than those states that did not. This difference looks large! It’s $1,763.52 million in net inflows!
But what are the chances this difference is simply the product of chance? We can formally test this question using linear regression:
<- lm(net_fdi_inflows ~ pcj, data = ebj)
m
modelsummary(m,
coef_rename = c("pcjPCJ institutions" = "PCJ institutions established"),
stars = T)
(1) | |
---|---|
(Intercept) | 425.006 |
(323.874) | |
PCJ institutions established | 1763.517* |
(744.049) | |
Num.Obs. | 95 |
R2 | 0.057 |
R2 Adj. | 0.047 |
AIC | 1784.5 |
BIC | 1792.2 |
Log.Lik. | −889.253 |
F | 5.618 |
RMSE | 2811.91 |
+ p < 0.1, * p < 0.05, ** p < 0.01, *** p < 0.001 |
According to this model, what is the average predicted net FDI inflows for countries that did not establish PCJ institution?
Post-conflict states that did not establish a PCJ institution received, on average, net FDI inflows of $425.01 million in the 10-year period after conflict.
To get this, I looked at the intercept coefficient which tells us the average predicted value of our outcome variable when all explanatory variables are equal to zero or their baseline category.
According to this model, what is the average predicted net FDI inflows for countries that did establish PCJ institution?
In contrast, states that did establish PCJ institution received, on average, net FDI inflows of $2,188.52 million.
In other words:
\[ Average\ net\ FDI\ inflows = \beta_0 + \beta_1 Institution\ established + \epsilon \]
When an institution was established (i.e. \(Institution\ established = 1\)):
\[ Average\ net\ FDI\ inflows = 425.01 + 1763.52 * 1 + \epsilon \\ \]
\[ Average\ net\ FDI\ inflows = 2188.52 \]
Substantive significance
This finding is substantively significant. These countries are recovering from conflict: their economies are really weak. Leaders are often very keen to find stable and reliable sources of funding to promote and strengthen their battered economies. This average difference of $1,763.52 million is; therefore, likely to incentivize this policy.
Further, if we look at the range of plausible values of the coefficient on the existence of a PCJ institution (i.e. the confidence interval) we can see states that did not establish a commission plausibly receive no net FDI inflows in this post-conflict period. In fact, it is plausible that investment leaves their economies: this net inflow can be negative. On the other hand, states that do establish a commission enjoy, on average, billions in net FDI inflows. For an economy struggling to establish indigenous production in a post-conflict setting, this can be critical to their long-term economic development. Again, this is further proof of the substantive significance of this relationship.
Visualizing the full range of plausible coefficients can be a great way to communicate your findings:
plot_predictions(m, condition = "pcj") +
geom_hline(yintercept = 0, colour = "grey") +
labs(x = NULL,
y = "Net FDI inflows (USD million)") +
scale_y_continuous(labels = scales::dollar) +
theme_minimal()
But what about other factors that shape net FDI inflows?
One of Appel and Loyle’s major contributions is their critique of approaches to estimating net FDI inflows that focus only on economic factors. They argue that there are several political factors that are significant determinants of other countries’ and foreign firms’ willingness to invest in these war-torn countries.
This critique is very valid, but it suggests that there are many different things influencing this outcome of interest, including economic factors. We have only looked at the political! The economists might turn around and accuse us of doing the very thing Appel and Loyle accused them of!
Let’s add some of those economic factors into our model. We will start with an intuitive one: individuals’ economic wealth (measured as GDP per capita). I expect that foreign firms will be more willing to invest larger sums of money into economies with richer citizens. These citizens will be more willing and able to purchase the goods and services provided by those firms.
Therefore, I hypothesize that the greater a state’s GDP per capita, the larger its net FDI inflows. I expect this to be the case regardless of whether the state has established a PCJ institution. Let’s test this claim:
<- lm(net_fdi_inflows ~ pcj + gdp_per_capita, data = ebj)
m
modelsummary(m,
coef_rename = c("pcjPCJ institutions" = "PCJ institutions established",
"gdp_per_capita" = "GDP per capita (current USD)"),
stars = T)
(1) | |
---|---|
(Intercept) | −319.173 |
(396.052) | |
PCJ institutions established | 1673.288* |
(714.015) | |
GDP per capita (current USD) | 0.318** |
(0.105) | |
Num.Obs. | 95 |
R2 | 0.142 |
R2 Adj. | 0.124 |
AIC | 1777.5 |
BIC | 1787.7 |
Log.Lik. | −884.742 |
F | 7.638 |
RMSE | 2681.52 |
+ p < 0.1, * p < 0.05, ** p < 0.01, *** p < 0.001 |
We continue to find a positive and statistically significant relationship between net FDI inflows and the establishment of a PCJ institution. This model also accounts for the association between the state’s GDP per capita and those inflows.
The intercept here is not informative on its own. It tells us the estimated average net FDI inflows for states that do not have PCJ institutions and in which citizens had a GDP per capita of $0. Although the majority of states did not establish an institution, there are no states in the world that have a GDP per capita of $0.
Let’s instead focus on the other coefficients. We find that states that established a PCJ institution received, on average, net FDI inflows of $1,673.29 million more than states that did not in the 10-year period after conflict. This is a slightly smaller estimated difference than we found in the model that did not account for individuals’ average wealth, but it remains large.
We also find that an increase in the GDP per capita of a state of $1,000 is associated with an increase of $318.45 million in net FDI inflow. This is consistent with our expectations that, holding all else constant, a country with a richer population is a more attractive investment destination than a country that has poorer citizens.
Using this richer model
We now have a richer understanding of the determinants of net FDI inflows to post-conflict countries. We have accounted for both economic and political determinants of those flows. Although it is often useful to look at the estimated relationship of each of those variables individually (as we did just above), we often learn more by looking at the whole model in context.
As usual, one of the easiest ways to communicate this is through a visualization. Let’s look at the predicted net FDI inflows for post-conflict countries that established and did not establish PCJ institutions across a range of plausible GDP per capita values:
plot_predictions(m, condition = c("gdp_per_capita", "pcj")) +
labs(x = "GDP per capita (USD)",
y = "Net FDI inflows (USD million)") +
scale_y_continuous(labels = scales::dollar) +
scale_x_continuous(labels = scales::dollar) +
theme_minimal()
We can see the positive relationship between GDP per capita and net FDI inflows and that across any given value of GDP per capita states that have established a PCJ institution start and stay at a higher predicted inflow compared to those that did not.
The full model
Appel and Loyle control for many more economic and political factors shaping net FDI inflows. Remember, they are arguing that the extensive literature that looks at the determinants of FDI flows to post-conflict states failed to account for this important political factor. However, that same literature did a very good job of identifying the economic factors, including countries’ economic development, size, and growth rates, that shape this outcome. They don’t dispute that these factors are also important, they just argue that we should also think about the role PCJ institutions play in shaping foreign firms’ beliefs about the risk of returning to violence.
So, let’s account for these other factors:
<- lm(net_fdi_inflows ~ pcj + gdp_per_capita + gdp + gdp_per_capita_growth +
m + ex_rate_fluc + labor + f_life_exp +
cap_account_openness + polity2 + damage + conflict_duration + peace_agreement +
pol_constraints + cold_war,
victory data = ebj)
modelsummary(m,
coef_rename = c("pcjPCJ institutions" = "PCJ institutions established",
"gdp_per_capita" = "GDP per capita (current USD)",
"gdp" = "GDP (current USD)",
"gdp_per_capita_growth" = "GDP per capita growth rate (%)",
"cap_account_openness" = "Capital account openness",
"ex_rate_fluc" = "Exchange rate fluctuation",
"labor" = "Labor force participation (%)",
"f_life_exp" = "Average female life expectancy (years)",
"pol_constraints" = "Political constraints",
"polity2" = "Regime type (Polity score)",
"damage" = "Pre-conflict GDP lost",
"conflict_duration" = "Conflict duration (years)",
"peace_agreementPeace agreement" = "Peace agreement",
"victoryVictory" = "Decisive victory",
"cold_warCold War" = "Cold War"),
stars = T)
(1) | |
---|---|
(Intercept) | −1278.322 |
(2852.294) | |
PCJ institutions established | 1960.282** |
(702.992) | |
GDP per capita (current USD) | −0.111 |
(0.133) | |
GDP (current USD) | 0.000*** |
(0.000) | |
GDP per capita growth rate (%) | 37.400 |
(23.239) | |
Capital account openness | 198.823 |
(201.590) | |
Exchange rate fluctuation | −42.516** |
(13.888) | |
Labor force participation (%) | 9.844 |
(25.528) | |
Average female life expectancy (years) | 3.475 |
(32.993) | |
Political constraints | 2557.954+ |
(1459.599) | |
Regime type (Polity score) | −90.169 |
(56.309) | |
Pre-conflict GDP lost | 28.379** |
(10.242) | |
Conflict duration (years) | 0.811 |
(35.543) | |
Peace agreement | −1215.137 |
(793.826) | |
Decisive victory | −33.969 |
(650.725) | |
Cold War | 81.531 |
(654.092) | |
Num.Obs. | 95 |
R2 | 0.514 |
R2 Adj. | 0.422 |
AIC | 1749.5 |
BIC | 1792.9 |
Log.Lik. | −857.752 |
RMSE | 2018.34 |
+ p < 0.1, * p < 0.05, ** p < 0.01, *** p < 0.001 |
Even when we account for all the political, economic, and conflict-related factors that the literature previously identified to be important, we still find that the existence of PCJ institution substantively and statistically significantly shapes the net FDI inflows of post-conflict states.
Including categorical variables with multiple categories
Appel and Loyle look at a number of political factors driving net FDI inflows to post-conflict states. They include a measure of regime type: the country’s Polity score. This score measures a state’s regime type along a 21-point scale from -10 (perfect autocracy) to 10 (perfect democracy). Broadly speaking, political scientists have usefully broken this spectrum down into three regime types: democracies, hybrid regimes, and autocracies.
Let’s modify their measure of regime type to reflect these broad categories, instead of treating it as a continuous variable:
<- ebj |>
ebj mutate(regime_type = case_when(polity2 > 5 ~ "Democracy",
< -5 ~ "Autocracy",
polity2 TRUE ~ "Hybrid regime"),
regime_type = factor(regime_type, levels = c("Autocracy",
"Hybrid regime",
"Democracy")))
I have a theoretical reason to do this. I suspect that there is not a clear linear relationship between investors’ confidence in a post-conflict state and its regime type when we treat regime type as a continuous spectrum moving linearly from autocracies to democracies. In other words, I don’t think that moving one Polity score away from being an autocracy to being a democracy would have a consistent effect on investor confidence (and; therefore, net FDI inflows). Appel and Loyle’s model agrees with me: the regime type variable is not statistically significant.
Rather, I suspect that strong democracies and strong autocracies provide the political stability required to comfort foreign investors. These investors believe that the strong control democrats and autocrats have over their citizens and institutions reduces the risk that the country will re-enter into conflict. However, hybrid regimes do not tend to have this level of control. Investors are; therefore, less likely to invest in post-conflict countries with hybrid regimes.
Let’s test this!
We now have a categorical variable with three categories: democracy, hybrid regime, and autocracy. We have thus far largely dealt with binary categorical variables (voted or not, Southern or not, female or not). How do we use and interpret multiple categorical variables in regression analysis?
Happily, the intuition remains the same as with our binary categorical variables. We hold one category out as our baseline category and then compare the associated effects of the other categories to this one.
Let’s step through that using a stripped back version of our model:
<- lm(net_fdi_inflows ~ pcj + regime_type, data = ebj)
m
modelsummary(m,
coef_rename = c("pcjPCJ institutions" = "PCJ institutions established",
"regime_typeDemocracy" = "Democracy",
"regime_typeHybrid regime" = "Hybrid"),
stars = T)
(1) | |
---|---|
(Intercept) | 214.555 |
(497.984) | |
PCJ institutions established | 1816.169* |
(764.093) | |
Hybrid | 553.893 |
(667.644) | |
Democracy | −148.124 |
(809.623) | |
Num.Obs. | 95 |
R2 | 0.068 |
R2 Adj. | 0.037 |
AIC | 1787.4 |
BIC | 1800.1 |
Log.Lik. | −888.688 |
F | 2.217 |
RMSE | 2795.25 |
+ p < 0.1, * p < 0.05, ** p < 0.01, *** p < 0.001 |
You’ll note that autocracies are missing from our regression table. This is because they are being held out as our baseline category. Their effect on net FDI inflows is captured by the intercept coefficient.
We often say that the intercept coefficient represents the predicted average value of our outcome of interest when all independent variables are set to zero. It might be useful for you to think of your baseline category as taking on the value zero. For example, we can think of autocracy = 0
.
Our model suggests that autocracies (regime_type = "Autocracy"
) that have not established a commission (pcj = "No institutions"
) have a predicted average net FDI inflow of $214.55 million.
The coefficients on democracies and hybrid regimes need to be interpreted in relation to autocracies (their baseline category). From our model, we can see that the coefficient for democracies is negative and the coefficient for hybrid regimes is positive. That means that, on average, democracies tend to receive less net FDI inflows than autocracies and hybrid regimes tend to receive more net FDI inflows than autocracies.
Predictions with multiple categorical variables
Using our model, what do we predict to be the net FDI inflows for democracies, autocracies, and hybrid regimes that either have a PCJ institutions or do not?
First, let’s create a table with each possible combination of these two variables of interest:
<- tibble(pcj = factor(c("No institutions",
new_data "PCJ institutions"))) |>
cross_join(
tibble(regime_type = factor(c("Autocracy", "Democracy", "Hybrid regime")))
)
new_data
# A tibble: 6 × 2
pcj regime_type
<fct> <fct>
1 No institutions Autocracy
2 No institutions Democracy
3 No institutions Hybrid regime
4 PCJ institutions Autocracy
5 PCJ institutions Democracy
6 PCJ institutions Hybrid regime
Then we can use our model to predict what we expect a hypothetical state with each of these combinations of characteristics to receive in net FDI inflows:
<- augment(m, newdata = new_data)
pred
pred
# A tibble: 6 × 3
pcj regime_type .fitted
<fct> <fct> <dbl>
1 No institutions Autocracy 215.
2 No institutions Democracy 66.4
3 No institutions Hybrid regime 768.
4 PCJ institutions Autocracy 2031.
5 PCJ institutions Democracy 1883.
6 PCJ institutions Hybrid regime 2585.
That’s a bit unweildy. Let’s visualize it!
ggplot(pred, aes(x = .fitted, y = pcj, colour = regime_type)) +
geom_point(size = 5) +
theme_minimal() +
labs(x = "Predicted net FDI inflows (USD, million)",
y = NULL,
colour = "Regime type")
We can clearly see that states that established PCJ institutions received, on average, larger net FDI inflows than states that did not, no matter their regime type. Further and completely counter to my hypothesis, hybrid regimes have, on average, the highest net FDI inflows compared to democracies and autocracies even when we account for whether the state has a PCJ institution.
I’m not too worried: none of these coefficients are anywhere close to being statistically significant. I suspect that there is a more complex relationship underlying commercial actors’ beliefs about the stability of hybrid regimes, democracies, and autocracies and its effect on net investment flows. But I hope this serves as a good illustration of how we can use multiple categorical variables in our analyses.