When it comes to insect and invert borne zoonotic diseases, edge forest habitat is thought of in the literature to be high risk. But because the worlds forests are only getting more fragmented, the area of edge forest is increasing. Lyme disease particularly, has been found to be a major offender, as the land type lends itself well to strong tick vectors like deer.

This has been a strong motivator for studying Lyme disease incidence. If the wildlife-urban interface (WUI) is getting larger, it’s likely that more people will choose to live in those areas, and we may expect the rate of Lyme disease to rise. It’s for this reason that habitat fragmentation has logically been used as an indicator for incidence. But studies on this topic, while generally supporting this idea, have been inconsistent.

MacDonald et al. (2018), suggests that the reason for this inconsistency could be an endogeneity bias, in which as much as habitat fragmentation influences Lyme prevalence, it also works the other way around, in which Lyme incidence influences where people choose to live, which ironically can lead to more fragmentation. This violates a very important assumption that makes most statistical models work, in which the predictor variables must not be correlated with the error term. Because this is the method used in most other studies on this topic, this suggest some bias in the literature.

They therefore try a different approach, and instead use an instrumental variable (IV) model. An IV is a variable that correlates with the predictor variable in a model. We refer to this as endogeneity, and the variable as our endogenous variable. This method can be used to predict Lyme disease by proximity, as habitat fragmentation on it’s own is unreliable.

MacDonald et al. (2018) utilizes this concept by including a density restriction index, which indicates lot size regulation in an area. The thing about the instrumental variable approach to modeling is that it’s a 2 stage process where first, the instrument is regressed on our habitat fragmentation (our endogenous variable), and second, the the endogenous variable is regressed on the dependent variable (Lyme disease incidence). This is a very robust way performing this analysis.

Using this strategy, the paper attempts to model Lyme disease incidence in the WUI of the US northeastern deciduous forests. Along, with and IV approach, it uses a fixed effects approach and checks for robustness with several F-tests.

library(tidyverse)
library(ivreg)
library(plm)
library(jtools)
library(sandwich)
library(lmtest)
library(fixest)
library(kableExtra)
library(ivDiag)

lyme <- read_csv(here::here('posts', 
                            'macdonald_2018_breakdown', 
                            'data', 
                            'MacDonaldetal_JAE_2018.csv'))
lyme_na <- lyme %>% filter(is.na(DensityResIndex) == FALSE)

Workflow

Variable names

Variable	Role in Model	Description
DensityResIndex	IV	Density restriction index, indicates the average lot size regulation for a given observation
PercWuiPop	Endogenous Variable	Percent of county population residing in WUI
LogLyme0103_1113	Dependent Variable	Lyme disease incidence
MeanPatchArea	Exogenous Covariate	Average forest patch area by county
EdgeDensity	Exogenous Covariate	Forest edge density by county
PercLandForest	Exogenous Covariate	% forested area in county
StateYrGroup	Fixed Variable	Identifier for each observation based on state and year
AgStatDistrict	Clustering Variable (for CMSE)	USDA agricultural statistics districts

First and second stage IV regression

A general overview of the methods employed in the paper…

The instrument is first regressed on the endogenous variable

stage_1 <- lm(PercWuiPop ~ DensityResIndex,
              data = lyme_na)

Then the first stage fitted values are regressed against the dependant variable.

stage_2 <- lm(LogLyme0103_1113 ~ PercWuiPop, 
              data = lyme_na)

Both stages include covariates, although the strength of them will be tested later on.

stage_1 <- lm(PercWuiPop ~ DensityResIndex +
              MeanPatchArea + EdgeDensity + PercLandForest, 
              data = lyme_na)

stage_2 <- lm(LogLyme0103_1113 ~ PercWuiPop +
              MeanPatchArea + EdgeDensity + PercLandForest, 
              data = lyme_na)

The ivreg package can run both the stage 1 and stage 2 regressions intertwined in a single function, making this process easy.

lyme_iv <- ivreg(LogLyme0103_1113 ~ MeanPatchArea + EdgeDensity + PercLandForest | # covariates
                 PercWuiPop | # endogenous variable
                 DensityResIndex, # instrumental variable
  data = lyme_na)

summary(lyme_iv)


Call:
ivreg(formula = LogLyme0103_1113 ~ MeanPatchArea + EdgeDensity + 
    PercLandForest | PercWuiPop | DensityResIndex, data = lyme_na)

Residuals:
    Min      1Q  Median      3Q     Max 
-4.5836 -0.9984  0.1557  1.0503  3.3776 

Coefficients:
                Estimate Std. Error t value Pr(>|t|)    
(Intercept)     1.789588   0.186272   9.607  < 2e-16 ***
PercWuiPop      0.037167   0.008414   4.417 1.22e-05 ***
MeanPatchArea  -0.042153   0.017051  -2.472   0.0138 *  
EdgeDensity    -0.002510   0.007850  -0.320   0.7493    
PercLandForest  0.007982   0.018260   0.437   0.6622    

Diagnostic tests:
                 df1 df2 statistic  p-value    
Weak instruments   1 509     62.97 1.35e-14 ***
Wu-Hausman         1 508     17.09 4.17e-05 ***
Sargan             0  NA        NA       NA    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 1.459 on 509 degrees of freedom
Multiple R-Squared: -0.005559,  Adjusted R-squared: -0.01346 
Wald test: 28.41 on 4 and 509 DF,  p-value: < 2.2e-16

Fixed effects

The authors also ran the second stage as a fixed effects model. This approach is one where you group your data to account for unobserved differences between them. It does this by using a dummy variable (0 or 1) for each class for a group to indicate which data are from which class.

Here, the models were fixed by year or state & year.

fixed_effects <- lm(LogLyme0103_1113 ~ PercWuiPop + as_factor(StateYrGroup) - 1, data = lyme_na)

summary(fixed_effects)


Call:
lm(formula = LogLyme0103_1113 ~ PercWuiPop + as_factor(StateYrGroup) - 
    1, data = lyme_na)

Residuals:
    Min      1Q  Median      3Q     Max 
-4.2041 -0.6909 -0.0239  0.6520  4.9930 

Coefficients:
                          Estimate Std. Error t value Pr(>|t|)    
PercWuiPop                0.017941   0.001989   9.022  < 2e-16 ***
as_factor(StateYrGroup)1  3.454231   0.420940   8.206 2.04e-15 ***
as_factor(StateYrGroup)2  2.879821   0.421244   6.836 2.43e-11 ***
as_factor(StateYrGroup)3  2.920626   0.648817   4.501 8.44e-06 ***
as_factor(StateYrGroup)4  3.983158   0.648824   6.139 1.72e-09 ***
as_factor(StateYrGroup)5  0.598705   0.340043   1.761   0.0789 .  
as_factor(StateYrGroup)6  2.711908   0.342733   7.913 1.70e-14 ***
as_factor(StateYrGroup)7  2.458755   0.328560   7.483 3.39e-13 ***
as_factor(StateYrGroup)8  3.046376   0.328395   9.277  < 2e-16 ***
as_factor(StateYrGroup)9  2.471677   0.322881   7.655 1.04e-13 ***
as_factor(StateYrGroup)10 3.369002   0.321252  10.487  < 2e-16 ***
as_factor(StateYrGroup)11 1.505099   0.178967   8.410 4.53e-16 ***
as_factor(StateYrGroup)12 2.742846   0.179086  15.316  < 2e-16 ***
as_factor(StateYrGroup)13 0.968439   0.455144   2.128   0.0339 *  
as_factor(StateYrGroup)14 3.004004   0.456960   6.574 1.26e-10 ***
as_factor(StateYrGroup)15 2.364315   0.267596   8.835  < 2e-16 ***
as_factor(StateYrGroup)16 3.100802   0.266760  11.624  < 2e-16 ***
as_factor(StateYrGroup)17 0.980690   0.188115   5.213 2.74e-07 ***
as_factor(StateYrGroup)18 2.117406   0.187057  11.320  < 2e-16 ***
as_factor(StateYrGroup)19 1.807620   0.198379   9.112  < 2e-16 ***
as_factor(StateYrGroup)20 2.624127   0.197436  13.291  < 2e-16 ***
as_factor(StateYrGroup)21 0.442376   0.381141   1.161   0.2463    
as_factor(StateYrGroup)22 2.936382   0.383114   7.665 9.74e-14 ***
as_factor(StateYrGroup)23 1.525581   0.185235   8.236 1.64e-15 ***
as_factor(StateYrGroup)24 3.005449   0.185549  16.198  < 2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 1.123 on 489 degrees of freedom
Multiple R-squared:  0.8878,    Adjusted R-squared:  0.8821 
F-statistic: 154.8 on 25 and 489 DF,  p-value: < 2.2e-16

The paper states that the fixed-effects had very little to know effect on the estimate, only changing it by +-0.002.

F-test and Robustness Checks

In the most general sense, a F-test looks for statistical significance by comparing variances. But in a modeling context, a F-test can be used to estimate how much the addition of 1 or more variables improves the model. We do this by comparing our restricted and unrestricted models.

The general equation:

\[ F = \frac{\text{SSE}_R - \text{SSE}_{UR} / e}{\text{SSE}_{UR} / (n - k)} \]

Where:

\(\text{SSE}_{UR}\) is the full model with every variable.
\(\text{SSE}_{R}\) is a model with one or more variables excluded. The excluded variables are being tested for their strength.
\(\text e\) is the restrictions or the total number of omitted variables
\(\text k\) is the total number of variables in the regression, including the intercept.
\(\text n\) is the sample size.

They performed this test on the first stage of the IV approach. To accomplish this manually in R…

# get unrestricted model
ur_model <- lm(PercWuiPop ~ DensityResIndex
               + MeanPatchArea + EdgeDensity + PercLandForest 
              ,data = lyme_na)

# sse unrestricted
sse_ur <- sum((fitted(ur_model) - lyme_na$PercWuiPop)^2)

# restricted models
r_model <- lm(PercWuiPop ~ MeanPatchArea + EdgeDensity + PercLandForest 
              ,data = lyme_na)

# sse restricted
sse_r <- sum((fitted(r_model) - lyme_na$PercWuiPop)^2)

# non-robust f test
F_std <- ((sse_r - sse_ur) / 1) / 
           (sse_ur / (nrow(lyme_na) - 5))

cat("F value: ", F_std, "\n")

F value:  62.96987

This can also be accomplished with an ANOVA…

f_stat <- anova(r_model, ur_model)
cat("F value with ANOVA: ", f_stat$F[2])

F value with ANOVA:  62.96987

A F-value above 10 indicates a strong instrument, and this F is significantly above 10.

You would then obtain a critical value.

\(\text ResDF_{UR}\) is derived from sample size of dataset - total num of variables in model
\(\text ResDF_{R}\) is derived from the previously calculated Residual Df, minus the number of restrictions.
\(\text {Df}_1\) is therefore derived from ResDF_UR - variables in model.
\(\text {Df}_2\) is k - \(\text {Df}_1\) - 1

We will just use the qf() function to calculate it.

With a df₂ this large, it would be approximately 2.20. But we can also calculate it with qf():

df1 <- summary(ur_model)$coefficients %>% nrow()

cat("F distribution critical value: ", qf(0.95, f_stat$Df[2], df1 - f_stat$Df[2] - 1))

F distribution critical value:  10.12796

The problem with doing the F-test this way, is that it’s not robust. It assumes homoscedasticity and i.i.d, which because the data has both a time and spatial component, likely isn’t true. To account for this, the paper also performs a more robust version of the F-test called the effective F, from Olea & Pflueger (2013). This test balances the instrumental variables and endogenous variables by a weighted covariance matrix. The matrix accounts for unequal variance and clustering. Although clustering was already accounted for as the error term for these models are clustered mean squared errors (CMSE).

\(F_{\text{eff}} = \frac{(Y'X) \hat{W}^{-1}_2 (Z'X)}{K \cdot n}\)

\(\text Z\) is an instrumental variable vector. \(\text X\) is an endogenous variable vector. \(\text Z', X'\) are transposes of the prior variables (as in row to column, or vise versa) \(\hat{W}^{-1}_2\) is the covariance matrix \(\text K\) is number of instruments \(\text n\) is sample size

The crux of effective F is the \(\tau\) parameter. A higher tau means lower critical values and less strict criteria. For lower \(\tau\)s, the opposite is true. A generally accepted \(\tau\) is 10%.

This can be done simply with eff_F() from the ivDiag package:

effective_f <- eff_F(lyme_na, Y = "LogLyme0103_1113", D = "PercWuiPop", Z = "DensityResIndex", controls = c("PercLandForest", "EdgeDensity", "MeanPatchArea"), cl = "AgStatDistrict")

cat("Effective F Stat:", effective_f)

Effective F Stat: 11.4094

Clustered Mean Squared Errors

I had mentioned earlier in this blog that clustering should not be a problem when doing the non-robust F-test. This is because this study uses clustered mean squared errors (CMSE) as it’s error term. We can calculate them with vcovCL() from the sandwich package, and get a summary/hypothesis test with the coeftest() from the lmtest package.

You funnel in a model and a clustering variable within the dataset:

# Calculate clustered standard errors
clustered_se <- vcovCL(ur_model, cluster = ~ AgStatDistrict)
# Hypothesis testing with clustered standard errors
coeftest(ur_model, vcov = clustered_se)


t test of coefficients:

                Estimate Std. Error t value  Pr(>|t|)    
(Intercept)     -4.42576    3.44729 -1.2838 0.1997827    
DensityResIndex 23.08206    6.83351  3.3778 0.0007867 ***
MeanPatchArea   -0.14486    0.37884 -0.3824 0.7023420    
EdgeDensity      0.35138    0.18181  1.9326 0.0538361 .  
PercLandForest   0.80993    0.45888  1.7650 0.0781565 .  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Results

This study created a variety of different models and tested different combinations of covariates. The models used were:

OLS
Year Fixed Effects
State * Year Fixed Effects
IV
IV with Year Fixed Effects

And the different combinations of covariates were: * None (Bivariate model) * % Forest & Edge Density * % Forest & Mean Patch Size * Mean Patch Size & Edge Density * All covariates

Here, we run all models…

Code

##~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
##~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
##                                                                            ~~
##                                    OLS                                   ----
##                                                                            ~~
##~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
##~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~




##~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
##                            Bivariate models                              ----
##~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

ols_biv <- lm(LogLyme0103_1113 ~ PercWuiPop, data = lyme)
ols_biv_clust <- ols_biv %>% 
    vcovCL(cluster = ~ AgStatDistrict) %>% 
    coeftest(ols_biv, vcov = .)


##~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
##                            Multivariate models                           ----
##~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

# %forest + edge
ols_multi_1 <- lm(LogLyme0103_1113 ~ PercWuiPop + EdgeDensity + PercLandForest, data = lyme)
ols_multi_1_clust <- ols_multi_1 %>% 
    vcovCL(cluster = ~ AgStatDistrict) %>% 
    coeftest(ols_multi_1, vcov = .)

# %Forest + patch area
ols_multi_2 <- lm(LogLyme0103_1113 ~ PercWuiPop + MeanPatchArea + PercLandForest, data = lyme)
ols_multi_2_clust <- ols_multi_2 %>% 
    vcovCL(cluster = ~ AgStatDistrict) %>% 
    coeftest(ols_multi_2, vcov = .)

# mean patch + edge
ols_multi_3 <- lm(LogLyme0103_1113 ~ PercWuiPop + MeanPatchArea + EdgeDensity, data = lyme)
ols_multi_3_clust <- ols_multi_3 %>% 
    vcovCL(cluster = ~ AgStatDistrict) %>% 
    coeftest(ols_multi_3, vcov = .)

# %Forest + mean patch + edge
ols_multi_4 <- lm(LogLyme0103_1113 ~ PercWuiPop + MeanPatchArea + EdgeDensity + PercLandForest, data = lyme)
ols_multi_4_clust <- ols_multi_4 %>% 
    vcovCL(cluster = ~ AgStatDistrict) %>% 
    coeftest(ols_multi_4, vcov = .)

     #                Combine all models into list                 ~~~
     #~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~


ols_models <- list(ols_biv = ols_biv_clust[2,1:2], 
                ols_multi_1 = ols_multi_1_clust[2:nrow(ols_multi_1_clust),1:2], 
                ols_multi_2 = ols_multi_2_clust[2:nrow(ols_multi_2_clust),1:2], 
                ols_multi_3 = ols_multi_3_clust[2:nrow(ols_multi_3_clust),1:2], 
                ols_multi_4 = ols_multi_4_clust[2:nrow(ols_multi_4_clust),1:2])





##~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
##~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
##                                                                            ~~
##                                  YEAR FE                                 ----
##                                                                            ~~
##~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
##~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~





##~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
##                            Bivariate models                              ----
##~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

yearfe_biv <- lm(LogLyme0103_1113 ~ PercWuiPop + as_factor(Year) - 1, data = lyme)
yearfe_biv_clust <- yearfe_biv %>% 
    vcovCL(cluster = ~ AgStatDistrict) %>% 
    coeftest(yearfe_biv, vcov = .)


##~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
##                            Multivariate models                           ----
##~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

# %forest + edge
yearfe_multi_1 <- lm(LogLyme0103_1113 ~ PercWuiPop + EdgeDensity + PercLandForest + as_factor(Year) - 1, data = lyme)
yearfe_multi_1_clust <- yearfe_multi_1 %>% 
    vcovCL(cluster = ~ AgStatDistrict) %>% 
    coeftest(yearfe_multi_1, vcov = .)

# %Forest + patch area
yearfe_multi_2 <- lm(LogLyme0103_1113 ~ PercWuiPop + MeanPatchArea + PercLandForest + as_factor(Year) - 1, data = lyme)
yearfe_multi_2_clust <- yearfe_multi_2 %>% 
    vcovCL(cluster = ~ AgStatDistrict) %>% 
    coeftest(yearfe_multi_2, vcov = .)

# mean patch + edge
yearfe_multi_3 <- lm(LogLyme0103_1113 ~ PercWuiPop + MeanPatchArea + EdgeDensity + as_factor(Year) - 1, data = lyme)
yearfe_multi_3_clust <- yearfe_multi_3 %>% 
    vcovCL(cluster = ~ AgStatDistrict) %>% 
    coeftest(yearfe_multi_3, vcov = .)

# %Forest + mean patch + edge
yearfe_multi_4 <- lm(LogLyme0103_1113 ~ PercWuiPop + MeanPatchArea + EdgeDensity + PercLandForest + as_factor(Year) - 1, data = lyme)
yearfe_multi_4_clust <- yearfe_multi_4 %>% 
    vcovCL(cluster = ~ AgStatDistrict) %>% 
    coeftest(yearfe_multi_4, vcov = .)

     #                Combine all models into list                 ~~~
     #~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~


year_fe_models <- list(yearfe_biv = yearfe_biv_clust[1, c("Estimate", "Std. Error")], 
                yearfe_multi_1 = yearfe_multi_1_clust[1:3, c("Estimate", "Std. Error")], 
                yearfe_multi_2 = yearfe_multi_2_clust[1:3, c("Estimate", "Std. Error")], 
                yearfe_multi_3 = yearfe_multi_3_clust[1:3, c("Estimate", "Std. Error")], 
                yearfe_multi_4 = yearfe_multi_4_clust[1:4, c("Estimate", "Std. Error")])






##~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
##~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
##                                                                            ~~
##                              STATE + YEAR FE                             ----
##                                                                            ~~
##~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
##~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~






##~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
##                            Bivariate models                              ----
##~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

stateyearfe_biv <- lm(LogLyme0103_1113 ~ PercWuiPop + as_factor(StateYrGroup) - 1, data = lyme)
stateyearfe_biv_clust <- stateyearfe_biv %>% 
    vcovCL(cluster = ~ AgStatDistrict) %>% 
    coeftest(stateyearfe_biv, vcov = .)


##~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
##                            Multivariate models                           ----
##~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

# %forest + edge
stateyearfe_multi_1 <- lm(LogLyme0103_1113 ~ PercWuiPop + EdgeDensity + PercLandForest + as_factor(StateYrGroup) - 1, data = lyme)
stateyearfe_multi_1_clust <- stateyearfe_multi_1 %>% 
    vcovCL(cluster = ~ AgStatDistrict) %>% 
    coeftest(stateyearfe_multi_1, vcov = .)

# %Forest + patch area
stateyearfe_multi_2 <- lm(LogLyme0103_1113 ~ PercWuiPop + MeanPatchArea + PercLandForest + as_factor(StateYrGroup) - 1, data = lyme)
stateyearfe_multi_2_clust <- stateyearfe_multi_2 %>% 
    vcovCL(cluster = ~ AgStatDistrict) %>% 
    coeftest(stateyearfe_multi_2, vcov = .)

# mean patch + edge
stateyearfe_multi_3 <- lm(LogLyme0103_1113 ~ PercWuiPop + MeanPatchArea + EdgeDensity + as_factor(StateYrGroup) - 1, data = lyme)
stateyearfe_multi_3_clust <- stateyearfe_multi_3 %>% 
    vcovCL(cluster = ~ AgStatDistrict) %>% 
    coeftest(stateyearfe_multi_3, vcov = .)

# %Forest + mean patch + edge
stateyearfe_multi_4 <- lm(LogLyme0103_1113 ~ PercWuiPop + MeanPatchArea + EdgeDensity + PercLandForest + as_factor(StateYrGroup) - 1, data = lyme)
stateyearfe_multi_4_clust <- stateyearfe_multi_4 %>% 
    vcovCL(cluster = ~ AgStatDistrict) %>% 
    coeftest(stateyearfe_multi_4, vcov = .)

     #                Combine all models into list                 ~~~
     #~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~


stateyearfe_models <- list(stateyearfe_biv = stateyearfe_biv_clust[1, c("Estimate", "Std. Error")], 
                stateyearfe_multi_1 = stateyearfe_multi_1_clust[1:3, 1:2], 
                stateyearfe_multi_2 = stateyearfe_multi_2_clust[1:3, 1:2], 
                stateyearfe_multi_3 = stateyearfe_multi_3_clust[1:3, 1:2], 
                stateyearfe_multi_4 = stateyearfe_multi_4_clust[1:4, 1:2])






##~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
##~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
##                                                                            ~~
##                                     IV                                   ----
##                                                                            ~~
##~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
##~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~







##~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
##                            Bivariate models                              ----
##~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

iv_biv <- ivreg(LogLyme0103_1113 ~ PercWuiPop |
  DensityResIndex,
  data = lyme_na)
iv_biv_clust <- iv_biv %>% 
    vcovCL(cluster = ~ AgStatDistrict) %>% 
    coeftest(iv_biv, vcov = .)


##~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
##                            Multivariate models                           ----
##~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

# %forest + edge
iv_multi_1 <- ivreg(LogLyme0103_1113 ~ EdgeDensity + PercLandForest | 
                        PercWuiPop | 
                        DensityResIndex, 
                    data = lyme_na)
iv_multi_1_clust <- iv_multi_1 %>% 
    vcovCL(cluster = ~ AgStatDistrict) %>% 
    coeftest(iv_multi_1, vcov = .)

# %Forest + patch area
iv_multi_2 <- ivreg(LogLyme0103_1113 ~ MeanPatchArea + PercLandForest | 
                        PercWuiPop | 
                        DensityResIndex, 
                    data = lyme_na)
iv_multi_2_clust <- iv_multi_2 %>% 
    vcovCL(cluster = ~ AgStatDistrict) %>% 
    coeftest(iv_multi_2, vcov = .)

# mean patch + edge
iv_multi_3 <- ivreg(LogLyme0103_1113 ~ MeanPatchArea + EdgeDensity | 
                        PercWuiPop | 
                        DensityResIndex, 
                    data = lyme_na)
iv_multi_3_clust <- iv_multi_3 %>% 
    vcovCL(cluster = ~ AgStatDistrict) %>% 
    coeftest(iv_multi_3, vcov = .)

# %Forest + mean patch + edge
iv_multi_4 <- ivreg(LogLyme0103_1113 ~ MeanPatchArea + EdgeDensity + PercLandForest | 
                        PercWuiPop | 
                        DensityResIndex, 
                    data = lyme_na)
iv_multi_4_clust <- iv_multi_4 %>% 
    vcovCL(cluster = ~ AgStatDistrict) %>% 
    coeftest(iv_multi_4, vcov = .)

     #                Combine all models into list                 ~~~
     #~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

iv_models <- list(iv_biv = iv_biv_clust[1, c("Estimate", "Std. Error")], 
                iv_multi_1 = iv_multi_1_clust[2:4, 1:2], 
                iv_multi_2 = iv_multi_2_clust[2:4, 1:2], 
                iv_multi_3 = iv_multi_3_clust[2:4, 1:2], 
                iv_multi_4 = iv_multi_4_clust[2:5, 1:2])








##~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
##~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
##                                                                            ~~
##                                IV + YEAR FE                              ----
##                                                                            ~~
##~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
##~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~






##~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
##                            Bivariate models                              ----
##~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

iv_yearfe_biv <- ivreg(LogLyme0103_1113 ~ PercWuiPop  + as_factor(Year) - 1 | 
                           DensityResIndex + as_factor(Year) - 1, 
                       data = lyme_na
  )
iv_yearfe_biv_clust <- iv_yearfe_biv %>% 
    vcovCL(cluster = ~ AgStatDistrict) %>%
    coeftest(iv_yearfe_biv, vcov = .)


##~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
##                            Multivariate models                           ----
##~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

# %forest + edge
iv_yearfe_multi_1 <- ivreg(
    LogLyme0103_1113 ~ PercWuiPop + PercLandForest + EdgeDensity + as_factor(Year) - 1 | 
                       DensityResIndex + PercLandForest + EdgeDensity + as_factor(Year) - 1, 
                       data = lyme_na
  )
iv_yearfe_multi_1_clust <- iv_yearfe_multi_1 %>% 
    vcovCL(cluster = ~ AgStatDistrict) %>% 
    coeftest(iv_yearfe_multi_1, vcov = .)

# %Forest + patch area
iv_yearfe_multi_2 <- ivreg(
    LogLyme0103_1113 ~ PercWuiPop + PercLandForest + MeanPatchArea + as_factor(Year) - 1 | 
                       DensityResIndex + PercLandForest + MeanPatchArea + as_factor(Year) - 1, 
                       data = lyme_na
  )
iv_yearfe_multi_2_clust <- iv_yearfe_multi_2 %>% 
    vcovCL(cluster = ~ AgStatDistrict) %>% 
    coeftest(iv_yearfe_multi_2, vcov = .)

# mean patch + edge
iv_yearfe_multi_3 <- ivreg(
    LogLyme0103_1113 ~ PercWuiPop + EdgeDensity + MeanPatchArea + as_factor(Year) - 1 | 
                       DensityResIndex + EdgeDensity + MeanPatchArea + as_factor(Year) - 1 , 
                       data = lyme_na
  )
iv_yearfe_multi_3_clust <- iv_yearfe_multi_3 %>% 
    vcovCL(cluster = ~ AgStatDistrict) %>% 
    coeftest(iv_yearfe_multi_3, vcov = .)

# %Forest + mean patch + edge
iv_yearfe_multi_4 <- ivreg(
    LogLyme0103_1113 ~ PercWuiPop + PercLandForest + EdgeDensity + MeanPatchArea + as_factor(Year) - 1| 
                       DensityResIndex + PercLandForest + EdgeDensity + MeanPatchArea + as_factor(Year) - 1, 
                       data = lyme_na
  )
iv_yearfe_multi_4_clust <- iv_yearfe_multi_4 %>% 
    vcovCL(cluster = ~ AgStatDistrict) %>% 
    coeftest(iv_yearfe_multi_4, vcov = .)

     #                Combine all models into list                 ~~~
     #~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

iv_yearfe_models <- list(iv_yearfe_biv = iv_yearfe_biv_clust[1, c("Estimate", "Std. Error")], 
                iv_yearfe_multi_1 = iv_yearfe_multi_1_clust[1:3, 1:2], 
                iv_yearfe_multi_2 = iv_yearfe_multi_2_clust[1:3, 1:2], 
                iv_yearfe_multi_3 = iv_yearfe_multi_3_clust[1:3, 1:2], 
                iv_yearfe_multi_4 = iv_yearfe_multi_4_clust[1:4, 1:2])






##~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
##~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
##                                                                            ~~
##                          FUNCTION TO CREATE MODELS                       ----
##                                                                            ~~
##~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
##~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~





make_table <- function(models, type_label){

    table_output = data_frame()
    
    for(i in 1:length(models)){
        
        if(length(models[[i]]) < 3){
            
            table_output <- table_output %>% 
                bind_rows(models[[i]]) %>% 
              mutate(model = type_label, 
                     type = "bivariate",
                     covariate = "PercWuiPop",
                     num = 0)
        }
        else{
            table_output <- table_output %>% 
                 bind_rows(
            models[[i]] %>% as_tibble()
         %>%
              mutate(model = type_label,
                     type = "multivariate",
                     covariate = names(models[[i]][,1]),
                     num = i - 1
                     )
                 )
        }
    }
    
    return(table_output)

}




ols_table <- make_table(ols_models, "OLS") %>% 
    select(-model) %>% 
    rename(Std_Error = "Std. Error") %>%
    mutate(Estimate = Estimate %>% signif(2), 
           Std_Error = Std_Error %>% signif(2))

year_fe_table <- make_table(year_fe_models, "Year FE") %>% 
    select(-model) %>% 
    rename(Std_Error = "Std. Error") %>%
    mutate(Estimate = Estimate %>% signif(2), 
           Std_Error = Std_Error %>% signif(2))

stateyearfe_table <- make_table(stateyearfe_models, "State + Year FE") %>% 
    select(-model) %>% 
    rename(Std_Error = "Std. Error") %>%
    mutate(Estimate = Estimate %>% signif(2), 
           Std_Error = Std_Error %>% signif(2))

iv_table <- make_table(iv_models, "IV") %>% 
    select(-model) %>% 
    rename(Std_Error = "Std. Error") %>%
    mutate(Estimate = Estimate %>% signif(2), 
           Std_Error = Std_Error %>% signif(2))

iv_yearfe_table <- make_table(iv_yearfe_models, "IV + Year FE") %>% 
    select(-model) %>% 
    rename(Std_Error = "Std. Error") %>%
    mutate(Estimate = Estimate %>% signif(2), 
           Std_Error = Std_Error %>% signif(2))

Table creation for every model

# A tibble: 14 × 5
   Estimate Std_Error type         covariate        num
      <dbl>     <dbl> <chr>        <chr>          <dbl>
 1   0.017     0.0034 bivariate    PercWuiPop         0
 2   0.0051    0.0038 multivariate PercWuiPop         1
 3   0.03      0.0061 multivariate EdgeDensity        1
 4  -0.013     0.009  multivariate PercLandForest     1
 5   0.0066    0.0037 multivariate PercWuiPop         2
 6  -0.059     0.013  multivariate MeanPatchArea      2
 7   0.057     0.012  multivariate PercLandForest     2
 8   0.0052    0.0037 multivariate PercWuiPop         3
 9  -0.014     0.0077 multivariate MeanPatchArea      3
10   0.026     0.0048 multivariate EdgeDensity        3
11   0.005     0.0038 multivariate PercWuiPop         4
12  -0.022     0.02   multivariate MeanPatchArea      4
13   0.023     0.0094 multivariate EdgeDensity        4
14   0.0095    0.023  multivariate PercLandForest     4

# A tibble: 14 × 5
   Estimate Std_Error type         covariate        num
      <dbl>     <dbl> <chr>        <chr>          <dbl>
 1   0.016     0.0035 bivariate    PercWuiPop         0
 2   0.0046    0.0038 multivariate PercWuiPop         1
 3   0.03      0.0061 multivariate EdgeDensity        1
 4  -0.012     0.0091 multivariate PercLandForest     1
 5   0.0061    0.0038 multivariate PercWuiPop         2
 6  -0.06      0.013  multivariate MeanPatchArea      2
 7   0.058     0.012  multivariate PercLandForest     2
 8   0.0047    0.0038 multivariate PercWuiPop         3
 9  -0.014     0.0078 multivariate MeanPatchArea      3
10   0.027     0.0049 multivariate EdgeDensity        3
11   0.0045    0.0038 multivariate PercWuiPop         4
12  -0.022     0.019  multivariate MeanPatchArea      4
13   0.023     0.0095 multivariate EdgeDensity        4
14   0.01      0.022  multivariate PercLandForest     4

# A tibble: 14 × 5
   Estimate Std_Error type         covariate        num
      <dbl>     <dbl> <chr>        <chr>          <dbl>
 1   0.018     0.0038 bivariate    PercWuiPop         0
 2   0.0057    0.0038 multivariate PercWuiPop         1
 3   0.029     0.0057 multivariate EdgeDensity        1
 4  -0.0069    0.011  multivariate PercLandForest     1
 5   0.0066    0.0038 multivariate PercWuiPop         2
 6  -0.061     0.013  multivariate MeanPatchArea      2
 7   0.064     0.014  multivariate PercLandForest     2
 8   0.0064    0.0039 multivariate PercWuiPop         3
 9  -0.011     0.0089 multivariate MeanPatchArea      3
10   0.027     0.0049 multivariate EdgeDensity        3
11   0.0055    0.0038 multivariate PercWuiPop         4
12  -0.028     0.019  multivariate MeanPatchArea      4
13   0.02      0.0086 multivariate EdgeDensity        4
14   0.021     0.023  multivariate PercLandForest     4

# A tibble: 14 × 5
   Estimate Std_Error type         covariate        num
      <dbl>     <dbl> <chr>        <chr>          <dbl>
 1  1.4        0.4    bivariate    PercWuiPop         0
 2  0.038      0.014  multivariate PercWuiPop         1
 3  0.0093     0.0089 multivariate EdgeDensity        1
 4 -0.031      0.014  multivariate PercLandForest     1
 5  0.037      0.013  multivariate PercWuiPop         2
 6 -0.038      0.019  multivariate MeanPatchArea      2
 7  0.0031     0.026  multivariate PercLandForest     2
 8  0.037      0.014  multivariate PercWuiPop         3
 9 -0.035      0.012  multivariate MeanPatchArea      3
10  0.00038    0.011  multivariate EdgeDensity        3
11  0.037      0.014  multivariate PercWuiPop         4
12 -0.042      0.022  multivariate MeanPatchArea      4
13 -0.0025     0.011  multivariate EdgeDensity        4
14  0.008      0.027  multivariate PercLandForest     4

# A tibble: 14 × 5
   Estimate Std_Error type         covariate        num
      <dbl>     <dbl> <chr>        <chr>          <dbl>
 1   0.037     0.0087 bivariate    PercWuiPop         0
 2   0.038     0.014  multivariate PercWuiPop         1
 3  -0.031     0.014  multivariate PercLandForest     1
 4   0.0095    0.0089 multivariate EdgeDensity        1
 5   0.037     0.013  multivariate PercWuiPop         2
 6   0.0038    0.026  multivariate PercLandForest     2
 7  -0.038     0.019  multivariate MeanPatchArea      2
 8   0.037     0.014  multivariate PercWuiPop         3
 9   0.0007    0.011  multivariate EdgeDensity        3
10  -0.035     0.012  multivariate MeanPatchArea      3
11   0.037     0.014  multivariate PercWuiPop         4
12   0.0084    0.027  multivariate PercLandForest     4
13  -0.0023    0.011  multivariate EdgeDensity        4
14  -0.042     0.022  multivariate MeanPatchArea      4

These tables contain all estimates and errors for every tested model.

A series of robust checks were also performed on the first stage for 5 different models containing the different combinations of covariates specified previously. These tests are the non-robust F test, the Effective F test, and the Specification test. Although here I will just perform the Effective F Weak Instruments Test.

Code

##~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
##  ~ Model 1: No Covariates  ----
##~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

no_covariates <- eff_F(lyme_na, Y = "LogLyme0103_1113", D = "PercWuiPop", Z = "DensityResIndex", cl = "AgStatDistrict", FE = "as_factor(Year)")



##~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
##  ~ Model 2: % Forest + Edge Density  ----
##~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

forest_edge <- eff_F(lyme_na, Y = "LogLyme0103_1113", D = "PercWuiPop", Z = "DensityResIndex", controls = c("PercLandForest", "EdgeDensity"), cl = "AgStatDistrict")



##~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
##  ~ Model 3: % Forest + Mean Patch Area  ----
##~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

forest_patch <- eff_F(lyme_na, Y = "LogLyme0103_1113", D = "PercWuiPop", Z = "DensityResIndex", controls = c("PercLandForest", "MeanPatchArea"), cl = "AgStatDistrict")




##~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
##  ~ Model 4: Edge Density + Mean Patch Area  ----
##~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

edge_patch <- eff_F(lyme_na, Y = "LogLyme0103_1113", D = "PercWuiPop", Z = "DensityResIndex", controls = c("EdgeDensity", "MeanPatchArea"), cl = "AgStatDistrict")




##~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
##  ~ Model 5: All Covariates  ----
##~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

all_covariates <- eff_F(lyme_na, Y = "LogLyme0103_1113", D = "PercWuiPop", Z = "DensityResIndex", controls = c("PercLandForest", "EdgeDensity", "MeanPatchArea"), cl = "AgStatDistrict")




#....................Printing the F statistics...................
cat(" No covariates effective F stat: ", no_covariates, "**", "\n", 
    "% forest + edge density effective F stat: ", forest_edge, "\n", 
    "% forest + Mean Patch Area effective F stat: ", forest_patch, "\n", 
    "Edge density + mean match area effective F stat: ", edge_patch, "\n", 
    "All covariates effective F stat: ", all_covariates, "\n"
    )

 No covariates effective F stat:  23.6541 ** 
 % forest + edge density effective F stat:  11.9416 
 % forest + Mean Patch Area effective F stat:  10.5339 
 Edge density + mean match area effective F stat:  12.8009 
 All covariates effective F stat:  11.4094

The only model that is significant to \(\alpha < 0.05\) is the no covariates model. The other models would be significant if we accept \(\tau < 0.30\), but that is very low precision.

The other robustness tests were not performed because I could not replicate their values, likely due to my own error. This being said, we can still check the F statistic of each of the models, as it is build into summary().

Code

no_covariates_f <- lm(PercWuiPop ~ DensityResIndex + # instrumental variable
       as_factor(Year), # fixed effects
  data = lyme_na) %>% 
    summary()

forest_edge_f <- lm(PercWuiPop ~ DensityResIndex + # instrumental variable
       MeanPatchArea + EdgeDensity + # covariates
       as_factor(Year), # fixed effects
  data = lyme_na) %>% 
    summary()

forest_patch_f <- lm(PercWuiPop ~ DensityResIndex + # instrumental variable
       MeanPatchArea + PercLandForest + # covariates
       as_factor(Year), # fixed effects
  data = lyme_na) %>% 
    summary()

edge_patch_f <- lm(PercWuiPop ~ DensityResIndex + # instrumental variable
       MeanPatchArea + EdgeDensity + # covariates
       as_factor(Year), # fixed effects
  data = lyme_na) %>% 
    summary()


all_covariates_f <- lm(PercWuiPop ~ DensityResIndex + # instrumental variable
       MeanPatchArea + EdgeDensity + PercLandForest + # covariates
       as_factor(Year), # fixed effects
  data = lyme_na) %>% 
    summary()


cat(" No covariates non-robust F stat: ", no_covariates_f$fstatistic[1], "**", "\n", 
    "% forest + edge density non-robust F stat: ", forest_edge_f$fstatistic[1], "\n", 
    "% forest + Mean Patch Area non-robust F stat: ", forest_patch_f$fstatistic[1], "\n", 
    "Edge density + mean match area non-robust F stat: ", edge_patch_f$fstatistic[1], "\n", 
    "All covariates non-robust F stat: ", all_covariates_f$fstatistic[1]
    )

 No covariates non-robust F stat:  37.18896 ** 
 % forest + edge density non-robust F stat:  141.0745 
 % forest + Mean Patch Area non-robust F stat:  140.3725 
 Edge density + mean match area non-robust F stat:  141.0745 
 All covariates non-robust F stat:  116.9124

These F statistics are likely very overinflated, but are significantly above 10.

Conclusion

The study seems pretty rock solid in terms of validating the assumptions of the model, as CMSE accounts for clustering and biases, the effective F accounts for potential heteroscedasticity, and the fixed effects account for any autocorrelation issues (even though it didn’t have much of an effect). Furthermore, the IV approach potentially mitigated any endogeneity bias.

The only mark against the study is that the effective F was not significant when adding the covariates at a \(\tau\) of 10%. It was significant without them, but omitting them could lead of omitted variable bias.

This study possibly suggests that lot size regulations are a good instrument to measure Lyme incidence. At the very least, it suggests that this methodology could be more effective than previously used ones, as the endogeneity bias makes directly using habitat fragmentation as a predictor unreliable.

Citation

BibTeX citation:

@online{paul_cohen2025,
  author = {Paul Cohen, Joshua and Overbye, Amanda and Mull, Josh},
  title = {MacDonald Et Al. (2018) {Breakdown}},
  date = {2025-03-23},
  url = {https://silkiemoth.github.io/},
  langid = {en}
}

For attribution, please cite this work as:

Paul Cohen, Joshua, Amanda Overbye, and Josh Mull. 2025. “MacDonald Et Al. (2018) Breakdown.” March 23, 2025. https://silkiemoth.github.io/.