This exercise continues our examination of the effect of TSPs pollution on infant mortality. Here
we explore violations of the assumptions underlying the Gauss-Markov theorem that guarantees
that OLS is BLUE. This exercise will help (or perhaps force) you to practice the solutions to
these violations that we have discussed in class.
Once again, feel free to work cooperatively, but each person is required to turn in his/her own
problem set that provides the solutions in his/her own words.
The unit of observation is the county and there are 462 observations of 19 variables. Each
observation records the change between 1972 and 1971 (i.e., the 1972 minus the 1971 value) for
each of the variables. The lone exception is the tbirth variable that equals the sum of the 1972
and 1972 number of births. This variable should be used as a weight (in STATA language
this means w=tbirth) in ALL regressions in this exercise.
The relevant variables (with descriptions in quotations) are:
dimr7271 “# inf death per 1,000 births 72-71”
tbirth “total births 71 & 72”
dwhite “% births, white mom 72-71”
dothr “% births, nonwhite/nonblack mom 72-71”
dfemale “% female births 72-71”
dedudad “father yrs of ed 72-71”
dedumom “mother yrs of ed 72-71”
dlwght “% births with weight<2,500 g 72-71”
dmaried “% mother married 72-71”
dunmard “% mother unmarried 72-71”
dagemom “mother age 72-71”
dpcare1 “% mom began month 1 or 2 72-71”
dpcare2 “% mom began 3rd month 72-71”
dpcare3 “% mom began 4-6th month 72-71”
dpcare4 “% mom began 7-9th month 72-71”
dpcinc “county-level per cap income 72-71”
dmtspgm “county-level tsps concen 72-71”
fstate “fips state code”
reg_tsp “=1 if county regulated for tsps”
a. Plot dimr7271 against dmtspgm. Does it look like there is an association between changes in
infant mortality and tsps? Repeat this exercise where weight is set equal to tbirth. Now, does
there appear to be a relationship?
b. Based on the scatterplots from part a., is there any evidence on
homoskedasticity/ heteroskedasticity in the change infant mortality rate model?
c. Suppose that there is heteroskedasticity in the residuals of the change imr regression. Is the
OLS estimator of the effect of TSPs unbiased and consistent? Efficient? Is the “conventional”
estimator of the variance of the estimated effect of TSPs unbiased/consistent?
d. Regress dimr7271 on dmtspgm. Is there evidence of a relationship here? Now add the
complete set of control variables (i.e., dwhite dothr dfemale dedudad dedumom dmaried
dumard dagemom dlwght dpcare0 dpcare1 dpcare2 dpcare3 dpcare4 dpcinc). Have your
conclusions changed? What might explain the differences in these estimates and the cross
sectional ones in the last problem set?
e. Now repeat the complete set of controls regression, but use the “robust” subcommand in
STATA to calculate the White heteroskedastic consistent standard errors (reg dimr7271
dmtspgm dwhite dothr dpcare0 dpcare1 dpcare2 dpcare3 dpcare4 dpcinc [w=tbirth], robust).
Explain briefly how these estimates of the standard errors are corrected for heteroskedasticity.
How do they compare to the “uncorrected” (conventional) LS estimates of the standard
errors? Is there evidence of heteroskedasticity?
f. Using the “predict” STATA command (predict [variable name], residual), save the residuals
from the LS complete set of controls regression. Now regress these residuals on the complete set
of controls. Explain why the R-squared and estimated coefficients from the regression are
g. Now let’s apply a more formal test for heteroskedasticity. Regress the squared values of the
residuals from the LS complete set of controls regression on the complete set of controls. Use
the R2 to test for heteroskedasticity. What do you find? Now include the complete set of
controls and their squares. Does this change your conclusions? Now apply White’s special test
for heterskedasticity. Does this change your conclusions?
h. Now suppose that someone (call her God) tells you that the var(ei) = c * dmtspgm. Is this
evidence of heteroskedasticity? If so, what would you do to return to the Gauss-Markov
assumptions? What are the advantages of this approach relative to White standard errors? In
practice, what are the potential problems with this approach?