这是中国香港的一个线性回归模型problem set代写

This exercise continues our examination of the effect of TSPs pollution on infant mortality. Here

we explore violations of the assumptions underlying the Gauss-Markov theorem that guarantees

that OLS is BLUE. This exercise will help (or perhaps force) you to practice the solutions to

these violations that we have discussed in class.

Once again, feel free to work cooperatively, but each person is required to turn in his/her own

problem set that provides the solutions in his/her own words.

The unit of observation is the county and there are 462 observations of 19 variables. Each

observation records the change between 1972 and 1971 (i.e., the 1972 minus the 1971 value) for

each of the variables. The lone exception is the tbirth variable that equals the sum of the 1972

and 1972 number of births. This variable should be used as a weight (in STATA language

this means w=tbirth) in ALL regressions in this exercise.

The relevant variables (with descriptions in quotations) are:

dimr7271 “# inf death per 1,000 births 72-71”

tbirth “total births 71 & 72”

dwhite “% births, white mom 72-71”

dothr “% births, nonwhite/nonblack mom 72-71”

dfemale “% female births 72-71”

dedudad “father yrs of ed 72-71”

dedumom “mother yrs of ed 72-71”

dlwght “% births with weight<2,500 g 72-71”

dmaried “% mother married 72-71”

dunmard “% mother unmarried 72-71”

dagemom “mother age 72-71”

dpcare1 “% mom began month 1 or 2 72-71”

dpcare2 “% mom began 3rd month 72-71”

dpcare3 “% mom began 4-6th month 72-71”

dpcare4 “% mom began 7-9th month 72-71”

dpcinc “county-level per cap income 72-71”

dmtspgm “county-level tsps concen 72-71”

fstate “fips state code”

reg_tsp “=1 if county regulated for tsps”

## 1. Introduction to the New Data and Problems with the Residual

a. Plot dimr7271 against dmtspgm. Does it look like there is an association between changes in

infant mortality and tsps? Repeat this exercise where weight is set equal to tbirth. Now, does

there appear to be a relationship?

b. Based on the scatterplots from part a., is there any evidence on

homoskedasticity/ heteroskedasticity in the change infant mortality rate model?

c. Suppose that there is heteroskedasticity in the residuals of the change imr regression. Is the

OLS estimator of the effect of TSPs unbiased and consistent? Efficient? Is the “conventional”

estimator of the variance of the estimated effect of TSPs unbiased/consistent?

d. Regress dimr7271 on dmtspgm. Is there evidence of a relationship here? Now add the

complete set of control variables (i.e., dwhite dothr dfemale dedudad dedumom dmaried

dumard dagemom dlwght dpcare0 dpcare1 dpcare2 dpcare3 dpcare4 dpcinc). Have your

conclusions changed? What might explain the differences in these estimates and the cross

sectional ones in the last problem set?

e. Now repeat the complete set of controls regression, but use the “robust” subcommand in

STATA to calculate the White heteroskedastic consistent standard errors (reg dimr7271

dmtspgm dwhite dothr dpcare0 dpcare1 dpcare2 dpcare3 dpcare4 dpcinc [w=tbirth], robust).

Explain briefly how these estimates of the standard errors are corrected for heteroskedasticity.

How do they compare to the “uncorrected” (conventional) LS estimates of the standard

errors? Is there evidence of heteroskedasticity?

f. Using the “predict” STATA command (predict [variable name], residual), save the residuals

from the LS complete set of controls regression. Now regress these residuals on the complete set

of controls. Explain why the R-squared and estimated coefficients from the regression are

virtually zero.

g. Now let’s apply a more formal test for heteroskedasticity. Regress the squared values of the

residuals from the LS complete set of controls regression on the complete set of controls. Use

the R2 to test for heteroskedasticity. What do you find? Now include the complete set of

controls and their squares. Does this change your conclusions? Now apply White’s special test

for heterskedasticity. Does this change your conclusions?

h. Now suppose that someone (call her God) tells you that the var(ei) = c * dmtspgm. Is this

evidence of heteroskedasticity? If so, what would you do to return to the Gauss-Markov

assumptions? What are the advantages of this approach relative to White standard errors? In

practice, what are the potential problems with this approach?