Instructions:
Do not put all R code and outputs at the end of the document.
Researchers from Baystate Medical Centre in Massachusetts were interested in identifying risk factors associated with giving birth to a low birth baby (weighing less than 2500 grams). The main risk factors of interest were the mother’s smoking status, age, the presence of uterine irritability, and age. The main aim of this assignment is to analyse the data using several hypothesis tests and regression.
The columns of the file contain the following information:
Column Name Description
C1 LowWt Low Birth Weight (= 1 if Birth Weight < 2500g; = 0 if Birth Weight ≥ 2500g;)
C2 BirthWt Birth weight in grams
C3 Age Mother’s age in years
C4 MotherWt Mother’s weight prior to pregnancy in kg
C5 Smoke Mother’s smoking status during pregnancy (= 1 if yes; = 0 if no)
C6 UterIrr Presence of uterine irritability (= 1 if yes; = 0 if no)
(a) Regress BirthWt on Age, MotherWt, Smoke, UterIrr and write down the estimated multiple linear regression equation.
(b) Interpret the slope coefficient associated with Age and Smoke.
(c) Find and interpret the coefficient of determination.
(d) Conduct a hypothesis test to determine whether or not the model is useful. Include mention of H0 and H1, the observed value of the test statistic, the p-value, the decision, and a conclusion.
(e) Identify all the independent variables which are significant at the 5% significance level.
(f) Perform backward elimination by using step() function in R and BIC criterion to attain the best subset of independent variables to predict BirthWt
(g) Perform the residual diagnostics on the final model.
(a) Construct a contingency table that shows the number of women who have just given birth grouped by low birth weight baby (LowWt) and smoking status (Smoke) .
(b) At the 0.05 level of significance, is there evidence of a significant association between smoking status of the mother and a baby of low birth weight? Include mention of H0 and H1, the observed value of the test statistic, the p-value, a decision, and a conclusion.
(c) What are the odds of a woman who smoked having a low birth weight baby?
(d) What are the odds of a woman who did not smoke having a low birth weight baby?
(e) What are the odds of a women who had a low birth weight baby being a non smoker?
(f) What is the odds ratio for women who did and did not smoke having a low birth weight baby?
(g) Fit a logistic regression of normal birth weight on smoking status. Treat no smoke as a base group. Write down the fitted logistic regression equation.
(h) Refer to part (g). What is the odds ratio for women who did and did not smoke? Is it the same as your calculation in part (f)? Explain why or why not.