Instructions to Students
- This assignment is worth 7% of your total mark.
- You may choose to either typeset your assignment in LATEX, or handwrite and scan it to produce an electronic version.
- You may use R for this assignment, including the lm function unless otherwise specifified. If you do, include your R commands and output.Write your answers on A4 paper. Page 1 should only have your student number, the subject code and the subject name. Write on one side of each sheet only. Each question should be on a new page. The question number must be written at the top of each page.
Scanning and Submitting
- Put the pages in question order and all the same way up. Use a scanning app to scan all pages to PDF. Scan directly from above. Crop pages to A4.
- Submit your scanned assignment as a single PDF fifile and carefully review the submission in Gradescope. Scan again and resubmit if necessary.
Question 1 (5 marks)
Consider a general full rank linear model y = Xβ + ε with p > 2 parameters. Derive an expression for a joint 100(1 − α)% confifidence region for parameters βi and βj , where i and j are arbitrary.
Question 2 (11 marks) (11 marks does not include the bonus part (f).)
An experiment is conducted to estimate the annual demand for cars, based on their cost, the current unemployment rate, and the current interest rate. A survey is conducted and the following measurements obtained:
For this question, you may not use the lm function in R.
(a) Fit a linear model to the data, and estimate the parameters and error variance.
(b) Calculate 95% confifidence intervals for the model parameters.
(c) In a year with 8% unemployment rate and 3.5% interest rate, we price a car at $12, 000 and observe that 7,000 cars are sold. Is this an atypical year (according to your model)?
(d) Using your answer from question 1, fifind and draw a joint 95% confifidence region for the parameters corresponding to unemployment rate and interest rate. Superimpose a rectangle corresponding to the confifidence intervals found in (b).
(e) Do you expect the confifidence region to be larger or smaller than the rectangle? Justify your answer.
(f) (Bonus) What is the probability that the true parameters for unemployment rate and interest rate (jointly) lie in the rectangle you drew in (d)?
Question 3 (7 marks)
Consider a full rank linear model y = Xβ + ε. Derive a formula for a 100(1 − α)% prediction interval for the sum of the responses of two independent future observations y1 and y2, with predictors x1 and x2 respectively.
Question 4 (12 marks)
For this question we use the data set bike.csv (available on the LMS). This data set records counts of public bikes rented in an hour with the corresponding weather information. The variables are:
count = the number of bikes rented in an hour
temp = temperature (in Celsius)
hum = relative humidity
wind = windspeed (in m/s)
visi = visibility (in metres)
dew = dew point temperature (in Celsius)
solar = solar radiation (in MJ/m2)
(a) Fit a linear model using all of the variables.
(b) Test for model relevance, using a corrected sum of squares.
(c) Use forward selection with F tests to select variables for your model.
(d) Starting from a null model, use stepwise selection with AIC to select variables for your model. Use this as your fifinal model; comment brieflfly on the variables included.
(e) Using the full model, test whether the temperature and dew point temperature have the same effffect on the number of bikes rented.
(f) Comment on the suitability of your fifinal model, using diagnostic plots.
Question 5 (5 marks)
Suppose that we have a response variable y which is known to have a quadratic relationship with a predictor variable x. Explain all of the difffferences between fifitting a linear model of y against x and x2 , versus a linear model of √y against x. Which would you use for each of the two datasets shown below?