Skip links

数学代写|Statistics – Flexible Regression

这是一篇英国的Flexible Regression限时测试数学代写


NOTE: Candidates should attempt ALL 3 questions.

  1. A common model for population growth is the logistic growth model;

Yi =α1/1 + exp{−(α2 + α3xi)} +  i , i = 1, . . . , n   (1)

where Yi is the population (in 1000s), xi is the year variable, α1, α2, α3 are model parameters and  i ∼ N (0, σ2 ).

(a) In order to fifit the logistic growth model (Equation 1), to a set of data, we could use non-linear least squares. Derive the iterative steps that would be used to fit this non-linear model using non-linear least squares. In your derivation use f() to defifine a suitable function of the data and parameters, α = (α1, α2, α3)T to represent the parameters, Rt to denote the matrix of derivatives at iteration t with (i, j)th element Rt ij = ∂f(xi, ˆαt)/∂αj , and vt = y f(x, ˆαt ) + Rt ˆαt enablesa reparametrisation to a linear objective at iteration t. Within your derivation explain any approximations and reparameterisations that you have used.


(b) The nls() function in R can be used to fifit the logistic growth model using nonlinear least squares with user defifined starting values for the model parameters (α1,α2, and α3). An alternative approach to using user defifined starting values would be to automatically obtain initial parameter values using a self-starter function.

An appropriate self-starter function for the logistic growth model (Equation 1) is the logistic self-starter function given by,

η1/1 + exp((η2 x)3).   (2)

Show that the deterministic component of the logistic growth model (Equation 1) is equal to the logistic self-starter function (Equation 2) by re-parameterising α1,α2, and α3 in terms of η1, η2, and η3.


(c) An alternative model for a non-linear relationship between population (in 1000s) and year would be a non-parametric model i.e.

Yi = f(xi) +  i , i = 1, . . . , n   (3)

where Yi is the population (in 1000s), f() is a smooth function of xi , the year variable, and  i ∼ N (0, σ2 ).

i.Local polynomial regression is one possible non-parametric method that could be used. Write down the model fifitting criterion for local polynomial regression in vector-matrix notation for target point x, where y is the response data, X is the design matrix, α is the vector of parameters and W is a weight matrix.

Ensure you detail the structure of each of these components, and state a suitable function to be used to create W[4 MARKS]

ii.Specififically referring to your suggested formulation for W in part (i), explain the role of the weight function and indicate, for your choice of kernel, how the smoothing parameter changes the smoothness of the function when fiting a local polynomial regression model.  [3 MARKS]

iii. Suppose you have fitted a local cubic regression to a set of data, using your chosen kernel for the weight function. Following model fitting you obtain dfmod = 7.2.

State the formula for dfmod in this context, ensuring you provide a definition for any quantities you refer to, and provide an interpretation for this value in terms of the complexity of the model that has been fifitted.  [3 MARKS]

iv.The Epanechnikov kernel function is defifined as,

w(xi x; h) = W ( |xi x| /h) (4)


W(u) =  (1 u2) : for 0 u < 1

0 : otherwise

and h is the smoothing parameter.

Show that, for a local cubic regression with an Epanechnikov kernel function,as h → ∞ then tr(S) = 4, where S is defifined to be the smoothing matrix such thatˆy = Sy[4 MARKS]

  1. Phosphorus is an essential element for plant life. However, phosphorus can also be a pollutant in water bodies such as lakes and rivers, with high levels of it affecting the aquatic life below the waters surface. Data were collected from lakes in Michigan (MI), Maine (ME) and Wisconsin (WI) to investigate the relationships between water chemistry and catchment variables and the levels of phosphorus in lakes.

A generalised additive model was fifitted with a gamma response of Phosphorus (using a log link), smooth functions for each of log area (in m2 ) (area.m2), log elevation (in m) (elev.m), log mean depth (in m) (mean.depth.m) and the percentage of the catchment area surrounding the lake that was agricultural (agri), and a factor effect for State (coded ‘ME’, ‘MI’ and ‘WI’).

(a) State the equation for a generalised additive model in this context, clearly defifining any notation you use that is not defifined in the question.  [3 MARKS]

Leave a comment