  AUTHOR
essaygo
PUBLISHED ON:
2022年11月8日
PUBLISHED IN:

Submission: Read the submission instruction carefully! There are 4 questions in this assignment.

You need to submit two fifiles through Quercus for this assignment.

• The fifirst fifile should be a PDF fifile titled hw3_writeup.pdf containing your answers to Questions 1 – 4, as well as R code and R outputs requested for Questions 3 and 4. You can produce the fifile however you like (e.g. LATEX, Microsoft Word, scanner), as long as it is readable.
• The second fifile should be your completed R code, named as penalized logistic regression.R.

You need to ensure that this fifile has the exact name as indicated. DO NOT set or modify the working directory within this fifile.

Neatness Point: You will be deducted one point if we have a hard time reading your solutions or understanding the structure of your code.

Late Submission: 10% of the total possible marks will be deducted for each day late, up to a maximum of 3 days. After that, no submissions will be accepted.

• Problem 1 (3 pts)

Consider the classifification problem with the label of Y belong to C := {1, 2, . . . , K} and any realization x of X Rp . Let f be any classififier that maps any x Rp to a label in C.

1. (2 pts) Prove that the best function f (i.e. the Bayes classififier)

f := argmin

f:Rp→C  Eh 1{Y 6 =f(X)} | X = xi satisfifies f (x) = argmax k∈C P(Y = k | X = x).

(0.1)

1. (1 pt) Argue that the Bayes error equals to Eh 1{Y 6 =f (X)} | X = xi = 1 max

k∈C

P(Y = k | X = x).

• Problem 2 (3 pts)

Consider a classifification problem. Assume that the response variable Y can only take value in C = {1, 2, 3}. For a fifixed x0, assume that the conditional probability of Y given X = x0 follows P(Y = 1 | X = x0) = 0.6; P(Y = 2 | X = x0) = 0.3; P(Y = 3 | X = x0) = 0.1.

Consider a naive classififier fˆ, called random guessing, which randomly picks one label from C = {1, 2, 3} with equal probability.

1. (2 pts) Compute the expected test error rate of fˆ at X = x0.
2. (1 pt) Compute the Bayes error rate at X = x0 and compare it with that of fˆ.
• Problem 3 (21 pts)

In this problem, you will implement logistic regression by completing the provided code in penalized logistic regression.R & hw3 starter.R and experiment with the completed code.

Throughout this homework, you will be working with a subset of hand-written digits, 2’s and 3’s, represented as 16 × 16 pixel arrays. We show the example digits in Figure 1. The pixel intensities are between 0 and 1, and were read into the vectors in a raster-scan manner. You are given one training set: train which contains 300 examples of each class. You can access and load this training set by using functions

source(“hw3_starter/utils.R”)

x_train <- train\$x

y_train <- train\$y

y_train contains the labels of these 300 images while x_train are the 256 pixel values. You are also given a validation set that you should use for tuning and a test set that you should use for reporting the fifinal performance. Optionally, the code for visualizing the dataset is located at utils.py.

Figure 1: Example digits. Top and bottom show digits of 2s and 3s, respectively.

You need to implement the penalized logistic regression model by minimizing the cost

J (β, β0) := 1n nXi=1

n yi log p(xi; β, β0) + (1 yi) log 1 p(xi; β, β0) o + λ

2k βk 22over (β, β0) (Rp, R), where p(xi; β, β0) =eβ0+x>i β 1 + eβ0+x>i β .

Here n is the total number of data points, p is the number of features in xi, λ 0 is the regularization parameter and β and β0 are the parameters to optimize over. Note that we should only penalize the coeffiffifficient parameters β and not the intercept term β0.

1. (2 pts) Verify that the gradients of J (β, β0) at any (β¯, β¯0) have the following expression,

J (β, β0)β β¯¯0=1n nXi=1″ yi +eβ¯0+x>i β¯1 +eβ¯0+x>i β¯ # xi + λβ¯,

J (β, β0)∂β0β¯¯0=1n nXi=1″yi +eβ¯0+x>i β¯1 +eβ¯0+x>i ¯β # .

1. (4 pts) Implement the functions

Evaluate ,Predict logis,Comp gradient and Comp loss located at penalized logistic regression.R. While implementing the functions, remember to vectorize the operations; you should not have any for-loops in these functions. Include your code in the report.

Important note: carefully read the provided code in penalized logistic regression.R.

You should understand the code and its structure instead of using it as a black box!

1. (2 pts) Complete the missing parts in function Penalized Logistic Reg located at penalized logistic regression.R. This function should train the penalized logistic regression model using gradient descent on given training set. You may use the implemented functions from step 2. Include your code in the report.

For parts 2 and 3, your completed penalized logistic regression.R should NOT import other R packages.

1. (4 pts) Complete the part (a) in hw3 starter.R.

In this part, you need to fifix your regularization parameter, lbd = 0, and to experiment with the hyperparameters for stepsize (the learning rate) and max iter (the number of iterations).

[Hints: (1) You only need to use the training data for this part. (2) A too small learning rate takes longer to converge. (3) A too large learning rate is also problematic.]

In the write-up, report and brieflfly explain which hyperparameter settings you found worked the best.

For this choice of hyperparameters, generate and report a plot that shows how the training loss changes (iteration counter on x-axis and training loss on y-axis).

For this choice of hyperparameters, generate and report a plot for the training 0-1 error (iteration counter on x-axis and training error on y-axis).

Did the training 0-1 error have the same pattern as the training loss? Is your fifinding aligned with your expectation? State you reasoning.

1. (7 pts) Complete the part (b) in hw3 starter.R.

Using the selected setting of hyperparameters (for learning rate and number of iteration) that you identifified in step 4, fifit the model by using λ ∈ {0, 0.01, 0.05, 0.1, 0.5, 1}.

(1 pts) Does your selected setting of hyperparameters guarantee convergence for all λ’s? If not, re-identify hyperparameters for those λ’s for which convergence is not guranteed. Report the hyperparameter setting(s) you used for each λ.

(2 pts) Generate and report one plot that shows how the training 0-1 error changes as you train with difffferent values of λ.

(2 pts) Generate and report one plot that shows how the validation 0-1 error changes as you train with difffferent values of λ.

(2 pts) Comment on the effffects of λ based on these two plots. Which is the best value of λ based on your experiment?

1. (2 pts) Complete the part (c) in hw3 starter.R.

Fit the model by using the best value of λ identifified in step 5 and report its test 0-1 error. Compare your test error with the model fifitted by using glmnet with the same λ.

• Problem 4 (10 pts)

In this problem, you will develop a model to predict whether a given car gets high or low gas mileage based on the Auto data set.

1. (1 pts) Create a binary variable, mpg01, that contains a 1 if mpg contains a value above its median, and a 0 if mpg contains a value below its median. You can compute the median using the median() function.

Split the data into a training set (70%) and a test set (30%). (Use set.seed(0) to ensure reproducibility.)

1. (2 pts) Perform LDA on the training data in order to classify mpg01 using the variables cylinders, displacement, horsepower, weight, acceleration, and year. What is the test error of the model obtained?
1. (2 pts) Perform QDA on the training data in order to classify mpg01 using the same variables in part 3. What is the test error of the model obtained?
1. (2 pts) Perform logistic regression on the training data in order to classify mpg01 using the same variables in part 3. What is the test error of the model obtained?
1. (3 pts) Draw the ROC curves of LDA, QDA and logistic regression on the test data.

Compute their AUCs and comment on which classififier you would choose. (You may fifind the R package pROC useful.)

You may also like:

## 标签云  