这是一篇关于程序代码相关问题的练习考试Assessment代写
Part B: Short answer questions (14 marks)
Please provide brief answers to the six questions (next six slides) in this section and submit them.
All the questions are compulsory. Part B: Q1 (2 marks)
If a Decision Tree is over�tting the training set, is it a good idea to try decreasing max_depth ? Brie�y explain your answer.
Type your response in the Challenge workspace (in the �le answer.txt) and then click on the Submit button at the bottom right of the screen. Part B: Q2 (2 marks)
Brie�y explain the most important di�erence between the AdaBoot and the Gradient Boosting methods.
Type your response in the Challenge workspace (in the �le answer.txt) and then click on the Submit button at the bottom right of the screen. Part B: Q3 (3 marks)
Brie�y describe two techniques to select the right number of clusters when using K-Means.
Type your response in the Challenge workspace (in the �le answer.txt) and then click on the Submit button at the bottom right of the screen. Part B: Q4 (2 marks)
In multi-layer perceptron, does increasing the number of hidden layers improve performance?
Explain your answer with reference to any dataset example from lessons or assignment.
Type your response in the Challenge workspace (in the �le answer.txt) and then click on the Submit button at the bottom right of the screen. Part B: Q5 (2 marks)
Explain what is happening in the code below.
def BackwardPass(self, input_vec, desired):
out_delta = (desired – self.out)*(self.out*(1-self.out))
hid_delta = out_delta.dot(self.W2.T) * (self.hidout * (1-self.hidout))
if self.vanilla == True:
self.W2+= self.hidout.T.dot(out_delta) * self.learn_rate
self.B2+= (-1 * self.learn_rate * out_delta)
self.W1 += (input_vec.T.dot(hid_delta) * self.learn_rate)
self.B1+= (-1 * self.learn_rate * hid_delta)
else:
v2 = self.W2.copy()
v1 = self.W1.copy()
b2 = self.B2.copy()
b1 = self.B1.copy()
self.W2+= ( v2 *self.momenRate) + (self.hidout.T.dot(out_delta) * self.learn_rate) # ve
self.W1 += ( v1 *self.momenRate) + (input_vec.T.dot(hid_delta) * self.learn_rate)
self.B2+= ( b2 *self.momenRate) + (-1 * self.learn_rate * out_delta) # velocity update
self.B1 += ( b1 *self.momenRate) + (-1 * self.learn_rate * hid_delta)
Type your response in the Challenge workspace (in the �le answer.txt) and then click on the Submit button at the bottom right of screen. Part B: Q6 (3 marks)
What are the major similarities and di�erences between Adam and AdaGrad? If given a regression or classi�cation problem, which one would perform better and why? How would you evaluate them?
Type your response in the Challenge workspace (in the �le answer.txt) and then click on the Submit button at the bottom right of the screen. Part C: Programming questions (26 marks)
For Part C questions, you need to answer
Part C: Q1 and
one of Part C: Q2 OR Part C: Q3 Part C: Q1 (6 marks)
This remains a challenge for large models and unstructured datasets.For the following tasks, you need to write Python (or R) code, along with the required comments in the �le answer.py (or answer.r ) and submit your solution.
Load the dataset available in dataset_clustering.csv. Your task is to cluster the dataset using K-Means. You need to use silhouette scores to select a suitable number of clusters. and store that value in the variable named best_k and the corresponding model should be stored in the variable named best_model .
In your comments, provide brief justi�cations, with clearly articulated reasons, for the alternatives you explored to build the model you submitted.
How to submit
Type your solution (python code and comments) in the Challenge workspace (in the �le answer.py or answer.r ) and then click on the Submit button at the bottom right of the screen. Part C: Q2 (20 marks)
For the following tasks, you need to write a python (or R) code, along with the required comments in the �le answer.py (or answer.r ) and submit your solution.
Task-1 (7.5 marks):
Load the dataset available in dataset.csv , train a Random Forest classi�er on the data set and then properly evaluate the resulting model.
If required, consider di�erent data pre-processing approaches discussed in the course. Also consider di�erent ways you could build models using a Random Forest classi�er. You need to explore possible alternative approaches before selecting the most appropriate model for the data set. Your model should satisfy the following criteria: precision should be at least 80% and recall should be at least 85%, for the target value 1 .
Your best model must be saved in the variable named best_model_task1 .
Task-2 (7.5 marks):
Next, use PCA to reduce the data set’s dimensionality, with an explained variance ratio of 95%. Train a new Random Forest classi�er on the reduced data set and evaluate the classi�er on the test set.
Your best model must be saved in the variable named best_model_task2 .
Task-3 (5 marks):
Describe and discuss the above results.
How to submit
Type your solution (Python code and comments) in the Challenge workspace (in the �le answer.py or answer.r ) and then click on the Submit button at the bottom right of the screen. Part C: Q3 (20 marks)
Task 1: 5 Marks
Data processing for machine learning: Given the dataset:
https://archive.ics.uci.edu/ml/datasets/breast+cancer+wisconsin+(original)
Use either R or Python with the needed libraries to process the data (Note you can use �nd and replace using text editor for non-numbers)
Type your response in the Challenge workspace (in the �le process.py or process.r) and then click on the Submit button at the bottom right of the screen.
Task 2: 10 Marks
Machine learning using neural networks
You can use either R or Python with needed libraries for this task.
Using the processed data from Pact C: Q3, use the neural network (with the appropriate number of hidden layer and neurons), for one model run, show the following with 60 percent train and test split.
Use either (in the �le nnmodel.r or nnmodel.py) with answer.txt and then click on the Submit button at the bottom right of screen.
Task 3: 5 MarksDescribe and discuss the results.