Skip links

数据科学代写|User-centric Systems for Data Science Assignment 3


1. Data schema

The MRIs you will use are given in the form of Numpy arrays (.npy) as expected by the pretrained CNN
model. The segmented MRIs are given as NIfTI files (.nii) that you can manipulate with the NiBabel
library. Each MRI is associated with a set of metadata that are given in a CSV file (ADNI.csv). The
schema includes 31 attributes (categorical, binary, and numerical). These are not cleaned data, that is,
some values are missing from the CSV file, while others may be inconsistent or erroneous. The attribute
(AD) contains the MRI labels (ground truth) that indicate whether the patient has (AD=1) or not (AD=0).

2. TASK I: Load model and data (credits: 10/100)

The first task is to load the pretrained CNN model in memory from the given checkpoint
(cnn_best.pth) using PyTorch. At this step, you need to divide the 19 MRIs into two parts:

A. The first part contains the test data, i.e., the MRIs that you will use to probe the CNN model and
generate classifications. Make sure that this dataset contains at least 5 MRIs.

B. The second part contains the background data, i.e., the MRIs that will be used by SHAP for
perturbing the instances to approximate the Shapley values (see Lectures 12 and 14).
Report how many instances from your test dataset are classified correctly.

Hint #1: You may want to have a look at the PyTorch model and data loaders.

Hint #2: After implementing the whole pipeline, you can also experiment with different ratios of test and
background data sizes to see if (and how) they affect the results.

3. Task II: Generate SHAP values for individual pixels (credits: 40/100)

The second task is to compute the SHAP values for the pixels of the MRIs whose predictions we want to
explain. In this case, each pixel (element of the Numpy array) corresponds to a feature value that has a
unique SHAP value for the given classification. The output of this task is another set of Numpy arrays
(one for each input MRI) that contain the SHAP values.

Hint: You may use the DeepExplainer or the GradientExplainer from the SHAP repository.

4. Generate SHAP heatmaps on 2D MRI slices (credits: 10/100)

The third task is to extract 2D slices from the 3D MRIs and plot the SHAP values as heatmaps on the 2D
images. Each heatmap serves as an explanation that highlights the individual pixels that contribute
positively or negatively to a particular prediction, as shown in Fig. 1.

Figure 1

You must generate heatmaps for two randomly selected MRI instances, one classified as “AD” and one
classified as “Not AD”. Make sure that the CNN model predictions for both MRIs are correct according
to their label (in ADNI.csv).

Hint #1: You may find this plotting example in the SHAP repository helpful.

Leave a comment