In each case, write down the exact lines of code that you have used to answer the question,
including all intermediate steps. For each line of code, you should write an explanation below
it that describes what it is doing and the rationale behind its use. Some of the questions can be
answered with material directly learned in lectures and workshops, however other parts may
require additional research into which approaches might be best to answer the question. There
are 10 marks available for each question (100 marks in total).
For these sets of questions we will use the dataset “cell.measurements.csv”. This dataset
contains measurements depicting cell shape and cell movements at different time points.
The aim of the study is to understand if treatment alters cell shape or movement in the
context of injury. Within the datasheet there are:
• 11 different cell measurements
• 6 animals: 3 treated and 3 not treated (control)
• Measurements from several time points (time_frame)
• For each animal and each time point, measurements have been collected with and without
The second and third column contain info about injury (1 presence of injury; 0 absence of
Injury) and treatment (1 presence of treatment; 0 absence of treatment).
Visualise the distribution of each cell measurement within each animal, showing the
behaviour in the presence or absence of Injury.
Discuss which measurement has the most variability between animals, in presence and
absence of Injury. Can you see any apparent different between treated animals vs the control
For each cell measurement, plot how the values change over time, discriminating between
each animal and facetting based on the presence or absence of Injury.
Discuss the final plot. Can you see similar behaviours across animals? Does time have a big
effect on all or some of the measurements?
What are the most correlated cell measurements in the dataset? Create a correlation plot and
discuss what are the most positive correlated and negative correlated variables.