In each case, write down the exact lines of code that you have used to answer the question,including all intermediate steps. For each line of code, you should write an explanation below it that describes what it is doing and the rationale behind its use. Some of the questions can be answered with material directly learned in lectures and workshops, however other parts may require additional research into which approaches might be best to answer the question. There are 10 marks available for each question (100 marks in total).
For these sets of questions we will use the dataset “cell.measurements.csv”. This dataset contains measurements depicting cell shape and cell movements at different time points.
The aim of the study is to understand if treatment alters cell shape or movement in the context of injury. Within the datasheet there are:
• 11 different cell measurements
• 6 animals: 3 treated and 3 not treated (control)
• Measurements from several time points (time_frame)
• For each animal and each time point, measurements have been collected with and without
The second and third column contain info about injury (1 presence of injury; 0 absence of Injury) and treatment (1 presence of treatment; 0 absence of treatment).
Visualise the distribution of each cell measurement within each animal, showing the behaviour in the presence or absence of Injury.
Discuss which measurement has the most variability between animals, in presence and absence of Injury. Can you see any apparent different between treated animals vs the control ones?
For each cell measurement, plot how the values change over time, discriminating between each animal and facetting based on the presence or absence of Injury.
Discuss the final plot. Can you see similar behaviours across animals? Does time have a big effect on all or some of the measurements?
What are the most correlated cell measurements in the dataset? Create a correlation plot and discuss what are the most positive correlated and negative correlated variables.