Questions:
Drug producers and wholesalers add extre chemicals to their product to increase theirvolume and so make larger profits.(Retail drug dealers do this too, but that’s not ourconcern here.)As drugs pass through the pipeline from drug producing countries(SouthAmerica, Central Asia) to Western countries, adulterants can be added at several stages.The pattern of these adulterants behaves like a kind of fingerprint, capturing the
provenance of each shipment.
This can be used by law enforcement to understand the way in which drugs are handled. lfmultiple shipments all have the same pattern of adulterants, then they were presumablysourced from the same place. lf all the patterns are different, then the pipelines are
complex and overlapping.
This dataset describes the chemical composition of the adulterants added to drugshipments that were interdicted by Customs. Each row of the dataset results from thechemical composition of a single shipment. Each column corresponds to one adulterantwhose presence is checked for by Customs.
Process
The primary goal of the analytics is:
To cluster the samples according to their adulterant pattern, and see what this revealsabout the possible pathways by which the drugs arrived at this country’s border.
o To reinforce any conclusions drawn from these clusterings by trying to predictshipment label from adulterant profile.
Start early so that you can mull over your initial results and think about how to improvethem. As we’ve seen, much of data analytics is iterative.
This is a design project, so it is inherently open-ended.But as a starting point, make surethat you have carried out the following steps (and label them as sections in your report):
lf you carry out these steps competently and you talk about the results and implicationsclearly, you can expect to get a mark in the B to B+ range. The Queen’s marking schemereserves As and A+s for those who show exceptional understanding and use of the course material. In this context, that might mean using a more difficult technique, noticing something subtle about the results, or getting a more subtle insight about how someonemight act on these models, and there are other possibilities.
Your mark depends on the quality of your design, which techniques you try, in whichorder. Don’t be afraid to mention dead ends, as long as they weren’t obviously dead before you tried them. You don’t have to build every possible model, but you should buildthe best few,so that you know your results are not an accident.
Results need to be discussed.It isn’t enough to produce a confusion matrix. Tell thereader what it means, or what you learn from it.
You aren’t restricted to KNIME and you can use techniques other than those we’vediscussed in class or you’ve done on previous exercise sheets.
The overall project report should be 5-7 pages, not counting the space you need todisplay your results.
The most common way to get poor marks in this project is to try a few random or obviousthings (in Week 10!!) without explaining why you chose them, and report the results without any discussion of what they mean.
Upload your project report by the deadline. Depending on the covid situation, thedeadline might be movable.