Skip links

作业代写|CSE 347/447 Data Mining: Project 2 – Classification Algorithm



Standard and General Requirements

  • This is a research project, so you are not required to program from scratch, you can call built-in functions or import packages. Do your utmost to learn and explore how the performance of different algorithms may vary with different datasets and parameter settings. For example, on small datasets like USFGait, a simple SVM algorithm may produce better results than DNN based methods in terms of both effectiveness and efficiency.
  • Please note that directly copying code/text from a source without citation constitutes plagiarism which is forbidden and will result in an F in the grade. Partial credit will be given for partial solutions.
  • It isimportant for you to participate in the group works equally. If some team members are not consistent with their commitments, that makes it difficult for other team members to complete the project. You are ALLOWED to change your team members before April 29, 2022, please send me an email to discuss it before removing anyone from the group. If so, you are NOT ALLOWED to add additional members to the team.
  • No Late Policy: There is no late policy for this assignment, unless other arrangement is agreed to before this hard deadline. That is: late submission earns no credit. Exceptions will be made in, well, exceptional circumstances (a life-threatening illness, for example).
  • Submission Instructions: Please submit your Code, Report and Slides as .zip file to CourseSite. Schedule a group meeting with TA ( after submission to demo your code.
  • Group Project Presentation: Each group is required to do an 8-minute presentation to a panel of judges,followed by 2 minutes of Q&A. The presentation will be held at 8:30AM-2:00PM on May 16, 2022. Please

RESERVE your preferred time slot by filling out this online Excel file from sheet 2: Presentation Time.

  • Presentation Template: Here is a Presentation Example for your reference.

Complete the following tasks:

  • Please implement the following algorithms for classification.

– K-Nearest Neighbors (KNN)

– Support Vector Machine (SVM)

– Convolutional Neural Network (CNN)

  • Apply your algorithms to any three of the seven datasets described in Table 1 (of course, you can use all the datasets). You can download them at here. For the CIFAR-10 and MNIST datasets, you can also import them from Keras with the given splits.
  • You are required to use the cross validation to do the hyperparameter tuning to achieve better performance.

For the smaller datasets (Iyer, Cho, YaleB and USFGait), you need to use the K-fold cross validation to do this (By default, K = 3). For the larger datasets (PIE, CIFAR-10, MNIST), you can use 10% of the training data as the validation set and the rest as the training set.

  • Evaluate your classification algorithms with Accuracy, F1 score and AUC metrics.

Hint: For multi-class classification, AUC score can be calculated based on multi-class strategies, see Lecture 13, Page 67 for details on AUC-ROC Curve Scoring Function for Multi-class Classification. Here is another example for your reference: AUC-ROC for Multi-Class Classification.

  • For traditional machine learning algorithms, you may consider to use PCA method to do the dimensionality reduction before classification. See an example of PCA for dimensionality reduction on the code demo Lec9:

Visualization Similarity Matrix & High-Dimensional Data.

Please note that to reduce the influence of randomness, for the smaller datasets (i.e., Iyer, Cho, YaleB and USFGait), you are required to run your algorithm for t times (t = 3) and report the average along with standard deviation (std) on test sets. The YaleB and USFGait datasets have provided the training and testing sets but Iyer and Cho do not, and thus you need to divide data into training and test sets by yourself. You should be careful that when tuning parameters of your algorithms, you can only use the training data, while testing data are only for the purpose of testing.

Final submission. Your final submission should include the following:

  • Code: Three classification algorithms. Your code is expected to allow the users to choose either of the classification algorithm like Project 1, but it is an option for this project.
  • Report: Describe the flow of all implemented algorithms. Compare the performance of these approaches on the selected datasets in terms of Accuracy, F1, and AUC. State the pros and cons of each algorithm and any findings you get from the experiments, such as parameter sensitivity analysis. Submit your report as a PDF. • Presentation:

Complete your slides and prepare for an 8-minute presentation. Submit your slides as a PDF.

You may try to edit your report as a paper submitted to a journal/conference, but this is not required. In your report,you can use Tables and Figures to compare the results of different algorithms on different datasets. For example, Table 2 compares two dimensionality-reduction based KNN classification methods named “MPCA” and “FMPCA” on several datasets.

Leave a comment