联系方式

  • QQ:99515681
  • 邮箱:99515681@qq.com
  • 工作时间:8:00-23:00
  • 微信:codinghelp

您当前位置:首页 >> Python编程Python编程

日期:2024-02-21 08:24

Group Assignments 1 and 2

2024 Session 1 (SIM)

10 Marks

Deadline: Refer to the submission link of assignments on Moodle

One task is included in each assignment. The specification of the task(s) starts in a separate page.

You must implement and run all your Python code in Jupyter Notebook. The deliverables are project presentation slides and source code.

All results of your implementation must be reproducible from your submitted Jupyter notebook source files. In addition, the submission must include all execution outputs as well as clear explanation of your implementation algorithms (e.g., in the Markdown format or as comments in your Python codes).

Submission must be done online by using the correct submission link for this subject on MOODLE.

This is a group assignment. Only one submission per group. State the names and student numbers of group members at the beginning of each submitted file.

Marking guidelines:

Correctness of source code, and completeness and clearness of the project presentation.

Assignment 1

(10 marks)

Dataset: Credit score data set

(https://www.kaggle.com/datasets/parisrohan/credit-score-classification)

Objective

The objective of this task is to develop an end-to-end data mining project by using the Python machine learning library Scikit-Learn. Only the Scikit-Learn library can be used in this task. However, all non-ML libraries (e.g., SciPy) are allowed.

Requirements

(1)      This is a classification problem.

(2)      Use stratified sampling to select 80% data for training and 20% for testing.

(3)      Main steps of the project are (a) “discover and visualise the data”, (b) “prepare the data for machine learning algorithms”, (c) “select and train models”, (d) “fine-tune the model” and (e) “evaluate the outcomes”. You can structure the project in your own way. Some steps can be performed more than once.

(4)      In the steps (c) and (d) above, you must work with at least three machine learning algorithms.

(5)      In step (b), define at least one new feature by using the User-Defined Transformer. This transformer includes a parameter indicating whether use the new feature(s) or not. In step (d), fine-tuning step must use this parameter (as a hyper parameter).

(6)      Explanation of each step together with the Python codes must be included.

(7)      A comparison of the models’ performance must be included.

The assessment is based on the correctness and quality of your project. You must not copy any code from any public source directly.

Deliverables

Deliverables include (1) a project presentation* and (2) a submission including the following files:

.    the Jupiter Notebook source code,

.    a PDF document generated from your Jupiter Notebook source code, and

.    the presentation slides.

*Note: The project presentation is announced by your tutorial teacher.

Assignment 2

(10 marks)

Dataset: Credit score data set

(The same as in Assignment 1.)

Objective

The objective of this task is to develop an end-to-end data mining project by using the Python machine learning library Spark MLlib. Only the Spark MLlib can be used in this task. However, all non-ML libraries (e.g., SciPy) are allowed.

Requirements

(1)      This is a classification problem.

(2)      Use stratified sampling to select 80% data for training and 20% for testing.

(3)      Main steps of the project are (a) “discover and visualise the data”, (b) “prepare the data for machine learning algorithms”, (c) “select and train models”, (d) “fine-tune the models” and (e) “evaluate the outcomes”. You can structure the project in your own way. Some steps can be performed more than once.

(4)      In the steps (c) and (d) above, you must work with at least three machine learning algorithms.

(5)      Explanation of each step together with the Python codes must be included.

(6)      A comparison of the models’ performance must be included.

(7)      Based on your experience in the assignments, write a brief report that compares  Spark MLlib and Scikit-Learn (e.g., their pros/cons or similarity/difference).

The assessment is based on the correctness and quality of your project. You must not copy any code from any public source directly.

Deliverables

Deliverables include (1) a project presentation* and (2) a submission including the following files:

.    the Jupiter Notebook source code,

.    a PDF document generated from your Jupiter Notebook source code, and

.    the presentation slides.

*Note: The project presentation is announced by your tutorial teacher.


版权所有:留学生编程辅导网 2020 All Rights Reserved 联系方式:QQ:99515681 微信:codinghelp 电子信箱:99515681@qq.com
免责声明:本站部分内容从网络整理而来,只供参考!如有版权问题可联系本站删除。 站长地图

python代写
微信客服:codinghelp