联系方式

  • QQ:99515681
  • 邮箱:99515681@qq.com
  • 工作时间:8:00-23:00
  • 微信:codinghelp

您当前位置:首页 >> Python编程Python编程

日期:2023-03-31 07:09

RMIT Classification: Trusted

COSC2673/COSC2793 Sem est er 1 2023

Machine Learning & Computational Machine Learning

Assignment 1

Introduction to Machine Learning

Weight: 30% of the final course mark

Type: Individual

Due Date: 5.00pm, 3rd of April 2023 (Week 6)

Learning Outcomes: This assignment contributes to CLOs: 1, 3, 4

Note: Marks will be awarded for meeting requirements as close as possible. Clarifications/Updates

may be made via announcements / relevant discussion forums, you are required to check them regularly.

2

RMIT Classification: Trusted

In this assignment you will explore a modified real dataset and practice the typical machine

learning process. This assignmentis designed to help you become more confidentin applying

machine learning approaches to solving tasks. In this assignment you will:

1. Selecting the appropriate ML techniques and applying them to solve a real-world ML

problem.

2. Analysing the output of the algorithm(s).

3. Research how to extend the modelling techniques that are taught in class.

4. Providing an ultimate judgement of the final trained model that you would use in a real-world

setting.

To complete this assignment, you will require skills and knowledge from lecture and lab material for

Weeks 1 to 4 (inclusive). You may find that you will be unable to complete some of the activities until you

have completed the relevant lab work. However, you will be able to commence work on some sections.

Thus, do the work you can initially, and continue to build in new features as you learn the relevant skills.

A machine learning model cannot be developed within a day or two. Therefore, start early.

This assignment has three deliverables:

1. A PDF report, preferably in the form of notebook. Regardless, your report should include the

graphs produced by your analysis. If you are using notebook, it needs to be in the format of the

provided tutorials. That means the report should include markdown text explaining the rational,

critical analysis of your approach and ultimate judgement. The report needs to be selfexplanatory, well structured, and fulfill all the assignment specifications.

2. A set of predictions from your ultimate judgement. The sample solution is included, the ID need

to include the ID of the selected data from Data_Set that makes up your test set (manual selection

is not acceptable).

3. Your Python scripts or Jupyter notebooks used to perform your modelling & analysis with

instructions on how to run them, which need to have embedded explanatory comments.

This assignment contributes to the following course CLOs:

• CLO 1: Understand the fundamental concepts and algorithms of machine learning and

applications.

• CLO 3: Set up a machine learning configuration, including processing data and

performing feature engineering, for a range of applications.

• CLO 4: Apply machine learning software and toolkits for diverse applications.

Academic integrity is about honest presentation of your academic work. It means acknowledge the work

of others while developing your own insights, knowledge and ideas. You should take extreme care that

you have:

1.1 Summary

1 Introduction

1.2 Outcomes

1.3 Academic Integrity

3

RMIT Classification: Trusted

• Acknowledged words, data, diagrams, models, frameworks and/or ideas of others you have quoted (i.e.

directly copied), summarised, paraphrased, discussed or mentioned in your assessment through the

appropriate referencing methods

• Provided a reference list of the publication details so your reader can locate the source if necessary.

This includes material taken from Internet sites. If you do not acknowledge the sources of your material,

you may be accused of plagiarism because you have passed off the work and ideas of another person

without appropriate referencing, as if they were your own.

RMIT University treats plagiarism as a very serious offence constituting misconduct. Plagiarism covers a

variety of inappropriate behaviours, including:

• Failure to properly document a source

• Copyright material from the internet or databases

• Collusion between students

For further information on our policies and procedures, please refer to the following:

https://www.rmit.edu.au/students/student-essentials/rights-and-responsibilities/ academic-integrity.

4

RMIT Classification: Trusted

In this assignment, you will predict the life expectancy of a device based on several

attributes (features) related to the manufacturing specification (such as company’s

unique code, average faulty parts of the same products in that factory, etc.).

Your task is to develop a ML algorithm to predict the life expectancy of a device on

unseen data (test data). You will also setup an evaluation framework, including

selecting appropriate performance measures, and determining how to split the data

into training and testing data (manual split is not acceptable).

You need to come up with an approach (that follows the restrictions in 3.2), where

each element of the system is justified using data analysis, performance analysis

and/or knowledge from relevant literature.

• As one of the aims of the assignment is to become familiar with the machine learning

paradigm, you should evaluate multiple different models (only use techniques taught

in class up to week 5 - inclusive) to determine which one is most appropriate for this

task.

• Setup an evaluation framework, including selecting appropriate performance

measures, and determining how to split the data.

• Finally, you need to analyse the model and the results from your models using

appropriate techniques and establish how adequate your model is to perform the

task in real world and discuss limitation if there are any (ultimate judgement).

• Predict the result for the test set.

This assignment has three deliverables:

1. A PDF report, preferably in the form of notebook. Regardless, your report should

include the graphs produced by your analysis. If you are using notebook, it needs to be in

the format of the provided tutorials. That means the report should include markdown text

explaining the rational, critical analysis of your approach and ultimate judgement. The

report needs to be self-explanatory, well structured, and fulfill all the assignment

specifications.

2. A set of predictions from your ultimate judgement. The sample solution is included, the

ID need to include the ID of the selected data from Data_Set that makes up your test set

(manual selection is not acceptable).

3. Your Python scripts or Jupyter notebooks used to perform your modelling & analysis

with instructions on how to run them, which need to have embedded explanatory

comments.

5

RMIT Classification: Trusted

The data set for this assignment is available on Canvas. It has been modified and

pre-processed to some extent, such that all the attributes/features are integers

or floats, and missing values has been estimated and filled in.

There are the following files:

• Data-set.csv, contains the entire dataset. You need to divide this data

set into training and testing (don’t divide the dataset manually), then

perform your analysis and tasks on them.

• The file metadata.txt contains some brief description of each of the fields (attribute

names).

• The file sample_solution.csv shows the expected format for your predictions on the

unseen test data (reminder: test set is the result of randomly dividing your entire

dataset into train and test).

2.1.1 Restrictions

As the aim of this assignment is to encourage you to learn to explore different

approaches, while you can explore feature impotency, and regularization, your

approach must not explicitly perform feature selection. That is, your models

should have all features as input(exceptthe “ID” field which is not an attribute).

A detailed rubric is attached on canvas. In summary:

• Approach 60%;

• Implementation 20%;

• Report Presentation 20%.

Approach: You are required to use a suitable approach to find a predictive model. You may use

any ML technique taught in class during week 2-5, including: linear, non- linear and regularization

techniques. Each element of the approach need to be justified using data analysis, performance

analysis, your analytical argument and/or published work in literature. This assignment isn’t just

about your code or model, but the thought process behind your work. The elements of your

approach may include:

• Setting up the evaluation framework

• Selecting models, loss function and optimization procedure.

• Hyper-parameter setting and tuning

• Identify problem specific issues/properties and solutions.

2.1 Data Set

2.3 Marking guidline

6

RMIT Classification: Trusted

• Analysing model and outputs.

All the elements of your approach should be justified and the justifications should be visible in

the PDF version of the notebook (inserted as Markdown text). The justifications you provide may

include:

• How you formulate the problem and the evaluation framework.

• Modelling techniques, you select and why you selected them.

• Parameter settings and other approaches you have tried.

• Limitation and improvements that are required for real-world implantation.

This will allow us to understand your rationale. We encourage you to explore this problem and

not just focus on maximising a single performance metric. By the end of your report, we should

be convinced that of your ultimate judgement and that you have considered all reasonable aspects

in investigating this problem.

Remember that good analysis provides factual statements, evidence and justifications for

conclusions that you draw. A statements such as:

“I did xyz because I felt that it was good”

is not analysis. This is an unjustified opinion. Instead, you should aim for statements such as:

“I did xyz because it is more efficient. It is more efficient because . . . ”

Ultimate Judgement & Analysis: You must make an ultimate judgement of the “best” model that

you would use and recommend in a real-world setting for this problem. It is up to you to

determine the criteria by which you evaluate your model and determine what is means to be “the

best model”. You need to provide evidence to support your ultimate judgement and discuss

limitation of your approach/ultimate model if there are any in the notebook as Markdown text.

Performance on test set (Unseen data): You must use the model chosen in your ultimate

judgement to predict the target for unseen testing data (provided in test data.csv). Your ultimate

prediction will be evaluated, and the performance of all of the ultimate judgements will be

published.

Implementation

Your implementation needs to be efficient and understandable by the instructor.

Should follow good programming practices.

You must use your the model chosen in your ultimate judgement to predict the

TARGET_LifeExpectancy on unseen testing data (which is a result of dividing the dataset into

train and test set). Your ultimate prediction will be evaluated, and the performance of all of the

ultimate judgements will be published

7

RMIT Classification: Trusted

To help you get started, we suggest the following:

• Load dataset into your Jupyter or your favourite Python IDE

• Do some preliminary data exploration, to understand it better (this will help you later on with

trying to figure which regression approach is ideal and how to improve it)

• Setup your data into training and testing datasets

• Select the basic linear regression algorithm and train it then evaluate it

• Analyse the results and see what is going on (to help you determine what needs to be changed to

improve the regression model)

• Now you can continue with your method development, discussion and ultimate judgment, etc.

Most questions should be asked on Canvas, however, please do not post any code. There is a FAQ, and

anything in the FAQ will override what is specified in this specifications, if there is ambiguity.

Your lecturer is happy to discuss questions and your results with you. Please feel free to come talk to us

during consultation, or even a quick question, during lecture break.

3.1 Getting Started

3.2

8

RMIT Classification: Trusted

The rubric is attached on Canvas.

Submission instructions will be placed on Canvas.

A penalty of 10% of the maximum mark per day (including weekends) will apply to late assignments up

to a maximum of five days or the end of the eligible period for this assignment, whichever occurs first.

Assignments will not be marked after this time.

3.5.1 Extensions and Special Consideration

A penalty of 10% per day is applied to late submissions up to business 5 days, after which you will lose ALL

the assignment marks. Extensions will be given only in exceptional cases; refer to the Special Consideration

process. Special Considerations given after grades and/or solutions have been released will automatically

result in an equivalent assessment in the form of a test, assessing the same knowledge and skills of the

assignment (location and time to be arranged by the course coordinator).

3.3 Marking Rubric

3.4 Instructions

3.5 Late Assessment Policy


版权所有:留学生编程辅导网 2020 All Rights Reserved 联系方式:QQ:99515681 微信:codinghelp 电子信箱:99515681@qq.com
免责声明:本站部分内容从网络整理而来,只供参考!如有版权问题可联系本站删除。 站长地图

python代写
微信客服:codinghelp