联系方式

  • QQ:99515681
  • 邮箱:99515681@qq.com
  • 工作时间:8:00-23:00
  • 微信:codinghelp

您当前位置:首页 >> Python编程Python编程

日期:2020-03-07 10:18

Assignment 1

Due: Monday 2nd March at 10.00pm

The aims of this assignment are to put into practice the concepts covered in lectures, apply

these to a real dataset, and to demonstrate your ability to use Python to carry out machine

learning tasks.

Problems and Data

The data you are going to be working on comes from Spotify and is based on this set:

https://www.kaggle.com/cnic92/spotify-past-decades-songs-50s10s which captures various

attributes about songs and includes a popularity score. You will work on this as a group of four

and you are going to address two problems:

⚫ A regression problem which aims to predict the popularity score of a song

⚫ A classification problem which aims to predict the top genre that a song belongs to

A description of each problem along with a detailed overview of the data is available at:

⚫ Regression Problem: https://www.kaggle.com/t/fcb56e49f46d4bfb999148579d857fbc

⚫ Classification Problem: https://www.kaggle.com/t/38bfabc24c8942d1802d2214522a3249

Instructions on how to work with Kaggle Data in Google Colab are provided here:

https://colab.research.google.com/drive/1EXYGOLT_uoYm9dHM8U51qCropg1MhXFI

Instructions

Think carefully about the problem you are working on and the main question you are trying to

answer. Take your time to make sure you understand the data. It is also not necessary to use

every attribute - you may find yourself working with many or just a few. The emphasis in this

assignment is also much on the process: if you find that the techniques you have chosen don't

work very well or fail to produce particularly interesting results, then this is not a problem

provided you followed the appropriate steps to understand and prepare the data and select

appropriate models, and can provide some insights or explanations into why your model failed

(or performed brilliantly!).

To both the classification and regression tasks you should aim to apply a range (around 2-4) of

techniques including the basic ones (which might be useful as baseline comparisons) and also

the more sophisticated ones covered in this class, but the emphasis should be on using the

techniques appropriately and interpreting the results, not on a scatter-gun of algorithms.

When you have developed a solution to the problem you should evaluate it on the test data and

upload the results file to the competition page for scoring. Each page provides detailed

instructions on the file format required. You will then be able to evaluate your solution in relation

to previous submissions and also see how you are performing in relation to other teams in the

class. Your position in the rankings will make a (small) contribution to your final mark for the

assignment.

Teams

You should work as a team of four. You are permitted to select your own groups and when you

have done so you should add your team into myplace. How you choose to split the work

amongst your group is up to yourselves but remember that this is a learning opportunity and you

should aim to contribute evenly and understand all aspects of the work. Everyone in the group is

responsible for the final submission and should be prepared to answer questions about it.

Submission

For this assignment, your team will need to submit the following for each task:

- Your final predictions to Kaggle via the InClass Competitions

- Your exported Jupyter/Colab Notebook (submitted as an .ipynb file)

- Your exported Jupyter/Colab Notebook (submitted as a PDF file)

In addition, your team should also submit a PDF of the assignment cover sheet, with the

contribution percentages from each student.

Each Jupyter/Colab Notebook should include the following:

- Your team’s name

- Your team’s names and student numbers

- A description of the final architecture and solution that you employed for the final set of

predictions.

- A justification for why you choose this architecture and solution including: how you came

up with the approach, why you selected or modified input variables, explaining what

worked and did not work, and what other models were tried.

- All code to reproduce the final predictions must be included, along with any code that

justifies your choices.

- The report should conclude by reporting your performance in the Kaggle InClass

competition.

The Python code used and the explanations of the steps should be interleaved within the

notebook, and provided in a logical manner, to show your working and justify your

interpretations and analysis of the outcomes. Explanations should be succinct and clear, with

the emphasis justifying the choices made, and critical interpretation of results.

Each task is worth 25 marks each, and so the assignment is out of a total of 50 marks, and is

worth 25% of your overall mark.

Marking Scheme

This assignment is out of 50 and each problem will be assessed according to the following

marking scheme:

Solution (based on comments and code) (10 marks per task):

⚫ Explanation of your solution and setup (packages, algorithms etc. used, data analysis

and preparation)

⚫ Justification for the choices (rationale for the models used which may also be based on

your analysis of the data)

⚫ Explanation of the various models tried (what worked, what didn't, and why)

⚫ No more than 1500 words.

Code Quality (8 marks per task)

⚫ Readability, configurability (how easily it can be adapted to other models, problems etc.),

structure

⚫ Correspondence to solution

⚫ How cleanly it runs

Performance (7 marks per task)

⚫ Performance on the Training Data should be reported in the text (and the code should

report the values reported).

⚫ Performance given the Kaggle Test Data should be reported in the text..

⚫ Explanation of the difference.

⚫ Relative performance in comparison to other solutions

More details on the assessment criteria are given in the table on the following page

Poor Fair Good Excellent

Solution 0-3 4-5 6-8 9-10

Model and Data

Engineering

A simple naive baseline, with

limited processing or engineering

of features and configuration

(unless justified)

A sophisticated and appropriate

configuration given the data,

processes features appropriately,

and ensures over-fitting is limited.

Justification No or little justification for the

choices made to produce the

solution.

A well-motivated justification based

on theory/course-work/previous

experience and/or other attempts.

Additional

Models

No other models tried, or other

features/parameters

experimented with.

A range of models, parameters and

configurations explored, to provide a

strong justification for the final

choice.

Code 0-1 2-3 4-6 7-8

Readability Unclear or unclear in parts, poorly

structured

Functions used and added

appropriately, code is clear and

readable, and well-structured.

Correspondence Does not or only partly

corresponds to the actual solution

described.

Mostly or completely corresponds to

the actual solution described.

Works Does not, or parts of work. Works, runs efficiently, and obtains

the reported outputs.

Performance 0-1 2-3 4-6 7

Relative

Performance

Bottom 25/50% of scores Top 50/75% of scores

Explanation No or little justification for poor

performance, or difference with

training performance. No

explanation as to what was

impacting performance.

Provides justification for differences

with training performance, or what

gave the submission the edge - i.e.

what made the biggest impact to the

performance.

Example Notebook Reports

Below are some illustrations of reports which combine the code and text together.

⚫ Using Python to see how the Times writes about men and women

⚫ An open science approach to a recent false-positive between solar activity and the

Indian monsoon

⚫ Kaggle Competition | Titanic Machine Learning from Disaster

⚫ An example machine learning notebook

⚫ An exploratory statistical analysis of the 2014 World Cup Final

CS98X - Assignment Cover Page

Team Name:

Contributions

Fill in your name, number, and contribution (typed). Please also sign or mark that you have

agreed. If you can’t agree, fill in the percentage contribution out of 100, that you think you

deserve, with a short justification below

Student Name Student No Percentage

Contribution

Signature

/ Check

Notes on Contribution (if required):


版权所有:留学生编程辅导网 2020 All Rights Reserved 联系方式:QQ:99515681 微信:codinghelp 电子信箱:99515681@qq.com
免责声明:本站部分内容从网络整理而来,只供参考!如有版权问题可联系本站删除。 站长地图

python代写
微信客服:codinghelp