联系方式

  • QQ:99515681
  • 邮箱:99515681@qq.com
  • 工作时间:8:00-23:00
  • 微信:codinghelp

您当前位置:首页 >> Python编程Python编程

日期:2023-05-02 11:50

COMP5310 Project Stage 2B

Develop and Evaluate Predictive Model

Due: 11:59pm on 14th of May 2023 (end of Week 11)

Value: 15% of the unit


This stage is usually done with the same group members as you worked with for Stage 2A.

However, under exceptional circumstances an alternative group may be created by the unit

coordinator when a group is reduced in size due to member discontinuing this unit. If this

applies to you, please urgently email Nazanin.borhan@sydney.edu.au to discuss this.


DISPUTE RESOLUTION

If, during the course of the assignment work, there is a dispute among group members that

you can’t resolve, or that will impact your group’s capacity to complete the task well, you

need to inform the unit coordinator, Nazanin.borhan@sydney.edu.au. Make sure that your

email includes your group number and tutorial session, and is explicit about the difficulty.

Also, make sure this email is copied to your tutor and all the members of the group (including

anyone you are complaining about). We need to know about problems in time to help fix

them and deal with non-performance promptly (don’t wait until a few days before the work

is due to complain that someone is not delivering on their tasks). If necessary, the unit

coordinator will split a group, and leave anyone who didn’t participate effectively in a group

by themselves (they will need to achieve all the outcomes on their own). This option is only

available up until Monday May 1st, which is the last day with time to resolve the issue before

the due date. For any group issues that arise after this time, you will need to try to resolve

the problem on your own, and you will continue to be treated as a single group. If someone

doesn’t provide the material required for the report, or their material is not of the agreed

standard, you should still have the report show what that person did. Their section of the

report may be empty if they don’t produce anything, or it may have material but not enough.

In such cases, please put a “Note to marker” on the front page of the report, which describes

the circumstances. That way, we can consider how best to apply the marking scheme. Note

that it is not expected or sensible for other members to do the work that someone failed to

deliver.


TASKS

GROUP TASKS:

1. Identify an attribute that you will all make predictions about and find a dataset that

contains this attribute. The attribute you are predicting may be quantitative or nominal.

The dataset may be one from the previous stages of this project.

2. Decide on the measure of success for the predictive models you will be producing. You

will need to justify your choice of measure and describe its strengths and limitations.

3. Divide the dataset into a training set and a test set. We suggest having at least one-

tenth of the original dataset in the test dataset.

4. Coordinate in choosing the methods you will use, to each produce a predictive model for

this attribute, using the training dataset (the coordination is needed to avoid duplication

between members, and to enable a good conclusion for your report).

5. Write Part B of the report, that discusses the different models and their strengths and

weaknesses. This should be written for a reader who is interested in your research or

business question.

Page 2 of 5


Note: The models created in this Stage must ALL be predicting (in different ways) one

common attribute in the one common dataset. You are allowed to use a dataset you already

have from Stage 1 or 2A, but you are equally free to change dataset and even domain, however,

keep in mind that many machine learning techniques do not work well unless the dataset is

large enough and quite clean. We recommend that you do some preliminary data analysis to

convince yourself that there is some relationship between the other attributes and the one you

are going to predict (otherwise predictions will not be very effective). You also need to choose

how you will measure the effectiveness of predicting. We recommend that you use one of the

measures that is built-in for scikit-learn to calculate, given the test data and the predictions

made for those items. For higher levels than pass, you need more than one measure that you

will calculate on each model.


INDIVIDUAL TASKS:

1. Use Python (for example, the scikit-learn library) to produce a predictive model for the

chosen attribute from the training dataset, using the kind of model and training method

allocated to you by the group. If your method for training has hyper-parameters, you

should adjust them as well as possible, but only using parts of the training dataset in

doing so (you must not use any of the test dataset for this).

2. Evaluate the quality of the predictive model you produced, in terms of the measure of

success that the group chose.

3. Write your section in Part A of the report, in which you present the work you have done

individually.


WHAT TO SUBMIT

There are TWO deliverables in this stage of the project, and both should be submitted by

ONE PERSON on behalf of the whole group.

1. A written report on your work, as a PDF document. There is a maximum length for the

report of 2500 words for groups of 2 and 3000 words for groups of 3. The report

should have a front page, that gives the group name and lists the members involved

(giving their SID and unikey, not their name), and then the body of the report has a

structure as follows (this corresponds to the marking scheme):

Part A: It should be targeted at a tutor or lecturer whose goal is to see what you

achieved, so they can allocate a mark. In this section you must:

a. State your research or business question.

b. State the domain and the dataset you are using.

c. Indicate how you split your dataset into training and test data.

d. Then, there should be one section for each member (the section should state the

SID/unikey of the group member who did the work reported in this section). In

this section, there should be the following sub-sections:

o A description of the way you produced the predictive model, including the

Python code you wrote that produces the model and any pre-processing

(e.g., rescaling some attributes). If possible, you should also give the

predictive model itself (e.g., for a linear regression, you would report what

coefficients each attribute has in the model; for a decision tree you would

state the different decision points).

o The evaluation of how well your predictive model does in predicting. This

must include the Python code you wrote that calculates some measure of

effectiveness (on the test data), as well as stating the actual value of this

measure for your predictive model. For higher marks, textual discussion

is also needed (see the mark scheme below). For example, you may

consider using significance testing, confidence intervals, regression r-

Page 3 of 5


square, clustering V-measure, classification f1-score, etc.


Part B: Targeted at someone who is interested in your research or business question,

and wants to understand how well various machine learning approaches work for

producing predictive models in the context of your research or business question. This

part is written as a group, and you must:

a. Describe the different ways the members produced predictive members.

b. Comment on the evaluations to draw conclusions about the strengths and

limitations of the different approaches, tying this back to your business question

(see the marking scheme for more guidance on what is expected here).


2. The code and dataset you used to produce your predictive model and calculate some

measure of effectiveness of the model. If you have done any further transforms on

attributes before training/testing, this code should also be included. The code should be

submitted as a single zip or tar.gz file which contains a subfolder for each group

member.


MARKING

Here is the mark scheme for this assignment. The score (out of five) is the sum of separate

scores for each of three components. Note that there is an individual and a group component

to each member’s mark.


Predictive Models [3 points] [Individual Mark]

This component is assessed based on the corresponding subsection of the separate member

section in Part A of the report; the uploaded data and code may be checked by the marker as

supporting evidence for claims made in the report.


[Full marks]: The Distinction criteria holds, and also there is a clear explanation of any

method that is not presented in the tutorials, including an argument for why this is a

reasonable approach to consider for the task (this discussion should go well beyond simply

reporting that the model predicts well, to argue that one could reasonably hope that it might

be good, in several ways).


[Distinction]: The Pass criteria holds, and also at least one of the methods used must go

beyond what is covered in the tutorials.


[Pass]: The group member uses Python and the agreed training dataset and correctly

produces a predictive model for the agreed attribute. The code that each member wrote to

produce their model (including doing any preliminary attribute transformations) must be

explicitly shown in the report. The ways in which the various members’ models are produced

should all be different from one another (this could be different algorithmic training

techniques, different choice of hyper-parameters, different scaling, or choice of input

attributes, etc.).


[Flawed]: Some predictive model is produced using Python.


Evaluation of Predictive Models [4 points] [Individual Mark]

This component is assessed based on the corresponding sub-section of the separate member

section in Part A of the report. The uploaded data and code may be checked by the marker as

supporting evidence for claims made in the report.


Page 4 of 5


[Full marks]: The Distinction criteria holds, and also, for each approach, there is a reasonable

discussion relating the outcome of the measurements to the nature of the training approach,

characteristics of the dataset and any transformations done.


[Distinction]: The group member has correctly reported on more than one measure of

performance of the model on the test dataset. The code that does this measurement must be

explicitly shown in the report. Also, for each approach there is a sensible discussion of the

interpretation of the measurements (for example, whether it is indicating overfitting or

underfitting, whether the accuracy/precision/recall/F1 score differs between different

classes in your data).


[Pass]: The group member has correctly reported on some measure of performance of the

model on the test dataset. The code that does this measurement must be explicitly shown in

the report. The ways in which the various members’ models are produced should all be

different from one another (this could be different algorithmic training techniques, different

choice of hyper-parameters, different scaling or choice of input attributes, etc).


[Flawed]: Some reasonable attempt to evaluate the effectiveness of a predictive model.


Discussion [7 points] [Group Mark]

This component is assessed based on Part B of the report. Material in Part A, or the submitted

data and code may be checked by the marker as supporting evidence for claims made in this

part of the report.


[Full marks]: The Discussion section meets the Distinction criteria and suggests at least one

reasonable improvement that can be made to each member’s predictive model. The structure

needs to be logical and well-organised.


[Distinction]: The Discussion section provides some accurate and clear information about the

different machine learning methods that were used for this task, and provides useful insight

into strengths and weaknesses of the different machine learning methods for answering the

business or research question. It also indicates features of the dataset that impact on the

outcomes. It also discusses honestly and with insight, the strengths, limitations and

uncertainties about the comparisons made between different machine learning techniques

(for example, what are strengths and limitations of the measurements which were used).


[Pass]: The Discussion section provides some accurate and clear information about the

machine learning techniques that were used for this task, and how the resulting predictive

models performed.


[Flawed]: The Discussion section describes the machine learning techniques that were used.


Conclusion [1 point] [Group Mark]

This component is assessed based on Part B (group component) of the report. Material in

Part A, or the submitted data and code, may be checked by the marker as supporting evidence

for claims made in the report.


[Full marks]: The Conclusion section meets the Distinction criteria and makes reasonable

suggestions for future work on your analysis and predictive models that can help achieve the

recommended course of action.


[Distinction]: In addition to the Pass criteria, the Conclusion section describes the extent of

Page 5 of 5


support for this course of action, based on the information in the Discussion section,

identifying what risks, limitations and caveats apply.


[Pass]: The Conclusion section describes a recommended course of action in relation to your

research or business question, that is supported by the information in the Discussion section.


[Flawed]: The Conclusion section describes a recommended course of action in relation to

your research or business question.


Penalties

10% of the overall mark will be deducted if your report is unnecessarily longwinded and

does not address the marking criteria within the word limits.


Late Work

As announced in the unit outline, late work (without approved special consideration or other

arrangements) suffers a penalty of 5% of the maximum marks, for each calendar day after

the due date. No late work will be accepted more than 10 calendar days after the due date.


版权所有:留学生编程辅导网 2020 All Rights Reserved 联系方式:QQ:99515681 微信:codinghelp 电子信箱:99515681@qq.com
免责声明:本站部分内容从网络整理而来,只供参考!如有版权问题可联系本站删除。 站长地图

python代写
微信客服:codinghelp