联系方式

  • QQ:99515681
  • 邮箱:99515681@qq.com
  • 工作时间:8:00-23:00
  • 微信:codinghelp

您当前位置:首页 >> Java编程Java编程

日期:2022-11-02 09:14

QBUS2810, S2 2022

Statistical Modelling for Business

Group Assignment

This group assignment will contribute 20% towards your final result in the

unit. The deadline is 11:59pm Friday 4th November, 2022. Submission is

via Canvas.

This assignment must be completed in your Canvas group. It is entirely

students responsibility to form and/or join a group in the People section of

the 2810 Canvas site. Groups consist of precisely 3 students only.

Maximum Length: There is no maximum page length for this assignment. If you

have something interesting and worthwhile to include, then please do so without worry-

ing about a page limit. However, irrelevant or overly long-winded material may reduce

your overall mark (as well as the marker’s enjoyment of life). As a guideline, in pre-

vious runs of this class the typical report had between 20-25 pages, excluding Python

code.

Notes on Marking:

The assignment will initially be marked out of 64.

Up to an additional five (5) marks will be awarded based on the overall pre-

sentation quality of your report. Thus, you will receive a total mark for this

assignment out of 69. You will lose some of these 5 presentation marks for poor,

inefficient, unclear and/or unprofessional presentation. You will be rewarded for

professional, efficient and clear presentation methods. I expect your final report

to be done in a professional editing package and to be submitted in pdf only.

Html files of jupyter notebooks are not suitable.

You must use Python for this assignment. You are being assessed on how

well you can use Python to complete the assignment tasks. NB: You can use

Excel for simple data manipulations and clean-up; but Python is better at these

2tasks too! All plots and statistical output in the assignment must have been

produced in Python, though you can of course make nicer tables in a text editor

to include in your assignment. Please include an appendix in your assignment

that contains the Python code your group used to produce ALL outputs in your

assignment. A heavy penalty will apply if the Python code is not supplied (or

the code supplied does not run or work when the marker tries to run it).

Key requirements:

Pre-analysis instructions for data:

Please include the python code from the Jupyter notebook file “grp assnt gendata.ipynb”

in your Jupyter notebook file to input and clean the data. Collect the student ID num-

bers for the members of your group and then add these numbers together. Input the

result into the python code where instructed. Run the subsequent code to generate

two datasets: “train” and “test”. Most analysis you do will only use the “train” data

set. Any forecasting your group does will only use the “test” dataset. The purpose

of these commands is to ensure that each group receives different randomly selected

datasets for “train”ing and “test”ing purposes. Two other python codes are included

in case you need it: forward selection.py and backword selection.py

Business problem:

The US Department of Energy Office of Energy Efficiency runs a website www.fueleconomy.gov

which is the official source for fuel economy information for consumers and organisa-

tions in the US. The US government is interested in understanding the drivers of fuel

economy in a large range of vehicles for private consumer, organisational and govern-

ment use in the US. In particular, they are very interested in the effect of a variable

called engine displacement, which is the total volume of all the cylinders in an engine,

on fuel economy in vehicles. They wish to build a model that can accurately predict

the level of fuel economy for the cars in their database, so they can improve their un-

derstanding and communicate this, and also make better recommendations, on their

website. Your group has been commissioned to research on and analyse the data pro-

vided and then report back to the Department of Energy Office of Energy Efficiency,

3principally regarding the major goals they are interested in.

Data and Description:

Please see the file Fueleconomy.pdf for information on the variables and data collected.

The data used here are from a wide range of cars manufactured in the years 1984-2023

and is available at https://www.fueleconomy.gov/feg/ws/index.shtml. The dataset at

this site is in the file “vehicles.csv”. Please see Fueleconomy.pdf for descriptions of

the variables in the study and for more information. The measure of fuel economy to

be used is the average miles per gallon MPG achieved over various tested journeys for

each car, labelled comb08 in the dataset.

Goals and primary questions:

There are three primary goals that the Department of Energy Office of Energy Effi-

ciency would like your group to focus on:

(a) Understand the relationship between fuel economy and primarily engine dis-

placement;

(b) Develop a causal model for fuel economy, that includes engine displacement;

(c) Develop an optimal model for predicting fuel economy, as well understand the

relationship between fuel economy and the optimal set of useful explanatory

variables.

(d) Understand how the useful predictor variables interact to help explain the vari-

ations in fuel economy.

The focus is on vehicles that use either only a single fuel, being only petrol or only

diesel: cars that employ electricity or gas to power them (solely or hybrid) are not to be

considered in your analysis. Only cars made in the years 1984-2022 should be included.

As in many real data sets, there are many extraneous variables here, including other

potential response variables, all of which are not suitable to be included as explanatory

variables in any predictive or causal models for fuel economy. This includes several

variables to do with electric or gas or hybrid cars, and many others, all of which should

be ignored. These variables are removed by the code in “grp assnt gendata.ipynb”.

4Tasks:

1. (6 marks) Conduct a suitable exploratory analysis on this dataset; specifically one

that is relevant to the goals of this study.

2. (6 marks) Analyse the relationship between fuel economy (MPG) and displacement

and test the significance of this relationship using a suitably chosen SLR. Include a

discussion of whether the assumptions of your analysis and test could hold for this data

and whether and how strongly the data actually fits the model.

3. (12 marks)

a. Discuss which variables in the dataset could be causing omitted variable bias in

your analysis in task 2, and justify clearly why you think that. (3 marks)

b. Include these omitted variables, together with displacement, in a standard MLR

model, without any transformations or interactions or nonlinear effects; then fit

the model and present and interpret the estimated model. (3 marks)

c. Assess the (partial) relationship between MPG and displacement, and include

a discussion of whether the assumptions could hold for this data and whether

and how well the data actually fits the model. (3 marks)

d. Also discuss the level and sources of multi-collinearity present and whether you

think this is problematic, or not, and why; and if so, problematic for what? (3

marks)

4. (6 marks) Conduct a variable and model selection exercise, including some poten-

tial interaction effects and also considering some transformations/nonlinear effects for

the regressors and/or response variable. You must properly motivate and discuss all

your choices here.

5. (6 marks) Provide a summary of the comparison of the strength of model fits

over at least 5 different models/transformations/variable sets that you tried, all while

forcing displacement to stay in the model in some form. Discuss your findings.

56. (6 marks) Fully report a diagnostic analysis on the final ”optimal” model, as well

as briefly discussing any collinearity issues it may have. Also, if there are any nonlinear

effects in this model, clearly discuss and illustrate their effects on MPG.

7. (6 marks) Discuss your results and conclusions regarding the overall goals of this

study, in light of the results from your overall analysis of the “train” dataset. Be

technical but clear here. Also, interpret the effect of displacement on fuel economy,

using (at least) the optimal model.

8. (6 marks) Using (at least) the 3 best model specifications considered so far (and any

others you think relevant), generate forecast predictions in the“test” dataset for MPG.

Present a summary table, and suitable plot(s), of the forecasts and their accuracy,

using the forecast measures RMSE, MAD and forecast R2.

9. (5 marks) Re-discuss your results and conclusions regarding the overall goals of

this study, in light of these results and your overall analysis. Be technical but clear

here.

10. (5 marks) Write a final report, in as close to plain English as is practical and

possible, that discusses and summarises your analysis above and gives conclusions on

the overall goals of this study. Address the report to, and write it at a level appropriate

for, the Department of Energy Office of Energy Efficiency, who may not be that savvy in

business analytics. Include in your report a recommendation for what the Department

should spend money on in order to increase efficiency of road transport in general; plus

any suggestions for future studies they should do to better achieve the goals they have.


相关文章

版权所有:留学生编程辅导网 2020 All Rights Reserved 联系方式:QQ:99515681 微信:codinghelp 电子信箱:99515681@qq.com
免责声明:本站部分内容从网络整理而来,只供参考!如有版权问题可联系本站删除。 站长地图

python代写
微信客服:codinghelp