STATS 201/208 Data Analysis
Assignment 4, First Semester, 2019
Due: 3pm Thursday 30th May
Instructions concerning this assignment:
We are providing you an R Markdown document called STATS20x_2019_S1_A4.rmd
(available on Canvas) which will have some answers already filled in. You will need to fill in
and complete the rest of the document. The data files you will be using for the assignment are
described in the questions and are available from Canvas. Make sure you put these datasets in
the same place you put the R markdown document because it is going to look for them there.
The first change you need to make to the markdown document is put your name and ID number at the top.
Notes:
This assignment is worth 7% of your final mark and requires a substantial amount of work. Do
not leave it until the last few days.
Late assignments are not accepted unless there is a good reason for an extension being granted
(usually medical requiring a medical certificate).
The total marks for this assignment will be 35 (this includes 4 marks for presentation and
communication) which will be converted to a mark out of 10 for recording. Most of the marks
for assignments will tend to be for interpretation.
There are 4 Presentation and Communication marks for this assignment lost for not doing the
following:
Coversheet. Using and filling in the correct coversheet.
Name and ID number at top of R Markdown document.
Space saving and printing assignment 2-up. Not printing out unnecessary output (listing
data sets or showing erroneous R output). Assignment work printed out in "2-up" layout. 2-up
layout prints 2 pages side-by-side reduced to one page.
Readability. This is for your general communication ability in the assignment. This includes
sentences clearly conveying the correct idea; sentences making sense; comments not being
excessively long or short; conclusions following logically from previous statements.
Use of Natural Language in Executive Summaries. In executive summaries, this is for
discussing the analysis in context, not using variable names, using units when known and
rounding sensibly.
Keeping to the Point in Executive Summaries. In executive summaries this is for not going
into far more detail than required.
It is your responsibility to back up your computer files. If you are using your own computer, it is
your responsibility to ensure that you can access the data and run
R and R Studio well ahead of
the assignment due date. Technical problems outside our control are not accepted as excuses
for submitting coursework late.
Question 1. [11 Marks]
A researcher was interested in comparing e‐mailhabits of adults, depending on their
maritalstatus. They collected the following data from the U.S. GeneralSocial Surveys.
This data is can be found in the file "email.txt", with variables:
email the number of e-mails a person sent in the last 24 hours (day) recorded at
the time of the survey.
marital The persons current marital status (Married , Divorced or Never).
Fit a GLM modelling the number of e-mails sent as counts. Consider all possible pairwise
comparisons of marital status.
Write a Method and Assumption Checks section.
Write an Executive Summary.
Question 2. [20 Marks]
A series of tests were carried out to investigate the effectiveness of two variants of a
insecticide called rotenone. Rotenone is a naturally occurring chemical with insect-killing
properties, obtained from the roots of several tropical and subtropical plant species belonging
to the genus Lonchocarpus or Derris. It is used in home gardens for insect control.
For each test, a number of insects were exposed to a randomly assigned dose of either variant
A or variant B of the insecticide. The number of insects that died and survived in each group
were then recorded.
We want to know the effect of increasing the dosage of the insecticide and are particularly
interested in the effectiveness at a dosage of 1 unit. This is because it is thought that there are
some harmful environmental side effects with high doses of the pesticide. We also want to
know if there is a difference in their effectiveness between the two variants and whether any
difference depends on the level of dosage.
The data is can be found in the file "rotenone.csv", with variables:
Dead The number of dead insects from that particular test.
Alive The number of insects surviving that particular test.
Dose The dose of the poison (e.g., concentration) of the insecticide.
Poison The variant of the poison. Either A or B.
Comment on the plots of the data.
Fit a logistic regression model with an interaction term between Poison and Dose.
Determine whether the model can be simplified (i.e., apply Ockham's razor), and
determine an appropriate final model. (Remember – only drop ONE term from your model
at a time.)
For both variants of the insecticide, use your model to predict the proportion of insects that
will be killed if the dose level is 1.
Write a Method and Assumption Checks section.
Write an Executive Summary.
版权所有:留学生编程辅导网 2020 All Rights Reserved 联系方式:QQ:99515681 微信:codinghelp 电子信箱:99515681@qq.com
免责声明:本站部分内容从网络整理而来,只供参考!如有版权问题可联系本站删除。