联系方式

  • QQ:99515681
  • 邮箱:99515681@qq.com
  • 工作时间:8:00-23:00
  • 微信:codinghelp

您当前位置:首页 >> Algorithm 算法作业Algorithm 算法作业

日期:2019-12-06 10:48

ISTA 116 Final Project

October 22, 2019

1 Overview

In your final project, you will apply the methods and techniques that you have learned in ISTA 116 to the

analysis of real-world data. The main goal of this project is to formulate a statistically answerable question

and address it through the analysis and/or collection of data, using summary statistics, data visualization,

and statistical inference.

1.1 Public Data Option

One option for completing this project is to analyze publicly available data. Data is available from several

public repositories, some focused toward statistics education and some with a broader scope. By using public

data, you will have a lot of flexibility in your choice of topic, but you should begin your data search early to

ensure that you are able to find data that is useful for answering your question.

1.2 Data Collection Option

If you prefer, you may collect data yourself using a survey or another method of data collection. If you

choose to survey people, you should make an effort to avoid convenience sampling or other potentially biased

methods of sampling.

The data collection option may require less time researching data, but will require you to begin earlier

to ensure you have enough time to collect the data you need.

1.3 Suggested data sources

If you choose to use publicly available data, a couple of sources are suggested below. This list may be

updated over time!

? The Data and Story Library (DASL) at dasl.datadescription.com. This has a collection of data

sets intended for statistics students to learn on

? data.gov offers a large amount of government data. Subjects include economic data, energy and climate

data, etc. There is a huge amount of data here, but it can be somewhat difficult to search through it

all and find a good data set.

These are not the only acceptable sources of data! Of course, there is a lot of data available. If you find

another source of data that is more directly applicable to your topic, you are free to use it. However, you

should make sure that the source is reputable, and that you have enough information about how the data

was collected to assess whether good data collection procedures were used.

1

2 Requirements

Your project submission will consist of two parts: a report and an R script. The report is a full description

of your project, including the question that you set out to answer, the source of data or data collection

methods you used, a visual and quantitative summary of the data you are investigating, and analysis using

inference methods such as confidence intervals, hypothesis tests, and regression models. Your report should

be 3-7 pages long including tables and graphs, and should have the following sections:

? An introduction in which you state the underlying research question and how you hope to answer it.

? A data/methods section in which you describe the source of your data, whether you are using a

publicly available data set or collecting your own. This should also include appropriate visualizations

of your data, in the form of tables and plots.

? An analysis section in which you describe the results of your statistical analysis, including: any associations

relevant to your question that you have observed; summary statistics describing the data that

you have collected/found, with confidence intervals where appropriate; the results of any hypothesis

tests you conducted; any linear or logistic regression models.

? A discussion section in which you state the conclusions that you have reached from your analysis.

Ideally, this will include a clear answer to the question that you posed in the introduction.

The R script accompanying your report should include the code that you used in your analysis. This

includes calculating summary statistics, creating plots, and performing statistical inference (confidence intervals,

hypothesis tests, and regression). Your script should be documented with comments briefly describing

what each piece of your code does.

3 Assessment

Your project will be evaluated based on the following qualities:

? Statistical question: Your project should attempt to answer a research question. This question

should be clearly stated, focused enough to be answered by the data available or the planned study,

and interesting.

? Data sourcing: Your project may use data that you collect by a survey or other type of study, or data

that is publicly available. Several suggested public data sources are available in another document.

If you choose to collect data with your own study, your report should address your sampling methods

and your study design, and note possible sources of bias in the data.

If you choose to use publicly available data, you should ensure that your data come from a reputable

source, and consider any limitations in the sampling or data collection process

? Display and Visualization: Your report should include appropriately chosen, well-labeled, and

accurate visualizations of your data, including tables, plots, and/or graphs.

? Analysis: Your report should include appropriately chosen summary statistics to describe the data

in your data set, and use inference methods such as confidence intervals, hypothesis tests, and linear

or logistic regression models for estimation of population parameters. Conditions should be checked

for all inference procedures, and your report should discuss the extent to which the results may be

generalized beyond the sample.

? Use of R: All of the calculations, summary statistics, plots, and inference described in your report

should be reproducible by the R script that you submit alongside it. This script should use functions

and techniques covered in class and in the lab assignments.

2

? Discussion, conclusion, and reflection: Your report should include a clear answer to the original

statistical question, consistent with the available data and the results of your visualization and analysis.

If a satisfactory answer to the question cannot be reached with the available data, you should discuss

what additional data might be needed.

If you collected your own data, include a discussion of what went well and what did not in your data

collection process. If you used publicly available data, you should discuss any weaknesses or limitations

of the data set that was available – is it missing important variables? was the data collected in a lessthan-optimal

manner?

Finally, your conclusion should propose some ideas for further study in the same area – this could be

a follow-up question informed by the results of your analysis, or a new study that could address some

of the limitations in the existing data.

4 Dates and Deadlines

In order to help your progress on the project, there are two intermediate deadlines. These intermediate

deadlines are each worth 10% of the total project points; if you miss an intermediate deadline, the point

value of your final submission will scale up to replace these. So, you should think of the intermediate

deadlines as a way to both “lock in” some of the points for the final project as well as to get some initial

feedback on your planned project.

4.1 Topic Proposal: October 31st

Your topic proposal should be a brief description (1-2 paragraphs) stating your research question and indicating

where you intend to acquire data. If you are choosing publicly available data, you should identify at least

one specific data set that you will use and its source; if you are planning to collect your own data, you should

have a description of how you will select your sample and make your observations. The proposal should

specify the variables of interest, which (if any) are explanatory and response variables, and any associations

you plan to investigate.

4.2 Summary of Methods: November 21st

In your summary of methods, you should submit a brief description of what methods you plan to use for

? Data visualization: which variables will you plot and how?

? Data summarization: which summary statistics (mean, median, etc.) will you calculate, for which

variables, and why?

? Inference methods: which types of confidence intervals, hypothesis tests, and regression models will

you use, and on which variables?

Your summary of methods does not have to include any of the results of this analysis, just a plan for what

you will do.

4.3 Rough Draft (Optional, December 5th)

We encourage you to submit a rough version of your project report ahead of the final deadline to get feedback.

This is not required and will not factor directly into your grade, but if you submit by the rough draft deadline

then we will be able to tell you any major aspects of your report that are missing or need revision.

3

4.4 Final Submission Deadline: December 17

The final project, including the report and R code, is due by 11:59 PM on Tuesday, December 17 (the date

assigned to ISTA 116 for final exams this semester).

4


版权所有:留学生编程辅导网 2020 All Rights Reserved 联系方式:QQ:99515681 微信:codinghelp 电子信箱:99515681@qq.com
免责声明:本站部分内容从网络整理而来,只供参考!如有版权问题可联系本站删除。 站长地图

python代写
微信客服:codinghelp