联系方式

  • QQ:99515681
  • 邮箱:99515681@qq.com
  • 工作时间:8:00-23:00
  • 微信:codinghelp

您当前位置:首页 >> Algorithm 算法作业Algorithm 算法作业

日期:2019-11-23 10:53

Guideline of R Project

Overview

In this project, you will use R to formulate and answer a series of specific questions about a data set

of your choice. You are expected to:

- Form a group of 5 teammates.

- Only 1 group can be of 4, under the special permission from the instructor.

- Identify a dataset of interest

- Perform exploratory analysis with R to understand the data

- Investigate hypotheses (i.e., potential questions you want to answer by analyzing this dataset),

and develop preliminary insights

- Prepare a report in Word: include a set of at least 6 visualizations that illustrate your findings,

and interpret these visualizations

- Prepare a presentation in PPT: to share your findings to class

Final Deliverables and Important Dates

1. Proposal

- A 1-page proposal consisting of

 Title

 team formation

 Dataset of your choice

 Background information: what it is about?

 What attributes/fields are available, how many records?

 The source of the dataset (e.g., web link)

 Sample records (e.g., the first 10)

 Propose an initial set of at least 3 questions you’d like to investigate

 You also need to submit the downloaded raw dataset with your proposal

- due on Nov. 15, submit to Moodle.

2. Presentation

- Nov. 29 (the last class)

- see details below

3. R project of solution

- A self-contained project file with source code and raw data

- due on Dec. 3

- submit to Moodle by the team leader

4. Final report

- See details below

- due on Dec. 3, submit to Moodle

- submit to Moodle by the team leader

5. Peer evaluation

- See details below

- due on Dec. 3

- submit individually to Moodle

2 | P a g e

Details

Data Selection and Preparation

- First, choose a topic of interest to your team and find a dataset that can provide insights into

that topic. See recommended sources at the end of this guideline.

- Please check with the instructor to ensure it is appropriate for this assignment, and write a 1-

page proposal

- Be advised that data collection and preparation (also known as data wrangling) can be a very

time-consuming process. Be sure you have sufficient time to conduct exploratory analysis,

after preparing the data.

Exploratory and Visual Analysis

You are expected to perform an exploratory analysis of your dataset using R. You should consider

two different phases of exploration.

- In the first phase, you should seek to gain an overview of the shape & structure of your dataset.

What variables does the dataset contain? How are they distributed? Are there any notable data

quality issues? Are there any surprising relationships among the variables?

- In the second phase, you should investigate your initial questions, as well as any new questions

that arise during your exploration, if any. For each question, start by creating a visualization

that might provide a useful answer. Then refine the visualization (for example, by adding

additional variables, changing sorting or axis scales, filtering or subsetting data, etc.) to

develop better perspectives and explore unexpected observations. You should repeat this

process for each of your questions, but feel free to revise your questions or branch off to

explore new questions if the data allow.

Group Presentation

- Design your presentation slides

- Presentation: a 5-minutes storytelling of your work; 2 minutes for Q&A

- Introduce your data and background information, hypotheses/questions, results, and

discuss limitations/future directions.

- Try to make it interesting and rich in information (if time allows).

- Do NOT highlight the technical details of your work (such as code, functions, special

tricks, etc.) during the presentation. Focus on storytelling.

- Due to the short time available, choose 1 or 2 representatives to present. However, all

members must attend and prepare for Q&A.

Coding

- This is an R project, you are expected to use R to process data and present results throughout

the entire project (rather than Excel, Power BI, etc.)

- Create a self-contained R project folder (refer to the structure requirement in the first R

assignment)

- Provide appropriate comments to your code

- Working code – your code should run without any error (tip: try it on different computers)

- Results should be consistent with those in your report and presentation

- Zip the whole project directory into a compressed package, and submit to Moodle, including

- Your raw data

3 | P a g e

- Your code

- Anything else you use

Final Report

Your final submission will be a written report. Focus on the answers to your initial questions. If

applicable, describe surprises as well as challenges encountered along the way, e.g. data quality

issues. Each visualization image should be accompanied with a title and short caption (<2

sentences). Provide sufficient detail for each caption such that anyone could read through your

report and understand your findings. Feel free to annotate your images to draw attention to specific

features of the data.

- Recommended report outline (revise or enhance if needed)

 Title page. (report title and team members)

 Abstract (No more than 150 words)

 Data descriptions – introducing the dataset and related background information. You

should indicate the source of data.

 Research Questions – introducing the questions you want to answer, and the

motivation.

 Results – analytical results and visualizations

 Summary – briefly summarize and discuss your findings

 Future Work - A description of how your solution could be extended or improved

 References – literatures you have used

 Do NOT put code into this report. The code should be submitted separately.

General Grading Criteria

- Poses clear questions applicable to the chosen dataset.

- Appropriate data wrangling (preprocessing) and exploratory data analysis (EDA)

- Breadth and depth of analysis

- Expressive & effective visualizations appropriate to analysis questions.

- Clearly written, understandable captions that communicate primary insights.

- Originality. Submissions will be checked by Turnitin for originality report. Remember to cite

property for any references.

Detailed Grading Components (totally 100 points)

o Part 1: proposal (10 points)

o Part 2: report (30 points)

- In general, the report will be graded on its content (correctness and accuracy), breadth and

depth of discussion, report structure, originality, and writing quality.

o Part 3: presentation content (20 points, delivered by 1 or 2 representatives)

- Slide design

- Correct and accurate information, logical arguments

- Content richness (relevant and rich information, well-defined terms)

- Presentation delivery (preparation, expression clarity)

- Ability to answer questions

- Time management

o Part 4: coding (30 points)

4 | P a g e

- Working code

- Code readability, necessary comments

- Output consistent with report

- Originality

- A well-structured self-contained project

o Part 5: peer evaluation (10 points, individual-based evaluation)

- The evaluation in this part is based on the average contribution percentage (CP) through

intra-group peer evaluation. Each student is expected to submit his/her evaluation separately

to Moodle.

- Your CP = average(intra-group evaluation of your contribution)

- For group of 5, for example, the equal-contribution percentage (ECP) is 100% ÷ 5 = 20%

- You may gain all 10 points if your CP = ECP. You may gain as high as 15 points in this part,

if your CP is significantly higher than ECP; and as low as 0 points in this part, if your CP is

significantly lower than ECP.

Data Sources

- Open databases

o Kaggle datasets

o Awesome Public Datasets: topic-centric list of high-quality open datasets in public domains

o Macau government open database: Macau regional statistics

o Chinese government open databases: Provided by Chinese National Statistical Bureau

o Databases in business-related subjects: commercial databases available in UM library, only

accessible in UM

- Unopen datasets

o You may also choose datasets that are not open to public. In such a case, please indicate the

source of data.

- Notes and hints

o You are recommended to choose a business-related dataset; Interesting datasets in other

domains are also good choices.

o You are not recommended to choose datasets in a highly specialized domain (e.g., biology,

physics, etc.), unless you are very familiar with this domain.

o Choose the dataset that comes with sufficient descriptions and/or background information.

It is not wise to choose a dataset with little additional descriptions. As such you will have to

guess the meaning of its attributes and values.


版权所有:留学生编程辅导网 2020 All Rights Reserved 联系方式:QQ:99515681 微信:codinghelp 电子信箱:99515681@qq.com
免责声明:本站部分内容从网络整理而来,只供参考!如有版权问题可联系本站删除。 站长地图

python代写
微信客服:codinghelp