联系方式

  • QQ:99515681
  • 邮箱:99515681@qq.com
  • 工作时间:8:00-23:00
  • 微信:codinghelp2

您当前位置:首页 >> Algorithm 算法作业Algorithm 算法作业

日期:2022-11-17 10:26

STA 4373 – Computational Methods in Statistics

Fall 2022

STA 4373 Assignment 1

Instructions.

In this assignment you’ll analyze a college sports dataset and create a PDF of your results using the Quarto

template I’ve posted to Canvas. Quarto essentially functions equivalently to R Markdown but will likely

become a kind of successor to that technology. Like R Markdown, Quarto interweaves code with text using

Markdown syntax, and presently most R Markdown files (file extension .Rmd) can be changed to Quarto files

(file extension .qmd) and compile in exactly the same way. You will need to download and install Quarto on

your machine, see the above link.

When you turn in the file, the filename of the turn-in should be last names separated by dashes and

terminated with -1.pdf. For example, if Joe Shmo, Jane Doe, Mickey Mouse worked together, they would

turn in shmo-doe-mouse-1.pdf.

You may use your text and work in groups of size up to three. Only one delegate of your team will submit

the resulting PDF on Canvas. The PDF should have the names of each of the collaborators on top. The

main advantage to working in a group is that you can bounce ideas off one another, and hopefully uncover

more interesting features of the data.

You may use the internet to access the text’s wepage, other websites directly linked in this document, and

other general-purpose data science in R questions. However, you may not read or use any analyses of this

or related datasets you find online. Failure to follow this rule may be considered a violation of this course’s

academic integrity policy. If you have any questions about this, please contact me.

Please put a new page break before each question so each question starts on its own page (this will

facilitate grading) and never provide output that runs over more than one page!

College athletics.

According to Data is Plural, a well-known data science blog:

The Equity in Athletics Disclosure Act (EADA) requires thousands of US colleges to provide

annual data on athletic participation, staffing, and finances by team gender and sport. School-

and team-level datasets are available through the Department of Education for the academic

years ending 2003–19.

The good folks at TidyTuesday, a popular online data-science education community, have scraped and

aggregated some of this data and posted it to GitHub. You can download the dataset using the tidytuesdayR

package like this:

tuesdata <- tidytuesdayR::tt_load(2022, week = 13)

sports <- tuesdata$sports

A brief explanation of the data, along with the code used to pull it off of the DoE’s webpage, can be seen at

the GitHub link above.

Note: This assignment involves the analysis of an interesting dataset that can easily lead to controversy.

As is so often the case, such data tend to speak to phenomena that are surprisingly nuanced. In such

situations, try to be slow to form opinions, skeptical of the ones you do form, and limit any conclusions you

draw to analyses you yourself have conducted on the data at hand.

1

Questions.

1. Describe the dataset: how many observations does it contain?; what does each represent?; how many

variables are present?

2. classification name refers to the league that the team plays in. Count how many observations

pertain to each level of classification name.

3. NCAA Division I-FBS corresponds to the Football Bowl Subdivision of Division 1 NCAA sports, read

the top blurb here for a brief introduction.

Make a dataset div 1 fbs that contains only the observations from Division 1 FBS schools. glimpse()

the dataset to show you’ve succeeded.

4. How many Division 1 FBS schools are in the dataset?

5. Presumably the revenues of a team are realized after the expenditures. Make a scatterplot of the

revenues (y) vs expenditures (x) colored by sex. Polish the graphic so that the axes are clearly legible

and understood. Use geom_function(fun = ~ .x, linetype = 2) to add a reference break-even line

to the graphic.

Hint 1: See slide 52 in the data visualization slides. Also, check out scale color manual().

Hint 2: In your axes scales, use labels = label dollar(suffix = "mil", scale = 1e-6); it’ll

make them look much better!

Hint 3: Use na.rm = TRUE in your geom calls to suppress missing values, which come from, e.g., not

having female football teams.

6. Comment on the graphic according to the standard criteria: 1) general trend, 2) local behavior, 3)

outliers. Be sure to comment on the difference between the two groups in the process.

7. Re-create the previous graphic using log based 10 scales. Address overplotting with alpha blending

and shrinking point size (to the reasonable extent possible). Remove the break-even line.

Note: start the limits of the axes at $100k in order to eliminate lower-value amounts.

Hint: Setting aesthetics for point layers will percolate into the legend, so that if you make the points

very transparent and small, the legend will be hard to see. You can override those set aesthetic values

in the legend by adding this:

guides(

color = guide_legend(override.aes = list(size = 3, alpha = 1))

)

Hint 2: The log axes will make the scale such that it’s possible to distinguish among values in the

thousand range and the million range simultaneously. A clean way to address this problem is to set

scale cut = c(0, k = 1e3, m = 1e6) in label dollar(); see the documentation of that function

for details. Be sure to put in many breaks (not just 3, say) to illustrate the scale.

Hint 3: Try label dollar(scale cut = c(0, k = 1e3, m = 1e6)) here.

8. What proportion of men’s sports’ expenditures are the same or within $2 dollars (say) of their revenues?

9. If we assume expenditures are investments intended to generate revenue, the return on investment

(ROI) of each team on a per-dollar-spent basis is simply the ratio of revenues to expenditures. Create

new variables roi men and roi women and add them to the div 1 fbs dataset.

2

10. Investigate and compare the distribution of men’s teams’ and women’s teams’ ROIs. How are they

similar? How are they different? Please present no more than 3 graphics in your write-up (not including

combinations of graphics made with patchwork). Comment briefly to explain your line of reasoning.

Note: this is not asking you to look at their joint distribution.

Hint: Remember you can use the patchwork package to put more than one graphic in a figure!

11. Which sports seem to be garner the highest ROI for each sex? Provide graphics to support your

conclusion.

Hint: You can use drop na(roi men) (for example) to drop out levels for sports that aren’t played by

one sex or the other.

12. Restricting our attention to basketball, where the team sizes and game requirements are more or less

the same, compare the expenditures of Division 1 FBS schools for men and women by visualizing the

joint distribution of the two. Comment briefly.


相关文章

版权所有:留学生编程辅导网 2020 All Rights Reserved 联系方式:QQ:99515681 微信:codinghelp 电子信箱:99515681@qq.com
免责声明:本站部分内容从网络整理而来,只供参考!如有版权问题可联系本站删除。 站长地图

python代写
微信客服:codinghelp