Part 1 (10 points)
The following file contains information on capital cities as well as the rural and urban population in UN countries:
SYB61_T03_Population Growth Rates in Urban areas and Capital cities.xlsxDownload SYB61_T03_Population Growth Rates in Urban areas and Capital cities.xlsx
1. Examined the file in R. How many unique countries are there (HINT: Africa is not a country, and neither is Puerto Rico, so you need to come up with a strategy to take that into account)? Are the data in a long or wide form? What wide-format variables are available?
2. You need to make sure that only one type of information is stored in each column. If that is not the case, pivot the data using applicable functions from the tidyverse package. Refer to the cheat sheets and class code for help.
3. We want to merge the gapminder data with this new file. The problem is that gapminder has data for 2007, but the new file only has data for 2005 and 2010. The new file also does not have data on the size of the total population of the country. Predict the total country population in 2007 from this UN file using the available information and list all the assumptions you made in the process. You might approach the problem differently, but make sure to describe exactly what you did.
HINT: think about assumptions that need to be made about the changes in population from year to year. How likely is it that these changes will be very drastic? What can cause drastic changes? If you do not expect drastic causes, can the population just change slowly and steadily?
4. Merge the data using an appropriate command from the tidyverse. HINT: The problem here is that gapminder and the UN name countries inconsistently. You might want to develop a strategy to account for it and describe it.
5. Compare the predictions for 2007 with the data available in gapminder. Remember that the gapminder dataset already has the (presumably correct) population data for 2007. Calculate the absolute and the relative error of prediction using your technique of interpolation for each country in the data. What is the smallest error? What is the largest error? How on average did your interpolation perform?
Part 2. (10 points)
Examine the experiment data. You do not need to do any statistical testing when answering the questions, just describe the data using measures of central tendency and measures of variability. Make sure to always verbally explain the results. Even if your code is correct, not giving a verbal answer will result in the loss of points.
1. Examine groups by the type of message they received. Is the composition of groups similar by age, gender, and self-reported risk? What is it telling us about the validity of the experimental data?
2. Are there differences in cheating behavior. by gender?
3. Split people into two groups using the median age. Compare the cheating behavior. of those two groups.
4. Compare the cheating behavior. of people by year. Is your year different from previous years?
5. Compare the proportion of correct responses in each round of the experiment (1 through 20). Do you observe any changes in the behavior. over time?
NOTE: Every time there is an ambiguity in the phrasing or a possibility for multiple solutions, is it up to you to decide on how to resolve them. As long as you explain what decisions you made and why you made them, there should be no problems if you reach different answers than your peers.
版权所有:留学生编程辅导网 2020 All Rights Reserved 联系方式:QQ:99515681 微信:codinghelp 电子信箱:99515681@qq.com
免责声明:本站部分内容从网络整理而来,只供参考!如有版权问题可联系本站删除。