联系方式

  • QQ:99515681
  • 邮箱:99515681@qq.com
  • 工作时间:8:00-23:00
  • 微信:codinghelp

您当前位置:首页 >> Algorithm 算法作业Algorithm 算法作业

日期:2019-10-12 11:39

STAT 385 Fall 2019 - Homework Assignment 03

Due by 12:00 PM 10/13/2019

The Homework Problems

Below you will find problems for you to complete as an individual. It is fine to discuss the homework problems with classmates, but cheating is prohibited and will be harshly penalized if detected.


1. Create a custom volume measurement function that will convert the following units of volume:

13 imperial (liquid) cups to cubic inches.


2.5 US customary (liquid) gallons to fluid ounces.


3 US customary (dry) teaspoons to milliliters.


75 (dry) liters to imperial quarts.


2. Do the following:

create a 25 ×× 25 matrix with autoregressive structure with p=9/10p=9/10, every element in the matrix should be equal to (9/10)|i−j|(9/10)|i−j| where  i is the row index and j is the column index. Report the row and column sums of this matrix.


run the commands:


set.seed(13)

x <- c(10, 10)

n <- 2

Create a while loop which concatenates a new mean-zero normal random variables that have σ=2σ=2 to the existing vector x at every iteration. Have this loop terminate when the standard error (estimated standard deviation of x divided by n−−√n) is lower than 1/10. Report nn.


repeat part b and report nn after running the commands:

set.seed(13)

x <- rnorm(0, sd = 2)

n <- 1

The sample size required to get a standard error lower than 1/10 was smaller in part c than it was in part b. We would expect for this to be the case before we ran any code. Why?

3. Do the following (Efron’s bootstrap):

load in the dataset dataHW3.csv


call the first column of this dataset x. Compute the statistic (mean(x) - 10)/se(x) where se is shorthand for standard error (see the previous problem for the definition of standard error).


now resample the elements of x with replacement 10000 times, and compute and store the statistic (mean(x’) - mean(x))/se(x’) at each iteration where x’ corresponds to the resample of the elements of x. Call the vector which contains these reasampled statistics `resamples’. Use an apply function for this part.


run the command `hist(resamples, breaks = 20)’ to make a histogram, include this histogram in your assignment.


repeat parts b through d with respect to the second column of dataHW3.csv. Would you say that the test statistic calculated from each column has the same distribution?


4. Do the following:

make sure you have the dataset WPP2010.csv (your file location may need to change) and then run the commands:

# load in UN dataset and remove irrelevant variables

options(warn=-1)

WPP2010 <- read.csv("WPP2010.csv", header = TRUE)

colnames(WPP2010)[3] <- c("region")

colnames(WPP2010)[6] <- c("year")

colnames(WPP2010)[7:17] <- paste("age", 0:10 * 5, sep = "")

WPP2010 <- WPP2010[, c(3, 6, 11, 12)]


# restrict attention to countries of interest

countries <- c("Canada", "Mexico", "United States of America")


# obtain population data for all countries for all years

dataset <- WPP2010[WPP2010[, 1] %in% countries, ]

dataset[, 3] <- as.numeric(levels(dataset[, 3]))[dataset[, 3]]

dataset[, 4] <- as.numeric(levels(dataset[, 4]))[dataset[, 4]]

dataset[, 3:4] <- dataset[, 3:4] / 1000


# get population dataset for this analysis corresponding to the

# Census years

dataset.years <- dataset[dataset[, 2] %in%

 c("1960", "1970", "1980", "1990", "2000", "2010"), ]

dataset.years[, 2] <- factor(dataset.years[, 2])

dataset.years.list <- split(dataset.years, f = as.factor(dataset.years[, 2]))

pops <- unlist(lapply(dataset.years.list, function(x) sum(x[, 3:4])))

The code in part a is partially commented. Add comments to all remaining lines of code to make the script clear.


Determine the proportion of mainland North American males aged 20-29 that lived in 1970 or before.


5. With the tidyverse package and its functions, do the following with the CCSO Bookings Data:

show only the 2012 bookings for people ages 17-23 years old not residing in Illinois and show the data dimension


show only the bookings for people who have employment status as “student” booked after the year 2012 residing in Danville and show the data dimension


show only the bookings for Asian people residing in the cities of Champaign or Urbana and show the data dimension


repeat parts a-c using only pipe operators


Select in-class tasks

Completion of select in-class tasks will be worth 1 point and will be graded largely by completion. Obvious errors and incomplete work will recieve deductions. Problems 3-5 are directly copied from your notes. Problems 1-2 are copied from the notes with minor alterations. In these problems I ask that you display the first 5 rows of the dataset instead of the entire dataset.


Load in the CCSO dataset, discover 3 factor (or categorical) variables and 3 numeric variables. Show the first 5 rows of this dataset with only those 6 variables.


Rename one of the factor variables to a name that is either easier to understand than the original variable name. Show the first 5 rows of the dataset with all variables such that the variable with the new name is the first column in the dataset.


Write 3 separate loops: a for loop, while loop, and repeat loop that give the same result. The result should be the cumulative sum of Days in jail among Black people whose Arrest Ages 18-24 with Student as Employment status within the CCSO Bookings Data.


Here are some images of R code. Read the code, debug it if necessary, and judge it on its efficiency and correctness. Decide on which set of code is better and improve the better one.


Using the vector y below

set.seed(385)

y <- rnorm(100)

Use the which.min and which.max functions to dispay the index corresponding to the minimum and maximum elelments of y.


Do the which.min and which.max functions work? (try: max(y) == y[which.max(y)]).


Use the which function and the length function to report the proportion of the elements of y that are greater than 0.


Discuss why the proportion in part c is close to 0.5. Hint: What is the mean of the normal distribution that generated the elements in y?


Create a factor variable with 50 values of A and 50 values of B, and name this factor variable trt.


Create a data frame consisting of x and trt.


版权所有:留学生编程辅导网 2020 All Rights Reserved 联系方式:QQ:99515681 微信:codinghelp 电子信箱:99515681@qq.com
免责声明:本站部分内容从网络整理而来,只供参考!如有版权问题可联系本站删除。 站长地图

python代写
微信客服:codinghelp