联系方式

  • QQ:99515681
  • 邮箱:99515681@qq.com
  • 工作时间:8:00-23:00
  • 微信:codinghelp

您当前位置:首页 >> Algorithm 算法作业Algorithm 算法作业

日期:2019-10-17 10:10

Homework 2: SDGB 7844

Submit two files through Blackboard: (a) .Rmd R Markdown file with answers and code

and (b) Word document of knitted R Markdown file. Your file should be named as follows:

“HW2-[Full Name]-[Class Time]” and include those details in the body of your file.

For those of you who have studied U.S. government, you know that Congress (legislature) is

made up of the House of Representatives and the Senate. The number of people each state

sends to the House is dependent on that state’s population, whereas every state sends two

people to the Senate.

A census of the U.S. population is required every ten years by the U.S. Constitution

(Article 1, Section 2). The primary purpose of the census is to determine how many representatives

each state will send to the House. This procedure is called apportionment

(link). There are 435 representatives in the House and each state sends at least once person.

Once the census is complete, the equal proportions method is used to apportion those

435 seats among the states.

The first census was conducted in 1790 when people were hired to visit each home and

count who lived there. At that time, only white males were eligible to vote, but according

to the Constitution everyone was to be counted, not just eligible voters or citizens. Slaves

were counted too, but were considered only three-fifths of a person (see Constitution Article

1, Section 2, Clause 3). This was abolished after the Civil War when the 13th Amendment

to the Constitution was ratified in 1865.

The Electoral College is a body which decides who is president. The number of House

members plus two equals the number of electoral votes each state gets. During a presidential

election, citizens technically vote for the Electoral College members (even though the presidential

candidates are on the ballot) and the Electoral College votes for president (link).

For all practical purposes, though, whichever candidate gets the most votes in the state gets

all of the electoral votes for that state. (Note: There are 538 electoral college members and

so 538 electoral votes. 538 = 435 House reps + 50 Senators + 3 people for the District of

Columbia. Therefore, whomever gets at least 270 electoral votes wins.)

The next census is in 2020 when, again, everyone will be counted. Every residential address

will receive a form to fill regarding the occupants of that residence. Between censuses, the

government keeps track of population changes through the Population Estimates Program

(PEP), which is administered by the U.S. Census Bureau (link).

1

Goal: Use 2018 Population Estimates Program (PEP) data to estimate the number

of House of Representative members for each state expected from the results

of the upcoming 2020 census. Compare your estimates with the current House

distribution which is based on the 2010 census1

.

Information Sources:

DO NOT CHANGE ANY OF THE FILE NAMES OR FILES THEMSELVES!!

• “PEP 2018 PEPANNRES with ann.csv”: 2018 population for each state from

the PEP from American FactFinder, a website maintained by the Census Bureau.

Instructions are at the end of this assignment.

• “ApportionmentPopulation2010.xls”: 2010 population for each state and the

2010 apportionment results. Instructions are at the end of this assignment.

• Equal proportions algorithm: In “Congressional Apportionment...” file posted

with this assignment.

• U.S. map: from the R package usmap . You need to install this package on your

computer and then load it by using the command require(usmap). See Lecture 3

slides for instructions on installing an R package.

1. What was the “residence rule” for the 2010 census and why is it important? (Use the

internet and provide a link for any sources you use.)

2. Upload the 2018 data file into R. Only keep the columns Geography; April 1, 2010

- Census; and Population Estimate (as of July 1) - 2018. Rename the columns

state; res2010; and pep2018 (all lowercase).

(a) There are 50 states, so why are there more than 50 rows in the data set?

(b) What is the resident population of the U.S. according to the 2010 census? Which

geographies are included/excluded from this total? Remove the extra rows from

your 2018 PEP data set so you only have the data for the 50 states. (The functions

sum() and is.element() are useful here.)

(c) Calculate the percent change of the total resident population between the 2010

census and 2018. How much has the population grown? Once you’ve answered this

question, remove the res2010 column from the data set.

1Note: The population used for apportionment purposes is slightly higher than the resident populations

given in the 2018 data file. That is because people like overseas military members are included as part of

their home state population totals for apportionment purposes. That means our 2018 population values will

undercount the population used for 2020 apportionment.

Page 2 of 6

3. Upload the 2010 data file into R. This file has some extra bits, so the arguments skip

and n max in the read excel() function from the package readxl may be useful. Keep

the columns STATE; APPORTIONMENT POPULATION (APRIL 1, 2010); and APPORTIONED

REPRESENTATIVES BASED ON 2010 CENSUS. Rename them state;

appor2010; and rep2010 (again, all lowercase).

(a) Calculate the following summary statistics for the 2010 census population values

and put them into a table in Word: minimum, maximium, mean, median, and

standard deviation.

(b) Which state has the largest population? Which has the smallest? Where does New

York fall into the ranking of population size?

4. Create two histograms: (a) 2010 apportionment population and (b) log of the 2010

apportionment populaiton (log always means natural log in statistics). Describe the

shape of both distributions.

5. Looking at your histograms in Question 4, is the mean or the median a better measure

for center in each case? Justify your answer.

6. Create two scatter plots: (a) 2010 apportionment population on the x-axis and number

of House members on the y-axis; and (b) log of 2010 apportionment population on

the x-axis and number of House members on the y-axis. Which plot shows a clearer

relationship between the two variables? Can we use correlation, r, to represent the

relationships in either graph? Justify your answers.

7. Merge the the 2018 population data and the 2010 apportionment data into a single R

object called data.x. Estimate what the number of House members for each state would

be in 2020 based on your 2018 population data using the equal proportions method. Add

your calculated apportionment numbers as a new column in data.x.

The equal proportions method of calculating the number of House members is given in

the “Congressional Apportionment” report posted along with this assignment (additional

info). Read it first so you can understand the instructions given below.

Equal Proportions Method:

Step 1: Calculate a vector of values of the formula 1/

p

n(n − 1) where n goes from 2 to

60 and call it denom. This means that we are assuming that the maximum number

of seats for a state is 60, which seems reasonable given the 2010 representative

numbers. (Make sure you’ve merged your 2010 and 2018 data sets first.)

Step 2: Multiply each value of denom in Step 1 by each state’s 2018 population. For example,

each element in denom is multiplied by Alabama’s population and the repeated

Page 3 of 6

for Alaska, Arizona, etc. These values are called priority values:

P Vn =

state population

p

n(n − 1)

There are many ways to do this, but the simplest in terms of coding is to use some

matrix algebra: c(t(outer(data.x$pep2018, denom))) where outer() calculates

the outer product of two vectors, t() transposes the resulting matrix, and c()

converts the matrix into a vector.

Step 3: Create a data set with the priority values as one column and the corresponding

state names as a second column.

Step 4: Sort your data set in Step 3 in descending order by priority value so that the

highest priority values are on top. Extract the first 385 rows (435-50=385). Each

row of the resulting data set represents one seat in the House.

Step 5: Make a frequency table of the state names in Step 4 using the function count().

The frequency of each state is the initial number of representatives for that state.

Step 6: Merge your frequency table with data.x. Then, replace all NA counts with 0 using

the function replace na().

Step 7: Add 1 to each state representative count so that each state has at least one representative

and the total number of representatives equals 435.

Now, answer the following questions:

(a) Make a table in Word with the three states with the highest number of representatives.

What fraction of the total number of representatives do these 3 states

comprise? Currently, do the same states have the highest number of representatives?

(b) How many states have only a single House of Representatives member?

8. Calculate the following difference: (estimated 2020 house reps − 2010 house reps) as

a new column in data.x and convert it to a character data type Call this column

difference. Make a frequency table of the differences column in Word.

Page 4 of 6

9. A way of representing the information in Question 8 is by creating a map.

(a) Make a map of the US color-coded by the differences column. Then answer the

following questions.

(b) Why does the legend include an NA?

(c) Describe what you see in the map.

(d) Various research/media organizations have made their own predictions about distribution

of the House seats. Pick one and compare your results with their predictions.

Include links to any references you use.

(e) Describe one way we could improve our analysis.

Page 5 of 6

Downloading 2018 PEP Data

1. Go to the American FactFinder website:

https://factfinder.census.gov

2. In the section titled, “What We Provide” near the bottom, click on the “get data” link

next to Population Estimates Program.

3. Click on the table called PEPANNRES, “Annual Estimates of the Resident Population:

April 1, 2010 to July 1, 2018”. It should bring you to a table which looks like this:

4. Click on the Download button; select the “Use” option in the pop-up window and click

OK.

5. Unzip the downloaded file. The file you will be using is called

“PEP 2018 PEPANNRES with ann.csv” The other files in the folder contain information

about the data.

6. You can put the entire folder wherever you have your R code for this assignment.

When you upload the data, use the filepath

“PEP 2018 PEPANNRES/PEP 2018 PEPANNRES with ann.csv” to indicate that the

file you want is inside the folder called “PEP 2018 PEPANNRES”. That way you can

keep all of the information relevant to the data file together.

Downloading 2010 Apportionment Data

1. Go to this website:

https://www.census.gov/data/tables/2010/dec/2010-apportionment-data.html

2. Download the Excel file titled “Apportionment Population and Number...”

Page 6 of 6


版权所有:留学生编程辅导网 2020 All Rights Reserved 联系方式:QQ:99515681 微信:codinghelp 电子信箱:99515681@qq.com
免责声明:本站部分内容从网络整理而来,只供参考!如有版权问题可联系本站删除。 站长地图

python代写
微信客服:codinghelp