联系方式

  • QQ:99515681
  • 邮箱:99515681@qq.com
  • 工作时间:8:00-23:00
  • 微信:codinghelp

您当前位置:首页 >> Algorithm 算法作业Algorithm 算法作业

日期:2019-05-01 09:55

Instruction:

All calculations must be performed with R using the comma separated files provided on Canvas. For each question, I want the R output table included in the pdf-file using Courier New as the font. Use Arial or Times New Roman for the write-up. I would like you to turn in one(!) R script file that, when executed, loads the data from the .csv files and produces all the output. You should also upload one(!) pdf document that answers the questions below (e.g., interpretation of the results, etc.). Besides just estimating the models, also say something about the marginal effects (i.e., how a change in the independent variable changes the dependent variable).

Question 1: Grade Point Average (gpa.csv) (10 Points) Consider the equation

colgpa=β0 +β1 ·hsize+β2 ·hsize2 +β3 ·hsperc+β4 ·sat+β5 · female+β6 ·athlete+ε

where colgpa is cumulative college grade point average, hsize is size of high school graduating class (in hundreds), hsperc is academic percentile in graduating class, sat is combined SAT score, f emale is a binary gender variable, and athlete is a binary variable, which is one for student-athletes.

1.What are your expectations for the coefficients in this equation? Which ones are you unsure about?

2.Estimate the equation in part (1) and report the results in the usual form. What is the estimated GPA

differential between athletes and non-athletes? Is it statistically significant?

3.Drop sat from the model and re-estimate the equation. Now, what is the estimated effect of being an

athlete? Discuss why the estimate is different than that obtained in part (2).

4.In the model from part (1), allow the effect of being an athlete to differ by gender and test the null

hypothesis that there is no difference between women athletes and women non-athletes.

5.In the model from part (1), does the effect of sat on colgpa differ by gender? Justify your answer.

6.Make sure to test for heteroscedasticity in all your models.

1

Question 2: Housing Prices (housing.csv) (10 points)

This data set contains observations on housing selling prices. In your regression model, let y be the selling price of home in $1000 (price), x1 represents the size of the home (sqr f t), x2 are the number of bedrooms (bdrms), and x3 is the lot size in square feet (lotsize).

1.Construct a scatterplot matrix. You will need to use the function pairs for this. Interpret what the resulting scatterplot matrix shows.

2.Run the regression model in R. Write down the prediction equation, and interpret the coefficient of size of home by its effect.

3.Report the t statistic for testing H0: β2 = 0. Report the p-value for Ha: β2 < 0, and interpret.

4.Construct and examine the histogram of the residuals for the multiple regression model. What does

this describe?

Question 3: Housing Prices and Student Teacher Ratio (stratio.csv) (10 Points)

The variables are labeled as follows: price is the median housing price, nox represents nitrous oxide (pol- lutant) in parts per 100 million, stratio is the average student-teacher ratio, and proptax is the property tax.

1.Summarize the data in terms of number of observations, mean, standard deviation, minimum, and maximum.

2.Use price as the dependent variable and stratio as the independent variable. Run a linear regression. What is the intuition behind the negative coefficient for stratio?

3.Generate a scatter plot of the two variables and display the fitted line.

4.Generate the natural logarithm of the following variables: price, nox, and dist. Estimate the multi- variate regression model:

ln(price) = β0 + β1 · ln(nox) + β2 · ln(dist) + β3 · stratio

5.Report and interpret the results from your multivariate regression model.

6.Create an interaction variable between ln(proptax) and stratio and add the new variable to the regres- sion model. Interpret the result. Think about the relationship of property taxes and school funding at the local level.

Question 4: Happiness (happy.csv) (10 Points)

Using the data set happy.csv, generate an ordered logit regression model that regresses the dependent vari- able happiness on those variables that have the strongest potential causal relationship (see below). For your model, interpret the R output and indicate why each independent variable that is included in the model would contribute to better or worse health. Speak to the possible multicollinearity in the variables.

sexfreq: Frequency of sex during last year ? gun: Have gun in home

sclass: Subjective class identification

2

health1: Condition of health

happiness: General happiness

party: Political party affiliation. You may want to combine the “Ind, near democrat” and “Not str democrat” into the same category, e.g., “lean democrat”. Do the same for republicans.

education: Highest year of school completed

age: Age of respondent

Question 5: Hybrid Vehicles (hybrid.csv) (10 points)

Consider the problem of choosing whether or not to purchase a hybrid vehicle, e.g., the Toyota Prius, Honda Civic Hybrid, Ford Escape, etc. As an analyst, you assume that whether or not an individual purchases a hybrid depends upon the current price of gasoline (gas), the difference in purchase price of a hybrid vehicle compared to a comparably equipped vehicle (increment), college education which is represented by a dummy variable that equals 1 if individual has completed college and equals 0 otherwise (college), and a dummy variable that equals 1 if the individual is a member of an environmental organization, e.g., Nature Conservancy, National Audubon Society, (env). Answer the following questions contained in hybrid.csv:

1.Provide summary statistics (i.e., means, minimums, maximums, and standard deviations) for each variables.

2.Estimate a regression model that allows to calculate the probability that a person would buy a hybrid.

3.Usingyourparameterestimates,computetheprobabilitythatthefollowing“types”ofindividualswill buy a hybrid:

Type I: gasoline price = 2.50; difference in purchase price = 1,500; college = 0; and member of an environmental organization = 0

Type II: gasoline price = 3.50; difference in purchase price = 500; college = 1; and member of an environmental organization = 1

Type III: gasoline price = 3.00; difference in purchase price = 1,000; college = 1; and member of an environmental organization = 0

4.Giventheaboveprobabilities,calculatethemarginaleffectthatgasolinepriceshaveontheprobability that each of the three “types” of individuals will purchase a hybrid vehicle.

5.Given the above probabilities, calculate the marginal effect that joining an environmental group on Type I and Type III individuals.

6.The proportion of individuals in the population who would purchase a hybrid vehicle given a $500 rebate on the cost of purchasing a hybrid vehicle.

Question 6: Milk Demand (milk.csv) (10 points)

Consider the problem of modeling the weekly demand for milk (measured in gallons) (milk). As an analyst, you assume that the demand depends upon the current price of milk for the individual (price), the number of children in the household (kids), household income (measured in $10,000’s) (income), and a dummy variable that equals 1 if individual has completed college and equals 0 otherwise (college). Answer the following questions contained in milk.csv:

3

1.Provide summary statistics (i.e., means, minimums, maximums, and standard deviations) for each of these variables.

2.Estimate the model using a multivariate regression model (OLS).

3.Estimate the model using a technique that takes the truncation into account.

4.Using your parameter estimates from part 3, compute point estimates for the following:

Type I: price of milk = 3.50, children = 0, Income = 2, and College = 0; ? Type II: price of milk = 2.50, children = 4, Income = 8, and College = 1;

5.The marginal effect that milk prices have on the quantity of milk demanded by each of the “types” of individuals.

6.The impact that a $0.50 increase in the price of milk will have on the quantity of milk demanded in the population.

Question 7: Egg Demand (eggs.csv) (10 points)

Use eggs.csv that contains data on the number of eggs produced (in million) and the price (in cents per dozen) for 1990 and 1991.

1.Estimate the model yi = β0 + β1 · price + εi for the years 1990 and 1991 separately.

2.Pool the observations for the 2 years and estimate the pooled regression. What assumptions are you

making in pooling the data?

3.Use the fixed effects model, distinguishing the 2 years, and present the regression results.

4.Can you use the fixed effects model, distinguishing the 50 states? Why or why not?

5.Would it make sense to distinguish both the state effect and the year effect? If so, how many dummy variables would you have to introduce?

6.Would the random effects model be appropriate to model the production of eggs? Why or why not?

Question 8: Unemployment (unemployment.csv) (10 points)

File unemployment.csv gives data on the civilian unemployment rate (y in percent) and manufacturing hourly compensation in U.S. dollars (x, index, 1992 = 100) for Canada, the United Kingdom, and the United States for the period 19801999. Consider the model:

yit =β0 +β1 ·xit +εit

A priori, what is the expected relationship between Y and X? Why? Estimate the model for each country. Estimate the model, pooling all the 60 observations. Estimate the fixed effects model. Estimate the random effects model. Which is a better model? Justify your answer.

Question 9: Impeachment (impeach.csv) (10 points)

Using the data found in the file impeach.csv, carry out two Logit analyses of U.S. Senate votes concerning the impeachment of President Clinton. A description of this data, arranged by Alan Reifman from Texas Tech, is in the appendix.

4

In your first Logit, use Votes on the Article I perjury charge as your dependent variable and three independent variables: the Senator’s degree of ideological conservatism, the percent of the vote Clinton received in the 1996 Presidential election in each state, and the year each Senator’s seat is up and he/she must run for re- election. In your second Logit, drop out all insignificant variables to develop the final regression. Based upon your results, predict the probability that a Senator with the following characteristics would have voted to find President Clinton guilty of perjury:

Party: Democrat

Degree of ideological conservatism: 63

Percent of the vote Clinton received in the 1996 Presidential election: 45 ? The year the Senator’s seat is up and he/she must run for re-election: 2002 ? First-term senator: no

For each model, find the predicted probabilities based on the numbers abovecompare and contrast. Run a Linear Probability Model (LPM) with the same variables you used in your second logit and compare and contrast the results of both models (the LPM and the Second Logit). For the LPM:

Write out the regression equation

Complete hypothesis tests for t’s and F

Interpret R-squared and adjusted R-squared

Question 10: Electric Vehicles (evs.csv) (10 points)

The file evs.csv contains data about the choice of consumers with respect to alternative fuel vehicles. The variable “choices” represents the choice by the consumer for gasoline vehicles (choice = 1), conventional hybrids (choice = 2), plug-in hybrids (choice = 3), and electric vehicles (choice = 4). For each consumer, you have the following variables: age, suv (whether they are interested in buying a SUV), level2 (indicating whether people have a fast charger for electric cars in their community), own . . . (indicating whether the respondent currently has a gas, hybrid, plug- in hybrid, or battery electric vehicle), gender (1=female) and numcars (number of cars). The variables politics, edu, and income are coded as follows:

Income

– Under $15,000 (1)

– $15,000 to $24, 999 (2) – $25,000 to $34, 999 (3) – $35,000 to $49,999 (4)

– $50,000 to $74,999 (5)

– $75,000 to $99,999 (6)

– $100,000 to $149,999 (7) – $150,000 to $199,999 (8) – $200,000 to $249,000 (9)

5

– above $250,000 (10) ? Education

– Less than High School (1) – High School / GED (2)

– Some College (3)

– 2-year College Degree (4) – 4-year College Degree (5) – Masters Degree (6)

– Doctoral Degree (7) ? Politics

– Extremely Liberal (1) – Liberal (2)

– Slightly liberal (3)

– Moderate (4)

– Slightly conservative (5)

– Conservative (6)

– Extremely conservative (7) – Other (8)

– None (9)

Estimate a multinomial logit model that estimates the probability of a consumer to purchase a gasoline, hybrid, plug-in hybrid, or battery electric vehicle. Calculate the marginal probabilities as well.

6

Appendix: U.S. Senate Votes on Clinton Removal

On February 12, 1999, for only the second time in the nation’s history, the U.S. Senate voted on whether to remove a President, based on impeachment articles passed by the U.S. House. Dozens of political talk shows featured analyses of why Senators may have voted the way they did, but such discourse was rarely (if ever) informed by systematic statistical analysis of the votes. This dataset allows for such analysis. Further, the magnitude of this event should ensure that classroom students have some familiarity with it, making the dataset a nice one for illustrating statistical principles.

For each U.S. Senator, his or her votes on whether to remove President Clinton on each of the two articles of impeachment (plus a summary variable representing each Senator’s number of “guilty” votes) are provided, as well as each Senator’s values on several variables that could be predictive of vote (e.g., Senator’s degree of conservatism, how well Clinton did in the Senator’s state in the 1996 Presidential election). The description of the variables is as follows:

Name of senator

State (postal code)

Vote on Article I, Perjury: 0 = Not Guilty, 1 = Guilty

Vote on Article II, Obstruction of Justice: 0 = NG, 1 = G

Number of votes for guilt

Party: 0 = Democrat, 1 = Republican

Senator’s degree of ideological conservativism (0-100)

Percent of the vote Clinton received in the 1996 Presidential election in each state  The year each Senator’s seat is up and he/she must run for re-election (or retire)

First-term senator? 0 = no, 1 = yes

Name of Senator was limited to eight characters, so some names are cut off. Also, because multiple Senators often have the same or similar last names, nicknames were sometimes created to avoid confusion. For example, there is both a Tim Hutchinson and a Kay Bailey Hutchison; the former is referred to as “timhutch” and the latter as “kaybhut.” Each Senator’s degree of ideological conservativism is based on 1997 voting records as judged by the American Conservative Union, where 100 is most conservative.


版权所有:留学生编程辅导网 2020 All Rights Reserved 联系方式:QQ:99515681 微信:codinghelp 电子信箱:99515681@qq.com
免责声明:本站部分内容从网络整理而来,只供参考!如有版权问题可联系本站删除。 站长地图

python代写
微信客服:codinghelp