联系方式

  • QQ:99515681
  • 邮箱:99515681@qq.com
  • 工作时间:8:00-23:00
  • 微信:codinghelp

您当前位置:首页 >> Java编程Java编程

日期:2022-11-01 08:37

SOC 360 Statistics for Sociologists 1

Data Analysis Project

Part 1

For this project, you’ll use the methods of statistical analysis we’ve learned to analyze real social

relations. The full project will be completed over the course of the term and will be submitted in two

parts. You’ll develop a research question and hypothesis about the univariate and bivariate distributions of

two variables, using appropriate numerical and graphical methods to present your findings. Later, you’ll

use inferential statistics to assess the probability that the observed relationship between the variables is

due only to chance. We’re primarily interested in assessing whether you can properly present and

interpret the results of statistical analysis and write clearly about them.

Essentially, there are four types of DAP that you might conduct; you are only required to carry out one

of them:

1. a quantitative predictor and quantitative outcome

2. a categorical predictor and outcome

3. a quantitative predictor and categorical outcome

4. a categorical predictor and quantitative outcome.

The first two methods are simpler and easier to conduct with the tools we’ve developed, but you are

welcome to try your hand at any of the four types. A detailed tutorial for each type, each of which is

roughly an hour (though I speak very slowly and you may be able to watch it in 40 minutes or less

depending on how much you speed it up), is available on Youtube. These walk through a do-file with

detailed, clear commentary. You are also more than welcome to ask your TA or the lecturer for support!

Please limit the number of pages for the first part of the assignment to eight (8) double-spaced pages

with 12-point font. Contact us if you foresee any problems in sticking to this limit.

INSTRUCTIONS FOR DAP I

Step 1: Exploring the data

For this paper, we’ll use the General Social Survey (GSS) data or Current Population Survey (CPS) data

that you have been using in section, unless you’ve worked with other sets in other courses and want to

use them—if so, reach out to us! Graduate students should generally try to come up with data-sets that

are directly relevant to them; ask Griffin or Sanghyo for help. The CPS data are more limited, but the

economic data are better and more like true quantitative variables than the GSS economic data;

conversely, the GSS data include many more questions. In each data-set, going to the variable window of

Stata and typing the first few letters of a keyword you’re interested in brings up relevant variables.

The GSS codebook, found here (http://gss.norc.org/documents/codebook/gss_codebook.pdf), describes

the many variables that you can find in the GSS. While the codebook is very large, you can use the

CTRL+f function on your computer to search quickly through the initial listing of the variables to find

your variable of interest (it’s best to have a rough idea in mind of what you’re looking for first, though);

then, continue scrolling through the hits until you get to that page of the codebook. Alternatively, on page

1 of the document (and page 12 of the file), the list of all variables begins. Note that not all questions

described are asked about every year, so you’ll want to stick to questions that are asked in your year.

For CPS data, we are technically using an extract from 2019 put together by the Center for Economic and

Policy Research (CEPR), which is described here. Documentation for this data-set is a bit more

complicated, but on the other hand, the set of variables is smaller, and it is usually clear what they mean.

Select the following…

1. An outcome variable whose variation interests you, such as income or education – for this part,

it is best to select variables that we think of as generally dependent on other sociological variables

(such as class or ascribed race).

2. A variable which you think might have some causal influence on the first sort of variable.

For the variables you have identified, do all of the following:

1. Formulate a research question and hypothesis about a) how the values of the variables are

distributed in the population, and b) how the outcome variable might relate to the predictor

variable.

a. E.g., a) on balance, most people in the US either have a college education or less than a

12th grade education, as do their fathers and b) their education level (educ) is1 2

positively related to their father’s education level (paeduc).3

b. Consider whether any of your variables might need some basic transformations

(especially the creation of a dummy variable); if so, create one.

2. Present basic descriptive statistics of the variable in nice-looking, titled tables, taking care

to show multiple measures of both central tendency and spread. Look for outliers and consider

the reason for them, commenting on whether they are simply extreme values or whether they

might be data errors. Make sure to avoid including measures that are not meaningful for the

type of variables which you have.

3. Present graphical analyses of each variable’s individual distribution. You can present the

distribution of each variable in multiple different ways; just make sure not to present data in a

way that does not make sense (e.g., a histogram of race would be much less informative than a

bar graph of the same, whereas a pie graph of income would be much less useful than a

histogram). Describe the meaning of each figure you use; you need at least one per variable.

4. Present a bivariate numerical analysis of the variables, again making sure to put the relevant

numbers into a nicely-formatted table. For quantitative variables, report all of the relevant

regression output; for qualitative variables, explain the meaning of a two-way table.

5. Present a bivariate graphical analysis of that relationship.

3 You can, for now, take this survey data as basically representative of the population at large and not worry about the

problem of sampling. The second half of the course, and the second part of the data project, will put this assumption

into question and lead us into statistical inference proper.

2 This typeface indicates the name of a variable in Stata or a command in Stata. “Educ” is how the GSS labels the

respondent’s education, whereas “paeduc” is father’s education.

1 This is a hypothesis which we would have evidence for rejecting; it’s just an example.

Step 2: Writing the report

After you complete the analysis, you are ready to begin writing your paper. This data analysis project

should be written in narrative form (full sentences and paragraphs) and should include …

1. Introduction

a. Give the research question you are trying to answer in this report (distribution of

variables of interest and their relationship).

b. State your research hypotheses – for both the univariate distribution and the relationship

between one variable and another — and justify them with some brief social analysis.

c. Discuss the relevance or importance of your research question (i.e., why should your

readers care about this) in a paragraph or so.

2. Data & methods

a. Briefly describe your data and its survey design.

b. Discuss the operationalization of your variables. What type of variable is it? How did

the researchers measure it?

i. Make sure to consider some potential weaknesses of this measure of your

variable. (Note: for some variables, the GSS variable will be a relatively

unproblematic measure of your theoretical construct, while in other cases, there

may be significant gaps – that is OK. Just report it.).

c. Discuss briefly the numeric and graphic methods you used for analysis. You can

simply refer to the results later in a table; here, you should focus less on the particular

results and more on why you chose, say, regression or a two-way table.

3. Univariate analysis

a. Present —report but also discuss — the contents of the key tables of univariate

descriptive statistics with appropriate titles, effective rounding, source cited, etc. Do the

same with the relevant graphs. The question of which measures are appropriate to

include will depend on the type of variables you select; be as comprehensive as you can

without being loquacious.

b. Comment on the relationship to your hypothesis: is this what you expected?

4. Bivariate analysis

a. Present —report but also discuss — the contents of the key tables of bivariate

descriptive statistics with appropriate titles, effective rounding, source cited, etc. Do the

same with the relevant graphs. Again, the question of which measures are appropriate to

include will depend on the type of variables you select; be as comprehensive as you can

without being loquacious.

b. Comment on the relationship to your hypothesis: is this what you expected?

5. Conclusion

a. Assess your hypothesis in the light of your findings. Does your hypothesis seem to have

evidence in favor of it or against it? Again, you do not need to currently worry about the

problem of statistical inference (are these survey data generalizable) – although you

should be aware of this problem. Are there limits on your form of analysis?

b. If your hypothesis is supported, what further questions do you have about the

findings? If your hypothesis is not supported, ruminate on why this is. Is the

hypothesis flawed? Were your measures not up to the task of testing it as rigorously as

you’d like? What direction should additional research on this subject take?

Data Project Part 1 Rubric

DAP I is due on Sunday, October 30th by 11:59pm as a .doc file uploaded to Canvas. The report should

be submitted as a single Word file (with graphs included in the body of the text). The document should

have 1-inch margins and the text should be double spaced and in 12-point font.

Section/Category Task Possible Points

Introduction (1) Research question clearly stated and hypothesis and

reasoning for hypothesis and relevance discussed

1

Data & methods (1.5) Brief description of the data-set 0.5

Pros and cons of the variable measure are discussed 0.5

Methodological choices are justified 0.5

Univariate analysis (3) Graphs are included & properly labeled / interpreted 1.5

Tables are included & properly labeled / interpreted 1.5

Bivariate analysis (3) Graphs are included & properly labeled / interpreted 1.5

Tables are included & properly labeled / interpreted 1.5

Conclusion (0.5) Makes a reasonable conclusion referring to the

original hypothesis and considers further questions

0.5

Style/grammar (1) Report written in complete sentences, full paragraphs,

nicely labeled sections, relatively few grammatical errors,

12pt font, double spaced with 1 inch margins


相关文章

版权所有:留学生编程辅导网 2020 All Rights Reserved 联系方式:QQ:99515681 微信:codinghelp 电子信箱:99515681@qq.com
免责声明:本站部分内容从网络整理而来,只供参考!如有版权问题可联系本站删除。 站长地图

python代写
微信客服:codinghelp