ECON 570语言编程代做、代写c++，Java程序设计、Python代写编程代做-代写Python编程

联系方式

QQ：99515681
邮箱：99515681@qq.com
工作时间：8:00-23:00
微信：codinghelp

您当前位置：首页 >> Python编程Python编程

ECON 570语言编程代做、代写c++，Java程序设计、Python代写编程代做

日期：2020-11-08 08:01

ECON 570 Problem Set 3

Due: November 13, 2020

1 Lalonde NSW Data

A. Load the Lalonde experimental dataset with the lalonde_data method from the module

causalinference.utils. The outcome variable is earnings in 1978, and the covariates

are, in order:

Black Indicator variable; 1 if Black, 0 otherwise.

Hispanic Indicator variable; 1 if Hispanic, 0 otherwise.

Age Age in years.

Married Marital status; 1 if married, 0 otherwise.

Nodegree Indicator variable; 1 if no degree, 0 otherwise.

Education Years of education.

E74 Earnings in 1974.

U74 Unemployment status in 1974; 1 if unemployed, 0 otherwise.

E75 Earnings in 1975.

U75 Unemployment status in 1975; 1 if unemployed, 0 otherwise.

Using CausalModel from the module causalinference, provide summary statistics

for the outcome variable and the covariates. Which covariate has the largest normalized

difference?

B. Estimate the propensity score using the selection algorithm est_propensity_s. In

selecting the basic covariates set, specify E74, U74, E75, and U75. What are the additional

linear terms and second-order terms that were selected by the algorithm?

C. Trim the sample using trim_s to get rid of observations with extreme propensity score

values. What is the cut-off that is selected? How many observations are dropped as a

result?

D. Stratify the sample using stratify_s. How many propensity bins are created? Report

the summary statistics for each bin.

E. Estimate the average treatment effect using OLS, blocking, and matching. For matching,

set the number of matches to 2 and adjust for bias. How much do the estimates

differ?

2 Document Classification

A. From the module sklearn.datasets, load the training data set using the method

fetch_20newsgroups. This dataset comprises around 18000 newsgroups posts on 20

topics. Print out a couple sample posts and list out all the topic names.

B. Convert the posts (blobs of texts) into bag-of-word vectors. What is the dimensionality

of these vectors? That is, what is the number of words that have appeared in this data

set?

C. Use your favorite dimensionality reduction technique to compress these vectors into

ones of K = 30 dimensions.

D. Use your favorite supervised learning model to train a model that tries to predict the

topic of a post from the vectorized representation of the post you obtained in the

previous step.

E. Use the test data to tune your model. Make sure to include K as a hyperparameter as

well. Use accuracy_score from sklearn.metrics as your evaluation metric. What

is the highest accuracy you are able to achieve?

【返回顶部】【打印本稿】【关闭本页】

【上一篇】：代写COMP2511、代做Python，c/c++，Java程序语言

【下一篇】：代写COMP2511、代做Python，c/c++，Java程序语言

联系方式

最新辅导

热门辅导

您当前位置：首页 >> Python编程Python编程

ECON 570语言编程代做、代写c++，Java程序设计、Python代写编程代做

日期：2020-11-08 08:01

相关文章