IS542留学生代做、代写pdf format、代做R语言、R设计代写-代写Algorithm 算法作业

联系方式

QQ：99515681
邮箱：99515681@qq.com
工作时间：8:00-23:00
微信：codinghelp

您当前位置：首页 >> Algorithm 算法作业Algorithm 算法作业

IS542留学生代做、代写pdf format、代做R语言、R设计代写

日期：2018-12-15 10:45

Fall 2018 IS542 Final

Due Tuesday December 18, 5:00PM US Central Time

Discuss two or more of the following questions, in your own words. You may choose to address any two,

three, four, or even all questions but should target 3-4 pages of text in total (not counting figures, tables,

and references). Upload your answers to the final section of the class Moodle page as a single narrative

document in pdf format. You may, and are encouraged to, illustrate your answers using R, but that's no

substitute for lucid natural language explanations. To preserve the natural flow of the narrative, figures

and tables should be embedded into the document near their first mention. Any supplementary files like

code or data should be referenced in the text and separately uploaded. You may use books, articles, notes,

search engines, or computers, but may not solicit or receive direct assistance from other human beings.

Cite sources if you use them. For the first three question you may want to illustrate technical detail using

R, discuss practical aspects that are important for applications, and theoretical aspects of the subject.

Question 1. Construct a dataset with at least 8 observations and 3 variables (y, x1, and x2) such that least

squares linear regression of y versus x1 produces y = - 2x1 + e1 and regressing y versus x1 and x2

produces y = 2x1 - x2 + e2. How might you interpret the relationship between y and x1? Show your work

in R.

Question 2. Write a short essay, in your own words, explaining the four assumptions of linear regression

and show how to test them on a dataset of your choice. Show your work in R.

Question 3. Write a short essay, in your own words, on the subject of the Bayes theorem illustrate its use

in an application of your making.

Question 4. R challenge. During the last class session we worked with the circle.arff dataset, assessing

the cross-validated performance of a wide variety of classification algorithms such as decision trees,

random forest, rules, support vector machine, Na?ve Bayes, Bayes Net, logistic regression, neural net, knearest

neighbor, and boosting. Replicate some of these experiments using R.

http://abel.lis.illinois.edu/data/circle.arff

Question 5. R challenge: The data directory contains a file with author names and associated Ethnea and

Genni predictions. Use logistic regression to identify character n-grams of first and/or last names that may

help predict the Ethnea categories. It might be helpful to install and use an R package such as tm that is

able to extract character n-grams. Classification performance can be assessed using precision and recall

for each ethnicity Ethnea category, and classes that are the most similar can be identified using the

confusion matrix.

Full dataset:

http://abel.ischool.illinois.edu/data/names_ethnea_genni_country.csv

Of which a smaller, random sample is given here:

http://abel.ischool.illinois.edu/data/names_ethnea_genni_country_sample.csv

References:

Torvik VI, Agarwal S. Ethnea -- an instance-based ethnicity classifier based on geo-coded author names in a largescale

bibliographic database. International Symposium on Science of Science March 22-23, 2016 - Library of

Congress, Washington DC, USA http://hdl.handle.net/2142/88927

【返回顶部】【打印本稿】【关闭本页】

【上一篇】：代写MATH 475、代做R程序设计、代写linear regression、代做R编程

【下一篇】：代写MATH 475、代做R程序设计、代写linear regression、代做R编程

联系方式

最新辅导

热门辅导

您当前位置：首页 >> Algorithm 算法作业Algorithm 算法作业

IS542留学生代做、代写pdf format、代做R语言、R设计代写

日期：2018-12-15 10:45

相关文章