联系方式

  • QQ:99515681
  • 邮箱:99515681@qq.com
  • 工作时间:8:00-23:00
  • 微信:codinghelp

您当前位置:首页 >> Algorithm 算法作业Algorithm 算法作业

日期:2019-11-13 05:08

STAT 432: Analysis 03

Directions

IMRAD and Quiz Due: Monday, November 18, 3:00 PM

Reflection Due: Thursday, November 21, 3:00 PM

Analysis Goal

Use the given data to detect the credit card fraud. There are three tasks associated with

this analysis:

Quiz Subset Data: cc-sub.csv (data/cc-sub.csv)

Full Data: creditcard.csv.gz (https://stat432.org/data/creditcard.csv.gz)

Use:

readr::read_csv("https://stat432.org/data/creditcard.csv.gz")

to read the full data directly.

Source: Kaggle - Credit Card Fraud Detection (https://www.kaggle.com/mlgulb/creditcardfraud)

Please refer to the source documentation for information on data collect and a data

dictionary. The response was altered.

0 is now labeled genuine

1 is now labeled fraud

The following code is available to show how the quiz data was created, but please use

the .csv linked above. If you choose to use the full data, you will need to run the line

below that alters the response from 0 and 1 to genuine and fraud , unless you

prefer 0 and 1 .

# load pacakges

library("tidyverse")

library("caret")

library("gbm")

library("ROSE")

# extract file obtained from Kaggle

# https://www.kaggle.com/mlg-ulb/creditcardfraud

untar("creditcardfraud.zip")

# create remote readable compressed file

system("gzip creditcard.csv")

# from gz file

cc = read_csv("creditcard.csv.gz")

# verify data

nrow(cc) == 284807

# make response a factor with names instead of numbers

cc$Class = factor(ifelse(cc$Class == 0, "genuine", "fraud"))

# subset for efficiency and PL

set.seed(42)

sub_idx = sample(nrow(cc), size = 50000)

cc_sub = cc[sub_idx, ]

# write subset to disk

write_csv(cc_sub, "cc-sub.csv")

IMRAD

For this analysis, do the following:

Analyze the data however your please! You should keep the stated goal in mind,

but you may define a more specific goal that you are working towards.

Write a report in R Markdown using the IMRAD

(https://en.wikipedia.org/wiki/IMRAD) template

(https://www.cmu.edu/gcc/handouts-and-resources/handouts/imrd.pdf).

Write an abstract.

Write an introduction.

Write an methods section.

Write an results section.

Write an discussion.

IMRAD Submission

Submit a .zip file to Compass that contains:

A .Rmd file that is your IMRAD.

This file should be written assuming that it is in the same folder as a folder

called /data/ which contains heart-disease.csv .

Hint: Create an RStudio Project.

A .html file that is the result of knitting your .Rmd file.

The zip file should contain no other files. (Whether or not these two files are within

another folder does not matter.)

Submit your .zip file to the correct assignment on Compass2g. You are granted an

unlimited number of submissions. Only your final submission will be graded.

R Environment

We assume that your R , R packages, and RStudio are all up-to-date. (Or at least as

recent as the versions found on RStudio Cloud.) You’ve been warned.

R Style

Your code will be graded based on its style. We don’t expect you to have a mature

coding style, so we have a list of rules which must be followed.

The following will be explicitly checked for in your code:

All commas must be followed by a space. (Additionally, commas should never be

preceded by a space.)

Infix operators ( == , + , - , <- , etc.) should always be surrounded by spaces.

(https://style.tidyverse.org/syntax.html#infix-operators)

Exceptions: : , :: , $ , [ , [[ , ] , ]]

^ : Use x ^ 2 instead of x^2 .

Use a consistent assignment operator. Either <- or = , not both.

If you choose to use the <- operator, you will need to replace the =

operator in the given code.

Do not use T or F .

Do not use absolute paths.

Do not use semicolons, ; .

Do not use the attach() function.

No more than one newline (blank line) in a row in an R Markdown document.

No more than one newline (blank line) in a row in an R chunk.

A newline before and after each chunk in an R Markdown document.

No newline to start a chunk. No newline at end of chunk. (The first and last line of

each chunk should contain code, or a comment for the first line.)

A newline at the end of the file.

The following are suggested, but will not be directly assessed:

Variable and function names (that are user created) should only contain lowercase

letters, numbers, and underscores.

Do not use periods, . , or capital letters in variable and function names.

Opening (left) curly braces should not be on their own line.

A space should precede a parenthesis, except in a function call.

Good: for (i in 1:10)

Bad: for(i in 1:10)

Good: mean(x)

Bad: mean (x)

Except for the first, argument names should be written in function calls.

(Exception for the predict() function.)

This will be encouraged, but not enforced, as often it is better to make a

judgement call than to be dogmatic.

Much of this is derived from the tidyverse style guide (https://style.tidyverse.org). If

you follow the tidyverse guide, be aware of our use of ^ and = .

Analysis Quiz

There will be a PL quiz associated with this analysis to check some of the “objective”

numeric results of your analysis.

Self Assessment

After submission of the analysis, an example “solution” will be released. In addition, a

set of reflection questions will be released. By comparing your submitted analysis to the

“solution” together with the reflection questions, you will write a short self-assessment of

you analysis.

Specifics on formatting, submission, etc, will be released after the analysis is due.

Grading

This analysis is worth a total of 10 points.

IMRAD .Rmd and .html (4 points)

Code Style in .Rmd (0 - 1 - 2)

Eye Test of .html (0 - 1 - 2)

Does it look like you’ve done an analysis? Is it reasonably formatted?

Analysis Quiz (4 points)

Reflection (2 points)

Failure to submit the correct files will results in 0 points for the IMRAD.

Quiz grading will be similar to regular quizzes.

Grading of the self reflection will largely be based on completion. A template will be

provided after submission of the analysis.

Late Policy

The late policy will apply to each individual task. See above for due dates.

Late submissions for both will be accepted up to 48 hours after the initial deadline.

Up to 24 hours late, the assignment will incur a 10 percent reduction.

Up to 48 hours late, the assignment will incur a 30 percent reduction.

No exceptions! Start early and make sure your environment is working correctly

and you are able to produce a working document.

If you submit multiple attempts, the final attempt will be graded. If your first submission

is on time, but your final submission is late, you will incur the late submission penalty.


版权所有:留学生编程辅导网 2020 All Rights Reserved 联系方式:QQ:99515681 微信:codinghelp 电子信箱:99515681@qq.com
免责声明:本站部分内容从网络整理而来,只供参考!如有版权问题可联系本站删除。 站长地图

python代写
微信客服:codinghelp