联系方式

  • QQ:99515681
  • 邮箱:99515681@qq.com
  • 工作时间:8:00-23:00
  • 微信:codinghelp

您当前位置:首页 >> Algorithm 算法作业Algorithm 算法作业

日期:2019-11-18 09:30

Assignment #4

Course: ISA 414

Points:100

Due date: November 18th

, 2019, before 11:59 pm

Submission instructions: this assignment is to be done individually. All your

answers should be in a single R script. Your code must be well formulated (i.e., no

errors) and sound (i.e., it does what the question asks it to do). In particular, the

grader must be able to open your .R file using RStudio and run the code without

running into errors. Code with errors may receive zero points. Submit the final

document on Canvas before the due date.

Question 1: suppose you are responsible for developing the code that computes the

number of times each word appears on Twitter every day. One can use these

frequencies, for example, as input when calculating the daily trending words.

Clearly, Twitter’s data are massive. Being an expert in Hadoop, you quickly realize

that you can use MapReduce to complete your task. I highly encourage you to use

Remote Desktop Connection to complete this question since all of the required

libraries are already installed there, and these libraries are not straightforward to

install. Also, make sure you set the version of R to 3.4.3.

a) To test your solution, you will be working with a sample of Twitter’s data. Start

by loading the file tweets_asst_4.csv (available on Canvas) to R using the

read.csv command (remember to set the argument stringAsFactors = FALSE).

Next, upload the resulting data frame to HDFS using the command to.dfs. [10

points]

b) Define a map function to solve your task. Hint: you might want to consider keys

created by combining (“pasting”) the date a tweet was tweeted with each word in

the tweet. For example, for a tweet tweeted on April 10th, 2017 containing the

word “spider”, a possible key returned by the map function would be 2017-04-

10_spider. [20 points]

c) Define a reduce function that counts the number of times each word appears on

Twitter per day. Hint: see the reduce function in the word-counting example we

covered in class. [10 points]

d) Run the mapreduce function using the data in a), the map function in b), and the

reduce function in c). Thereafter, retrieve the final output from HDFS and display

the same as a data frame (table). [10 points]

Question 2 – real-life case study: bol.com

A 2015 study sponsored by the Dutch electronic-commerce company bol.com, led

by Arthur Carvalho (previously: Rotterdam School of Management – Erasmus

University; currently: Farmer School of Business – Miami University) and Esther

Hundepool (PwC), investigated some of the factors that affect customers’

willingness-to-buy in B2C e-commerce environments. The case below is an

adaptation of the above study.

Business Understanding:

Over the past 20 years, the Internet has changed the way consumers buy goods and/or

services. Ranging from groceries to vacation packages and clothing, more and more

people are using the Internet to shop online. The online selling of products and/or

services by businesses to consumers is often defined as business-to-consumer (B2C)

electronic commerce (e-commerce).

E-commerce makes up a big share of the retail industry, often providing more

product choices and faster delivery time than "bricks-and-mortar" retailers do. The

transactions related to B2C e-commerce in Western Europe totaled 177.7 billion

euros in 2013, an increase of 12 percent when compared to the previous year.

Another interesting fact is that 95 million consumers in Western Europe bought

goods and/or services online in 2013. The total e-commerce sales in the United

States amounted to 1,233 billion US dollars in 2013. It is clear that e-commerce is a

booming business, which creates an extensive array of research opportunities, e.g.,

understanding the factors that influence customers' willingness-to-buy in B2C ecommerce

environments.

One can argue that trust perception is one of the biggest barriers for consumers to

engage in electronic commerce. A potential lack of trust will likely discourage

consumers to participate in online shopping. Therefore, it is interesting to study how

to manage trust in e-commerce environments as well as to study the influence of

different types of trust on consumers' willingness-to-buy online.

In addition to trust perception, risk perception can be another challenging factor in

e-commerce. Different types of risk perception are likely to influence consumers'

attitude towards online transactions.

Finally, consumers' demographic traits might also be of influence when it comes to

online shopping behavior.

The goal of this study is to investigate the variables that either positively or

negatively significantly influence customers’ willingness-to-buy in B2C ecommerce

environments. Following the above background sketch, one can

formulate the underlying business problem as:

What are the determinants of customers' willingness-to-buy in B2C e-commerce

environments?

In particular, this study aims at measuring the effects of perceived risk and perceived

trust on consumers' willingness-to-buy online. As e-commerce sales are expected to

continue growing over the years, understanding these factors, and how to effectively

deal with them, will play a crucial role in online strategies of companies engaging

in e-commerce.

Data Understanding:

The data in this study were collected by means of an electronic survey developed in

partnership with PwC and bol.com. To illustrate the process of online shopping, the

survey started by showing the respondents a 5-minute video containing an actual

browsing and shopping behavior on bol.com, the number one online retailer in the

Netherlands. Specifically, after exhibiting some features of the website, the video

showed a search for and a purchase of a digital camera.

When the video was over, the survey showed a web page from bol.com containing

a detailed description of the purchased camera. Following the video and product

description, the survey measured three dimensions of perceived risk and three

dimensions of perceived trust using five question-items per dimension. The six

dimensions are: Perceived Product Risk (PPR), Perceived Informational Risk (PIR),

Perceived Economic Risk (PER), Perceived Integrity (PI), Perceived Safety (PS),

and Perceived Benevolence (PB).

Next, the survey measured the main dimension of interest, Willingness-to-Buy

(WTB), using five question-items. All the question-items used a 0-100 scale. Think

about a chosen scale-value as the likelihood (represented in percentage values) that

the respondent agrees with a statement in the question-item. At the end, the survey

collected demographic information, such as respondents' age, income, and gender.

The survey was available from March 17th, 2015 to April 18th, 2015. We invited

participants via social networks and by sending emails to subject pools from

Rotterdam School of Management at Erasmus University, and the office of the

company PricewaterhouseCoopers (PwC) located in Rotterdam (the Netherlands).

In total, 360 participants started the survey.

After the data collection phase, we prepared the resulting data set for posterior

analysis by removing all incomplete survey responses, which resulted in a total of

199 full observations in the data set, a completion rate of 55.27%. We show below

the structure of the survey we used to collect data (translated from Dutch):

? Perceived Product Risk (PPR)

- PPR_1: I think this product will perform as expected.

- PPR_2: The product purchased will likely not perform as expected.

- PPR_3: I think it is difficult to judge the quality of this product adequately.

- PPR_4: In case of a product purchase on this website, it is likely to fail the

performance requirements originally intended.

- PPR_5: I believe the likelihood is high that something is wrong with the

performance of this product.

? Perceived Informational Risk (PIR)

- PIR_1: It is clear to me whether Bol.com intends to give my personal

information to third parties.

- PIR_2: I believe this website will protect my personal information from

exposure to third parties.

- PIR_3: I believe Bol.com does not intend to misuse the personal

information provided by me.

- PIR_4: I believe Bol.com will protect and store my personal information

correctly.

- PIR_5: I believe Bol.com is likely to misuse my personal information.

? Perceived Economic Risk (PER)

- PER_1: Purchasing from this website would involve economic risk (fraud,

hard to return).

- PER_2: I believe I can return this product and get a refund easily.

- PER_3: I believe there is a high chance that I stand to lose money if I

purchase this product.

- PER_4: When I purchase this item from Bol.com I have the chance of

financial loss.

- PER_5: I believe there is a great chance I do not receive the intended

product.

? Perceived Integrity (PI)

- PI_1: Bol.com acts sincere in dealing with their customers.

- PI_2: I believe this online shop is honest to their customers.

- PI_3: I believe Bol.com would keep its promise.

- PI_4: I would characterize Bol.com as honest.

- PI_5: Bol.com acts truthful in dealing with their customers.

? Perceived Safety (PS)

- PS_1: I believe this online shop has sufficient technical capacity to ensure

my data cannot be intercepted by hackers.

- PS_2: I believe this online shop shows great concern for the security of

any of the transactions.

- PS_3: I think this online shop has mechanisms to ensure the safe

transmission of my information.

- PS_4: I believe to have a safe transaction when purchasing from Bol.com.

- PS_5: Purchasing from this online shop is safe.

? Perceived Benevolence (PB)

- PB_1: When problems occur, I believe this website will be prepared to

solve my problems.

- PB_2: In case of a problem, I believe it will be easy to report a complaint

to this website.

- PB_3: I believe, when required, Bol.com would do its best to offer help.

- PB_4: In case of a problem, I believe this website will make all the

necessary efforts to solve it.

- PB_5: I believe this online shop keeps the well-being of the consumer

needs in mind.

? Willingness to Buy (WTB)

- WTB_1: The likelihood that I would shop at this online shop is high.

- WTB_2: I would consider buying this product at this price.

- WTB_3: I would be willing to recommend this online shop to friends.

- WTB_4: I would be willing to buy at this online shop.

- WTB_5: It is likely that I will purchase at this online shop.

? Demographics:

- Gender: What is your gender?

? Male

? Female

- Age: What is your age?

? Below 18 years old

? Between 18 and 25 years old

? Between 26 and 35 years old

? Between 36 and 45 years old

? Between 46 and 55 years old

? Above 55 years old

- Income: What is your current yearly income?

? Less than $20.000

? Between $20.000 and $35.000

? Between $35.000 and $50.000

? Between $50.000 and $65.000

? More than $65.000

? I prefer not to say

Data Preparation:

It is now time to analyze our data in order to provide an answer to the business

problem. From now on, you will be using the Spark technology in conjunction with

R programming language. I highly encourage you to use Remote Desktop

Connection to complete this question. Make sure you set the version of R to 3.6.1.

Then, run the following commands to install the required libraries:

install.packages("sparklyr")

spark_install(version = "2.0.2")

a) Start by downloading the data set bol.csv from Canvas. Next, run the following

commands to load the data locally, connect to a Spark cluster, and send the survey

data to the Spark cluster. [0 points]

library("sparklyr")

library("dplyr")

survey_data <- read.csv("bol.csv")

sc <- spark_connect(master = "local", version = "2.0.2")

survey_tbl <- copy_to(sc, survey_data, "survey", overwrite = TRUE)

Unless otherwise stated, all the following questions must be answered with code that

is executed on the Spark cluster. You should expect to use functions from the R

package dplyr in conjunction with Spark.

b) Note that the scales of PPR_1, PIR_5, and PER_2 are different from the scales of

the other items in their dimensions (constructs). For example, the scale of PPR_1 is

increasing in positivity, whereas the scales of PPR_2, PPR_3, PPR_4, and PPR_5

are decreasing in positivity. Hence, you have to transform the scales for the sake of

consistency. The goal of this preprocessing step is to have all risk-related variables

using scales in increasing negativity, and all trust-related variables using scales in

increasing positivity. To do so, transform (mutate) the variables PPR_1, PIR_1,

PIR_2, PIR_3, PIR_4, and PER_2 by subtracting their original values from 100, e.g.,

the new values of PPR_1 must be equal to 100 minus the old values. These

transformations should change the data set in the Spark cluster. [10 points]

c) After fixing the scales, it is now time to create our variables. Remember that we

measured each risk and trust dimensions using five question-items. Since the

question-items are highly subjective, one should expect that the respondents’

answers contain some “random component”. A common approach to eliminate some

of this “randomness” is by averaging the values of the question-items across each

dimension. In practice, one would have to perform reliability analysis and check for

internal consistency before doing so (e.g., performing a confirmatory factor analysis

and calculating Cronbach’s alpha), but this is beyond the scope of this assignment.

Using the mutate function from dplyr, add the following features to the data set in

the Spark cluster: [10 points]

PPR = (PPR_1 + PPR_2 + PPR_3 + PPR_4 + PPR_5)/5

PIR = (PIR_1 + PIR_2 + PIR_3 + PIR_4 + PIR_5)/5

PER = (PER_1 + PER_2 + PER_3 + PER_4 + PER_5)/5

PI = (PI_1 + PI_2 + PI_3 + PI_4 + PI_5)/5

PS = (PS_1 + PS_2 + PS_3 + PS_4 + PS_5)/5

PB = (PB_1 + PB_2 + PB_3 + PB_4 + PB_5)/5

WTB = (WTB_1 + WTB_2 + WTB_3 + WTB_4 + WTB_5)/5

Data Modeling:

d) Next, you will build an explanatory model that tries to relate the risk and trust

dimensions to willingness-to-buy. To simplify the analysis, ignore the demographic

variables in the data set. Using the ml_linear_regression function from the sparklyr

package, build a linear regression model where the dependent variable is WTB and

the independent variables are PPR, PIR, PER, PI, PS, and PB. Apply the summary

function to your model to retrieve coefficients and associated p-values. [10 points]

Conclusion:

e) Given the coefficients and p-values from above, which actions would you suggest

bol.com to take to increase consumers’ willingness-to-buy? List and carefully

explain at least three features that bol.com could add to its website to alleviate some

significant risk and trust perception issues, e.g., money back guarantees to reduce

perceived economic risks, online reviews to decrease perceive product risk, etc.

(sloppy answers will receive zero points) [20 points]


版权所有:留学生编程辅导网 2020 All Rights Reserved 联系方式:QQ:99515681 微信:codinghelp 电子信箱:99515681@qq.com
免责声明:本站部分内容从网络整理而来,只供参考!如有版权问题可联系本站删除。 站长地图

python代写
微信客服:codinghelp