data留学生编程代做、代写Java，CS，Python编程语言-代写Java编程

联系方式

QQ：99515681
邮箱：99515681@qq.com
工作时间：8:00-23:00
微信：codinghelp

您当前位置：首页 >> Java编程Java编程

data留学生编程代做、代写Java，CS，Python编程语言

日期：2021-01-29 11:43

3. In this problem, you are required to use spark.ml API. As in Problem 2, consider 3

objects:

(1) The first object, denoted by OA, is a ball centered at (0, 0, 0) of radius 1. As a

set of points, we write OA = {(x, y, z) | x

2 + y

2 + z

2 ≤ 1}.

(2) The second object, denoted by OB, is a cylinder defined by OB = {(x, y, z) |

2 + y

2 ≤ 4, 2 ≤ z ≤ 4}.

(3) The third object, denoted by OC, is an ellipsoid

OC = {(x, y, z) |(x − 2)2

Note that OA overlaps with OC a little bit.

Create a dataset in the following way:

(1) Each record in the dataset corresponds to a point contained in the union of OA,

OB and OC, which has a “features” part which is made of the xyz coordinates

of that point and a “label” part which tells which of OA, OB or OC this point

is contained in. Note that since OA ∩ OC is nonempty, if the point happens to

locate in OA ∩ OC, you still can only label it as OA or OC, but not both.

(2) The dataset you create should contain at least 500000 records. You should generate

the records randomly in the following way:

i. Each time, choose OA, OB or OC randomly. Suppose we choose OX (X is A,

B or C).

ii. Randomly create a point P contained in OX (think of how to do it). Now

the features of the newly created record is the coordinates of P and the

corresponding label is “OX”.

iii. After creating all the records, you should load and transform the dataset to

a spark Dataframe.

You are required to do the following work.

(1) Do classifications using both logistic regression and decision tree classifier. You

should try several different training/test split ratio on your dataset and for each

trained model, evaluate your model and show the accuracy of the test.

(2) Use K-means clustering to make cluster analysis on your data. Now only the

“feature” part of your data matters. Set the number K of clusters to 2, 3 and 4

respectively and make a comparison. Show the location of the centroids for each

case.

(3) Provide a visualization of the results of your classifications and cluster analysis.

In your report, you should provide both your codes and your demonstration of the

results. Take screenshots whenever necessary.

【返回顶部】【打印本稿】【关闭本页】

【上一篇】：Programming程序设计代做、Java编程调试、Java代写程序代写

【下一篇】：Programming程序设计代做、Java编程调试、Java代写程序代写

联系方式

最新辅导

热门辅导

您当前位置：首页 >> Java编程Java编程

data留学生编程代做、代写Java，CS，Python编程语言

日期：2021-01-29 11:43

相关文章