联系方式

  • QQ:99515681
  • 邮箱:99515681@qq.com
  • 工作时间:8:00-23:00
  • 微信:codinghelp

您当前位置:首页 >> Java编程Java编程

日期:2023-03-05 11:24


ISE 535 Data Mining Exam 1 Due on March 8 by 12:30 pm

For the following questions use the data in the file cities1.xlsx. It contains data on 325 metropolitan cities

in the United States.

Let column Metropolitan_Area be the row names of your dataframe.

Remove the non-numeric variables, Crime_Trend and Unemployment_Threat).

Use scale() function to scale all numeric columns.

Use function dist to find the distance between cities (on the scaled data).

K-MEANS CLUSTERING

1. (10 pts) Use set.seed(123) and the user function twcv to find TWCV values for k = 1 : 16. Use nstart =

25. Display the elbow chart.

2. (10 pts) The best number of clusters is the smallest k such that the cluster plot shows the least amount

of clusters overlap. Use fviz_cluster( ) with argument geom = ”point” to display cluster plots with

no label names. Try fviz_cluster( ) with different K. What is the best K? For this K find the number

of cities in each cluster.

3. (10 pts) For your choice of K clusters, find the median (or mean, if you prefer) of each numerical column

(on the original un-scaled dataset). Write one sentence characterizing each cluster.

HIERARCHICAL CLUSTERING

4. (20 pts) Use function hclust with linkage ward.D to create object h1 and display the four clusters on the

dendrogram. Use function cuttree( ) to find the clusters. Find the number of cities in each cluster.

Use fviz_cluster( ) with argument geom = ”point” to display the cluster plots of your choice with no

label names. Find the CCPC for ward.D

5. (20 pts) Use function hclust with linkage complete to create object h2 and display the four clusters on

the dendrogram. Use function cuttree( ) to find the clusters. Find the number of cities in each cluster.

Use fviz_cluster( ) with argument geom = ”point” to display the cluster plots of your choice with no

label names. Find the CCPC for complete linkage.

6. (20 pts) Use function hclust with linkage average to create object h3 and display the four clusters on the

dendrogram. Use function cuttree( ) to find the clusters. Find the number of cities in each cluster.

Use fviz_cluster( ) with argument geom = ”point” to display the cluster plots of your choice with no

label names. Find the CCPC for average linkage.

7. (10 pts) What linkage do you prefer? For the clusters found for this linkage find the median (or mean, if

you prefer) of each numerical column (on the original un-scaled dataset). Write one sentence characterizing

each cluster for this linkage.

Submit your report (code and output) as a pdf file onto Blackboard (no screen captures). Read your pdf file

before submitting.


相关文章

版权所有:留学生编程辅导网 2020 All Rights Reserved 联系方式:QQ:99515681 微信:codinghelp 电子信箱:99515681@qq.com
免责声明:本站部分内容从网络整理而来,只供参考!如有版权问题可联系本站删除。 站长地图

python代写
微信客服:codinghelp