AD699: Data Mining for Business Analytics
Fall 2018
Homework #5
Topic: Clustering
Due by 11:59 p.m. on Monday, 03DEC
Task: k-means clustering
The dataset Cereals.csv contains nutritional information, store display, and consumer ratings for 77 breakfast
cereals. Descriptions of the variables can be found in a text file that accompanies this assignment prompt.
I. Read this dataset into your R environment. Show the steps that you used to accomplish this.
II. Remove all cereals with missing values. Show the steps that you used to accomplish this.
III. Should this data be normalized? Why or why not? If so, normalize your data,and show the steps that
you took in order to make this happen.
IV. Use the kmeans algorithm to separate the breakfast cereals into clusters. To determine the optimal
number of clusters to use, consider using an elbow chart, or another means of analysis of your
preference. (Figure 15.6 in our textbook shows an elbow chart -- the textbook does not provide
template code, but a quick online search will very quickly yield sample/template code for an elbow
chart).
V. The local elementary school has asked that you identify the healthiest cluster from among the clusters
that you’ve found. Which cluster will you select, and why? Which cereals are in this cluster?
版权所有:留学生编程辅导网 2020 All Rights Reserved 联系方式:QQ:99515681 微信:codinghelp 电子信箱:99515681@qq.com
免责声明:本站部分内容从网络整理而来,只供参考!如有版权问题可联系本站删除。