MA4601/MAT061 Stochastic Search and Optimisation
Assignment 2: Clustering with Cross-Entropy
Due 9:00am Tuesday 3 March, 2020
The task for assignment 2 is to use the cross-entropy method to do clustering. You will need
to submit two files: a programme file titled YOUR NAME programme.r (or .py, .jl, etc.) and a report
as a pdf file titled YOUR NAME report.pdf. Submission by email to joneso18@cardiff.ac.uk.
Reference: De Boer, Kroese, Mannor, Rubinstein (2005) A tutorial on the Cross-Entropy
method. Annals of Operations Research 134, pp. 19–67.
Clustering
We are given a set of points X ⊂ R
d and are told to find centroids c1, . . . , ck ∈ R
d and a
partition X = X1 ∪ · · · ∪ Xk in order to minimise.
It can be shown that the centroids have to have the form,
so we can simplify the problem to finding the optimal partition. (If we use a distance other
than Euclidean then the centroids may not have this form.)
Solution Space
W.l.o.g. suppose that X = {x1, . . . , xn} and represent a partition into k subsets by a vector
i = (i1, . . . , in) ∈ {1, . . . , k}n, where ih = j when xh ∈ Xj . The problem is thus to minimise
S(i) = X.
Our family of probabilities on Ω = {1, . . . , k}
n
is indexed by matrices P = (pij )i=i,...,n;j=1,...,k,
where
P(I = i) = Ynh=1
ph,ih.
P is constrained to be non-negative, with row-sums equal to 1.
1
Test Problem
Once you have your code working, apply it to the data set jain.txt, available on Learning
Central (taken from Jain & Law (2005) Data Clustering: A Users Dilemma). Do it for k =
2, 3, 4, . . .
The report should describe the choices made in implementing the CE method and why they
were made. It should be presented as a stand-alone document that can be understood without
having to read your code. It should be no more than four pages long. When implementing the
CE method you should consider the following aspects:
• The sample size.
• The selection quantile.
• The smoothing parameter.
Marks will be allocated on the following basis:
50% Code correctness (does it actually work).
20% Quality of solution (how good is the best solution found).
20% Reasoning behind choices for meta-parameters.
10% Clarity of report.
2
版权所有:留学生编程辅导网 2020 All Rights Reserved 联系方式:QQ:99515681 微信:codinghelp 电子信箱:99515681@qq.com
免责声明:本站部分内容从网络整理而来,只供参考!如有版权问题可联系本站删除。