联系方式

  • QQ:99515681
  • 邮箱:99515681@qq.com
  • 工作时间:8:00-23:00
  • 微信:codinghelp

您当前位置:首页 >> Python编程Python编程

日期:2020-03-02 10:11

MA4601/MAT061 Stochastic Search and Optimisation

Assignment 2: Clustering with Cross-Entropy

Due 9:00am Tuesday 3 March, 2020

The task for assignment 2 is to use the cross-entropy method to do clustering. You will need

to submit two files: a programme file titled YOUR NAME programme.r (or .py, .jl, etc.) and a report

as a pdf file titled YOUR NAME report.pdf. Submission by email to joneso18@cardiff.ac.uk.

Reference: De Boer, Kroese, Mannor, Rubinstein (2005) A tutorial on the Cross-Entropy

method. Annals of Operations Research 134, pp. 19–67.

Clustering

We are given a set of points X ⊂ R

d and are told to find centroids c1, . . . , ck ∈ R

d and a

partition X = X1 ∪ · · · ∪ Xk in order to minimise.

It can be shown that the centroids have to have the form,

so we can simplify the problem to finding the optimal partition. (If we use a distance other

than Euclidean then the centroids may not have this form.)

Solution Space

W.l.o.g. suppose that X = {x1, . . . , xn} and represent a partition into k subsets by a vector

i = (i1, . . . , in) ∈ {1, . . . , k}n, where ih = j when xh ∈ Xj . The problem is thus to minimise

S(i) = X.

Our family of probabilities on Ω = {1, . . . , k}

n

is indexed by matrices P = (pij )i=i,...,n;j=1,...,k,

where

P(I = i) = Ynh=1

ph,ih.

P is constrained to be non-negative, with row-sums equal to 1.

1

Test Problem

Once you have your code working, apply it to the data set jain.txt, available on Learning

Central (taken from Jain & Law (2005) Data Clustering: A Users Dilemma). Do it for k =

2, 3, 4, . . .

The report should describe the choices made in implementing the CE method and why they

were made. It should be presented as a stand-alone document that can be understood without

having to read your code. It should be no more than four pages long. When implementing the

CE method you should consider the following aspects:

• The sample size.

• The selection quantile.

• The smoothing parameter.

Marks will be allocated on the following basis:

50% Code correctness (does it actually work).

20% Quality of solution (how good is the best solution found).

20% Reasoning behind choices for meta-parameters.

10% Clarity of report.

2


版权所有:留学生编程辅导网 2020 All Rights Reserved 联系方式:QQ:99515681 微信:codinghelp 电子信箱:99515681@qq.com
免责声明:本站部分内容从网络整理而来,只供参考!如有版权问题可联系本站删除。 站长地图

python代写
微信客服:codinghelp