联系方式

  • QQ:99515681
  • 邮箱:99515681@qq.com
  • 工作时间:8:00-23:00
  • 微信:codinghelp

您当前位置:首页 >> Python编程Python编程

日期:2023-05-06 09:31

k-Means and Mean-Shift Clustering

ECE4076/5176 Computer Vision

Lab 3 (Weeks 8,9)

Performance Expectations and Guidelines for Lab Completion

As we progress through the course, we expect you to expand your skills and toolset

to tackle more complex and involved problems. This lab has been designed to help

you achieve that by providing a comprehensive and engaging learning experience.

To ensure that you achieve your best possible results, we recommend that you

begin working on the lab well in advance of your lab sessions. Additionally, please

note that the computation times required for this lab may vary depending on the

efficiency of your code. Sometimes, running your code may take longer than 15

minutes if it is not optimized efficiently. Therefore, we encourage you to put in

your best efforts and work diligently to achieve the desired results.

1

Academic integrity

Every lab submission will be screened for any collusion and/or plagiarism. Breaches

of academic integrity will be investigated thoroughly and may result in a zero for the

assessment along with interviews with the plagiarism officers at Monash University.

Late submissions

Late submission of the lab will incur a penalty of 10% for each day late. That is,

with one day delay, the maximum mark you can get from the assignment is 7.2 out

of 8, so if you score 7.5, we will (sadly) give you 7.2. Labs submitted with more

than a week delay will not be assessed. Please apply for special consideration for

late submission as soon as possible (e.g., documented serious illness).

Lab Instructions and the Use of Generative AI

• You may not use any built-in OpenCV functions for this lab, except for those

used for loading/saving or processing (colour, size, ...) an image.

• You may use NumPy for array handling, and vectorizing your code (reducing

the number of for-loops) is encouraged.

• You should use Matplotlib to display images and any intermediate results.

• When completing the mean-shift algorithm task in this lab, you may use the

MeanShift() function from the scikit-learn (aka. sklearn) library.

• You may use a generative AI tool or coding assistance. We recommend understanding the concepts and coding as much as possible, as K-means and

GMMs are meta-algorithms. If you use generative AI tools, please indicate

the prompts you used and explain in detail any changes you made to modify

the code to complete the assignment.

• Please refrain from using Generative AI in preparing your report, as the

purpose of this lab is to assess your understanding of CV concepts.

Lab Grading

Each lab is worth 8%, and there are a number of sections and tasks with their own

weighting. A task is only considered complete if you can demonstrate a working

program and show an understanding of the underlying concepts. Note that later

tasks should reuse code from earlier tasks.

In this laboratory exercise, you will learn and employ clustering techniques to segment

and encode an RGB image with k number of colours. In all the tasks below, you must

not use the pre-written Python libraries for K-means Clustering; instead, you build your

own K-means code.

• Task 1: Distance Function

• Task 2: k-Means Clustering

2

• Task 3: k-Means++ Initialization

• Task 4: GMM: Analysing Data Distributions

• Task 5: Mean Shift - Data Modelling via its distribution

• Task 6: Comparison between k-Means, k-Means++ Initialization and Mean-Shift

References

• k-means clustering

• k-means++

• Mean shift

• Week 7 Lecture content

Resources

• Lab3 student template.ipynb - please complete the lab tasks in this template

notebook and use the markdown spaces provided to report your findings/

describe your approach.

• sharon.jpg is located in the lab3 folder

• Example outputs in the img folder that links to the notebook

Figure 1: Provided sharon image

3

Task 1: Distance Function

In this task, you will build a helper function to compute the squared distance from a set

of data points to a set of centroids. You are given a small dataset of five data points, X

and two centroids, M to test your function.

X =

0.67187976 0.44254368 0.17900127

0.55085456 0.65891464 0.18370379

0.79861987 0.3439561 0.68334744

0.36695437 0.15391793 0.81100023

0.22898267 0.58062367 0.5637733


0.66441854 0.08332493 0.54049661

0.05491067 0.94606233 0.29515262

You will also visualize whether your distances are correct by assigning the data points to

the closest centroid and displaying the results on a 3D scatter plot.

Task 2: k-Means Clustering and Visualisation

In this task, you will implement the k-Means Clustering and visualize the movement of

the centroids. First, you will write a function to generate k number of random centroids

as initialization. Then you will write a function that uses these centroids to perform

k-Means Clustering using the below steps:

1. Go through all the pixels in the image and calculate which of the k centroids it is

closest to.

2. For each pixel, store the index of the nearest centroids at the same pixel location

in an index image.

3. Re-compute each centroid by going through all the pixels in the RGB colour image

that were assigned to that centroid and taking the mean.

4. Repeat steps 1-3 for a pre-determined number of iterations.

Load the sharon.jpg image and use your function to cluster the pixels of the image into

four clusters (k = 4). Finally, you will display the results.

Run the program several times with different random seeds to see if you always converge

on the same answer. Try changing k from 4 to other numbers (from 2 to 10) and see how

this affects the output and if the results are reproducible.

After completing the k-means implementation, write code to visualize the clustering

process at every iteration. For that, first, you will write code to plot a 3D scatter plot of

data points along with the centroids and display the clustered image at once. Then you

will use that function inside your k-Means Clustering implementation to observe how the

centroids move with each iteration. Draw the RGB colour values from the sharon.jpg

image as well as the centroid RGB colour values as a 3D scatter plot. The colour of each

data point should reflect the centroid colour value that it is closest to.

Plotting all the data points is computationally expensive, so opt for only drawing randomly selected 250 data points. Also, make sure you add a time pause of 1-2 seconds

4

after each iteration so there is enough time for you to observe the convergence of centroids

clearly. Have a look at the template notebook for an example output of what it should

look like.

Task 3: k-Means++ Initialization

In this task, you will implement a variant of the k-Means Clustering algorithm called

the k-Means++ algorithm. In the k-Means++ algorithm, only the initialization of the

centroids will change. The rest of the implementation is the same as the conventional

K-Means Clustering with random initialization. Therefore, first, you will write a function

to generate centroids using the k-Means++ initialization by following the below steps:

1. Choose the first centroid randomly from the data points.

2. Compute the squared distance d

2 between every data point x and the selected

centroids so far. Record the minimum value of d

2 as D(x) for data point x.

3. Compute the probability of selecting each datapoint as the next centroid as:

P(x) = D(x)

P

x D(x)

(1)

4. Randomly draw a new centroid based on the computed probability distribution.

HINT: Compute an array that represents the cumulative distribution from the

probability values using the cumsum function. Generate a random number r

between 0 and 1. The new centroid corresponds to the first element value in the

cumulative distribution array that is greater than the value r.

5. Repeat steps 2-4 until k number of centroids have been chosen.

After having chosen k number of centroids, proceed with the standard k-Means algorithm

using the selected centroids. Load the sharon.jpg image and use your k-Means++

implementation to cluster the pixels of the image into four clusters (k = 4). Finally, you

will display the results.

Plotting all the data points is computationally expensive, so you may only draw randomly

selected 250 data points. Also, make sure you add a time pause of 1-2 seconds after each

iteration so there is enough time for you to observe the convergence of centroids clearly.

Task 4: GMM: Analysing Data Distributions

In this task, you will analyse a toy data set and understand its distribution via Gaussian

Mixture Models (GMMs). GMMs are probabilistic models that represent the probability

distribution of data as a weighted sum of multiple Gaussian distributions. Each Gaussian

distribution in the mixture is referred to as a component and is characterized by its

mean, covariance, and weight. GMMs are commonly used for tasks such as clustering

and density estimation, and can be applied to generate new samples, assign data points

to components, and estimate probability densities in the feature space.

A multivariate Gaussian distribution is defined as:

(5)

In both cases, h is the bandwidth of the kernel and x and z are two data samples (e.g.,

pixels of an image).

To apply the mean-shift (MS) algorithm, you will

1. Get an appreciation of the flat kernel density function by applying it to the toy

dataset from the previous task,

1Kernels are used to smooth out probability density functions (PDFs)

2. Move onto the Epanechnikov-kernel function (a popular kernel choice for the MS

algorithm),

3. Apply MS to the sharon.jpg and analyse how many colours arise from the image.

4. Compare the MS results to the ones obtained via k-means clustering.

Task 6: Comparison between k-Means, k-Means++

Initialization and Mean-Shift

In this task, you will visualize k-Means, k-Means++ initialization, and the mean-shift

algorithm by plotting the final clustering results of each method together with their

final cluster centers. For the two k-Means methods, you will additionally visualise their

initial centroids. Discuss the results and any noticeable differences that you may have

encountered between the different algorithms.


版权所有:留学生编程辅导网 2020 All Rights Reserved 联系方式:QQ:99515681 微信:codinghelp 电子信箱:99515681@qq.com
免责声明:本站部分内容从网络整理而来,只供参考!如有版权问题可联系本站删除。 站长地图

python代写
微信客服:codinghelp