Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Universiteti i Prishtinës
Problem Description
There exists an unclassified data set with hidden data structures in it. The task in
this assignment is to perform comprehensive Cluster Analysis in order to reveal
the structures and similar data groups.
The data set consists of unlabeled data set called test.txt and initial centroids data
set namely centroids.txt in the archive. Both files have the following format:
[attribute1_value <space> attribute2_value <space> ... <space>
attribute90_value].
The unlabeled data set includes 350 samples and the initial centroids set consists
of 15 samples. Data instances in both files have 90 attributes.
Finally, prepare an academic report and deliver it together with source code and
any additional material, which you were using during you work.
Tasks:
3. Perform clustering of the unlabeled data set. You could use provided initial
centroids set or generate your own. Also there could be considered next
stopping criteria:
3.1 Maximal number of iterations: 100
3.2 Cluster are consistent (no changes in group matrix or centroids on
current iteration, which mean that the clusters are balanced).
4. Cluster Analysis could be also represented more formally as optimization
procedure, which tries to minimize the Residual Sum of Squares objective
function:
5.1 What is the optimal number of clusters K for a given data set?
5.2 Did you get any empty clusters? What is the possible solution for this
problem?