Sei sulla pagina 1di 1

CLUSTERING COMPONENT OF BUDGETING

FROM DISTRICT GOVERNMENT IN INDONESIA


MUHAMAD SAKA SOTYASAKSI / 4SK4 / 14.8247@STIS.AC.ID

RESULT &
BACKGROUND Cluster Dendrogram

CONCLUSION Scree plot

The widely adopted clustering algorithm uses a sum of squared error 50

objective function. A detailed analysis shows the close relationship between 20

40

Hierarchical clustering and principal component analysis (PCA) 15

which is extensively utilized in unsupervised dimension reduction. With

Percentage of explained variances


HCPC Package provided by R, we tested the dataset that provided by World 30

Height
Bank consisting of sub-national government consisting of output program 10

that realized in the year of 2012 20

10

DATA 0

26
17
23
27
15
4
1
7
3
22
16
20
6
13
25

5
28
12
8
19
9
10
14
2
24
18
21
11
0

The Indonesia Consolidated Fiscal dataset (COFIS) contains data on 1 2 3 4 5 6


Dimensions
7 8 9 10

expenditure from the central and sub-national (provinces, districts) Figure 3.1 Fig 3.2
governments. The data comes from publicly available data sources, `Percentage of explained variance Dendrogram based on PCA
Fig 3.3
managed by the Government of Indonesia (GoI) and, unless indicated Cluster Plot
otherwise, is audited realized expenditure data.

The dataset contain 511+34 cases and 77 variables and after data From figure 1 we can conclude that by using 4 Principle
manipulation (eliminating Missing Value, standardization, and removing Components, at least 75% variance on the dataset can be
duplicates), the dataset only consisting of 28 cases and 65 variables which explained by the component that we created.
latter used in the analysis.
From figure 2.0 we know which district belong to which
cluster. the 4th cluster only have 1 member, the 3rd cluster have 6
members, the 2nd cluster have 2 members, and the rest join the
METHOD first cluster.
Clustering is one of the important data mining methods for
discovering knowledge in multivariate data sets. The goal is to identify Based on the result, district in cluster 1 have coordinates
groups (i.e. clusters) of similar objects within a data set of interest. Briefly, the on axes 1 and 2. Districts in cluster 2 have coordinates on the
two most common clustering strategies are: second axis, district who belong to the third cluster have
? Hierarchical clustering. coordinates on the first axis. And district who belong to the forth
Used for identifying groups of similar observations in a data set. cluster have coordinates on axes 1 and 4. Here, a dimension is
? Partitioning clustering such as k-means algorithm. kept only when the v-test is higher than 2.
Used for splitting a data set into several groups.

The HCPC (Hierarchical Clustering on Principal Components)


approach allows us to combine the three standard methods used in REFERENCE
multivariate data analyses (Husson, Josse, and J. 2010):
Ding, Chris, and Xiaofeng He. "Principal component analysis and effective k-means clustering." Proceedings of the
?
2004 SIAM International Conference on Data Mining. Society for Industrial and Applied Mathematics, 2004.
Principal component methods (PCA, CA, MCA, FAMD, MFA),
?
Hierarchical clustering and
?
Husson, François, J. Josse, and Pagès J. 2010. “Principal Component Methods - Hierarchical Clustering - Partitional
?
Partitioning clustering, particularly the k-means method.
? Clustering: Why Would We Need to Choose for Visualizing Data?” Unpublished Data.
http://www.sthda.com/english/upload/hcpc_husson_josse.pdf.

Potrebbero piacerti anche