Sei sulla pagina 1di 6

Running head: CLUSTER ANALYSIS 1

CLUSTER ANALYSIS

Hal Hagood

u06a1
CLUSTER ANALYSIS 2

“Clustering or cluster analysis is a generic name for a group of related techniques (such as

unsupervised pattern recognition, unsupervised classification analysis, numerical taxonomy, typology

constructions, Q-analysis, and so on) that automatically try to find natural groupings in the data. One

crucial difference between clustering and a typical classification model is the absence of any target

variable (where classes or groups are known a priori) in the data. In the context of textual data, this

means that no labeled training examples are needed before documents can be clustered into groups.

This is why clustering is often referred to as unsupervised classification.

As a conceptual activity, the assignment of objects into groups is something humans do routinely

all through their lives to reduce the complexity of the environment that they have to work with. The natural

grouping of objects and observations is extremely important to many disciplines (such as statistics,

psychology, sociology, biology, engineering, economics, and business). Each of these disciplines, in turn,

has used its own label to describe cluster analysis. Although the names might differ across disciplines, all

disciplines share the fundamental concept of separating data suggested by the natural groupings in the

data. In essence, cluster analysis attempts to group objects so that each object in a cluster is similar to

the other objects in the same cluster. However, objects in different clusters are dissimilar to each other. In

the context of textual data, objects are the documents that must be assigned to clusters so that within a

cluster, documents are similar, but between clusters, documents are different.

The basic idea is that documents within a cluster should be similar to each other, and documents

in different clusters should be dissimilar to each other. The similarity between two documents is based on

the similarity of features (such as terms or words) between documents in the vector space model. In this

context, we discuss latent semantic indexing (LSI), which provides a method for determining the similarity

of words and passages by the analysis of large text corpora. Then, we discuss the concept of topic

extraction from a collection of documents. A topic is conceptualized as a collection of terms that capture

the main themes or ideas in the document. Unlike cluster groups, where each document is assigned to

only one cluster, the same document can be assigned to multiple topics, depending on how many ideas

are represented in a documet” (Text Mining and Analysis, 2017).


CLUSTER ANALYSIS 3

For this particular assignment we worked with a set of data from a survey that contains structured

and unstructured data, this will help provide key insights into the data. A project was created using SAS

Enterprise Miner to import the survey data for analyses. Textual data was imported from the survey in

SAS Miner. Tutorial Q: A Hands on Tutorial on Text Mining in SAS from the textbook for setting up the

cluster analysis was followed in this exercise.

One variable at a time should be mined and the first one selected is “Why_Best_Lylty_Card” the

value of Use is set to Yes for that variable. This setting will be used for each node concurrently in this

mining procedure.

The Text Parsing Node is used to parse the data. Then the Text Filter Node is adjusted to check

spelling function to Yes and the number of terms to be displayed to All. The Text Filter node reduces the

total number of parsed terms and or documents that are analyzed, all the inputs from the survey are

used. Next add the Text Cluster node. SAS Miner will cluster documents into sets and supply a report on

the descriptive terms for those clusters. Setting used were outlined in the tutorial. The output of the

diagram built shows 10 clusters created with the nodes that were added.
CLUSTER ANALYSIS 4
CLUSTER ANALYSIS 5

Results of these clusters supply an understanding of customer choices, impressions and

opinions. Examining the clusters on an individual basis can help in understanding how customer’s

sentiment about products and services are provided. This can give insight and additional understanding

to their connections between positive and negative sentiments. Supplying stakeholders with this

essential information can help and sometimes vastly improve business decisions. This in turn relates

directly maintaining customer’s satisfaction levels both now and in the future contingent on whether they

are negative, positive or of neutral bias.


CLUSTER ANALYSIS 6

Reference

Text Mining and Analysis, (2017). Text Mining and Analysis: Practical Methods, Examples, and Case

Studies Using SAS Chapter 6 - Clustering and Topic Extraction. Retrieved August 25, 2017 from

http://viewer.books24x7.com/assetviewer.aspx?bookid=59026&chunkid=342485391&resume=ye

s&resumebookmarkid=dc367bed-ce7d-e711-a9c3-00505686029c

Potrebbero piacerti anche