Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Gouge has got strong association with the word Weld. Weld
is a defect that occurs when the material is not welded
properly or is related to the issue of welding. Similarly, in
depth analysis can be performed to find out the root cause of
the defect.
Methodology
Analysis
SAS Enterprise
analyze the data.
Guide
5.1
was
used
to
Text Parsing
The Text Parsing node parses through the entire
data to identify unique terms (words) present in
the text data. To speed up the run time, some of
the parsed properties were set to NO, these
were: Detect Different Parts of Speech, Noun
Groups, and Find Entities. Based on the
properties set, only certain required key words
were identified. After the terms were identified,
a Term-By-Document matrix was created with
terms as rows and documents (comments) as
columns. Usually, a very sparse matrix will be
generated. To reduce the size of the term-bydocument matrix a Start list was used.
Text Filter
The Text Filter node assigns weight to the words
based on their respective frequencies. A built in
algorithm was used to filter all the unnecessary
words which have a lower weight. The algorithm
assigns more weight to the medium and low
frequency words because the documents can
easily be classified based on those words.
Check spelling was set to Yes. The default
English dictionary was customized as needed.
The cells of the term-by-document matrix
contain the frequency of the term in that
particular document. SAS default weighting was
used to calculate frequency for terms.
Text Topic
The Text Topic node extracts topics. Topics are
groups
of
terms
that
summarize
a
representation of document collection. Multiterm topics were used, and after some
research, the number of multi-term topics was
set to 10. The node assigns document cutoff
to each document and assigns term cutoff to
each topic, and, based on the threshold value, it
will check whether the association between the
terms in the topic was strong enough or not.
Text Cluster
The Text Cluster node assigns the document to
only one cluster whereas Text Topic node
assigns the document to zero or more topics.
Single Value Decomposition was used to
Suman -Sai
Concept Links:
The concept link diagram shows the association or
relationship of a particular word with the other words.
The words
number
of documents, i.e., 16,093 documents, followed by words of
Topic 2, which appeared in 12,132 documents. The words
cut, yield, length, id, and mark define the theme of
Topic 3. All the materials that were assigned to Topic 3 have
the following defects: cut issues, desired length not met,
marks on the surfaces, and inner diameter under tolerances.
Similarly, the words reject, gouge, cut, weld, and UT
define the theme of Topic 2. The materials in Topic 2 were
getting rejected because of the following defects: Gouges,
welding issues, and under tolerances. The conclusion is most
of the comments were assigned to Topics 2, 3, and 10. Most of
the materials in Plant 30 have the defects that were present
in Topics 2, 3, and 10.
The Text Cluster node discovered six major themes, and each
document (comment) got assigned to one of the themes. 36%
of the comments were assigned to Cluster 1, but Cluster 1
does not have any descriptive terms and has no theme. This
has happened because all those comments in cluster 1 do not
contain any of the words that are present in the Start list. This
can be improved by removing some of the comments.