Sei sulla pagina 1di 6

Future of Information and Communication Conference (FICC) 2018

5-6 April 2018 | Singapore

Enriching Existing Ontology using Semi-Automated


Method
Md Jabed Hasan
Amna Islam Badhan Nafiz Ishtiaque Ahmed
Department of Computer Science
Department of Computer Science Department of Computer Science
American International University –
American International University – American International University –
Bangladesh
Bangladesh Bangladesh
Dhaka, Bangladesh
Dhaka, Bangladesh Dhaka, Bangladesh
jabedhasan21@gmail.com
badhan2405@gmail.com cse.ishtiaque@gmail.com

Abstract—Ontology is a kind of philosophical study which is mapping approach to enrich of a particular ontology and try
dealing with nature being. Ontologies are extremely useful tools computational external source to the target ontology.
for different purpose and various modalities in different areas
and communities. A common ontology is very effective in WordNet is a free lexical database and it‟s publicly
sophisticated software engineering purpose. In realistic world available for download. It is very large and covers with
new meaningful words are always improving a language and to general lexical relation in English. WordNet‟s structure can
enhance the most widely used ontologies it requires mapping. To make it a useful tool for computational linguistics and natural
assure the quality manual mapping is used with some limitation. language processing3. We apply semi-automated mapping to
Partial automated mapping may apply to extend ontology by enrich of ontology through mapping an external source to the
extracting and integrating knowledge from existing resources target ontology. Ontology which is consider as the pillars of
more effectively. In this paper, we present a semi-automated semantic web. The main thread of ontology is the study of
method, type of machine learning to enrich an existing ontology. entities and their relations4 in the philosophical sense.
Moreover, the approach can save time and ensure the accuracy
that they need to serve. Ontology is seen as a key factor for enabling in-house
across different systems and semantic web applications.
Keywords—Ontology; mapping; semi-automated method and Ontologies mapping are required for combining and
machine learning distributed different ontology‟s. Developing such ontology
mapping has been an important issue of recent ontology
I. INTRODUCTION research [7].
Ontology the word defines a specification of a The semantic web relies heavily on the formal ontologies
conceptualization1. The use of background knowledge for that structure its underlying data for comprehensive and
ontology matching is often a key factor for success, transportable machine understanding. Ontology learning
particularly in complex and lexically rich domains such as the greatly facilitates the construction of ontologies. The use of
life sciences [1]. It plays an important role in modern science ontologies to model the knowledge of specific domains
like, semantic web, machine translation and word sense represents a key aspect for the integration of information
disambiguation to exploit lexical knowledge. Ontology coming from different sources, for supporting collaboration
basically defines a collective vocabulary for researchers who within virtual communities, for improving information
need to share information in a certain domain which include retrieval, and more generally, it is important for reasoning on
machine-capable meanings of basic concepts in the domain available knowledge. Ontology learning includes a number of
and relations amongst them2. The most effectual and widely complementary disciplines [4]. Ontology feed on different
employed ontologies are still man-made. These include types of unstructured, semi structured, and fully structured
WordNet, Cycor, Open-Cyc, and SUMO. As they are data to support semiautomatic ontology engineering [8].
manually assembled; these knowledge sources have the Semantic annotation of data in the semantic web is the first
advantage of satisfying the highest quality expectations. critical step to better search, integration and analytics over
Whatever, these knowledge sources are very costly to heterogeneous data, semantic annotation of web services is an
assemble and continuous human effort required to keeping equally critical first step to achieving the target. Finding the
them up to date. Language processing typically uses localized right data for scientific research and application
knowledge in the ontology, looking up terms, handling development is still a challenge. One important goal of the
synonyms, and better understanding the context surrounding a semantic web is to make the meaning of information explicit
specific concept [2]. Some ontology structure can makes a through semantic mark-up, thus enabling more effective
useful tool for language processing. We apply semi-automated access to knowledge contained in heterogeneous information
environments, such as the web. Semantic search plays an

3
1
www.ksl.stanford.edu http://wordnet.princeton.edu/
2 4
https://en.m.wikipedia.org http://opennlp.apache.org/

1|Page
978-1-5386-2056-4/18/$31.00©2018 IEEE
Future of Information and Communication Conference (FICC) 2018
5-6 April 2018 | Singapore
important role in realizing this goal, as it promises to produce respectively. Entitypedia is progressively extended by
precise answers to user‟s queries by taking advantage of the collecting knowledge from several sources, including
availability of explicit semantics of information [5], [10]. WordNet [3].
Semantic Similarity relates to computing the similarity
between concepts which are not lexicographically similar. III. DESCRIPTION
Some of the most popular semantic similarity methods are A. Semi-Automatic Method
implemented and evaluated using WordNet as the reference
ontology [9]. Semi-automated means partially automated by both
manual and automatic system to achieve the target. Enrich
We know that WordNet is a lexical database of English ontology can be possible by manually as well as semi-
words. Many more like Oxford English Dictionary, automatic system. Though both can reach the same target but
Wiktionary (multilingual dictionary, a Wikipedia project) every single step of procedure of them are different. If both
Encyclopedia, Wordweb and Entitypedia are also some very approaches can reach the same goal then a question may come
popular English dictionary. Do I get a word into all that why we use semi-automatic system? Yes, there are some
dictionaries? The answer is may be no. because People invent important reasons to choose semi-automatic approach.
new words all the time. And all dictionaries don‟t update their
database at a time. For an example the word „Textpectation‟ There have some benefit of using manual approach. For
cannot be found in WordNet but it exists in Knoworthy example a man-made ontology WordNet, to enrich it‟s
database [3]. database by manually the benefits are its easy to find noise,
common noun and also find head for new common noun,
Our main objective of the thesis paper work is to enrich an make relational tree with WordNet database. But constrains of
ontology with higher accuracy using a semi-automated manual approach are time complexity, high assembly cost,
mapping approach. Using manually mapping we can achieve high quality assurance cost, low coverage etc. To overcome
high accuracy and noise identification but like other from these constrains we suggest semi-automatic system
ontologies the problem is low coverage and time consuming which can fulfill satisfactory level with high accuracy, low
and also required high cost. Besides, fully automation is cost and high coverage.
possible which achieve high coverage and minimal time with
the problem of low accuracy and noise unidentifiable. The B. WordNet
problem can be solved by semi-automated method which WordNet was developed and is being maintained at the
having both manual and automatic approach at a time. Cognitive Science Laboratory of Princeton University under
the supervision of Professor George A. Its knowledge base
In this paper we propose a semi-automated method to and it can be downloaded and used free of cost and can also be
enrich knowledgebase with adding missing concepts in an browse online.
ontology. At the beginning of the paper, we refer some related
work and literature review covering with semi-automated C. Structure of WordNet
method, structure of WordNet, WordNet database, semantic WordNet consists of three separate databases, one for
web search etc. then we analysis our method in several steps nouns, one for verbs and one for adjectives and adverbs and
with description. After that we evaluate our method and all are placed not in alphabetical order but with order of
compare result with other possible results elaborately. Before meaning. In Fig. 1, we have a relational tree and structure of
end of the paper we discus about limitations of the paper and WordNet showing how one word related to others in the
further work which can be done in future. WordNet database.
II. RELATED WORK The current version available for online usage is 3.1which
When a higher accuracy is necessary, semi-automatic was released in November 2012, is available to download. It
approaches are preferable. Plenty of ideas have been generated contains 155,287 words organized in 117,659 synsets for a
using this method. In this section, we briefly mention some total of 206,941 word-sense pairs. The basic structure of
relevant work which is absolutely related with semi-automated WordNet is synsets. WordNet covers more than 118,000
method. different word forms and more than 90,000 dissimilar word
senses In WordNet 3.0 there are 440 topics and domains [4].
FarsNet: a lexical ontology for the Persian language. Each of WordNet‟s 118, 000 synsets are linked with each
FarsNet is designed to contain a Persian WordNet with about other. WordNet is unique because of its each form-meaning
10000 synsets in its first phase and grow to cover verbs' pair. Means, a synset contains a brief definition and word form
argument structures and their selection restrictions in its with several different meanings is represented in many
second phase. The semi-automatic approach used to create the different synsets.
first phase: the Persian WordNet [11].
About 17% of the words in WordNet are polysemous and
Entitypedia is used to evaluate SAM as target ontology 40% of the words are synonymous [3]. WordNet maintains the
and the 15,480 categories of YAGO that were directly mapped word by groupings noun, verb, adjective, and adverb the open-
to entity as external resource. Wikipedia was used as external class words. It is assumed that the closed-class categories of
vocabulary. Developed at the University of Trento in Italy, English around 300 prepositions, pronouns, and determiners
Entitypedia is a knowledge base with a precise split between play a vital role in every parsing system and they are given no
individuals, classes, attributes and relations and their semantic clarification in WordNet.
lexicalization as proper nouns and common nouns,

2|Page
978-1-5386-2056-4/18/$31.00©2018 IEEE
Future of Information and Communication Conference (FICC) 2018
5-6 April 2018 | Singapore

Fig. 1. Relational Tree of WordNet.


Fig. 2. ISA relation among synsets.
Relation among words in WordNet is synonymy, e.g. the
words car and automobile. Synonyms words that mean the D. WordNet database
same concept which are interchangeable in many contexts are
The WordNet database is stored in an ASCII format
grouped into unordered sets (synsets). Encoded relation
consisting of eight files, two for each syntactic category.
among synsets is called ISA relation. The semantic relations
Additional files are used by the WordNet search code but are
that are included in WordNet are given below [3]:
not strictly part of the database [2]. WordNet only contains
 Synonymy (same-name) is a symmetric relation “open-class words” like nouns, verbs, adjectives, and adverbs,
between word forms. It‟s WordNet‟s elementary Prepositions, pronouns. Conjunction and articles cannot be
relation. found in WordNet.
 Antonymy (opposing-name) is also a symmetric E. Semantic Web Searching
semantic relation between word forms like synonymy, Semantic search is a data searching technique which
Significant in forming the meanings of adjectives and search in query and aims to not only find keywords, but to
adverbs. determine the intent and contextual meaning of the words
which is using for search. Semantic search provides more
 Hyponymy (sub-name) and hypernymy (super-name)
meaningful search results by evaluating and understanding the
inverse of hyponymy are transitive relations between
search phrase and finding the most relevant results in a
synsets. This semantic relation organizes the meanings
website, database or any other data repository5. For example,
of nouns into a hierarchical structure.
as shown in Fig. 3 the architecture of Semantic Web Search is
 Meronymy (part-name) and holonymy (whole-name) much difference of Normal Web Search6.
are composite semantic relations. WordNet can
distinguish component parts, substantive parts, and
member parts.
 Troponymy (manner-name) is for verbs what
hyponymy is for nouns, although the resulting
hierarchies are much shallower.
 Entailment relations between verbs are also implied in
WordNet.
While searching a new concept, it is very important to
identify a head along with other words. POS tagger (parts of
speech tagging) is comparatively an ideal way to find a new
concept with high accuracy. However, after finding a new
concept it has to be defined and make a relational tree with
other concept. Sometime concepts meaning or definition can
be different from head but it can relate to other concept which
is related with head. Here in Fig. 2, Graduate Course is not
directly related with head Organization but it related with
Course which is related with Work after Thing and this phase
is directly related with Organization.
Fig. 3. Architecture of semantic web search.

5
filosofie.unibuc.ro
6
semanticsage.blogspot.com

3|Page
978-1-5386-2056-4/18/$31.00©2018 IEEE
Future of Information and Communication Conference (FICC) 2018
5-6 April 2018 | Singapore
Semantic search is based on the context, substance, intent This system can merge words in semi-automatic approach
and concept of the searched phrase7. In the Semantic Web using JAWS and JWNL tools. Add and define new concept
community, semantic search is widely used to refer to a using our tool is shown in Fig. 5.
number of different categories of systems like, incorporates
location, synonyms of a term, current trends, word variations
and other natural language processing [6]. Semantic search
concepts are derived from various search algorithms and
methodologies, including keyword-to-concept mapping, graph
patterns and fuzzy logic.
IV. METHOD
In this paper we present a new method which is to search
for a new word or a new word‟s sense and merge that to the
related concept in WordNet. Searching process is applicable
for paragraph and sentence. We have to pick the common
nouns from those sentences and search those common nouns
in WordNet, if not exist then search the word to another
database or web source. If the word found in database or web
source then adjoins those words to the related concept in
WordNet database otherwise avoid those words as noise. The
way of our implementation:
Fig. 5. Create new concept using proposed system.
 Label: When searching procedure occur by a paragraph
or 5 to 6 sentences make label from these paragraph. V. EVALUATION AND RESULT
Label means pick a word from these sentences which
reflect a major influence in that passage. To evaluate our method we have tried various ontologies.
The aim of this step is to get an estimate idea about our target
 Summary: Searching with a big paragraph, make that and what we have found as result. In this process our method
paragraph in a summery version. takes approx. 2239ms to search a concept which is really very
faster than manual searching. Though manual searching is
 Define new concept: Define a new concept of
easier to identify a noise but it takes a larger time
searching, selecting head and merge with WordNet
comparatively partial mapping. And also the matter of higher
database.
accuracy and low cost, partial mapping can detect more and
 The identification of the head, which in turn is typically accurate head in minimal cost than manual mapping.
based on part of speech (POS) tagging, is an Thoroughly the mechanism performs properly to reduce the
approximated process with accuracy that varies noise rate of adding new concept and mapping it with
according to the tool and the dataset used to train it. WordNet database.
Some New concepts we found in this research and add A set of new word is examined which is not available in
WordNet database using our process like „Landform‟, WordNet. Table 1 describes the noise rate of suggested their
„Textpectation‟, „Futurologists‟, etc. In Fig. 4, a new concept conceptual definitions and accuracy rate of mapping the
is used, searching using our tool. concept with WordNet. Thereby we have tried 118 different
type of concept including some new concept and the result we
have found with time, cost and accuracy challenge are
compared and shown below in Fig. 6.

TABLE I. NOISE AND ACCURACY RATE FOR ADDING NEW WORD TO


WORDNET USING SEMI-AUTOMATED MAPPING
Noise rate of suggested
New Word Accuracy rate
conceptual definitions
Codec 30 % 98 %

Landform 20 % 100 %

Futurologists 0% 100 %

Rare diseases 0% 100 %

Sexologists 20 % 95 %
Fig. 4. Searching new concept.

7
blogspace.com

4|Page
978-1-5386-2056-4/18/$31.00©2018 IEEE
Future of Information and Communication Conference (FICC) 2018
5-6 April 2018 | Singapore

6 semi-automation can be a preferable technique to enrich


database through the mapping with WordNet ontology. As
ontologies support semantic web application so it‟s an
5 efficient system to merge words making possible. Because of
using semi-automated system, it can reduce the cost, time and
4 improve the results in the alignment of WordNet with existing
ontologies.
3
VIII. FUTURE WORK
2 Further research on adapting semi-automated approach
with more effective natural language processing (NLP)
1 algorithm can benefits enriching ontology with providing new
knowledge base concepts and constructing the accurate
0 definition of new word. Our approach can be further improved
Manual Mapping Semi-Automated Mapping by doing these future works:

time cost accuracy  Alternative search: Add more website (domain) in our
current implemented application for alternative search
so that new concept relation can be search other
Fig. 6. Difference between manual mapping and semi-automated mapping.
domain automatically.
VI. DISCUSSION AND LIMITATION  Enrich algorithm: To improve head concept list, we
The principal purpose of the study is to improve an have to enrich probable text parsing regular expression
existing ontology and materialize a new mechanism which and NLP algorithm.
provides high-ranking accuracy with least noise rate. Again  Alignment of verbs and nouns: The word which makes
the study adjoins the missing concepts to ontology to improve a huge different meaning from noun to verb, make
the knowledge base with semi-automated approach. these word align.
With this semi-automated approach; the WordNet  Word Sense Alignment: Word Sense Alignment can be
Database build accurate meaning of a newborn word and defined as the identification of pairs of senses from two
makes the relationship with other knowledge base. The (or more) lexical-semantic resources which denote the
extracted advantages from manual and automated approach same meaning, e.g.
makes this semi-automated approach more effective for
enriching ontology. Acquiring high accuracy and minimum o Be enamored or in love with.
noise rate the implemented work also clearly distinguish this o Feel love or affection for some one.
approach with manual and automated approach to enrich
ontology.  Definition of alignment senses: The definitions
describe equivalent meanings allowing the alignment
The study holds sundry limitations. At the context of
of the two senses.
particularly unformatted articles; the existing tools that
perform the parsing and normalizing task from numerous ACKNOWLEDGMENT
sources are still not adequate to achieve absolute accuracy.
Hence constructing the exact definition of a newborn word First of all we would like to show our gratitude to the
from this mechanism occasionally required human Almighty, who gave us the effort to work on this project. We
proficiency. want to thanks our honorable supervisor Bayzid Ashik
Hossain for guiding us. His profound knowledge in this field,
Like, when we got some new concept, we searched that keen interest, patience and continuous support lead to the
concept related information in Wikipedia to normalize (using completion of our work. His instructions have contributed
NLP) that concept from web content and built a relational tree greatly in every aspect of the thesis.
with other concept. But sometimes we didn‟t get that target
REFERENCES
concept in Wikipedia, in that case we searched that content in
[1] D. Faria, C. Pesquita, E. Santos, I. F. Cruz and F. M. Couto, "Automatic
other domain manually. As our current implemented Background Knowledge Selection for Matching Biomedical
application search targeted concept in Wikipedia (domain) Ontologies," 2014.
only, it search related information only Wikipedia. [2] D. J. Berndt, J. A. McCart and S. L. Luther, "Using Ontology Network
Structure in Text Mining," pp. 41-45, 2010.
When we were preparing Head concept list for targeted
[3] V. Maltese and B. A. Hossain, "SAM: A TOOL FOR THE
new concept sentences, sometimes it added some head that‟s SEMIAUTOMATIC MAPPING AND ENRICHMENT OF
were partially related to the new concept. ONTOLOGIES," 2012.
[4] S. Gella, C. Strapparava and V. Nastase, "MappingWordNet
VII. CONCLUSION Domains,WordNet Topics andWikipedia Categories to Generate
World is getting easier day by day. It is difficult to update, Multilingual Domain Specific Resources," 2014.
develop and maintain manmade ontologies like WordNet. So [5] L. D. Caro and G. Boella, "Automatic Enrichment of WordNet with
Common-Sense Knowledge," 2016.

5|Page
978-1-5386-2056-4/18/$31.00©2018 IEEE
Future of Information and Communication Conference (FICC) 2018
5-6 April 2018 | Singapore
[6] K. Elbedweihy, S. N. Wrigley, F. Ciravegna, D. Reinhard and a. A. [9] G. Varelas, E. Voutsakis, P. Raftopulou, E. G. Petrakis and E. E. Milios,
Bernstein, "Evaluating Semantic Search Systems to Identify Future "Semantic similarity methods in wordNet and their application to
Directions of Research," 2012. information retrieval on the web," 2005.
[7] N. Choi, I. Song and H. Han, "A survey on ontology mapping," vol. 35, [10] Y. Lei, V. Uren and E. Motta, "SemSearch: A Search Engine for the
pp. 34-41, 2006. Semantic Web," 2016.
[8] M.GaetaF, OrciuoliP and Ritrovato, "Advanced ontology management [11] M. Shamsfard, A. Hesabi, H. Fadaei, N. Mansoory, A. Famian, S.
system for personalized e-Learning," vol. 22, no. 4, pp. 292-301, 2009. Bagherbeigi, E. Fekri, M. Monshizadeh and S. M. Assi, "Semi
Automatic Development of FarsNet; The Persian WordNet," 2010.

6|Page
978-1-5386-2056-4/18/$31.00©2018 IEEE

Potrebbero piacerti anche