Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Abstract—Ontology is a kind of philosophical study which is mapping approach to enrich of a particular ontology and try
dealing with nature being. Ontologies are extremely useful tools computational external source to the target ontology.
for different purpose and various modalities in different areas
and communities. A common ontology is very effective in WordNet is a free lexical database and it‟s publicly
sophisticated software engineering purpose. In realistic world available for download. It is very large and covers with
new meaningful words are always improving a language and to general lexical relation in English. WordNet‟s structure can
enhance the most widely used ontologies it requires mapping. To make it a useful tool for computational linguistics and natural
assure the quality manual mapping is used with some limitation. language processing3. We apply semi-automated mapping to
Partial automated mapping may apply to extend ontology by enrich of ontology through mapping an external source to the
extracting and integrating knowledge from existing resources target ontology. Ontology which is consider as the pillars of
more effectively. In this paper, we present a semi-automated semantic web. The main thread of ontology is the study of
method, type of machine learning to enrich an existing ontology. entities and their relations4 in the philosophical sense.
Moreover, the approach can save time and ensure the accuracy
that they need to serve. Ontology is seen as a key factor for enabling in-house
across different systems and semantic web applications.
Keywords—Ontology; mapping; semi-automated method and Ontologies mapping are required for combining and
machine learning distributed different ontology‟s. Developing such ontology
mapping has been an important issue of recent ontology
I. INTRODUCTION research [7].
Ontology the word defines a specification of a The semantic web relies heavily on the formal ontologies
conceptualization1. The use of background knowledge for that structure its underlying data for comprehensive and
ontology matching is often a key factor for success, transportable machine understanding. Ontology learning
particularly in complex and lexically rich domains such as the greatly facilitates the construction of ontologies. The use of
life sciences [1]. It plays an important role in modern science ontologies to model the knowledge of specific domains
like, semantic web, machine translation and word sense represents a key aspect for the integration of information
disambiguation to exploit lexical knowledge. Ontology coming from different sources, for supporting collaboration
basically defines a collective vocabulary for researchers who within virtual communities, for improving information
need to share information in a certain domain which include retrieval, and more generally, it is important for reasoning on
machine-capable meanings of basic concepts in the domain available knowledge. Ontology learning includes a number of
and relations amongst them2. The most effectual and widely complementary disciplines [4]. Ontology feed on different
employed ontologies are still man-made. These include types of unstructured, semi structured, and fully structured
WordNet, Cycor, Open-Cyc, and SUMO. As they are data to support semiautomatic ontology engineering [8].
manually assembled; these knowledge sources have the Semantic annotation of data in the semantic web is the first
advantage of satisfying the highest quality expectations. critical step to better search, integration and analytics over
Whatever, these knowledge sources are very costly to heterogeneous data, semantic annotation of web services is an
assemble and continuous human effort required to keeping equally critical first step to achieving the target. Finding the
them up to date. Language processing typically uses localized right data for scientific research and application
knowledge in the ontology, looking up terms, handling development is still a challenge. One important goal of the
synonyms, and better understanding the context surrounding a semantic web is to make the meaning of information explicit
specific concept [2]. Some ontology structure can makes a through semantic mark-up, thus enabling more effective
useful tool for language processing. We apply semi-automated access to knowledge contained in heterogeneous information
environments, such as the web. Semantic search plays an
3
1
www.ksl.stanford.edu http://wordnet.princeton.edu/
2 4
https://en.m.wikipedia.org http://opennlp.apache.org/
1|Page
978-1-5386-2056-4/18/$31.00©2018 IEEE
Future of Information and Communication Conference (FICC) 2018
5-6 April 2018 | Singapore
important role in realizing this goal, as it promises to produce respectively. Entitypedia is progressively extended by
precise answers to user‟s queries by taking advantage of the collecting knowledge from several sources, including
availability of explicit semantics of information [5], [10]. WordNet [3].
Semantic Similarity relates to computing the similarity
between concepts which are not lexicographically similar. III. DESCRIPTION
Some of the most popular semantic similarity methods are A. Semi-Automatic Method
implemented and evaluated using WordNet as the reference
ontology [9]. Semi-automated means partially automated by both
manual and automatic system to achieve the target. Enrich
We know that WordNet is a lexical database of English ontology can be possible by manually as well as semi-
words. Many more like Oxford English Dictionary, automatic system. Though both can reach the same target but
Wiktionary (multilingual dictionary, a Wikipedia project) every single step of procedure of them are different. If both
Encyclopedia, Wordweb and Entitypedia are also some very approaches can reach the same goal then a question may come
popular English dictionary. Do I get a word into all that why we use semi-automatic system? Yes, there are some
dictionaries? The answer is may be no. because People invent important reasons to choose semi-automatic approach.
new words all the time. And all dictionaries don‟t update their
database at a time. For an example the word „Textpectation‟ There have some benefit of using manual approach. For
cannot be found in WordNet but it exists in Knoworthy example a man-made ontology WordNet, to enrich it‟s
database [3]. database by manually the benefits are its easy to find noise,
common noun and also find head for new common noun,
Our main objective of the thesis paper work is to enrich an make relational tree with WordNet database. But constrains of
ontology with higher accuracy using a semi-automated manual approach are time complexity, high assembly cost,
mapping approach. Using manually mapping we can achieve high quality assurance cost, low coverage etc. To overcome
high accuracy and noise identification but like other from these constrains we suggest semi-automatic system
ontologies the problem is low coverage and time consuming which can fulfill satisfactory level with high accuracy, low
and also required high cost. Besides, fully automation is cost and high coverage.
possible which achieve high coverage and minimal time with
the problem of low accuracy and noise unidentifiable. The B. WordNet
problem can be solved by semi-automated method which WordNet was developed and is being maintained at the
having both manual and automatic approach at a time. Cognitive Science Laboratory of Princeton University under
the supervision of Professor George A. Its knowledge base
In this paper we propose a semi-automated method to and it can be downloaded and used free of cost and can also be
enrich knowledgebase with adding missing concepts in an browse online.
ontology. At the beginning of the paper, we refer some related
work and literature review covering with semi-automated C. Structure of WordNet
method, structure of WordNet, WordNet database, semantic WordNet consists of three separate databases, one for
web search etc. then we analysis our method in several steps nouns, one for verbs and one for adjectives and adverbs and
with description. After that we evaluate our method and all are placed not in alphabetical order but with order of
compare result with other possible results elaborately. Before meaning. In Fig. 1, we have a relational tree and structure of
end of the paper we discus about limitations of the paper and WordNet showing how one word related to others in the
further work which can be done in future. WordNet database.
II. RELATED WORK The current version available for online usage is 3.1which
When a higher accuracy is necessary, semi-automatic was released in November 2012, is available to download. It
approaches are preferable. Plenty of ideas have been generated contains 155,287 words organized in 117,659 synsets for a
using this method. In this section, we briefly mention some total of 206,941 word-sense pairs. The basic structure of
relevant work which is absolutely related with semi-automated WordNet is synsets. WordNet covers more than 118,000
method. different word forms and more than 90,000 dissimilar word
senses In WordNet 3.0 there are 440 topics and domains [4].
FarsNet: a lexical ontology for the Persian language. Each of WordNet‟s 118, 000 synsets are linked with each
FarsNet is designed to contain a Persian WordNet with about other. WordNet is unique because of its each form-meaning
10000 synsets in its first phase and grow to cover verbs' pair. Means, a synset contains a brief definition and word form
argument structures and their selection restrictions in its with several different meanings is represented in many
second phase. The semi-automatic approach used to create the different synsets.
first phase: the Persian WordNet [11].
About 17% of the words in WordNet are polysemous and
Entitypedia is used to evaluate SAM as target ontology 40% of the words are synonymous [3]. WordNet maintains the
and the 15,480 categories of YAGO that were directly mapped word by groupings noun, verb, adjective, and adverb the open-
to entity as external resource. Wikipedia was used as external class words. It is assumed that the closed-class categories of
vocabulary. Developed at the University of Trento in Italy, English around 300 prepositions, pronouns, and determiners
Entitypedia is a knowledge base with a precise split between play a vital role in every parsing system and they are given no
individuals, classes, attributes and relations and their semantic clarification in WordNet.
lexicalization as proper nouns and common nouns,
2|Page
978-1-5386-2056-4/18/$31.00©2018 IEEE
Future of Information and Communication Conference (FICC) 2018
5-6 April 2018 | Singapore
5
filosofie.unibuc.ro
6
semanticsage.blogspot.com
3|Page
978-1-5386-2056-4/18/$31.00©2018 IEEE
Future of Information and Communication Conference (FICC) 2018
5-6 April 2018 | Singapore
Semantic search is based on the context, substance, intent This system can merge words in semi-automatic approach
and concept of the searched phrase7. In the Semantic Web using JAWS and JWNL tools. Add and define new concept
community, semantic search is widely used to refer to a using our tool is shown in Fig. 5.
number of different categories of systems like, incorporates
location, synonyms of a term, current trends, word variations
and other natural language processing [6]. Semantic search
concepts are derived from various search algorithms and
methodologies, including keyword-to-concept mapping, graph
patterns and fuzzy logic.
IV. METHOD
In this paper we present a new method which is to search
for a new word or a new word‟s sense and merge that to the
related concept in WordNet. Searching process is applicable
for paragraph and sentence. We have to pick the common
nouns from those sentences and search those common nouns
in WordNet, if not exist then search the word to another
database or web source. If the word found in database or web
source then adjoins those words to the related concept in
WordNet database otherwise avoid those words as noise. The
way of our implementation:
Fig. 5. Create new concept using proposed system.
Label: When searching procedure occur by a paragraph
or 5 to 6 sentences make label from these paragraph. V. EVALUATION AND RESULT
Label means pick a word from these sentences which
reflect a major influence in that passage. To evaluate our method we have tried various ontologies.
The aim of this step is to get an estimate idea about our target
Summary: Searching with a big paragraph, make that and what we have found as result. In this process our method
paragraph in a summery version. takes approx. 2239ms to search a concept which is really very
faster than manual searching. Though manual searching is
Define new concept: Define a new concept of
easier to identify a noise but it takes a larger time
searching, selecting head and merge with WordNet
comparatively partial mapping. And also the matter of higher
database.
accuracy and low cost, partial mapping can detect more and
The identification of the head, which in turn is typically accurate head in minimal cost than manual mapping.
based on part of speech (POS) tagging, is an Thoroughly the mechanism performs properly to reduce the
approximated process with accuracy that varies noise rate of adding new concept and mapping it with
according to the tool and the dataset used to train it. WordNet database.
Some New concepts we found in this research and add A set of new word is examined which is not available in
WordNet database using our process like „Landform‟, WordNet. Table 1 describes the noise rate of suggested their
„Textpectation‟, „Futurologists‟, etc. In Fig. 4, a new concept conceptual definitions and accuracy rate of mapping the
is used, searching using our tool. concept with WordNet. Thereby we have tried 118 different
type of concept including some new concept and the result we
have found with time, cost and accuracy challenge are
compared and shown below in Fig. 6.
Landform 20 % 100 %
Futurologists 0% 100 %
Sexologists 20 % 95 %
Fig. 4. Searching new concept.
7
blogspace.com
4|Page
978-1-5386-2056-4/18/$31.00©2018 IEEE
Future of Information and Communication Conference (FICC) 2018
5-6 April 2018 | Singapore
time cost accuracy Alternative search: Add more website (domain) in our
current implemented application for alternative search
so that new concept relation can be search other
Fig. 6. Difference between manual mapping and semi-automated mapping.
domain automatically.
VI. DISCUSSION AND LIMITATION Enrich algorithm: To improve head concept list, we
The principal purpose of the study is to improve an have to enrich probable text parsing regular expression
existing ontology and materialize a new mechanism which and NLP algorithm.
provides high-ranking accuracy with least noise rate. Again Alignment of verbs and nouns: The word which makes
the study adjoins the missing concepts to ontology to improve a huge different meaning from noun to verb, make
the knowledge base with semi-automated approach. these word align.
With this semi-automated approach; the WordNet Word Sense Alignment: Word Sense Alignment can be
Database build accurate meaning of a newborn word and defined as the identification of pairs of senses from two
makes the relationship with other knowledge base. The (or more) lexical-semantic resources which denote the
extracted advantages from manual and automated approach same meaning, e.g.
makes this semi-automated approach more effective for
enriching ontology. Acquiring high accuracy and minimum o Be enamored or in love with.
noise rate the implemented work also clearly distinguish this o Feel love or affection for some one.
approach with manual and automated approach to enrich
ontology. Definition of alignment senses: The definitions
describe equivalent meanings allowing the alignment
The study holds sundry limitations. At the context of
of the two senses.
particularly unformatted articles; the existing tools that
perform the parsing and normalizing task from numerous ACKNOWLEDGMENT
sources are still not adequate to achieve absolute accuracy.
Hence constructing the exact definition of a newborn word First of all we would like to show our gratitude to the
from this mechanism occasionally required human Almighty, who gave us the effort to work on this project. We
proficiency. want to thanks our honorable supervisor Bayzid Ashik
Hossain for guiding us. His profound knowledge in this field,
Like, when we got some new concept, we searched that keen interest, patience and continuous support lead to the
concept related information in Wikipedia to normalize (using completion of our work. His instructions have contributed
NLP) that concept from web content and built a relational tree greatly in every aspect of the thesis.
with other concept. But sometimes we didn‟t get that target
REFERENCES
concept in Wikipedia, in that case we searched that content in
[1] D. Faria, C. Pesquita, E. Santos, I. F. Cruz and F. M. Couto, "Automatic
other domain manually. As our current implemented Background Knowledge Selection for Matching Biomedical
application search targeted concept in Wikipedia (domain) Ontologies," 2014.
only, it search related information only Wikipedia. [2] D. J. Berndt, J. A. McCart and S. L. Luther, "Using Ontology Network
Structure in Text Mining," pp. 41-45, 2010.
When we were preparing Head concept list for targeted
[3] V. Maltese and B. A. Hossain, "SAM: A TOOL FOR THE
new concept sentences, sometimes it added some head that‟s SEMIAUTOMATIC MAPPING AND ENRICHMENT OF
were partially related to the new concept. ONTOLOGIES," 2012.
[4] S. Gella, C. Strapparava and V. Nastase, "MappingWordNet
VII. CONCLUSION Domains,WordNet Topics andWikipedia Categories to Generate
World is getting easier day by day. It is difficult to update, Multilingual Domain Specific Resources," 2014.
develop and maintain manmade ontologies like WordNet. So [5] L. D. Caro and G. Boella, "Automatic Enrichment of WordNet with
Common-Sense Knowledge," 2016.
5|Page
978-1-5386-2056-4/18/$31.00©2018 IEEE
Future of Information and Communication Conference (FICC) 2018
5-6 April 2018 | Singapore
[6] K. Elbedweihy, S. N. Wrigley, F. Ciravegna, D. Reinhard and a. A. [9] G. Varelas, E. Voutsakis, P. Raftopulou, E. G. Petrakis and E. E. Milios,
Bernstein, "Evaluating Semantic Search Systems to Identify Future "Semantic similarity methods in wordNet and their application to
Directions of Research," 2012. information retrieval on the web," 2005.
[7] N. Choi, I. Song and H. Han, "A survey on ontology mapping," vol. 35, [10] Y. Lei, V. Uren and E. Motta, "SemSearch: A Search Engine for the
pp. 34-41, 2006. Semantic Web," 2016.
[8] M.GaetaF, OrciuoliP and Ritrovato, "Advanced ontology management [11] M. Shamsfard, A. Hesabi, H. Fadaei, N. Mansoory, A. Famian, S.
system for personalized e-Learning," vol. 22, no. 4, pp. 292-301, 2009. Bagherbeigi, E. Fekri, M. Monshizadeh and S. M. Assi, "Semi
Automatic Development of FarsNet; The Persian WordNet," 2010.
6|Page
978-1-5386-2056-4/18/$31.00©2018 IEEE