Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
DataHub
documents is frequently
scarce and
sometimes anecdotal. Descriptions in the LoD
cloud presents shallow expressivity
Licensing and open
initiatives
Bizer et al. (2009),
Strasunskas and Tomassen
(2010)
The semantic web community prefer open
standards, like OWL or RDFS, than
alternatives with proprietary encoding format
or results of open academic experiments
Semantic linking Jain et al. (2010), Palacios
(2010)
The high number of description resources
causes scalability problems (linking
architecture problems due to one-by-one
mappings), overlapping in vocabularies. The
LoD Cloud datasets lack schema level
mappings between concepts of different
datasets at the schema level
Obsolescence Milicic (2011), Bizer et al.
(2009)
Updating or removing semantic resources
from the web. Link maintenance is poor
Trustworthiness Morato et al. (2007),
Bechhofer et al. (2010),
Bizer et al. (2009)
Publishing requirements as absence of
authoring information, quality, credit,
attribution are scarcely implemented.
Additional problems related with
advertisements and semantic spam in
vocabulary building and metadata description
Formalization Bikakis et al. (2013), Bizer
et al. (2009), Lanthaler and
Gutl (2012), Milicic (2011)
Variety of technologies to represent
knowledge (e.g. RDF, XML, UML, TEI, topic
maps, TCML, microformats, KIF, common
logic) and heterogeneity of linking. formats in
the semantic web: RDF, JSON linked data,
Xlink, Hypernotation. Semantics in XML
Schema is informal and a closed world
assumption
URI problems Milicic (2011) Problems to assign URIs to b-nodes and RDF
molecules
Difculties to querying
SPARQL
Jain et al. (2010) Users to specify the details of the structure of
the graph and be familiar con multiple
datasets
Privacy problems Bizer et al. (2009) Privacy problems caused by integrating data
from distinct sources
Vocabulary suitability
and adaptation
Palacios (2010), Mangold
(2007)
The suitability of a vocabulary is dened on
the basis of low or tight coupling. There is a
lack of statistic data to help the selection of
vocabularies
Usability Morato et al. (2007), Uren
et al. (2007)
Usability of current systems in the semantic
web
Table I.
Problems identied on
the semantic web
according to the literature
Semantic
retrieval systems
643
1.2 Semantic retrieval
1.2.1 Semantic search. According to Wei et al. (2008), in this work semantic search
refers to the retrieval of resources described for knowledge modeling and the usage of
logic-based knowledge representation languages for automated machine processing.
The term semantic search on the web is currently a buzzword with different
interpretations (Batzios and Mitkas, 2012). Traditionally, it includes techniques that
address the improvement of accuracy of searches (Fazzinga and Lukasiewicz, 2010,
Girit et al., 2012, Guha et al., 2003): disambiguation and contextualization of queries,
questions to semantic and semantically annotated documents, faceted search,
question-answering, query formalization or searches by similarity. In general, the main
feature of semantic search engines is to be able to solve complex queries by giving an
answer to a query than to offer us a set of documents where we could nd that answer.
1.2.2. Semantic retrieval systems. In the context of the semantic web, the concept of
information retrieval systems is rather generic and vague. It encompasses different
criteria. Scheir et al. (2007) propose the following classication:
.
the system operates on the semantic web with machine-interpretable data;
.
the systems is based on technology for the semantic web and ontology-driven
information retrieval approaches; and
.
the systems perform information retrieval and not data retrieval based on query
languages as SPARQL.
Among the rst search engines to appear were SHOE (Mangold, 2007) and On2broker.
On2broker (Fensel et al., 1999) had the objective of retrieving XML and RDF
documents, as well as vocabularies like MPEG-7 o Dublin Core. Since then, a number of
search engines have been presented: WebOWL (Batzios and Mitkas, 2012), Swoogle
(Ding et al., 2005a), XSearch (Amer-yahia and Lalmas, 2006), SWSE (Hogan et al., 2011),
Sindice, SemSearch or Watson. Most of these search engines are based on RDF or OWL
documents, as for example Swoogle, Falcons and Watson. The semantic results are
RDF documents (for example ontologies and ontology instances).
Some web retrieval systems extend even more the document typology; SWSE
transforms XML and HTML documents to RDF for subsequent indexing. Sindice, in
addition to RDF, includes microformats, RDFa and Microdata. Watson focuses on RDF
and OWL, but it includes other ontology languages like DAML-OIL.
Regarding the positioning of results, it is usually based on solutions similar to
Google (e.g. Swoogle, WebOWL and SWSE), but others are like Falcons utilize
variations of TF-IDF and popularity. XSearch is based on XML data, returning the part
of the XML tree structure that coincides with the search. A review of XML based
systems can be found in Amer-Yahia and Lalmas (2006).
Another type of information retrieval systems that search semantic web resources
as metadata vocabularies are directories. They are considered as a simple retrieval
system because searching for results is realized through the navigation of a tree
hierarchy that contains the resources. These directories are often not included in many
studies about semantic search; however we consider that it is a relevant resource on
semantic web environment.
1.2.3 Previous works in evaluation of semantic retrieval systems. Initial works on
evaluation of semantic search engines were mainly focused on query performance
(Tumer et al., 2009; Andago et al., 2010). Many of these studies compare general
LHT
31,4
644
purpose search engines to those that extract semantic knowledge from natural
language texts (for instance, Hakia) by means of a knowledge organization system.
These studies identied and analyzed common elements for the comparison of the two
categories of search engines (general purpose and semantic search). The results show
an advantage of general purpose search engines. These results are different when
structured and formalized documents, such as RDF, are taken into account.
The criteria to evaluate these semantic search engines are not based just in
query performance (Strasunskas and Tomassen, 2010). These authors state that a
rigorous comparison must take into account factors such as: query and ontology
quality, user interaction, semantic indexing criteria, query expansion, ltering,
ranking methods and presentation of results. Therefore, they propose a
classication framework (Table II) that comprises seven categories. All of them
are based on previous works to classify semantic search engines (Esmaili and
Abolhassani, 2006; Mangold, 2007).
Mangold (2007) carries out a classication of semantic search approaches. This
study analyzes ten systems according to seven criteria, as shown in Table II. Some of
Mangolds criteria are dependent on each other. Although the author recognizes that
there are other possible characteristics, they are not included in that study because the
purpose of that work was to focus on characteristics that most authors regard as
relevant. In the ontology structure, three types are analyzed: anonymous properties
(the only aspect presented is a shared context); standard properties: the common
thesaurus relationships (synonym, hypernym, meronym, instance), in addition to
negation; and Domain specic properties. In the case of Uren et al. (2007), the authors
identify four characteristics for classifying retrieval systems, none of which is related
to ontology quality criteria.
Mangold (2007) Strasunskas and Tomassen
(2010)
Uren et al. (2007)
Architecture Architecture Search environment: large
scale, heterogeneity and
portability
User context (user&s
information needs)
Search goal (question
answering, ontologies, data)
Query types
Query modication Search phase Iterative and exploratory
dimensions: renement,
recommendation and reuse
Transparency (transparent/
interactive)
User input (keywords, natural
language, graphics, formal
query or interactive)
Intrinsic problems:
Understanding, result ranking
and matching
Ontology structure Knowledge richness (taxonomy,
thesaurus, ontology)
Ontology technology Ontology encoding (RDFS,
OWL, . . .)
Coupling (ontology-documents
tight/low)
Scope (Web, desktop)
Table II.
Criteria for evaluating
semantic retrieval
systems
Semantic
retrieval systems
645
As it can be observed, some of the evaluation criteria, such as user context or
search goal, take into account the type of semantic search. Different works show
different types of semantic search. Wei et al. (2008) classify semantic search
research with respect to objectives, methodologies and functionalities:
document-oriented search; entity and knowledge-oriented search; multimedia
information search; relation-centered search; semantic analytics; mining-based
search. Fazzinga and Lukasiewicz (2010) points out that the evaluation of the
accuracy of a system must be dependent on its search capacity. The proposals of
Uren et al. (2007) and Strasunskas and Tomassen (2010) reduce the typology
proposed by Wei et al. (2008). The work of Strasunskas and Tomassen (2010) states
that standard IR metrics as recall and precision are not enough to measure user
satisfaction because of the complexity and the effort needed to use semantic search
tools. Therefore these authors suggest a holistic evaluation that includes system
quality, ontology quality, query quality, topic complexity and user interaction.
Table III arrays the types of semantic search, as they are presented in the above
publications.
Hence, there is a need to establish criteria to evaluate semantic search engines.
Many of the earlier studies just describe the functionalities of these search engines, but
there is still a need to provide a mechanism to facilitate the comparison in a similar
way to query performance metrics in classical retrieval.
2. Evaluation method
A summary of some problems identied on the semantic web are shown in Table I. As
we observe, all characteristics are qualitative and therefore difcult to measure with
classical information retrieval evaluation methods. In this section, we propose a
method to deal with some criteria scarcely analyzed in previous studies. Next, we have
selected the Desmet method (Kitchenham, 1996) in order to analyze and evaluate
different types of semantic web retrieval systems (directories and search engines), with
respect to their ability to manage and retrieve semantic documents. The goal is to
clarify if these semantic system types are implementing the requirements that are
discussed in prior studies and if they deal with the current problems found in the
semantic web.
Wei et al. (2008) Uren et al. (2007)
Strasunskas and
Tomassen (2010)
Fazzinga and
Lukasiewicz (2010)
Document-oriented
search
Entity search Information search Structured languages
Entity and knowledge-
oriented search
Relation search Data search Keyword-based
Multimedia information
search
Parameterized (faceted)
search
Question Answering Natural languages
Relation-centered
search
Ontology retrieval
Semantic analytics
Mining-based search
Table III.
Types of semantic search
LHT
31,4
646
DESMET is a comparative method for performing simple, reliable and impartial
evaluations in software engineering, such as requirement analysis. This method is
intended to help an evaluator in an evaluation exercise that is unbiased and reliable
(e.g. maximizes the chance of identifying the best method/tool). The DESMET method
is context-dependent, which means that we do not expect a specic tool to be the best in
all circumstances. Thus, in this work we do not intend to determine the best retrieval
system but to offer a way to select one semantic system type or another according to
the context. We consider that the method is adequate because the main evaluation
criteria are functionalities difcult to measure in the same way that classical retrieval
systems do. Besides, these web retrieval systems are always evolving, so we suggest
methods capable to be adapted to functionality modications. This method enables a
qualitative evaluation of the level of support that various systems provide to the
organization and the retrieval of semantic elements.
Following the steps of the DESMET method, rst we have identied the specic
circumstances for a context to retrieve ontologies and metadata vocabularies about a
specic subject. Second, we have performed a feature analysis, which essentially is an
evaluation based on the identication of requirements and their correspondence to the
characteristics that these specications support. Finally, we have dened the retrieval
systems to be evaluated, the criteria to evaluate them and assigned the values and
prioritization degree according to DESMET method.
2.1 Selecting retrieval systems of semantic documents
We have collected 12 semantic retrieval systems. We have found that retrieval systems
are different according to kinds and functionalities. In consequence, we have classied
retrieval systems in four types of semantic search engines, in order to provide a
comparison framework where we can analyze the results by groups. We propose the
following classication by types of retrieval systems and types of document that they
search:
.
Ontology search engines. These applications crawl the web discovering semantic
web documents. The search engine indexes the ontologies in order to retrieve and
rank the results. Examples are Swoogle (http://swoogle.umbc.edu/), Sindice
(http://sindice.com/), or Watson (http://watson.kmi.open.ac.uk/WatsonWUI/).
.
Search engines for metadata. A search engine aimed to retrieve metadata, as for
example the Linked Open Vocabulary (LOV) (http://lov.okfn.org/dataset/lov/
index.html) and the DataHub (http://datahub.io/ http://datahub.io/).
.
Ontology directories. Ontology catalogues collected by hand. Examples: DAML
Ontology Library (www.daml.org/ontologies/) and Protege Ontologies (http://
protegewiki.stanford.edu/wiki/Protege_Ontology_Library).
.
Metadata directories. Metadata catalogues, such as UKOLN metadata resource
(www.ukoln.ac.uk/metadata/resources/), Topic Maps PSIs (http://psi.
mchapman.com/vl/index), RDA vocabulary (http://rdvocab.info/) and the Open
Metadata Registry (http://metadataregistry.org/vocabulary/list.html).
We have avoided some kinds of search engines such as question-answering and
chatbots due to the fact that their technology is based on information extraction
instead of metadata description and their KOSs are not public. Although they interact
Semantic
retrieval systems
647
with a human user, they do not necessarily retrieve semantic documents, but instead
they utilize semantic resources as a natural language processing technique for user
interaction purposes.
2.2 Evaluation criteria for retrieval systems of semantic documents
Tables IV-VI present the set of criteria that we have dened for evaluating the
resources. These characteristics have been selected and rened from the previous
literature and classied in three types of criteria associated to each characteristic:
(1) Schema management. The related criteria are: interoperability, formalization,
interactivity and semantic framework (Table IV).
(2) Semantic management. Related to the meaning of concepts and their
management; related criteria are: disambiguation, multilingualism, synonyms,
scope, extensibility, reusability, modiability, and language (Table V).
(3) Queries. Concerning the query process and the management of the obtained
results. This category copes with sense specication, conceptual queries,
contextual queries and document retrieval (Table VI).
Following the Desmet method, we establish two types of features: simple and
compound. The simple characteristics are those that can be present or absent and can
be assessed using a Boolean scale. The compound characteristics get the degree to
which they are supported and quantied in an ordinal scale. The characteristics are
identied and prioritized, and we establish the system for the assessment of the
characteristics, with respect to their type and importance:
.
Simple types: No (0) and Yes (5).
.
Compound types: None (0), Low (1), Medium (3), High/Fundamental (5).
.
Importance: Optional (3), Desirable (6) and Obligatory (10).
3. Results
In the evaluation process of the different methods for the retrieval of semantic
documents, we have assigned one value to each of the characteristics. None of the
retrieval systems supports the characteristic of formalization, using the schema as it is
dened by the entity that is responsible for its creation and maintenance. This aspect
also determines that none of the resources copes with the ambiguity that exists in the
syntactic and semantic representation to, disambiguate for each concept and property
of the candidates that are included in the schema. We have neither found the
characteristics of multilingualism nor sense specication.
Characteristic Importance/type Description
Interoperability Obligatory/simple Possibility to establish relationships between concepts of
different schemas
Formalization Obligatory/simple Possibility to realize or improve the formalization of a
schema, regarding the management process
Interactivity Desirable/compound Possibility that the user participates actively in the Schema
Management, in accordance to the Web 2.0 guidelines
Table IV.
Characteristics of schema
management for the
evaluation of systems for
semantic retrieval
LHT
31,4
648
In this study, we have obtained criteria to be considered in a semantic retrieval system
instead of answering what system obtains the best results, because in this context
systems are constantly evolving.
3.1 Results of schema management
With respect to interoperability, we have observed that metadata directories do not
support this characteristic. The metadata registries do incorporate one-to-one
Characteristic Importance/type Description
Disambiguation Obligatory/simple Possibility ability to eliminate structural and
semantic ambiguity of concepts, in order to facilitate
the conceptual retrieval
Semantic framework Obligatory/compound The scope in which the semantic and the conceptual
retrieval of concepts is managed. The possible values
are: None (0), Local, in the schema (1), Local, with
relationships between schemas (3), Global, between
schemas that use a shared resource (e.g. An
ontology) (5)
Multilingualism Desirable/simple Possibility to support multiple languages
Synonymy Obligatory/compound Possibility to solve problems that arise from
different concepts with the same meaning
Scope Obligatory/simple The domain in which the semantics of the schemas
to be managed are dened. It can be either
homogeneous or heterogeneous
Extensibility Desirable/compound Possibility to expand the representation of the
schema semantics
Reusability Desirable/compound Possibility to reuse the representation of the schema
semantics
Modiability Desirable/compound Possibility to modify the representation of the
schema semantics
Language Optional/compound Possibility to represent the language that is use in
the formalization of the semantic
Table V.
Characteristics of
semantic management for
the evaluation of systems
for semantic retrieval
Characteristic Importance/type Description
Sense specication Obligatory/simple Possibility to express the concrete meaning of a
concept in the query process
Conceptual query Obligatory/compound Possibility to perform queries, according to the
meaning of the concepts
Contextual query Obligatory/compound Possibility to obtain results that derive from the
existing relationships between concepts
Document retrieval Optional/simple Chance to obtain semantic documents that derive
from schemas, as well as the schemas themselves
Table VI.
Characteristics of queries
for the evaluation of
systems for semantic
retrieval
Semantic
retrieval systems
649
relationships between schemas. Some special cases of ontology engines, such as
Watson, analyze relationships between concepts.
Metadata registries and ontology directories often provide extra functionality to the
users so that they can incorporate new resources to the system. Metadata directories,
similar to the ontology engines, are usually closed to user interventions, except for the
query processes.
3.2 Results of semantic management
With respect to the semantic framework, metadata directories do not use the semantics
associated to the concept; rather they only use the description tokens. In contrast to the
metadata engines, ontology engines and ontology directories utilize the semantic that
is local to the schema, including relationships with other schemas. Likewise, only these
categories present the characteristics of language and modiability. The schema
denition language that they use is either XML or RDF. On the other hand, the
correspondence between schemas and the semantic representation model is a
one-to-one relationship, which implies the revision and update of all correspondences.
With respect to synonymy, we have not detected it in metadata directories.
However, we consider it partially covered in the rest of typologies, because they
support the denition of one-to-one correspondences between concepts.
The scope of retrieval systems is wide and heterogeneous. As an example in the
case of metadata engines, LOV works with 322 vocabulary spaces. This resource
includes statistics such as LOV distribution, LOV popularity and LOD popularity. The
DataHub also includes ratings, but they are scarcely implemented.
The reusability, dened as the ability to reuse the representation of the schema
semantics, is applied only by the ontology directories through the publication of
one-to-one alignments for their possible reuse.
3.3 Queries
Concerning the query process and the management of the obtained results, we analyze
features such as Sense specication, conceptual and contextual queries, and document
retrieval. From the point-of-view of semantic retrieval, differentiating between
polysemic meanings, we have not detected in any of the categories the possibility to
search the concrete meaning.
Regarding conceptual queries, metadata directories base the retrieval process to the
syntactic search of the labels and attributes. In contrast, metadata engines, ontology
engines and ontology directories extend the searching by including meanings and
relationships between concepts in a generic environment, at the time that they permit
the establishment of a concrete semantic for the concept to be retrieved.
The possibility to extend queries with relationship concepts is present in metadata
engines, ontology engines and directories. However, metadata directories do not extend
the results of concepts through their relationships.
Finally, with respect to document retrieval, schema, metadata and ontology
directories only permit schema retrieval, while the corresponding engines permit the
retrieval of documents that are instances of these schemas.
For each characteristic, we have obtained the product of the assigned value by the
factor of importance. Once the weighted values of each system are calculated, we
calculate the aggregate percentages for each category, in order to facilitate their
LHT
31,4
650
interpretation. More specically, for each of the dened categories (Schema
management, Semantic management and Query), we have summed the value of
their characteristics and we have calculated the percentage of the above-mentioned
sum over the maximum possible value, which would correspond to 100 percent. In
Figure 1, we present the results that correspond to the evaluation of each method in
percentage and grouped by category.
In the schema management category, the metadata search engines and the ontology
search engines and directories obtain the highest results (43.1). The ontology
directories obtain this result mainly because they promote the participation of the user
and support the denition of relationships between schema elements. The ontology
search engines are positioned just below them due to their lesser ability of interactivity
with the user. The rest of the methods obtain noticeably lower values, as a result of the
lack of support to the management of correspondences between elements, as well as a
lesser degree of interactivity with the user.
For the semantic management category (Figure 1), the ontology directories and the
ontology search engines obtain the best results (53.4). In this case, they highlight the
management of relationships between concepts; their application scope, heterogeneous
with respect to the knowledge domain; the modiability of the solution and the
semantic representation language employed. The decrease of the values for metadata
search engines (43.3) is caused by the fact that these engines deal with a more restricted
scope, as well as the use of languages with less semantic expressivity for
representation. The metadata directories obtain the lowest value (6.1), a fact that can be
attributed mainly to the restricted nature of the application environment.
In the Query category (Figure 1), the ontology search engines and the metadata search
engines obtain the best results (57.6), mainly due to their ability to perform contextualized
conceptual queries, as well as the possibility to obtain semantic documents. The next
value corresponds to the ontology directories (36.4). The decrease of their score is due to
the impossibility to obtain documents that are associated to the schemas. The decrease of
the rest of the values is caused by the absence of contextualization of the results and the
local use of the schema semantics. As a result, the previous points cause the schema
directories and the metadata directories to get lower values.
The overall results of the evaluation of the methods (Figure 1) show that the
ontology directories achieve a good score (46.8), resulting from a positive evaluation
Figure 1.
Results of the evaluation
of each system grouped
by category
Semantic
retrieval systems
651
regarding the schema management and semantic management categories. Proceeding
in descending order, the ontology search engines (52.4) owe this result to the positive
evaluation of the query and semantic management methods. In the case of metadata
search engines (47.0), the obtained assessment arises from the tradeoff between the
semantic management and a good schema and query management. In the last place, we
nd the metadata directories (7.1), due to shortcomings in the support of all the
evaluated categories.
4. Discussion
In this work, the denition of a semantic document is extended to other schemas and
codications that contain a semantic description of document content. Standards like
topic maps or OWL can be represented with XML Schema and without the use of RDF.
Rigorous studies of this eld must not be limited to retrieval, maintenance and storage
of RDF documents only. Our main motivation is that this type of semantic documents
constitutes a key issue for the semantic description of other resources. Since these
vocabularies are considered as semantic documents, they must be retrievable by a
semantic search engine. Nevertheless, it is true that the semantic web community
prefers open standards, like OWL or RDFS, than alternatives with proprietary
encoding format or results of open academic experiments (Strasunskas and Tomassen,
2010).
Wei et al. (2008) has stressed the need to develop a formalized semantic search
framework. We believe that a desirable characteristic of semantic search is to
integrate this framework in the semantic web evaluation procedures. As an example
of the challenges that arise on the semantic web, we can take a closer look to the
linked data proposals and its element sets and value vocabularies. The rst problem
that arises is the linking architecture approach. There are scalability problems when
connecting resources using one-to-one mappings between vocabularies. In fact if we
compute the potential number of alignments in the whole set of vocabularies taking
2 at a time, we will have n! divided by 2!
*
(n-2)!, where n is the number of
vocabularies. The W3C Library Linked Data Incubator Group has undertaken great
effort of collecting and classifying value vocabularies and metadata element sets to
decrease these possibilities, but it is a long-term project. Another solution is a
unique central resource, which would connect to the rest of vocabularies, would
result into n-1 mappings between all possible concepts. But updating problems will
still remain: what effects have updates in vocabularys hierarchy or corrections in
descriptions due to ambiguities? Besides, we have taken into account that values for
the elements can been drawn not just from values vocabularies but even from free
text. Finally, there are difculties to cope with identifying and adapting
vocabularies. There is a need to identify and adapt the vocabularies, selecting the
most appropriate among the candidates and leaving open the possibility of adapting
it to the resource without modifying the original vocabulary, thus reducing possible
ambiguities.
Improving the process of identifying the essential functionalities, such as usability,
in the implementation of a semantic retrieval system is a critical point for popularizing
these resources (Morato et al., 2012). Interaction with a larger set of user is an essential
element that will help the semantic web and linked data technologies to achieve an
even greater degree of potential.
LHT
31,4
652
5. Conclusions
We have performed an evaluation of methods for semantic documents retrieval. The
results of the evaluation indicate that, at the moment, many of the resources of
semantic document retrieval lack the minimum of functionality in order to popularize
their use. Some of the difculties can be identied in ambiguity and lack of
formalization of resource descriptions, difculties in usability and operation, isolation
of datasets that impede more exhaustive searches and the difculty to carry out
conceptual searches and navigation.
As it has been shown in the analysis only the ontology directories get hardly over 50
percent in the evaluation. In our judgment, the current methods need to cover
characteristics that are essential for the management of semantic documents. Such
characteristics may include the formalization of the documents, their disambiguation,
multiple language support and the semantic coverage of queries.
There are many problems that are difcult to measure. If we observe metadata
vocabularies, we realize that selecting the right vocabularies is a tough task due to the
large number of vocabularies in the cloud. The absence of URIs, the low usability and
the lack of consensus between overlapping vocabularies, are difculties that we have
to overcome to facilitate the access of users to semantic web resources.
In this study we have proposed a mechanism to facilitate the comparison in a
similar way to query performance metrics in classical retrieval. Previous studies have
emphasized a descriptive approach to evaluate semantic search engines. We propose
an approach that gives weight to each evaluation criteria facilitating the comparison in
the future.
As a work in progress, we are studying how to identify criteria related with
trustworthiness and link quality. Although it is noticeable that some search engines
have included some statistics to guide the user in the selection of a vocabulary, there is
a lack of studies showing the real importance of this data in the user behavior.
References
Allemang, D. and Hendler, J. (2011), RDF-The basis of the Semantic Web, Semantic Web for the
Working Ontologist, 2nd ed., Morgan Kaufmann, Burlington, MA, pp. 27-50.
Amer-yahia, S. and Lalmas, M. (2006), XML Search: languages, Inex and Scoring, SIGMOD
Rec., Vol. 36 No. 7, pp. 16-23.
Andago, M.O., Phoebe, T. and Thanoun, B.A.M. (2010), Evaluation of a semantic search engine
against a keyword search engine using rst 20 precision, Intern. Journal for the Advanc.
of Science & Arts, Vol. 1 No. 2, pp. 55-63.
Batzios, A. and Mitkas, P.A. (2012), WebOWL: A semantic web search engine development
experiment, Expert Systems with Applications, Vol. 39 No. 5, pp. 5052-5060.
Bechhofer, S., Ainsworth, J., Bhagat, J., Buchan, I., Couch, P., Cruickshank, D., Deldereld, M.,
Dunlop, I., Gamble, M., Goble, C., Michaelides, D., Missier, P., Owen, S., Newman, D.,
De Roure, D. and Su, S. (2010), Why linked data is not enough for scientists,
IEEE International Conference on eScience, IEEE Sixth International Conference on
e-Science, pp. 300-307.
Bikakis, N., Tsinaraki, C., Gioldasis, N., Stavrakantonakis, I. and Christodoulakis, S. (2013),
The XML and sematic web worlds: technologies, interoperability and integration: a
survey of the state of the art, in Anagnostopoulos, I.E. (Ed.), Sematic Hyper/Mutlimedia
Adaptation, Vol. 418, Springer, Berlin, pp. 319-360.
Semantic
retrieval systems
653
Bizer, C., Heath, T. and Berners-Lee, T. (2009), Linked data the story so far, International
Journal on Semantic Web and Information Systems, Vol. 5 No. 3, pp. 1-22.
Buscaldi, D., Guerrini, G., Mesiti, M. and Rosso, P. (2003), Tag semantics for the retrieval of
XML documents, Proceedings of the 1st International Symposium on Information and
Communication Technologies, 24-26 September, Dublin, Ireland, Trinity College Dublin,
Dublin, pp. 273-278.
Ding, L., Finin, T., Peng, Y., Pinheiro da Silva, P. and McGuinness, D. (2005a), Tracking RDF
Graph Provenance using RDF Molecules, report TR-CS-05-06, Computer Science and
Electrical Engineering, University of Maryland, Baltimore County, April 30.
Ding, L., Pan, R., Finin, T., Joshi, A., Peng, Y. and Kolari, P. (2005), Finding and ranking
knowledge on the semantic web, Proceedings of the 4th International Semantic Web
Conference.
Esmaili, K.S. and Abolhassani, H. (2006), A categorization scheme for semantic web search
engine, AICCSA 06 Proceedings of the IEEE International Conference on Computer
Systems and Applications, IEEE Computer Society, Washington DC, pp. 171-178.
Fazzinga, B. and Lukasiewicz, T. (2010), Semantic search on the web, Semantic Web, Vol. 1,
pp. 89-96.
Fensel, D., Angele, J., Decker, S., Erdmann, M., Schnurr, H.-P., Staab, S., Studer, R. and Witt, A.
(1999), On2broker: Semantic-Based Access to Information Sources at the WWW,
Proc.WWW Conf., Internet (WebNet 99), Honolulu, 25-30 October, pp. 25-30.
Fuentes, D. and Mej a, A. (2013), Cycle management in semantic similarity among Wikipedias
concepts, degree thesis, University Carlos III.
Girit, H., Eberhard, R., Michelberger, B. and Mutschler, B. (2012), On the precision of search
engines: results from a controlled experiment, 15th Int. Conf. on Business Information
Systems (BIS 2012), pp. 260-271.
Guha, R., McCool, R. and Miller, E. (2003), Semantic search, WWW2003, Budapest, pp. 700-709.
Hogan, A., Harth, A., Umbrich, J., Kinsella, S., Polleres, A. and Decker, S. (2011), Searching and
browsing linked data with SWSE: the semantic web search engine, Web Semantics:
Science, Services and Agents on the World Wide Web, Vol. 9 No. 4, pp. 365-401.
Howarth, L.C. (2000), Creating a metadata-enabled framework for resource, available at: www.
cais-acsi.ca/proceedings/2000/howarth_2000.pdf (accesed January 2013).
Jain, P., Hitzler, P., Yeh, P.Z., Verma, K. and Sheth, A.P. (2010), Linked data is merely more
data, in Brickley, D., Chaudhri, V.K., Halpin, H. and McGuinness, D. (Eds), Linked Data
Meets Articial Intelligence. Tech. Rep. SS-10-07, AAAI Press, Menlo Park, CA, pp. 82-86.
Kitchenham, B. (1996), DESMET: a method for evaluating software engineering methods and
tools, available at: www.osel.co.uk/desmet.pdf (accesed June 2012).
Kobilarov, G., Scott, T., Raimond, Y., Oliver, S., Sizemore, C., Smethurst, M. and Lee, R. (2009),
Media meets semantic web - how the BBC uses DBpedia and linked data to make
conections, European Semantic Web Conf. Semantic Web in Use Track, Crete.
Lanthaler, M. and Gutl, C. (2012), On using JSON-LD to create evolvable RESTful services,
Proc.3rd Internat. Workshop on RESTful Design (WS-REST 2012) at WWW2012,
ACM Press, Lyon, pp. 25-32.
Mangold, C. (2007), A survey and classication of semantic search approaches, Int. J. Metadata
Semant. Ontologies, Vol. 2 No. 1, pp. 23-34.
LHT
31,4
654
Milicic, V. (2011), Introducing hypernotation an alternative to linked data available at: http://
milicicvuk.com/blog/category/hypernotation/ (accessed Ago 2013).
Morato, J., Sanchez-Cuadrado, S., Fraga, A. and Moreno Pelayo, V. (2007), Towards a social
semantic web environment, El profesional de la Informacion, Vol. 17 No. 1, pp. 78-85.
Morato, J., Fraga, A., Andreadakis, Y. and Sanchez-Cuadrado, S. (2012), Eight steps towards the
socialisation of the semantic web, International Journal of Social and Humanistic
Computing, Vol. 1 No. 4, pp. 347-362.
Nesic, S. (2010), Semantic document architecture for desktop data integration and
management, PhD thesis, Univ. Svizzera Italiana.
Palacios, V. (2010), Sistema de recuperacion conceptual mediante niveles semanticos en la
representacion de esquemas de metadatos,[Conceptual retrieval system using semantic
levels in the representation of metadata schemas] PhD Carlos III University, available at:
http://hdl.handle.net/10016/9332 (accesed June 2012).
Pastor-Sanchez, J.A., Mart nez-Mendez, F. and Rodr guez-Munoz, J.V. (2012), SKOS application
for interoperability of controlled vocabularies in the eld of linked open data,
El Profesional de la Informacion, Vol. 21 No. 3, pp. 245-253.
Research Group Data and Web Science (University of Mannheim) (n.d.), Datasets in the next
LOD Cloud, available at: http://wifo5-03.informatik.uni-mannheim.de/lodcloud/ (accessed
January 2013).
Strasunskas, D. and Tomassen, S.L. (2010), On variety of semantic search systems and their
evaluation methods, Proceedings of the International Conference on Information
Management and Evaluation, Academic Conferences Publishing, Cape Town, pp. 380-387.
Scheir, P., Pammer, V. and Lindstaedt, S.N. (2007), Information retrieval on the semantic web
does it exist?, Proceedings of Lernen-Wissen-Adaption (LWA) 2007, pp. 252-257.
Tumer, D., Shah, M.A. and Bitirim, Y. (2009), An empirical evaluation on semantic search
performance of keyword-based and semantic search engines: Google, Yahoo, Msn and
Hakia, 2009, 4th International Conference on Internet Monitoring and Protection (ICIMP
09), available at: http://doi.ieeecomputersociety.org/10.1109/ICIMP.2009.16 (accessed
Ago. 2013).
Uren, V., Lei, Y., Lopez, V., Liu, H., Motta, E. and Giordanino, M. (2007), The usability of
semantic search tools: a review, The Knowledge Engineering Review, Vol. 22 No. 4,
pp. 361-377.
W3C (2007), Semantic Annotations for WSDL and XML Schema, Farrell, J., Lausen, H. W3C
Recommendation 28 August 2007, available at: wwww.w3.org/TR/sawsdl/ (accesed
January 2013).
W3C (2010), XML Linking Language (XLink) Version 1.1. DeRose, S., Maler, E., Orchard,
D. Walsh, N. W3C Recommendation 06 May 2010, available at: www.w3.org/TR/xlink11/
(accesed January 2013).
W3C (2011), Library Linked Data Incubator Group Final Report, Baker, T. et al. W3C Incubator
Group Report 25 October 2011, available at: www.w3.org/2005/Incubator/lld/XGR-lld-
20111025/ (accesed January 2013).
W3C (2012), OWL 2 Web Ontology Language XML Serialization (second edition), Motik,
B. Parsia., B., and Patel-Schneider P.F. W3C Recommendation 11 December 2012, available
at: www.w3.org/TR/owl-xml-serialization (accessed January 2013).
W3C (2013), Ontologies available at: www.w3.org/standards/semanticweb/ontology/ (accessed
January 2013).
Semantic
retrieval systems
655
Waitelonis, J. and Sack, H. (2009), Augmenting video search with linked open data2009),
2-4 September, Graz, Proceedings of International Conference on Semantic Systems 2009
(i-semantics, Verlag der TU Graz, Austria.
Wei, W., Barnaghi, P.M. and Bargiela, A. (2008), Search with meanings: an overview of semantic
search systems, Int. J. Commun. SIWN, No. 3, pp. 76-82.
About the authors
Jorge Luis Morato is currently a Professor of Information Science in the Department of
Informatics at the Carlos III University of Madrid (Spain). In 1999, he received his PhD in Library
Science from Carlos III University. Jorge Luis Morato is the corresponding author and can be
contacted at: jorge.morato@gmail.com
Sonia Sanchez-Cuadrado works as an Assistant Professor in the Department of Informatics at
Carlos III University of Madrid. In 2007, she received her PhD in Library Science and Digital
Environment, designing a methodology for the automatic construction of knowledge
organization systems and NLP.
Christos Dimou is a Visiting Lecturer at the Department of Informatics, at the Carlos III
University of Madrid. In 2010, he obtained his PhD in Electrical and Computer Engineering,
Aristotle University of Thessaloniki, Greece, dening a framework for the performance
evaluation of software agents. His research interests include requirements engineering, software
agents and information retrieval.
Divakar Yadav is an Assistant Professor in the Department of Computer Science and
Engineering at Jaypee Institute of Information Technology, Noida and Carlos III University for
the last 12 years. His area of interests includes information retrieval, soft-computing, and
operating systems. He has participated, reviewed and organized many international and national
conferences. He received his PhD in Computer Sc. and Engineering in 2010.
Vicente Palacios is currently working as Systems Engineer at the Carlos III University of
Madrid, where he is also a lecturer of Software Processes and Advanced Software Design.
LHT
31,4
656
To purchase reprints of this article please e-mail: reprints@emeraldinsight.com
Or visit our web site for further details: www.emeraldinsight.com/reprints