Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
CWA - - - - -
WORKSHOP Final 2009-06-03
AGREEMENT
ICS Number
English version
This CEN Workshop Agreement has been drafted and approved by a Workshop of representatives of interested parties, the constitution
of which is indicated in the foreword of this Workshop Agreement.
The formal process followed by the Workshop in the development of this Workshop Agreement has been endorsed by the National
Members of CEN but neither the National Members of CEN nor the CEN Management Centre can be held accountable for the technical
content of this CEN Workshop Agreement or possible conflicts with standards or legislation.
This CEN Workshop Agreement can in no way be held as being an official standard developed by CEN and its Members.
This CEN Workshop Agreement is publicly available as a reference document from the CEN Members National Standard Bodies.
CEN Members are the national standards bodies of Austria, Belgium, Bulgaria, Cyprus, Czech Republic, Denmark, Estonia, Finland,
France, Germany, Greece, Hungary, Iceland, Ireland, Italy, Latvia, Lithuania, Luxembourg, Malta, the Netherlands, Norway, Poland,
Portugal, Romania, Slovakia, Slovenia, Spain, Sweden, Switzerland, and United Kingdom.
© 2009 CEN All rights of exploitation in any form and by any means reserved worldwide for CEN National Members.
Contents
Foreword 7
Executive summary 8
Summary of recommendations 11
Overall recommendations 11
List of recommendations on different topics 12
1 Scope 17
2 Normative references 18
3 Abbreviations, terms and definitions 19
3.1 Abbreviations 19
3.2 Terms and definitions 20
4 Methodology and thematic overview 21
4.1 Thematic circle 21
4.2 Topics 23
4.2.1 Semantics 23
4.2.2 Data transformation 24
4.2.3 Process handling 25
4.2.4 Metasearch 25
4.2.5 Object identification 25
4.3 Cross-cutting concerns / Prerequisites 26
4.3.1 Legal aspects 26
4.3.2 Multiculturalism 27
4.3.3 Business models 28
4.3.4 Technology 29
5 Case study 30
5.1 The processes 31
5.1.1 The actors 31
5.1.2 Consumer process 31
5.1.3 Travel-related professional process 33
5.2 The information and communication technologies 34
5.2.1 Multiple levels of data sources 34
5.2.2 Type of information 36
5.2.3 Type of data sources 38
6 Semantics 40
6.1 Standards 40
6.1.1 Needs and requirements 40
6.1.1.1 Introduction 40
6.1.1.2 Needs 41
6.1.1.3 Requirements 42
6.1.2 State of the art 42
6.1.2.1 Types of standards 44
6.1.2.2 List of travel industry standards, companies and organizations
(examples) 44
CEN/ISSS WS/eTOUR – CWA – 2009-06-03 – 3
7.2.4 Recommendations 78
7.2.4.1 Short-term recommendations (1–3 years) 78
7.2.4.2 Long-term recommendations (3–10 years) 78
7.3 Automatic information extraction 79
7.3.1 Needs and requirements 79
7.3.1.1 Needs 79
7.3.1.2 Requirements 79
7.3.2 State of the art 79
7.3.2.1 Named entity recognition 80
7.3.2.2 Event extraction 80
7.3.2.3 Tourism-specific information extraction 81
7.3.3 Gaps and future needs 82
7.3.3.1 Named entity recognition 82
7.3.3.2 Event extraction 82
7.3.3.3 Tourism-specific information extraction 82
7.3.4 Recommendations 82
7.3.4.1 Short-term recommendations (1–3 years) 82
7.3.4.2 Long-term recommendations (3–10 years) 82
7.4 Inter-ontology mapping 83
7.4.1 Needs and requirements 83
7.4.1.1 Introduction 83
7.4.1.2 Needs 83
7.4.1.3 Requirements 83
7.4.2 State of the art 84
7.4.3 Gaps and future needs 85
7.4.4 Recommendations 86
7.4.4.1 Short-term recommendations (1–3 years) 86
7.4.4.2 Long-term recommendations (3–10 years) 86
8 Process handling 87
8.1 Needs and requirements 87
8.1.1 Introduction 87
8.1.2 Needs 88
8.1.3 Requirements 90
8.2 State of the art 91
8.2.1 Global standardization efforts 91
8.2.2 Application Integration and APIs 91
8.3 Gaps and future needs 92
8.4 Recommendations 93
8.4.1 Short-term recommendations (1–3 years) 93
8.4.2 Long-term recommendations (3–10 years) 93
9 Metasearch 94
9.1 Methodology 94
9.1.1 Needs and requirements 94
9.1.1.1 Introduction 94
9.1.1.2 Quality of results 94
CEN/ISSS WS/eTOUR – CWA – 2009-06-03 – 5
Foreword
The objective of the Workshop CEN/ISSS WS/eTOUR on “Harmonization of data
interchange in tourism” and the production of this draft CEN Workshop Agreement
(CWA) was approved by the Workshop at its plenary meeting held in Brussels on 6
February 2008.
This final version of the CWA was approved by letter ballot following the final
Workshop meeting on 15 May 2009.
The Secretary of the Workshop has been Håvard Hjulstad, Standards Norway.
Executive summary
Problem statement
Tourism is in the vanguard of ICT adoption and eBusiness in the area of eMarketing
and online sales (B2C). Yet, in a ranking of various sectors the tourism industry only
achieves a mid-level score in the overall use of ICT and eBusiness. It is still lagging
behind especially regarding the deployment of ICT infrastructure and the adoption of
e-integrated business processes [eBusiness W@tch Report 2006/2007, p 167]. At
the same time, tourism is an important and growing sector of the European economy,
with a large presence of SMEs.
Approach
Data interchange has two key components: The electronic data itself and the
exchange of data between two or more tasks in larger process chains. This hinges on
the ability of all tasks to understand the data they are supposed to consume – i.e.
data interoperability – and of processes to be able to meaningfully cooperate –
process interoperability. This draft CWA thus circles around the two core issues
“data” and “processes” and related challenges in the domain that need deeper
analysis. In particular, we have identified five topics for further analysis which are
briefly outlined below: “semantics”, “data transformation”, “process handling”,
“metasearch” and “object identification”.
These five topics are placed in the larger context of four cross-cutting concerns that
permeate all of them. Tourism transactions on the one hand regularly transcend
national and cultural boundaries and frequently involve both very small and very
large players. On the other hand, very many of the parameters – rating systems for
accommodation, opening hours of sites, classification of beaches – are regulated
nationally or even regionally and reflect cultural preferences. All transactions must
naturally follow pertinent national or regional laws and regulations. This leads to the
four cross-cutting concerns “Legal aspects”, “Multiculturalism”, “Business models”,
and “Technology”.
the flexible integration of heterogeneous data structures from a wide number of data
sources. In this it is also a central requirement for the building of flexible, cross-
organizational process chains.
Data transformation
The co-existence of many different data formats already implies the need to
transform data during data exchange. This mapping can affect data structures on
different levels that need to be transformed:
Process handling
The World Wide Web has significantly boosted the use of ICT in the tourism industry
and empowered customers to make travel arrangements autonomously by the use of
a wide variety of different data sources. This requires the seamless interplay of
different computer systems, allowing new online services like dynamic packaging of
tourism products.
Metasearch
Metasearch proper builds on shared semantics and data transformation to enable
searches across different individual search components of heterogeneous websites
and aggregate the results in a unified list. From a user’s perspective they offer thus a
one-stop entry point to a specific type of information; from a technology perspective
they have high demands on distributed data querying.
Object identification
Electronic transactions often hinge upon the idea of being able to uniquely identify
the objects on which they operate. In contrast to for example flights, there are many
types of objects in Tourism that do not have a unique identifier. There is at present no
universally accepted scheme to identify, say, a given hotel that should be booked, or
to compare different offers for the same hotel.
formats from the cultural heritage and the tourism sector and is confronted very much
with the same challenges as discussed in the workshop report.
“Mediation” has been identified as the key concept to reach interoperability in a highly
fragmented and diversified area like the tourism industry. This best practice case
demonstrates the way how to easily reach interoperability by data mediation, while
leaving enough flexibility to each partner to define his own data format.
Recommendations
The workshop came up with a number of recommendations that are all centred
around the basic idea to deal with the diversity of existing standards, technologies,
projects, and entities – rather than bringing another standard to the market. The
keywords in this context are harmonization and mediation.
Summary of recommendations
Overall recommendations
The workshop came up with a number of recommendations that are all centred
around the basic idea to deal with the diversity of existing standards, technologies,
projects, and entities – rather than bringing another standard to the market. The
keywords in this context are harmonization and mediation. The desirable it seems to
unify terms and standards to allow easy exchange of information and execution of
processes, the important it is to leave the market flexibility and diversity to define data
schemas. Instead, ways should be found to mediate between the different
approaches. The tourism sector has come up with a broad spectrum of different
standards, and for various reasons it will be difficult, if not even impossible, to replace
them.
Another reason is the game of market forces, making it difficult to reach consensus
on the issues involved. Different from many other industries, like the construction
industry, the benefit from having different standards seems to have more advantages
than the lack of interoperability has disadvantages. This can be observed in the area
of destination management as well as on the side of tour operators. However, the
need for standardization is recognized as it can be seen from different industry
associations and forums. But strong resistance can be observed when discussing
approaches for European or worldwide standards.
Above all the detailed recommendations listed in the chapters below, a general
approach is therefore suggested to harmonize (keeping differences to a minimum)
and to mediate (enable understanding between the differences) existing formats and
standards. This approach must be flexible, easy to use and cost-effective, as it is the
case for example within the project euromuse.net, which is described as the best
practice case. These criteria are critical to the success of the approach, since the
tourism sector is characterized by a large number of small and medium-sized
organizations.
The approach of mediation shall in no way invite to establish as many isolated new
standards as possible. One should rather try to watch carefully existing standards or
approaches when starting to create something new, to enable later mediation
between them as easily as possible, only deviating from other standards where it is
absolutely required.
object identification. In addition it could also keep track of technologies and projects
easing the problem of data and process interoperability, to come up with
recommendations on interoperability approaches and best practises for data models
and for interoperability approaches. At the same time the watchtower could
operatively be offering a data mediation service between the recognized standards in
the field, to serve as a central data meditation service.
All these recommendations aim at keeping diversity and flexibility of the European
eTourism landscape, while allowing process and data interoperability for the actors
involved to achieve a higher level of e-integration.
Standards
Short-term recommendations
Long-term recommendations
Lower the entry barrier for participation in pertinent formal and informal
standardization bodies especially for SMEs and extend the scope of those
activities to cover the requirements of SMEs.
Work on interoperability approaches between different standards.
CEN/ISSS WS/eTOUR – CWA – 2009-06-03 – 13
Taxonomies
Short-term recommendations
Long-term recommendations
Ontologies
Short-term recommendations
Long-term recommendations
Use semantic web technologies (e.g. based on RDF URIs) to name and
represent (data) resources on the Web so that mapping can be automatically
undertaken.
Agree the degree of formality information ought to be defined with, so that
automatic mapping tools can compare information.
Ontologies should be developed on different abstraction level. Agreed high-
level ontologies should be in place and should be used when defining domain
ontologies. General domain ontologies should be reused when more specific
sub-domain ontologies are defined.
Short-term recommendations
Long-term recommendations
Long-term recommendations
Together with a recognized body such as the W3C, agree on the name that
ought to be used for the tags that represent a particular tourism content and
that is valid for search machines.
Develop SW that enables (semi)automatic information annotation according to
the previous recommendation.
CEN/ISSS WS/eTOUR – CWA – 2009-06-03 – 15
Inter-ontology mapping
Short-term recommendations
Long-term recommendations
Process handling
Short-term recommendations
Long-term recommendations
Metasearch methodology
Short-term recommendations
Long-term recommendations
Querying
Short-term recommendations
Long-term recommendations
Research on technologies for flexible and adaptive query methods, that are
able to understand semantics of a web repository and can send an appropriate
query.
Object identification
Short-term recommendations
Long-term recommendations
1 Scope
The CEN/ISSS Workshop on eTourism aims at producing guidelines for reaching
global interoperability, i.e. enabling seamless data interchange and execution of
eBusiness processes in the tourism sector.
The CWA will cover the following topics under a pan-European interoperability
perspective:
a. analysis and identification of the needs of B2B and B2C partners for
harmonized data interchange;
b. analysis of the gaps in the design of current interoperability approaches;
c. description of the metadata and principles and requirements for data
modelling;
d. analysis of business models and legal issues (IPRs5, DRMs, Personal data
protection and privacy);
e. analysis of existing initiatives and approaches for flexible harmonization and
global interoperability (including process interoperability);
f. recommendations concerning a general framework for eTourism related
information exchange;
g. best practice case.
The CEN Workshop will focus on data integration and discovery as well as seamless
execution of eBusiness processes. Application of the above will support end-user
satisfaction/consumption of travel products, increase data reliability, revenue
generation and margin contribution, motivating early adoption and roll out to market.
18 – CEN/ISSS WS/eTOUR – CWA – 2009-06-03
2 Normative references
The following normative documents (European and International Standards) are
referenced in this document. Other documents of interest are listed in the
Bibliography.
R U
RDF — resource description UCS — universal character set
framework (ISO/IEC 10646)
RDFS — resource description UNWTO — World Tourism
framework schema Organization
RMSIG — Reference Model Special URI — uniform resource identifier
Interest Group (under IFITT)
W
S W3C — World Wide Web Consortium
SCORM — Sharable Content Object WAI — Web Accessibility Initiative
Reference Model WSMO — web service modeling
SHOE — simple HTML ontology ontology
extensions WWW — World Wide Web
SME — small and medium enterprises
X
SOA — service-oriented architectures
SQL — standardized query language XFT — exchange for travel
XHTML — extensible HTML
T
XML — extensible markup language
TCP/IP — Transmission Control XSLT — extensible stylesheet
Protocol / Internet Protocol language transformation
TGV — train grande vitesse: high
speed train
Tourism is an important and growing sector of the European economy, with a large
presence of SMEs. ICT is an enabler to strengthen efficiency, reduce costs and
improve competitiveness of the industry. Tourism is expected to contribute 8.4 % of
total employment and 9.9 % of the GDG worldwide [World Travel and Tourism
Council, 2008, p 4].
For these reasons, it is important that companies and associations in the tourism
sector understand the benefits they can reap from eBusiness, enhance their ICT
infrastructure, and adopt eBusiness processes.
In eBusiness implementations the tourism sector has some specificities. Data quality
and reliability are critical issues (e.g. updated opening hours for a museum, reliable
on-line booking). Other critical issues are territorial definition and coordination
between regional or local groups and national sites. Commercial information (B2B,
B2C, B2G) and “touristic information” (information to the end user, G2C) are both
concerned. All involved parties provide information at different levels (e.g.
government – travel warning; B2C the mentioned opening hours, B2B distribution
prices and their meanings). These specificities lead to a high degree of heterogeneity
in tourism. Tourism market structures are complex and highly fragmented.
Information interchange on the level of processes and data structures is not
harmonized and the electronic execution of business processes on a global level is
still burdened by heterogeneous interfaces and data structures.
Figure 4-1
This hinges on the ability of all tasks to understand the data they are supposed to
consume – i.e. data interoperability – and of processes to be able to meaningfully
cooperate – process interoperability.
interoperability. Our report thus circles around the two key
concepts of data and processes; see figure 4-2.4
Figure 4-2
CEN/ISSS WS/eTOUR – CWA – 2009-06-03 – 23
The circle captures the relationship between data and process interoperability with
key enablers that need deeper analysis. In particular, we have identified the following
topics for further analysis:
Semantics
Data transformation
Process handling
Metasearch
Object identification
The topics are placed in the larger context of four cross-cutting concerns that
permeate all of them. Tourism transactions on the one hand regularly transcend
national and cultural boundaries and frequently involve both very small and very
large players. On the other hand, many of the parameters – rating systems for
accommodation, opening hours of sites, classification of beaches – are nationally or
even regionally regulated or reflect cultural preferences. All transactions must
naturally follow pertinent national or regional laws and regulations.
Processes will in particular be implemented in line with the process owner’s overall
business model. The data structures will similarly often be dictated by the owner’s
value proposition. Furthermore, both data and processes will at least to a degree
reflect the technology – software, hardware, overall connectivity etc. – on which the
system in question operates.
4.2 Topics
The following subsections will briefly present each of the selected topics, give a birds-
eye view, and motivate the rationale for their choice. The remainder of the report will
then examine the issues methodically and in more detail.
4.2.1 Semantics
The meaning and structure of data is at the heart of data interoperability – and, given
the plethora of pertinent formats, it is unfortunately a complex problem. Differences
on the syntactic level – say, XML messages versus comma-separated files or EDI-
type communications – can already impact how much semantics data carries already
in itself. Formal or informal standards can externally assign meaning to an otherwise
meaningless data set (say, to an otherwise arbitrary sequence of fields in the rows of
a csv file), or explicate the semantics of XML structures that to humans are already
partially self explanatory.
Taxonomies can help to unambiguously specify possible value sets for the data,
ideally combined with specific definitions of the individual options and their
relationship to others. Ontologies can then reference and use theses value sets in
properties of classes that go a long way further towards specifying the exact
semantics of data.
Consumers are getting more and more used to make online transactions, and it
comes to a crowding out process: Business actors have to follow demand to keep or
expand their market share. Traditional distribution channels are vanishing, and more
flexible and dynamic networks rise. A trend for outsourcing and focussing on core
competences could be observed, leading to a more consumer-centric approach and
allowing highly individualized and ad-hoc product design. This challenge brings with it
the need to orchestrate business processes flexibly and across organizations.
4.2.4 Metasearch
One of the prerequisites for process handling is the ability to identify the relevant
players for potential joint processes and to find information across those players.
Registries, especially federated registries, will play a leading role in describing
potential partners and their services. They will thus facilitate to bring them together.
At present search components differ in their query syntax, which makes it difficult to
scale metasearches and to spontaneously integrate new data sources. For the actual
technical realization of metasearches agreed query strategies and query syntaxes
are therefore desirable and being worked upon.
While object identification does work for flights, there are many other types of objects
in eTourism that do not have a unique identifier. One of the most important cases in
26 – CEN/ISSS WS/eTOUR – CWA – 2009-06-03
Laws, however, influence many other areas in eTourism transactions. The following
list is only indicative and certainly not a complete overview of pertinent legislation:
Some countries and regions such as Oberösterreich even have dedicated laws on
tourism (see http://www.oberoesterreich-tourismus.at/alias/lto/recht/410624/
tourismusrecht.html).
These laws are only partially harmonized across Europe, [Directive 90/314/EEC] as a
directive setting minimum pan-European standards for customer protections for
packaged tours being more an exception than the rule. Furthermore, the legal
systems of countries across Europe do not necessarily cover the same areas. For
example, hotel classification is mandated by law in some countries such as Italy and
Greece and does not even exist in others such as Finland.
legal situation is quite clear from an end user’s point of view. Such tours are always
regulated by the national laws in question. The tour operator is from the customer’s
perspective the only contractual partner [Freyer, 2006, p 234] [Directive 90/314/EEC]
and responsible for providing all the services that were promised. It is also alone
responsible for any possible redress that may result from unsatisfactory services.
The situation is much muddier for extras such as car rental at the place of destination
for which a travel agency only acts as an intermediate. Dynamic packaging poses
even bigger problems in this direction. An intermediary – often a specialized travel
agent – combines pre-assembled packages based on user preferences. The user
does not administrate the different items in the package himself, but gets offers for
packages which are dynamically assembled based on his preferences. However, the
legal and contractual consequences of such dynamic bundles are not clear yet. For
an end-user such a bundle of sub-packages can imply also a set of separate
contracts which do not by themselves necessarily fall under the definition of
“package” of [Directive 90/314/EEC]. In consequence, the contractual situations and
the legal mechanisms for redress can be quite more complex for dynamic packaging.
An unnamed provider of software components for eTourism transactions named this
as the single biggest obstacle to the uptake of dynamic packaging.
4.3.2 Multiculturalism
Related to legal aspects are the multicultural facets of many eTourism transactions
which span cultures and frequently involve both very small and very large players,
thus also mixing organizational cultures. Culture here is a much wider concept than
high culture and covers “the set of distinctive spiritual, material, intellectual and
emotional features of society or a social group, and [...] encompasses, in addition to
art and literature, lifestyles, ways of living together, value systems, traditions and
beliefs” [UNESCO, 2002]. Europe in particular is characterized by multiculturalism
right down to its official motto, unity in diversity.
Many cultural preconditions have influenced local description systems such as rating
systems for accommodation or classification of beaches which in some countries are
nationally or even regionally regulated. Others such as usual opening hours of sites
or food offerings follow usually local customs without being subject to laws.
Multilingualism: Languages are an integral and often defining part of cultures, and as
such multiculturalism includes multilingualism, the coexistence of many languages.
Until around the turn of the millennium the treatment of multilingual data in computer
systems posed major problems. However, the widespread adoption of the Universal
Character Set (UCS) [ISO/IEC 10646], also known as Unicode, and its companion
standards has changed the game. The UCS is supported in virtually all current
operating systems and many application programs including all major browsers and
email clients. XML is squarely based on the UCS. Thus both the internal
representation, the exchange and the display of multilingual data is now quite
unproblematic.
That said, some of the Global Distribution Systems (GDSs) that are at the core of
many eTourism transactions stem from the 1950s and 1960s, and even the youngest
of the “big four” GDSs, Amadeus, was written in the 1970s and 1980s. In this they
28 – CEN/ISSS WS/eTOUR – CWA – 2009-06-03
long predate the UCS and have at best sketchy support for multilingual data. Many to
this day operate on subsets of ASCII. This obviously can create considerable issues
notably for the handling of personal names and the names of organizations. It is
outside of the scope of this report to elucidate these issues in detail, though it would
be highly beneficial to be able to get this overview.
Taxonomies and terminology are another important area in which data is necessarily
language-dependent. The exact definitions of categories such as “double room” (with
or without children), “luxury hotel” etc. will reflect the understanding in a given
language and culture.
In view of this multitude of taxonomies, which may or may not in turn coincide with
the customer’s own cultural and personal preferences, ratings will have to be based
on specific properties of accommodation, and, for that matter, general service, rather
than on general classifications alone. Searches for “hotels with WiFi, restaurant and
rooms over 20 m2” are likely to produce more acceptable results for users of many
cultural background than searches on “3-star” alone.
operate automatically and can thus compete primarily on the price front and, in part,
are closely related to today’s GDSs. The GDSs themselves operate on two related,
but distinct business models, namely of being a service company for major service
providers such as airlines, and as an integration platform for intermediaries.
All eTourism activities must be seen in the context of the relevant business models.
They dictate the initial willingness to interchange data and to engage in cross-
organizational processes. In much of a sense this willingness is a premise for this
report.
4.3.4 Technology
The advent of the World Wide Web makes a watershed also for the tourism industry.
As we have seen, GDSs have been operational since the early 1960s, but they
depended on highly proprietary distribution networks to allow travel agents to interact
with them. The advent of videotext systems such as BTX in Germany and Minitel in
France and similar technologies in the 1980s somewhat opened and standardized
these channels, but by and large the communication channels remained accessible
only to professional intermediates.
The success of the WWW has largely standardized the communication channels
between providers to standard internet protocols; not necessarily http, though, as
many larger data sets are still transferred using ftp or related protocols. The
underlying technology has in many cases changed much less, though, with today’s
GDSs largely operating on the same transactional stacks as before, but some –
though by no means all – of its details have been abstracted away through the
common protocols.
This standardization on common network protocols has allowed for the rise of
collaboration standards such as SOAP-based Web Services, XML-based data
formats, semantic standards and, last but not least, the http standard itself that is
again in today’s emphasis on RESTful web services. This report concentrates on the
interoperability layer between implementations.
30 – CEN/ISSS WS/eTOUR – CWA – 2009-06-03
5 Case study
Mechanisms and solutions for electronic data exchanges in the tourism industry were
developed a long time ago at first by airline companies in order to allow them to be
able to exchange data about flights and bookings. Different standards emerged from
those initial operational exchanges, taking into account the limitation of the means of
communications of that time.
Over the years, the need to access inventory, prices, booking files, customer data
and sales or descriptive information has boomed, first through the development of the
GDSs (Sabre in 1960, Galileo in 1971), main CRSs (Pegasus, Wizcom, etc.) and
more recently with the web, used both for B2B and B2C applications.
The thematic circle introduced earlier (4.1) will be illustrated through the following
case study. The base guideline for the case study corresponds to a consumer (end
user or travel-related professionals) wanting to book a trip or gather travel-related
information using information and communication technologies.
Figure 5-1
The case study is first detailed in terms of different trip phases and corresponding
information needs and processes to be used by the consumer:
Platforms, technologies, types of information and data sources are reviewed within
the case study. Some drawbacks and limitations, gaps and future needs will also be
identified and associated to the elements of our thematic circle, which will then be
detailed later in the document.
CEN/ISSS WS/eTOUR – CWA – 2009-06-03 – 31
Figure 5-2
Discovering:
o select possible destinations and types of trips based on personal or
family interests (a particular activity or hobby, a destination, etc.);
o select according to a season (winter sport, sun in winter, etc.);
o investigate prices and opportunities, accommodations, services, events,
etc.;
o explore recommendations and ratings from other travellers;
o etc.
Shopping: to match reality with expectations:
o compare prices;
o compare content of offers (similar offers, different types of trips, etc.);
o investigate testimonies.
Constituting the trip itself by:
o validating price and availability for a trip from a unique vendor, or
o amalgamating components from different vendors – such as hotel
vendor, pre-packaged tour vendor, airline company, etc. (in a unique
booking or in multiple bookings);
o requesting bids or quotes or alerts from different vendors.
Finalizing the buying process (confirmed or option booking(s)):
o finally buy from a unique vendor, or
o buy the amalgamated components (stored in a unique or multiple
bookings);
o add links to reference data (to keep track of weather, health or country
data, activities, testimony, etc.);
o pay (deposit or total).
32 – CEN/ISSS WS/eTOUR – CWA – 2009-06-03
Once a booking is finalized, this is not the end of the process. Certain consumers
would continue browsing the web to
Finally, during or after the trip comes the part that is now booming with the web 2.0
sites: The consumer could
testify: He will add its own piece of information on the web, using forums,
testimony sites, polls;
publish new generated content, such as media, text;
enrich its profile(s) on the different sites in order to keep in touch with
opportunities in relation with their interests;
follow his subsequent trips in case he actually prepared more than one trip or
he acquired components that would be valid on several trips;
share common interests in order to organize group events;
possibly file and follow up a complaint.
Figure 5-3
Other professional processes also revolve around the major task of publishing data
for professional and end consumer use, such as
publishing fares,
providing information on products and destinations,
referencing other sources of information,
selecting and ranking data (vendors, destinations, etc.),
etc.
34 – CEN/ISSS WS/eTOUR – CWA – 2009-06-03
Those additional processes either rely on the same systems, platforms and
communications means as the ones available to end consumers, but with advanced
features, rely on specific systems not available to end consumers or end up being
manual.
This is of course only possible depending on the flexibility of the exchanges, on the
formats made available by the sources and intermediates, on the extensive use of
semantic web and other mechanisms allowing automated exchanges and recognition
of meaning and data. This will be detailed in the present document.
The user may also consult different sites in parallel, therefore initiating different
processes. This behaviour is considered outside the present case study.
Figure 5-4
The owner of the communication and information technology would usually own one
or several data sources and directly make us of them. That would be the case for
instance for a hotel group for its hotel data (editorial text, prices, availabilities,
comments, etc.).
CEN/ISSS WS/eTOUR – CWA – 2009-06-03 – 35
The front system may also connect to other external sources to aggregate additional
information. A hotel chain may not own the inventory of each hotel and could
interrogate the different hotel PMS or hotel groups CRS to validate the availability.
Each of those additional external sources could therefore either own the data or itself
aggregate content from other sources, therefore creating a chain of sources involved
in a single request from the consumer. That would typically be the case for an online
site like Opodo dynamically requesting airline availability and fares from a GDS
(Amadeus in our example), itself launching requests to different airlines in relation
with the expected city pair.
The added value of using layers of sources would reside in their capacity to
concentrate coherent data from different sources (such is the case of GDSs
for airlines, comparators);
enrich data from a source by either directly adding data or by concatenating
data from other external sources (like web sites proposing different types of
trips).
Online agencies such as Expedia or Opodo also have back office systems to enter
and maintain editorial data, price lists and stocks. That would be their own data
source. They typically do not own destination, weather, policy or health related data
but use external sources such as Lonely Planet or government web sites. Those
distributor in-house systems also usually connect to GDSs (Global Distribution
Systems such as Amadeus or Galileo) to request airline fares and availability. We
would be in the situation where an intermediate data source browses other external
data sources for information.
This need for a distributed architecture composed of distinct systems around the
world and owned by different companies with various strategies and technologies
lead to a number of constraints and requirements identified as cross-cutting aspects
in the previous introduction:
Technical aspects come first to mind, with the need to ensure compatibility of
the different systems, increase the reliability of the individual elements,
measure the impact on architectures and scale accordingly. Performance of
the different systems and of the overall chain is key and leads to additional
complexity (such as caching, uniqueness of data, etc.).
Business models must also be taken into account because making money is
central for the complete system to work smoothly. There must therefore be the
capacity
o to use other systems against retribution (fixed price, price per
transaction, percentage of a booking, etc.);
o to add mark-ups along the chain and still get a competitive price;
o to access net prices directly on intermediate levels in the chain;
o etc.
Legal aspects is equally important, with the necessity to ensure that
o the information and products found and possibly purchased on the
different systems can legally be purchased or used;
36 – CEN/ISSS WS/eTOUR – CWA – 2009-06-03
o the distributor and the end user will have the capacity to track individual
providers so that they fulfil their obligation (provided there is the same
notion at the provider’s place), in case of any issue.
Even multiculturalism is present when speaking about systems composing the
complete infrastructure:
o Provision of services (and support) on a 24 hours basis and not to stop
servers during the night is unusual in certain countries or for small
companies.
o Documentation to consume the service may not be written in a widely
used language such as English or with multiple translation.
The main topics involved to allow process and data interoperability also come into
play in case of multilevel data sources:
With all these elements in place in the multilevel data sources scenario of our case
study, we could have complex processes in place like dynamic packaging for
instance with
data interoperability, sharing and grouping objects with different identifiers and
semantic definitions, and
process interoperability:
o compatibility of the different exchanges for each sub process;
o capacity to have evolution only on certain components of the system;
o etc.
According to the type of information, different types of issues and needs arise, that
we have again grouped based on our thematic circle with first the topic of our
thematic cycle and then the pre requisites:
Figure 5-5
As introduced in the previous chapters, each data source may in turn connect to
multiple data source of the same type or of other types.
6 Semantics
6.1 Standards
6.1.1 Needs and requirements
6.1.1.1 Introduction
The first word that may come to mind when talking about data and information
interoperability and exchange is “standards”. Standards have traditionally been
widely used in different industries. The general goal of standards and standardization
is to allow compatibility, interoperability, safety, repeatability, quality, etc. The process
of developing and agreeing upon a general standard is known as standardization.
Within the computer science domain and Information and Communication Techno-
logies standards have also been widely used and are becoming increasingly more
important. There are a vast number of both software and hardware developers and
manufacturers worldwide that produce different items. These items do need to follow
particular standards in order to work together in a satisfactory manner. As the amount
of information contained on the Internet increases every second, a unified represent-
tation for web data and resources is needed in today’s large scale Internet data
management systems. This unification of standards will allow machines to meaning-
fully process the available information and to (successfully) exchange and integrate
data coming from distributed databases and information management systems. This
has been occurring, e.g. in the context of eLearning with the development of the
SCORM (http://www.adl.net/) and AICC (http://www.aicc.org/) standards, or in the
context of telemedicine applications with the development of standard data transport
protocols such as HL7 and ISO/IEEE 11073, among others.
CEN/ISSS WS/eTOUR – CWA – 2009-06-03 – 41
There have already been some efforts invested in this direction (see 6.1.2) in order to
enable distributed data exchange and integration. Interoperability between databases
and information sources needs to be provided on both a technical and informational
(semantic) level. The social value of the Web is that it enables human communi-
cation, commerce, and opportunities to share knowledge, information and experien-
ces. One of W3C’s (World Wide Web Consortium, http://www.w3c.org/) primary goals
is to make these benefits available to all people, whatever their hardware, software,
network infrastructure, native language, culture, geographical location, or physical or
mental ability might be.
6.1.1.2 Needs
Benefits of use of standards
Standards have proved to be a powerful tool for organizations of all sizes, supporting
innovation, increasing productivity and efficiency in their business processes.
Effective standardization promotes competition and enhances profitability, enabling a
business to take a leading role in shaping the industry itself. Generally speaking,
standards allow a company to:
In modern business effective communication along the supply chain and with
legislative bodies, clients and customers is imperative. Applying standards within the
everyday operation of a company provides the means to measure various variables
and thus, to be able to manage the evolution of the variables, providing benefits
when applied within the infrastructure of a company itself. Business costs and risks
can be minimized, internal processes streamlined and communication improved.
Standardization promotes interoperability, providing a competitive edge necessary for
the effective worldwide trading of products and services.
42 – CEN/ISSS WS/eTOUR – CWA – 2009-06-03
6.1.1.3 Requirements
Within the tourism industry standards may help companies to be more competitive in
terms of being present on the web by complying with information and communication
standards and recommendations. In order to achieve exchange and integration of
information through different information systems, information formats and transfer
protocols must be compatible and ought to allow any hardware and software used to
access the information to work together.
Furthermore, information integration and exchange are required to provide trade and
commerce operations capacity on web sites, so that a local company be globally
present through the web and increases its business opportunities.
Regarding Web and information standards one of the most active bodies the W3C.
W3C designs and promotes interoperable open (non-proprietary) formats and
protocols to avoid the market fragmentation of the past. A W3C Recommendation is
the equivalent of a web standard, indicating that this W3C-developed specification is
stable, contributes to web interoperability, and has been reviewed by the W3C
membership, who favours its adoption by the industry.
Standards can be found throughout daily life, but why would we need to use
standards? Rather than asking why we would need standards, we might usefully ask
ourselves what the world would be like without standards. Products would not work
as expected. They would be of inferior quality and incompatible with other products or
equipment, in fact they would not even connect with them, and in extreme cases;
non-standardized products could potentially be dangerous.
From a user’s standpoint, standards are extremely important in the computer industry
because they allow the combination of products from different manufacturers to
create a customized system. Without standards, only hardware and software from the
same company could be used together. In addition, standard user interfaces can
make it much easier to learn how to use new applications.
Most official computer standards are set by one of the following organizations:
There is a need to define and provide (semantic) definitions and clarifications in order
to transform disparate localized information into a global, coherent resource within
the Internet (most common communication platform and environment in this case).
6.1.4 Recommendations
6.1.4.1 Short-term recommendations (1–3 years)
Lower the entry barrier for participation in pertinent formal and informal
standardization bodies especially for SMEs and extend the scope of those
activities to cover their respective requirements.
Work on interoperability approaches between different standards.
6.2 Taxonomies
6.2.1 Needs and requirements
6.2.1.1 Introduction
Traditionally all sciences classify their objects. Astronomy classifies celestial bodies
such as planets, stars and galaxies. Botany classifies plants, chemistry the
chemicals, medicine classifies illnesses, psychology classifies mental processes,
library and information science classify documents and systems and methods of
knowledge organization, religious studies classify religions, and the list could go on
forever.
Such classifications are not performed just in order to create an aesthetic effect.
Classifications are constructed in order to work efficiently, and also to provide the
means to efficiently find and retrieve meaningful and required information.
Classification is not something extra put on the top of scientific work; rather it is
something deeply integrated within scientific work itself, as it provides deeper
understanding on the subject matter of study.
For example, if a new group of chemical substances are found to help cure a certain
disease and this fact is widely demonstrated, it shall be classified as a kind of drug
(e.g. as antidepressives, tranquillizers or anti-inflammatory drugs) that helps humans
recover from that particular disease.
6.2.1.2 Needs
organized in agreed ways by all agents (public institutions and bodies, industry,
research communities, final user, etc.). Relevant tourism information in general, its
organization within information management systems and its explicit specification
through schemas or information representation methods and models need to be
defined.
Information and content are key. To access the right piece of information at the right
moment information needs to be clearly stored and classified. Almost anything
(including tourism information, i.e. travel, accommodation, restoration, events, useful
information, etc.) has to be classified following a structure, e.g. taxonomic schemas.
6.2.1.3 Requirements
As the amount of (all kinds of) available information increases on the web the
particular piece of information we seek may be buried into the one that we do not
seek. Thus, the activity of classifying information becomes increasingly more
important as it makes it easier to find a particular content on the web. This in terms of
service provided by a company can be translated into business opportunities.
Information availability in an easy way is significantly more important to those
planning some kind of leisure activity, as their behaviour pattern indicates that they
will not spend too long on Web sites looking for information. Thus, information has to
be object oriented, not experience oriented. However, in order to build a successful
tourism taxonomy, both approaches are required.
The benefit of this (taxonomy) approach is that it allows related terms to be grouped
together and categorized in ways that make it easier to find the correct term to use
for whatever purpose. Within the tourism domain, if there is a taxonomical
classification for the notion of “Event”, different “Event”s could be classified under the
58 – CEN/ISSS WS/eTOUR – CWA – 2009-06-03
general one, e.g., sport events, cultural events, etc., and would allow a tourist to
easily find the kind of “Event” s/he wants to undertake.
Etymologically speaking, the word “taxonomy” comes from the Greek taxis
(“arrangement, order”) and nomos (“law”).
The units in taxonomies are termed taxon (plural: taxa). Initially taxonomy was only
the science of classifying living organisms and species, but later the word was
applied in a wider sense, and may also refer to either a classification of things, or the
principles underlying that classification. Classification of species, however, began
well before the eighteenth century. Aristotle distinguished species by habitat and
means of reproduction, but Andrea Cesalpino produced the first significant taxonomy
of plants in 1583, arranging the species in a hierarchical, graded order. His work was
developed by Marcello Malpighi, who expanded his hierarchical system to include
animals. The word taxonomy is sometimes used synonymously with classification
and sometimes given a special meaning.
There have also been some attempts to differentiate taxonomies from simple
classifications. These attempts may also serve as a review of the different definitions
authors have given to the notion of taxonomy. “A taxonomy obtains when several
fundamenta divisionis are considered in succession, rather than simultaneously, by
an intensional cl. [classification]. The order in which fundamenta are considered is
highly relevant: the taxonomy obtained by using property X to classify a genus and
then property Y to classify its species is by no means the same as that obtained by
considering property Y first and property X afterwards” [Marradi, 1990].
Campbell & Currier (31/10/00) [Campbell, Currier] asks: What is a taxonomy? And
they provide the following answer:
There is a vast number of taxonomic classifications within the tourism domain in the
literature. Almost every project applying information management methods use a
taxonomy in order to organize the existing information of their universe (the project to
be developed) of discourse. Taxonomies are later used to design database
structures, ontologies and other tools in order the information to be easily accessible
and retrievable for the final user.
In commercial web sites and online travel agencies, their services are often
organized under taxonomies, e.g. restaurants and kinds of restaurants.
Accommodation facilities are organized under different categories: hostel, 5-star
hotel, 4-star hotel, etc., or even in ranges of price, depending upon the search
criteria.
CEN/ISSS WS/eTOUR – CWA – 2009-06-03 – 59
Taxonomies usually require strict control over the creation of new entities and
branches and this restriction needs to be overcome, especially given the way
information is consumed on the web. Systems need to be as dynamic as possible,
i.e., flexible.
introduce classification methodologies; rather, they will categorize their content and
will make it available via tags, links, etc.
6.2.4 Recommendations
6.2.4.1 Short-term recommendations (1–3 years)
6.3 Ontologies
6.3.1 Needs and requirements
6.3.1.1 Introduction
The word “Ontology” (note the upper-case ‘O’) comes originally from philosophy.
From a philosophical point of view, Ontology is the branch of philosophy which deals
with the nature and the organization of reality [Guarino, Giaretta, 1995]. We have to
go as far back as to Aristotle to see the first reference to this word when he tries to
define a “science” that is “on top of” the rest of the sciences, when he describes in his
Metaphysics Book IV a science that studies the being as being (i.e. Ontology):
“There is a science that studies the being as being and its properties as such (being)
which belong to it in virtue of its nature. Now, this science is not the same as any of
the so called special sciences, since none of these other treat (universally) the being
as being itself but reducing the being to one part of it, they (“only”) investigate the
essential properties of this part. Since we are seeking the first principles and the
highest causes, there must (clearly) be something to which these belong in virtue of
its own nature. If then, those who sought the elements of existing things were
seeking these same principles, it is necessary that the elements must be elements of
being not only by accident but just because it is being. Therefore, it is of being as
being that we also must grasp the first causes” [Aristotle, Metaphysics Book IV].
At the computer science domain, ontologies (note now the lower-case ‘o’) aim at
capturing domain knowledge in a generic way and providing a commonly agreed
understanding of a domain which may be reused and shared across applications and
groups. Ontologies provide a common vocabulary of an area and define with different
levels of formality the meaning of terms and the relations between them. Since the
beginning of the 1990s, ontologies have become a popular research topic
CEN/ISSS WS/eTOUR – CWA – 2009-06-03 – 61
6.3.1.2 Needs
In recent years, the development of ontologies has been moving from the realm of
Artificial Intelligence (AI) laboratories to the desktops of domain experts. Ontologies
have become common on the World Wide Web. Ontologies on the web range from
large taxonomies categorizing web sites to categorizations of products for sale and
their features. Many disciplines now develop standardized ontologies that domain
experts can use to share and annotate information in their fields. Why would
someone want to develop an ontology? Here are some of the (possible) reasons:
Ontological analysis clarifies the structure of knowledge. The first reason is that they
form the heart of any system of knowledge representation. If there are not
conceptualizations that underlie knowledge, then there is not a vocabulary for
representing knowledge. Thus, the first step in knowledge representation is
performing an effective ontological analysis of some field of knowledge. Weak
analyses lead to incoherent knowledge bases.
Consider a domain in which there are people, some of whom are students, some
professors, some are other type of employees, some are females and some males.
For quite some time, a simple ontology was used in which the classes of students,
employees, professors, males and females were represented as “types of” humans.
Soon this caused problems because it was noted that students could also be
employees at times and can also stop being students. Further ontological analysis
showed that “students”, “employees”, etc. are not “types of” humans, but rather they
are “roles” that humans can play, unlike categories such as “females”, which are in
fact a “types of” humans. Clarifying the ontology of this data domain made it possible
to avoid various difficulties in reasoning about the data.
Knowledge sharing
Ontologies enable knowledge sharing. The second reason why ontologies are
important is that they provide a means of sharing knowledge. Suppose we do an
analysis and arrive at a satisfactory set of conceptualizations and terms standing for
them for some are of knowledge, say, the domain of “electronic devices”. The
resulting ontology would be likely to include terms such as “transistors” and “diodes”,
and more general terms such as “functions”, “processes”, and also terms in the
electrical domain, such as “voltage”, that could be necessary to represent the
behaviour of these devices. It is important to note that the ontology – defined by the
basic concepts involved and their relations – is intrinsic to the domain, apart from a
choice of vocabulary to represent it. This ontology can be shared with others who
62 – CEN/ISSS WS/eTOUR – CWA – 2009-06-03
have similar needs for knowledge representation in that domain, avoiding the need
for replicating the knowledge analysis.
Since then, a considerable progress has been made in developing conceptual bases
needed for building technology that allows knowledge component reuse and sharing.
One of the first definitions of the word “ontology” within the computer science domain
is due to Neches et al [1991]. They defined an ontology as follows: “An ontology
defines the basic terms and relations compromising the vocabulary of a topic area as
well as the rules for combining terms and relations to define extensions to the
vocabulary”.
It can be affirmed that this definition gives some clues about how to proceed to build
an ontology, including some vague definitions:
Later, in 1993, Gruber’s definition becomes the most referenced on the literature. The
following is his definition of an ontology: “An ontology is an explicit specification of a
conceptualization”. Conceptualization refers to an abstract model of phenomena in
the world by having identified the relevant concepts of those phenomena. Explicit
means that the type of concepts used and the constraints on their use are clearly
defined. Formal refers to the fact that the ontology should be machine readable and
process able. Shared reflects the notion that an ontology captures consensual
knowledge, that is, it is not private to some individual, but accepted by a
representative group of users that belong to a particular domain of knowledge.
CEN/ISSS WS/eTOUR – CWA – 2009-06-03 – 63
Ontologies provide a common vocabulary of an area and define – with different levels
of formality – the meaning of the terms and the relations between them. Knowledge
in ontologies is mainly formalised using five kinds of components: classes, relations,
functions, axioms and instances [Gruber, 1993].
The tools that can be used for building ontologies usually provide a graphical user
interface for building ontologies, which allows the ontologists to create ontologies
without using directly a specific ontology specification language. Some tools such as
Protégé, Chimaera, and FCA-Merge have been created for merging and integrating
ontologies.
In the context of the Semantic Web, some tools have arisen during last years for the
annotation of web resources in SHOE, RDF or DAML+OIL and OWL. Their main
objective is the creation and maintenance of ontology-based markups in static web
documents. In fact, they are used for managing easily instances, attributes and
relationships between web resources. Some of these annotation tools are
OntoAnnotate, OntoMAt, and SHOE Knowledge Annotator.
There are also some ontology-based text mining tools, which allow extracting
ontologies either from structured, semi-structured or free text. These tools are used
to learn ontologies from natural language, exploiting the interacting constraints on the
64 – CEN/ISSS WS/eTOUR – CWA – 2009-06-03
There are some important parameters that can be used in the comparison and
evaluation of existing tools. Some of these parameters are:
A great range of languages have been used for the specification of ontologies during
the last decade: Ontolingua, LOOM, OCML, Flogic, CARIN. Many of these languages
had already been used for representing knowledge inside knowledge-based
applications, others were adapted from existing knowledge representation
languages, and there is also a group of languages that were specifically created for
the representation of ontologies. These languages, which we will call “traditional”
languages, are in a stable phase of development, and their syntax consists of plain
text where ontologies are specified.
Recently many other languages have been developed in the context of the World
Wide Web: RDF, RDF Schema, SHOE, XOL, OML, OIL, DML+OIL, and OWL. Their
syntax is based on XML, which has been widely adopted as a ‘standard’ language for
exchanging information on the web, except for SHOE, whose syntax is based on
HTML.
Among all these languages, RDF and RDF Schema cannot be considered to be
ontology specification languages per se, but rather general languages for the
description of metadata in the web. Most of these “markup” languages are still in a
development phase.
There are many other languages that have been also considered in this survey. For
instance, some languages have been created for the specification of specific
ontologies, such as CycL and GRAIL. There are also some other languages that
have not been created specifically for the representation of ontologies, including
additional features that are not usual in ontologies, such as NKRL.
The most commonly used ontology development languages are the following:
There have been some research communities that have already tried to define
standard ontologies that cover a particular area of knowledge in a generic way and
that could thus be used in a standard way.
The CIDOC CRM is a core ontology explaining the extended meaning of data
structures from humanities and cultural heritage, including history of science, is the
outcome of a long-term disciplined knowledge engineering activity which excels in its
ontological commitment, i.e. acceptance of its constructs by domain experts.
The primary role of the CRM is to enable information exchange and integration
between heterogeneous sources of cultural heritage information (Doe, 03). It aims at
providing the semantic definitions and clarifications needed to transform disparate,
localized information sources into a coherent global resource within a larger
institution, in intranets or within the Internet. More concretely, it defines and it is
restricted to the underlying semantics of database schema and document structures
used in cultural heritage and museum documentation in terms of a formal ontology.
66 – CEN/ISSS WS/eTOUR – CWA – 2009-06-03
The success of the CRM relies on the fact that the explanation of common meaning
can be done by a very small set of primitive concepts and relations in contrast to data
structure that suggest to the user what to say about an object. The relations in data
structures that connect items directly by highly specific, diverse kind of relationship
can frequently be expressed by data paths composed of a few fundamental
relationships defined within the core ontology.
The CIDOC CRM has become the most promising core element for realizing
semantic interoperability in archives, libraries and museums by its capability to link
intellectual structure of highly diverse sources and products of scientific and scholar
discourse with the elements formally handled by information systems.
The CIDOC CRM is the culmination of over 10 years work by the CIDOC
Documentation Standards Working Group and CIDOC CRM SIG (Special Interest
Group) which are working groups of CIDOC. Since 2006 it is official standard ISO
21127.
FRBRoo
The FRBRoo is a formal ontology intended to capture and represent the underlying
semantics of bibliographic information and to facilitate the integration, mediation, and
interchange of bibliographic and museum information. The FRBR model was
originally designed as an entity-relationship model by a study group appointed by the
International Federation of Library Associations and Institutions (IFLA).
The CIDOC CRM model was being developed from 1996 under the auspices of the
ICOM-CIDOC (International Council for Museums – International Committee on
Documentation) Documentation Standards Working Group. The idea that both the
library and museum communities might benefit from harmonizing the two models was
first expressed in 2000 and grew up in the following years. Eventually it led to the
formation, in 2003, of the International Working Group on FRBR/CIDOC CRM
Harmonisation that brings together representatives from both communities with the
common goals of:
expressing the IFLA FRBR model with the concepts, tools, mechanisms, and
notation conventions provided by the CIDOC CRM, and
aligning (possibly even merging) the two object-oriented models with the aim
to contribute to the solution of the problem of semantic interoperability
between the documentation structures used for library and museum
information, such that:
o all equivalent information can be retrieved under the same notions, and
o all directly and indirectly related information can be retrieved regardless
of its distribution over individual data sources;
o knowledge encoded for a specific application can be repurposed for
other studies;
o recall and precision in systems employed by both communities is
improved;
CEN/ISSS WS/eTOUR – CWA – 2009-06-03 – 67
o both communities can learn from each other’s concepts for their mutual
progress;
o for the benefit of the scientific and scholarly communities and the
general public.
In 2006 a first draft of FRBRoo was completed. It is a logically rigid model interpreting
conceptualizations expressed in FRBRer and of concepts necessary to explain the
intended meaning of all FRBRer attributes and relationships. The model is formulated
as an extension of the CIDOC CRM. Any conflicts occurring in the harmonization
process with the CIDOC CRM have been or will be resolved on the CIDOC CRM side
as well. The Harmonization Group intends to continue work modelling the FRAR
concepts and elaborating the application of FRBR concepts to performing arts.
HarmoNET
The Harmonisation Network for the Exchange of Travel and Tourism Information,
HarmoNET, is an international network bringing together people and organizations
with an interest in the topic of harmonization and seamless information exchange in
travel and tourism. HarmoNET provides unique technologies and services enabling
an easy, affordable and fast information exchange.
SUO
Recognizing both the need for large ontologies and the need for an open process
leading to a free, public standard, a diverse group of people has come together to
make such a standard a reality. The Standard Upper Ontology (SUO) will be an
68 – CEN/ISSS WS/eTOUR – CWA – 2009-06-03
upper level ontology that provides definitions for general-purpose terms and acts as a
foundation for more specific domain ontologies.
It is estimated to contain between 1000 and 2500 terms plus roughly ten definitional
statements for each term.
SOUPA
concepts. Due to the heterogeneity of the travel and tourism industry, it is a challenge
for a single ontology to cover the whole market offer, thus the ontology management
process would potentially be too complicated.
6.3.4 Recommendations
6.3.4.1 Short-term recommendations (1–3 years)
7 Data transformation
7.1 Structured data mapping
7.1.1 Needs and requirements
7.1.1.1 Introduction
7.1.1.2 Needs
Using information systems in the travel and tourism industry implies using information
coming from different data sources. All in all, it is a system working in cooperation
with other systems and for this to happen, information coming from various data
sources may be needed to provide a particular service to a client.
The approach to be taken requires the creation of a mapping description using some
kind of formal language that maintains the level of formality and expressivity of both
the ontology and the database. The document containing the description of them has
to show the correspondences between the components of the database’s SQL
schema and those of the ontology. Afterwards, the ontology needs to be populated
through the mappings that have been made explicit in the document. The process
ought to be as automatic as possible in order to not need a high human effort.
In order to do this, languages to define mappings are needed. These languages have
to have the following features:
The language ought to define how to create instances in the ontology in terms
of the data stored in the database.
The language needs to have a declarative nature in terms of discovering
inconsistencies and ambiguities in the definition of a mapping. This potential
problems have to automatically be discovered by the mapping language.
The mapping definition language could potentially be used to automatically
characterize data sources to allow dynamic query distribution in intelligent
information integration approaches.
The mapping definition language doesn’t have to declare the degree of
similarity between database elements and ontology components. Rather, it
has to state under which conditions and after what transformations the
database elements are equivalent to the ontology components.
7.1.1.3 Requirements
Semantic conflicts occur whenever two contexts do not use the same interpretation of
the information. Goh identifies three main causes for semantic heterogeneity that
need to be overcome in order to achieve semantic interoperability [Goh, 1997]:
Confounding conflicts occur when information items seem to have the same
meaning, but differ in reality, e.g. owing to different temporal contexts.
Scaling conflicts occur when different reference systems are used to measure
a value. Examples are different currencies.
Naming conflicts occur when naming schemes of information differ
significantly. A frequent phenomenon is the presence of homonyms and
synonyms.
The use of ontologies for the explication of implicit and hidden knowledge is a
possible approach to overcome the problem of semantic heterogeneity. With respect
to the impact on the data exchange, structuring conflicts can be differentiated:
fully mappable: all clashes can be resolved without any loss of information;
partially or non-mappable: covering the structural conflicts for which any
conceivable transformation will cause a loss of information.
Here are some examples of clashes between different standards identified [Dell’Erba,
Fodor, Höpken, et al, 2005].
Most of current approaches to solve the interoperability problem are mainly based on
the idea of fixed, obligatory standards, which define all details of the exchanged
messages. An example of an international XML-based standard is the specification of
OTA [OTA]. Companies, which are using such standards, are automatically able to
exchange information with each other. However, all details of the exchanged
message must be committed among all communication participants. The process of
defining and maintaining such standards requires a lot of effort and therefore such
standards are almost exclusively used by large companies such as hotel chains,
airline companies and Global Distribution Systems (GDS).
There are several approaches in the literature to address the database to ontology
mapping. In general, they can be classified into two main categories: approaches to
create a new ontology from a database and approaches to map a database to an
already existing ontology.
mapping definition (i.e. the definition from the database structure (schema) to
the ontology structure, and
data migration, the migration of database content to instances of the ontology.
Most of the mappings have been defined ad-hoc, i.e. for particular cases and are
neither reusable nor extensible to other cases. Besides, should changes occur within
databases, the whole mapping and even ontology would have to be redefined in
order to cover new concepts and relations.
CEN/ISSS WS/eTOUR – CWA – 2009-06-03 – 75
The literature review has shown a number of languages that have been used to map
databases to ontologies. However, there is no evidence of any language that links
(maps) ontology components to database elements.
There is still a lot of human intervention needed for creating mappings. Although
graphical interfaces have been created (like in the case of R2O) still the mapping
work is in general hand intensive. This depends upon the level of formality and
different expressivity information is represented with and stored in databases. One
possible way to automate in a certain degree the mapping creating process could be
to recommend the building of the ontology using existing standard languages. This
way ontologies could be compared, as they would have the same degree of
expressivity and formality.
7.1.4 Recommendations
7.1.4.1 Short-term recommendations (1–3 years)
Use semantic web technologies (e.g. based on RDF URIs) to name and
represent (data) resources on the Web so that mapping can be automatically
undertaken.
Agree the degree of formality information ought to be defined with, so that
automatic mapping tools compare same kind of information.
Foster high level general ontologies to describe particular domains of interest
so that low-level more concrete ontologies can later be linked or merged within
the (more general) structure (if and only if both ontologies are defined with the
same level of formality and with the same ontology definition language).
particular source of information: author, date, origin, content, type of file, etc. Within
the context of Semantic Web (as defined by Tim Berners-Lee) annotating document
content is proposed by using semantic information from domain ontologies [Berners-
Lee, 2001]. The result of (manually) annotating a Web information resource is Web
pages with machine interpretable mark-up that provide the source material with which
agents and Semantic Web services and advanced search engine operate. The goal
is to create annotations with well-defined semantics.
The amount of tourism information on the Web is huge and the diversity of its nature
is also vast. Furthermore, recent studies have shown that decisions of tourists about
their potential destinations are increasingly influenced by multimedia and web-based
content and comments generated by other tourists. Besides, tourists have begun to
share their experiences on the web in the so-called Web 2.0 phenomenon and a
tremendous amount of web pages have been created by tourists and final users.
Event destination management organizations are beginning to include user
generated content into their own web sites as a way to promote their destination.
Travel and tourism is a leading industry in the application of B2C and B2B2C
eCommerce and mCommerce solutions as well as Web based information channel,
and a huge number of tourism information systems have been developed in order to
support all the processes related to the electronic market. If the objective is to
automate the eBusiness processes over the Web with no human intervention and
allowing machines to automatically interoperate among them, there is a must to
annotate information sources so that a mediation ontology can integrate information
coming from heterogeneous systems.
Therefore, in order for the tourism industry to succeed, new ways of data and content
annotation have to be developed so that the particular piece of information is used by
a particular machine for a particular business process allowing a vertical data
integration approach to the tourism market.
Manual annotation tools allow users to manually create annotations, i.e. metadata
about a particular information source. These tools are in general terms relatively
similar to those used for pure textual annotations, but differ in the sense that they
provide some support for ontologies.
Following, there is a list with some of the most relevant annotation tools found in the
literature:
Amaya [Quint, 1994] is a Web browser and editor that marks-up Web
documents in XML or HTML. The user can make annotations in the same tool
s/he uses for browsing purposes. It facilitates manual Web pages annotation
but does not support any automatic annotations;
The Annozilla browser aims to make all Amaya annotations readable in the
Mozilla browser;
The Mongrove system is another example of manual but user friendly
annotation tool [McDowell, 2003]. The annotation tool is a straight forward GUI
that allows users to associate a selection of tags to text that they highlight;
Due to the increase of multimedia content on the Web, tools to annotate this
kind of content have become very useful. Vannotea [Schroeter, 2003] can be
used to add metadata to MPEG-2 (video), JPEG(2000) image and Direct 3D
(mesh) files, with the mesh being used to define regions of images;
78 – CEN/ISSS WS/eTOUR – CWA – 2009-06-03
OntoMat Annotizer: this is a tool for making annotations which is built on the
principles of the CREAM framework. It has a Web browser to display the page
which is being annotated and provides some reasonably user friendly
functions for manual annotation, such as drag and drop creation of instances
and the ability to mark-up pages while they are being created;
The M-OntoMat-Annotizer [Bloehdorn, 2005] supports manual annotation of
image and video data by indexers with little multimedia experience by
automatic extraction of low level features that describe objects in the content.
A commercial version of OntoMat, called OntoAnnotate,5 is available from
Ontoprise;
SHOE Knowledge Annotator [Heflin, 2001] was an early system which allowed
users to mark-up HTML pages in SHOE guided by ontologies available locally
or via a URL. Users were assisted by being prompted for inputs. Unusually,
the SHOE Knowledge Annotator did not have a browser to display Web
pages, which could only be viewed as source code.
Other important challenges for the future in this active research area are: automating
the annotation of information of various formats, addressing issues of trust and
security and resolving problems of storage.
7.2.4 Recommendations
Due to the nature of this topic, there can be some overlapping of recommendations
with other issues that have already been covered, such as ontologies.
These scenarios are only a few examples of many. Actually, the amount of
information that is stored in this way probably vastly surpasses that in structured
sources. As Martin Hepp, Katharina Siorpaes, et al have analyzed, structured and
unstructured data complement each other in many cases, e.g. for hotels where web
sites frequently contain more complete descriptions of the hotel, while the GDSs only
publish the room availability.
Normally, however, the data on the web is unstructured and geared towards human
consumption only. Only rarely do metadata or formal resource descriptions reliably
complement and explicate this unstructured information to facilitate its use in
automated transactions or automated integration with structured resources. It seems
unlikely that this situation is going to improve fundamentally over the next years.
The unstructured nature of the data invariably limits its reuse in electronic
transactions. Based on this type of information it will be difficult at best to, e.g.,
automatically complement a hotel booking with the reservation of museum and
theatre tickets.
7.3.1.1 Needs
7.3.1.2 Requirements
In an ideal world, information extraction would structure free text in such a way that it
can be automatically analyzed, queried and integrated with structured data sources.
This is certainly illusionary for the foreseeable future. Nevertheless, it is necessary to
explore the potential of the various facets of information extraction for the eTourism
domain.
automatically structured and usually imported into databases, XML files or other
structured storage formats for subsequent analysis and evaluation.
Currently the two branches of information extraction that have drawn most attention
in the research community are named entity recognition – the explication of
references to persons, organizations, places, etc. – and event extraction; the latter,
e.g., practiced in projects such as JRC’s EMM Violent Events Maps that are
automatically compiled from published news feeds. Both are pertinent to eTourism.
Furthermore, some research has been done on information extraction specific to
eTourism.
Named entity recognition is by now a rather well understood topic with wide
applications both across many fields – computational linguistics, computational
philology and related disciplines, even genetics – and across many languages.
Approaches for name taggers often build either on hand-crafted rules – good
classifiers can reach a precision well above 90 % for English language material (cf.
Grishman, 2003, note 3) – or machine learning technologies including automated
learning and statistical model building. Both maximum entropy [Borthwick, 1999] and
Hidden Markov [Bikel, Miller, Schwartz, Weischedel, 1997] models have been trained
using tagged reference materials. The models have then been successfully applied
to untrained material, reaching again precision levels above 90 % for new material.
Various readily available tools implement named entity recognition. The ANNIE
package of the open source GATE suite contains resources such as a tokenizers,
gazetteers and semantic taggers to build rule-based named entity resolvers. Many
other open source or commercial offerings are listed in
http://en.wikipedia.org/wiki/Named_entity_recognition.
Whereas named entity recognition is a rather well understood topic, event extraction
is somewhat more experimental and by necessity more closely bound to the type of
events that are supposed to be extracted. A given event type is usually captured
according to a given template – essentially a database table or a set of formal
assertions – whose valencies are filled from entities that are isolated in the free text.
As a rule named entity recognition is a part of this explication process as named
entities frequently occur in the description of events.
To illustrate this situation a typical example of the description event from the Wall
Street Journal of 1993-02-19 may help. This example is lifted directly from GATE
Information Extraction:
CEN/ISSS WS/eTOUR – CWA – 2009-06-03 – 81
New York Times Co. named Russell T. Lewis, 45, president and general
manager of its flagship New York Times newspaper, responsible for all
business-side activities. He was executive vice president and deputy general
manager. He succeeds Lance R. Primis, who in September was named
president and chief operating officer of the parent.
Ideally event extraction might automatically capture the series of events implied in
this article according to a job-related template with fields such as organization, job
title, newly appointed person, and previous job holder. In reality this is often highly
non-trivial, as exemplified by the number of anaphoric references (“he”, “who” and
“the parent”), the need for inference (Primis obviously was the previous job holder
and has now been promoted) and the amount of encyclopaedic knowledge (New
York Time Co. is the holding for the newspaper) needed for interpreting even this
short and seemingly simple news bulletin.
Unsurprisingly results tend to be better if the source material already follows some
recurrent pattern, as is the case, e.g., for many job postings or medical records, but
also, interestingly, for news articles on violent events such as bombings or earth
quakes.
The number of readily available tools for event extraction is smaller than that for
named entity extraction, and they need to be heavily tailored for any given type of
event extraction and template. One example for such a tool is the open source GATE
Information Extraction package. Commercial offerings include the OpenCalais suite
of web services.
Information extraction for tourism specific data necessarily has to deal with a number
of different types of events such as performances, sports events, entries from event
calendars, etc. Each of these can have its own display rules and needs its own
templates. Furthermore, pertinent data is regularly spread across many sources in
many different languages and must hence support parsing from many languages.
Ideally, it should then be stored in language-independent templates based on
language-neutral concepts hierarchies. For end-user consumption the templates
must be rendered in various languages, ideally fully automatically. Already the FP4
project on Multilingual Information Extraction for Tourism and Travel Assistance
(MIETTA, 1998–2000) worked precisely on these issues. Xu, Netter, Stenzhorn
elaborate on two event types, adult education courses and theatre performances and
describe the MIETTA system developed in the project. Sadly, they do not publish any
data on the reliability of the system by testing the extracted information against
manually captured data, as would have been normal. To know this would obviously
be a precondition to gauge the viability of the project’s approach.
For eTourism named entity recognition is a key to linking extracted information with
given locations or organizations such as hotels, theatres, or other relevant players.
For this purpose one need agreement on an suitable model to unambiguously link
the names of organizations against a suitable vocabulary of organizational units in
the eTourism domain, possibly based on the 29 types proposed in “Annotation
guidelines for answer types” [Brunstein, 2002]. These findings need to be validated
against sample data to test the level of granularity and a sufficient precision in the
tagging.
Event extraction is still a research area, though, as we have seen, first applications
are operational, e.g. in the news arena. Standardization in this area would be
premature, though.
Event extraction for eTourism is still very much an area of research. In particular it
misses performance tests that would allow an informed decision on the precision that
current systems can reach. Given the great potential that information extraction can
have for the domain, it would be highly desirable to have such data.
7.3.4 Recommendations
7.3.4.1 Short-term recommendations (1–3 years)
The mapping between an integrated global ontology and local ontologies may
support enterprise knowledge management and data or information integration. In
the Semantic Web an integrated global ontology extracts information from the local
ones and provides a unified view through which users can query different local
ontologies. In an information integration system a mediated schema is constructed
for user queries. Mappings are used to describe the relationship between the
mediated schema, i.e. an integrated global ontology and local schemas.
7.4.1.2 Needs
There may be different airlines flying to the same destinations from same origins, and
that information has to be shown to the final user in order for her to make a decision
on the most convenient way to travel.
Tasks on distributed and heterogeneous systems demand support from more than
one ontology. Multiple ontologies need to be accessed from different systems. In
addition, the distributed nature (conceptualization) of ontology development has led
to dissimilar ontologies for the same or overlapping domains. Therefore, various
parties with different ontologies do not fully understand each other and they cannot
work together as a consequence, not allowing electronic transactions. To solve these
problems it is necessary to use ontology mapping to achieve interoperability among
information sources and enable effective and efficient business transactions over the
Internet
7.4.1.3 Requirements
Information sharing and integration does not only have to provide full accessibility to
data. In addition it ought to make that data fully processable and interpretable by
machines as well. One possible way to achieve effective heterogeneous information
integration is creating links among already existing ontologies. There are different
ways to map ontologies among them: from an integrated global ontology into local
ontologies, local ontologies among them and ontology mapping in ontology merging
and alignment.
With the growing use of ontologies in different domains of interest, the problem of
overlapping knowledge in a common domain becomes critical. The complexity of the
travel and tourism industry could by no means be represented by a single ontology,
thus multiple ontologies would have to be accessed from various applications. Inter-
ontology mapping could very well provide a common layer from which several
ontologies could be accessed and hence could exchange information in semantically
sound manners.
Many information integration systems use more than one ontology to describe the
information. The problem of mapping different ontologies is a well known problem in
knowledge engineering. General approaches that are used in information integration
systems are:
Although reasonable results have been achieved on the technical side of using
ontologies for intelligent information integration, the use of inter-ontology mapping is
still an exception. Reviewing the literature, it seems that most of the mappings have
been realised ad-hoc, i.e. for the particular purpose of the mapping itself, especially
for the connection of different ontologies. There are approaches that try to provide
well-founded mappings, but they either rely on assumptions that cannot always be
guaranteed or they face technical problems. There is a need to undertake research
on mapping methodologies for general purposes.
Most systems only provide tools to develop ontologies, and they fail to indicate a
particular methodology to develop them. The comparison of different approaches
indicates that requirements concerning ontology language and structure depend on
the kind of information to be integrated and the intended use of the ontology. There is
a need to develop a more general methodology that includes an analysis of the
integration task and supports the process of defining the role of ontologies with
respect to these requirements.
7.4.4 Recommendations
Recommendations within this section are by nature very similar to the
recommendations proposed within chapter 6.3 (“Ontologies”).
8 Process handling
8.1 Needs and requirements
8.1.1 Introduction
Consumers in the tourism industry are getting more and more used to make online
transactions, and the industry is competing with services to attract these customers
and get them to the actual booking act as fast as possible. Traditional distribution
channels are vanishing, and more flexible and dynamic networks rise. This very
dynamic development puts pressure on service providers: Business actors have to
follow demand to keep or expand their market share, otherwise they might get
crowded out.
These challenges require skills in marketing but most of all in deploying modern
information technology to manage the actual buying or booking process. This
process and other processes in the domain alike usually require the participation of
different players along the value chain to be fulfilled, making it necessary to interact
easily with other computer systems on a process level. But the management of
business processes is already difficult within one organization, making it a much
more sophisticated challenge in a network of organizations.
As already outlined in the introduction to this topic, in the context of information and
communication technology we consider a process to consist of data, being defined as
inputs and outputs, and of its execution, being a “work activity” or step. The problem
of data heterogeneity across different systems is part of the chapter on semantics,
while we want to discuss the dynamic aspect of executing processes by involving
heterogeneous computer systems in this chapter.
In fact even the one-time exchange of data is already a simple process, which implies
that data cannot be exchanged without having some kind of processes being
88 – CEN/ISSS WS/eTOUR – CWA – 2009-06-03
involved. Since this is already true for web sites being “crawled” to get information,
we do not want to consider passive process participation in our discussion of the
matter. Instead, we consider a rather complex interplay of at least two participants.
This has always been a problematic issue, being a more critical challenge compared
to mere exchange of data. This issue becomes even more pressing within a highly
networked, dynamic and diverse environment like the tourism industry today. The
introduction of standards is of course always a sophisticated way to meet
interoperability issues, but we know from the past that it is difficult to find industry-
wide acceptance. One reason is the loss of flexibility accompanying standards,
another one the game of market forces.
Since we leave the problem of data mediation to the chapter of semantics, and since
we consider complex processes, we have named this chapter “Process handling”,
and we consider it the dynamic component of process interoperability with the need
of active participation of all actors involved.
8.1.2 Needs
Under this chapter the basic and principle needs for process interoperability are
analysed and discussed, while requirements are outlined in the following chapter.
The challenge is to find ways of process interoperability between heterogeneous
systems that allows an easy integration of business processes and leaves the
autonomy and diversity of the different players, which is needed to correspond with
the diversity of requirements on a global scale. The following discussion does not
touch business issues like pricing, virtual or ad-hoc organizational forms, dynamic
packing, legal aspects, etc. The intention is also not to design platforms for these
issues; it is merely about discussing and recommending one or several ways to allow
process interoperability.
These three steps are very basic and might have some backward loops (e.g. if the
room is not available) or sub-steps (the check for availability might include a
temporary blocking of the room for the specific date). But most of all, parts of the
process might run on other systems. Imagine that the booking is done on a portal
comprising a number of different hotel chains. The checking of availability is done on
a hotel chain’s computer system and the check for approval of payment by credit
card (a pre-requisite for making a reservation) is done on a third system.
CEN/ISSS WS/eTOUR – CWA – 2009-06-03 – 89
This short use case illustrates that a business process can be broken down in
different steps (or sub-processes), which might need the interaction of different
systems. The entire process could be drawn by using a flow chart showing the
different steps and their dependencies. In any case the completion of the entire
business process requires the handling of the steps and conditions. For example, if
the room has been reserved in a first step and in a later step the credit card is not
accepted, then the reservation of the room must be cancelled. Or it is cancelled
automatically after some time if there is no confirmation which is required to complete
the booking. This is up to the design of the process on the hotel’s side. However, the
portal, owner of the entire business process, might need to deal with as many
different systems as hotel chains are presented on the portal. And each of the
systems might have different naming for reserving a room (booking, reservation,
locking, etc.) and different conditions (requires confirmation to complete the
reservation, cancels reservation automatically after some time without confirmation,
keeps reservation alive until status is changed, etc.).
Although the portal requires just one step to be done on another system, it might
have to deal with 100 different ways how to deal with this step if 100 hotels are
involved. And each hotel might have to deal with 100 booking systems if they make
business with 100 portals. Thus each actor might have to implement 100 interfaces to
be interoperable with the required other systems. It is obvious that this is increasing
dramatically the efforts to run processes automatically with other partners.
Since this simple use case is a very common use case and the industry is depending
more and more on the interaction of different computer systems, we can assume a
strong need for a solution that decreases the complexity for process interoperability
in a networked environment. Typical business processes in the tourism industry are:
searching,
selling and buying,
reservation,
booking,
modification,
cancellation,
confirmation,
notification,
payment and other money transfers.
This list might not be complete and could provoke a lot of discussion (e.g. the
difference of buying and booking, confirmation and notification, etc.). However, it
shall only bring examples of the frame of possibilities we are discussing. To perform
all these processes in a networked environment we can assume that
“the basic industry need is an applicable concept for the technical interaction
of heterogeneous ICT systems to provoke and run complete business process
cycles involving at least two different technical systems.”
“Applicable concept” shall express the need for something that is useful in daily
business life.
90 – CEN/ISSS WS/eTOUR – CWA – 2009-06-03
“Complete business process cycle” means that a business process, wherever it starts
or ends, should be carried out completely as defined.
“Involving at least two different technical systems” shall define that the topic can have
a bi-directional setup, but in any case has to be flexible enough to be run in a
network of different technical systems, thus more than two and up to an unspecified
number.
The design and management of business processes is a subject on its own, but for
the discussion following it is enough to say that business processes can be broken
into a number of steps. Each of the steps needs a trigger to initiate the step, has
some conditions to be started and delivers an output, including information required
for the performance of the overall process (e.g. trigger for another step).
Furthermore, we assume that a step can only run on one system. If it requires two
systems this step must be broken down into several steps. This assumption is
reasonable, because despite of the discussion whether this is technically feasible, it
is necessary to have the authority over a step with one actor. Otherwise two or more
actors would be responsible for the same step which is obviously not feasible in
practice.
8.1.3 Requirements
The chapter about requirements shall bring the industry needs, as described above,
in a more structured and operative form.
Additionally, there are also more flexible initiatives, giving a framework within which
players can adapt according to their needs, based on a non-cohercitive language that
allow to express common basic element in a similar ways for all players but at the
same time allow combination of those elements in different ways so as to allow
diversity. Cost of implementation can be reduced compared to a full standard since
all players may publish different levels of services. The use of templates also allows a
certain flexibility in the format of responses according to requesters. A drawback
stems from the fact that integrating different players may require certain adaptations
due to commercial or system driven specificities. On the other hand this fact allows
competition and diversity. An example of such a language is the XFT (Exchange For
Travel) language.
gateway for external systems where the corresponding partners do not care about
what happens behind the partner’s gate.
This is feasible when having a central player, but is not feasible in open and dynamic
networks, since an interface for each player has to be developed. This increases
drastically complexity and cost of implementation. However, Application Integration
and APIs are better suited to handle different processes and are more responsive
regarding specific requirements for the systems involved.
The following table helps to highlight the current state of the art as described above
meet the needs and requirements identified:
The different entries might well be questionable and can raise discussions, but in
general they reflect well the current situation: Standards and Application Integration
are not fully suitable for a highly networked and dynamic environment like the tourism
industry today. They result in a loss of autonomy, need some central entity or control
of power, and are expensive.
For each process run over different systems the interface needs to be specified,
developed and maintained separately, since they do not all make use of the same
standards or interfaces. If a new version of a standard or interface is published it
cannot be used automatically. It needs to be deployed and maintained manually. It is
CEN/ISSS WS/eTOUR – CWA – 2009-06-03 – 93
obvious that a more flexible solution with a mediating technology meets better the
requirements than rather rigid technologies.
These projects address the need for a more flexible and cost efficient way to align
business processes between different systems. A promising way is the concept of
Semantic Business Process Management, resulting from the application of Semantic
Web Services to Business Process Management, as for example discussed by Hepp
et al [2005]. Based on this concept, Cimpian, Mocan [2005] proposed a process
mediator, adjusting the bi-directional flow of messages based on the Web Service
Modeling Ontology (WSMO). This approach is similar to that chosen by the
Harmonise project (http://www.harmonet.org/) for data mediation, in which a
technology for mediating between heterogeneous data sources was developed. The
Harmonise technology allows involved parties to exchange information without
changing the local data structure, only by referring to a common understanding of a
domain-specific ontological concept, the Harmonise Ontology [Fodor, Werthner,
2005].
8.4 Recommendations
8.4.1 Short-term recommendations (1–3 years)
Simplify and rationalize existing processes – use stateless process handling or
request-response-pairs only.
Build an ontology of common processes in the tourism industry.
9 Metasearch
9.1 Methodology
9.1.1 Needs and requirements
9.1.1.1 Introduction
Metasearch is the ability to run one search process over different search engines of
heterogeneous instances (platforms, websites, databases) and aggregate result in a
unified list. In the tourism industry they are typically used to compile and compare
specific offers. Examples are:
Checkfelix: http://www.checkfelix.at/,
Kayak: http://www.kayak.com/,
Farechase: http://farechase.yahoo.com/,
Trabber: http://www.trabber.com/,
Kelkoo Travel: http://travel.kelkoo.co.uk/, and
Minube: http://www.minube.com/.
Typically, search results are not stored in a database, but delivered as real-time
results. However, some systems make use of data replication for static data (like,
e.g., hotel descriptions).
An acceptable response time for search engines is of high importance to meet user’s
expectations. Metasearch engines depend on the response time of the other search
engines and have to face clever algorithms to avoid deadlocks. However, this is less
a matter of information and process interoperability, except that runtime performance
in aggregating data might be improved.
CEN/ISSS WS/eTOUR – CWA – 2009-06-03 – 95
Data can either be accessed by getting it automatically from the web interface (user
interface) or via a data interface (e.g. web services). Semantic Annotation of content
and Semantic Mapping, so that the metasearch engine can find the information that
is required provides part of the answers and are detailed in the corresponding
chapters of this study. However, there are still some remaining issues for instance
regarding data encapsulated in pure graphical applications such as Flash
applications, because the data is not accessible at all. Possible solutions come from
the new technology trends such as Flex applications where the whole application is
XML based. Other issues stem from client-side calculations (e.g. options depend on
different settings, prices are calculated on the fly by client side). In that case, the data
is directly hard coded in the application and would require interpretation of the code
to access the data and the corresponding rules.
The search on another system is often tailored to the particularities of the foreign
system. Interfaces have to be updated each time the other systems changes to keep
the service level, if these interfaces do not follow a given pattern or standard.
Web crawlers (synonyms: Robots, Bots, Spiders) are software scripts and programs
that browse the World Wide Web in an automated manner to create copies of
website (which are processed by other software agents later) or to gather specific
information. They are used by search engines but are typically not used for
metasearch processes, since they are normally only gathering information and not
running processes on other websites.
HTTP requests can be used to run automated search queries on existing search
engines by rebuilding the HTTP request that is used on each of the external sites.
HTTP requests are very maintenance-intensive, since each little change in the HTTP
requests requires an up-date of the process. Depending on the external site, data is
sent back in an unstructured or in a structured manner and needs to be processed to
bring it into the scheme (semantics) used for displaying results by the metasearch
engine. Depending on the provision of results on the external systems, HTTP
requests can be used as a light-weighted, yet still maintenance-intensive, way for
running a metasearch process.
96 – CEN/ISSS WS/eTOUR – CWA – 2009-06-03
Since advanced tools can run different operations, website wrappers are well suited
for metasearching. Still they need considerable maintenance efforts since a change
in the wrapped website requires an update of the wrapper.
APIs are the classic way for interactions between different computer systems and
allow a broad range of possibilities. They can be independent of programming
language and are therefore open for any kind of integration and information
exchange. It has to be mentioned that the implementation of an API typically causes
considerable efforts, and there is no general standard for APIs, since APIs vary
significantly from the purpose and the domain concerned.
A web service is a software application enabling the exchange of data in XML format
to allow machine-to-machine (M2M) interaction on different platforms. Different from
HTTP requests and web crawlers, the provision of a web service has to be
implemented by the service provider, who is identified by listings in registries (UDDI).
A similar approach is REST (or RESTful web services) to allow the exchange of
domain-specific data over HTTP without an additional messaging layer like in web
services. It is often described as an easier form of web services.
9.1.2.8 Summary
The first group comprises methods where the agent providing a metasearch engine
can integrate other search engines without any assistance from the search engines
used (web crawler, HTTP requests, website wrapper). Thus the metasearch agent is
more independent from the other systems. These methods are therefore more
flexible, but cause considerable efforts for the implementation and maintenance of a
metasearch service. However, they would obviously cause less effort if standards are
supported or interoperability problems are solved.
The second group comprises methods, where the assistance of the external search
engine is required, where some kind of interface is provided or where other changes
are necessary. Clearly, these methods make it easier to implement and maintain a
metasearch service, but require the application of standards or the solving of
interoperability issues to run smoothly.
The methods for metasearch described above provide useful tools to integrate
different search engines, but quality of results, response time, access and efforts for
maintenance depend very much on the use of standards or the ability to understand
the other system in another way. Especially the combination of website wrapper and
semantic annotations to websites seem a promising way to enable improved
metasearch functionalities. The deployment of metasearch engines could be
98 – CEN/ISSS WS/eTOUR – CWA – 2009-06-03
One important direction metasearch engines can take is that of semantics. Semantic
search engines are becoming increasingly popular. Semantic search engines are
systems that need to understand (the meaning of) both what the user is asking for as
well as the information that is stored in the web. Any semantics-based query
recognizes key words used in order to carry out a search and uses that same
information in order to display more precise results. The final and main objective of
this search technique is to find all documents on the web that contain the most
relevant information related to the query (i.e. those that syntactically match with the
search keywords), minimizing the number of false results.
Another big challenge for metasearch engines, after understanding the content, is the
system performance. All methods have to fetch data from different systems,
transform them and display them in appropriate manner (ranking, paging, etc.). The
more sources the systems queries, the more benefit it offers for the user, but the
slower the system becomes.
9.1.4 Recommendations
9.1.4.1 Short-term recommendations (1–3 years)
9.2 Querying
9.2.1 Needs and requirements
9.2.1.1 Introduction
More often than not the information involved in eTourism transactions is distributed
across a number of different data stores, usually operated by different companies:
various GDSs (potentially in their respective national incarnations), CRSs, other
sources, not to forget the plethora of unstructured data such as the web. As
discussed throughout this section, we often need to find information in and across
many of these data sources and, indeed, often for the data sources themselves.
List all hotels in Rome with at least three stars that have availabilities between
October 20th and 22nd.
List all prices for flights to Rome that fly in on the morning of 20th and return
on the evening of the 22nd.
In many cases queries could be much more complex still and be combined with
constraints based on geographical data (hotels not more than 500 metres from the
Spanish Steps), price ranges (not more than EUR 100 per night), etc. In many cases
subsequent queries will build on the existing result sets of simpler queries and further
refine them in a piecemeal manner.
Going more into detail of human search behaviour, we can observe that users are not
searching for hotels “not more than 500 metres from the Spanish Steps” but rather for
hotels “close to” or “near” to the Spanish Steps. However, a hotel “near” Rome might
describe another distance than a hotel “near” the Spanish Steps. The translation of
human search needs or peculiarities into a respective machine-readable search
query covers aspects of interoperability we are not going to cover in this chapter,
which focuses on machine-machine interoperability. Nevertheless, natural language
processing and the transformation into search queries remain important aspects and
challenges in querying.
Queries along these lines are typical parts of the selection phase in eTourism
transactions. A given transaction will often involve a considerable number of queries
as the customer or her agents are narrowing down their result set to a small number
of hits that fit the demands. Queries therefore must be fast and return results within a
maximum of a few seconds. Nevertheless, we can observe metasearch engines on
the market today taking minutes rather than seconds to run real-time queries on
external systems.
100 – CEN/ISSS WS/eTOUR – CWA – 2009-06-03
100 % correctness of query results at this stage is highly desirable, but not absolutely
necessary. The ultimate corroboration or falsification of query results can follow at the
booking phase when an unbinding service offer is turned into a binding contract
between supplier and customer.
The tourism industry is a highly dynamic environment, and data stores and search
engines appear, and also disappear, almost continuously. Content aggregation and
syndication become indispensable tasks for the provision of one-stop platforms.
Ideally, integration of new data stores into a user or travel agency facing metasearch
engine should be largely transparent, easy and thus cost-efficient.
Technically, the search query entered into current metasearch engines has to be
translated to other data stores for further processing. This can either be done by
At present, option 1 dominates. For federated queries individual data stores today
offer their own query strategies. These strategies often reflect their historic evolution
and their specific internal processes. This makes querying one of the biggest
challenges for metasearch, since queries cannot be translated easily from one
system to another. The integration of each new data store means considerable
custom programming, making it a costly and time-consuming enterprise.
“Query by example” was developed by IBM in the 1970s in parallel to what was to
become SQL (cf. Ramakrishnan, Gehrke, 2002, chapter 6). The user supplies
example result sets that can formulate constraints or other selection criteria in
addition to typical string values. Examples can often be built through graphical user
interfaces.
When looking for a hotel, for example, the client would specify basic hotel
characteristics (e.g. name, category, etc.), and room category, and the system would
on that basis return a suitable set of hotels [Höpken, 2004]. This way, query by
example partially relieves users to learn about formalized query languages and,
instead, allows them to find related entries to known samples. However, it needs
clear templates for the type of examples that can be constructed and used as the
basis for cross-data store queries.
Standardized query languages usually expect users – which in this case will normally
be system integrators rather than end-users – to learn a specialized language for
querying a system. The best known of these certainly is the Standardized Query
Language (SQL) for relational databases which is used by all current relational
database management systems. At least the core features of ISO/IEC 9075, the
international standard specifying SQL, are implemented by virtually all suppliers of
relational database management systems.
SQL does not lend itself particularly well to federated queries and is normally used to
consult a given database instance. SQL-like syntax is used, however, for drill-down
searches in federated registries such as the ebXML Registry Specification. Likewise,
the SQL syntax has heavily influenced the syntax of a number of other non-relational
query languages such as the Object Query Language (OQL), Simple Protocol and
RDF Query Language (SPARQL), and aspects of the Topic Map Query Language
(TMQL). In the following we shall look at one new query language especially for
(potentially federated) semantic queries.
SPARQL: The Query Language for RDF (SPARQL) is, as the name suggests, a
language for querying RDF triples. This relatively new W3C Recommendation was
only published in January 2008, but can already point to a considerable
implementation base. It can be used to query against a considerable number of
commercial and non-commercial native triple stores – for a not complete list cf.
http://esw.w3.org/topic/SparqlImplementations –, but also against adaptors such as
D2R Server that sit on top of relational databases. This flexibility has encouraged the
growth of a number of publicly available SPARQL endpoints, some of which are listed
on http://esw.w3.org/topic/SparqlEndpoints.
SPARQL can honour the transitivity properties defined in RDF-S and OWL
ontologies.
SELECT ?resource
WHERE {
?resource dc:creator <http://www4.wiwiss.fu-
berlin.de/gutendata/resource/people/Abbott_Eleanor_Hallowell_1872-1958>
which would list all resources created by Eleanor Hallowell Abbot available in a given
triple store, or
SELECT ?title
WHERE {
?book dc:language "en" .
?book dc:title ?title
} ORDER BY ?title
Unlike SQL, SPARQL can be used for distributed queries and aggregation of data
across data stores [Schenk, Staab, 2008], [Haase, Wang, 2007] [Quilitz, Leser,
2008]. Vocabularies can be cross-references and cross-queries across data stores.
However, in the case of divergent ontologies being exposed in participating data
stores, suitable mappings, e.g. to a reference ontology, must exist for a distributed
query to succeed. Such a search strategy could in principle scale to manually
annotated data sources such as web pages annotated with RDFa.
That said, at present few, if any, examples of distributed SPARQL queries across a
number of nodes operated by different organizations are known to be used in a
production environment, even less so queries including many individual web pages
(though some commercial products such as Allegrograph
(http://agraph.franz.com/allegrograph/) support elaborate and largely transparent
federated SPARQL queries and reasoning across distributed instances of the
system). Little is known if the technology would, in fact, scale well enough for large
heterogeneous networks, and, if so, in which type of network topology.
Just as the query language itself, also the interfaces to query services can be
standardized, e.g. through shared interface specifications in WSDL. Without a shared
query language the expressiveness of such services is necessarily limited, but in
many cases the result sets even of simple queries can be subsequently refined in
further query steps.
Using simple syndication protocols based on Topic Map or RDF playloads only
actually changed records are exchanged between data stores. An Atom Atom-based
general purpose syndication protocol is specified
specified in part 1b of the nascent eGov
eGov-
Share CWA (http://www.egovpt.org/fg/CWA_Part_1b
http://www.egovpt.org/fg/CWA_Part_1b). ). Nodes subscribe to change
feeds and can thus import new, deleted or updated records on a case case-by-case basis
from their source registry, provided they have the necessary credentials to access
the feeds – and their metadata can be mapped on a shared reference ontology.
Figure 9-1
104 – CEN/ISSS WS/eTOUR – CWA – 2009-06-03
Querying thus becomes a sub-problem of data integration, and queries can then be
run locally against the aggregated data store. Updates can be pulled at short
intervals (say, every 10 minutes), thus providing cached queries with nearly live
results.
Query by example (QBE) can be used without any specific query language, only by
the use of data samples. This makes it easy to implement when data interoperability
is solved, independently of which kind of method is used to reach data
interoperability (standard, interfaces, mediation). The main drawback is the fact that
complex queries cannot be made and user requirements will not be met in most
cases. However, in specific scenarios, especially when looking for descriptions or
listings in domains with a shared or even standardized ontology, QBE might prove to
be sufficient.
Since the ICT infrastructure in the tourism industry is characterized by a broad range
of heterogeneous systems (and thus different databases), it is very unlikely that a
typical query language can be deployed as a standard on a broad base. SPARQL, on
the other hand, has the potential for broad acceptance, since it can be deployed on
top of existing reference models in the case of divergent data models. In this setup it
has similar benefits and constraints as QBE, but overcomes QBE’s main obstacle by
allowing complex queries. SPARQL seems therefore to be one of the main potential
candidates for handling metasearch queries in a distributed and divergent
environment.
query sequence must be defined as part of the interface(s). This makes query
interfaces difficult in its definition and adoption, and therefore limits the potential for
running complex queries since the efforts for defining and deploying interfaces
become overwhelming. Secondly, participating partners must either implement the
standard or define mappings to be interoperable with the standard. Thus query
interfaces have to be implemented either for each query scenario or mappings based
on a shared reference model must be setup.
Thus interface standardization is more advanced than QBE, but still has similar
restrictions. It might be well suitable for specific scenarios, but sets its limits for a
broader deployment.
eTourism can build on the experience with metadata syndication in the eGovernment
domain. Those results should be evaluated and screened for their applicability in the
data integration between CRSs, GDSs and intermediates.
9.2.4 Recommendations
9.2.4.1 Short-term recommendations (1–3 years)
Research on technologies for flexible and adaptive query methods, that are
able to understand semantics of a web repository and can send an appropriate
query.
106 – CEN/ISSS WS/eTOUR – CWA – 2009-06-03
As has been discussed above, both services and data is widely distributed in typical
eTourism scenarios. In addition to the information provided by one or more large
players such as a GDS, a typical eTourism transaction can bring together – or could
in the future profit from combining – local and remote data from many sources.
Standardized queries, e.g. based on SPARQL, or ad-hoc protocols can be used to
actually retrieve specific data sets from data stores (see above) and web services
can be used to access specific services, ideally through standardized APIs.
The need for machine-processable information especially on services has long been
recognized. When web services became popular in the late 1990s, three key factors
were considered to be crucial for the success of the then new paradigm:
Solutions for these requirements are based on open specifications and are in the
context of “traditional” web services usually identified with the three well-known basic
web service standards SOAP, WSDL and UDDI (questions of semantic
interoperability were largely out of focus at that time). In RESTful Web Services the
stack is somewhat less clearly defined especially for machine-processable API
descriptions, but the general requirements are the very much the same.
9.3.1.2 Needs
Looking beyond those specific web service standards, the OASIS Reference Model
for Service Oriented Architectures [OASIS Reference Model] explores some of these
requirements on a more precise, technologically neutral level. Around the idea of a
service as “the mechanism by which needs and capabilities are brought together”
gravitate concepts such as interaction of services, their service descriptions and their
visibility and reachability, all grounded in the willingness to collaborate with the goal
of achieving a real-world effect.
CEN/ISSS WS/eTOUR – CWA – 2009-06-03 – 107
Registries are typically regarded as one approach to achieve visibility, other options
being semantic or general-purpose search engines. Registries in this sense help to
find actual resources, thus enabling their discovery. For that purpose, they store
more or less standardized metadata to describe those resources and offer an
interface to query that metadata. This metadata could conceivably one day also be
harvested using information extraction.
9.3.1.3 Requirements
Registries must facilitate finding existing services and data repositories. Together
with standardized query technologies they thus help to put those resources to optimal
use.
While registries focus on the visibility of resources, they build on the often unspoken
assumption that there is already a willingness to collaborate and share those
resources in a given context, be it within an organization or across organizational
boundaries, be it for free or for a charge. This may or may not be true in a given
case, and it may or may not imply that a registry owner is willing to give up control
over the data. Furthermore, in the real world there is rarely a single source of
information for any given area of interest, and, as we have seen in the introduction to
this section, it is particularly true for the tourism sector. Individual registries are
maintained at various levels of government – notably, local authorities supporting
their local tourism industry –, in tourism associations, GDSs and other private sector
organizations. This makes sense; in many cases the maintainers are closest to the
very resources themselves and have both the best first-hand knowledge and the
strongest business case to keep the data up-to-date.
That said, there is also a strong requirement for centrality, or, more exactly, central
interfaces to enable searches across individual registries. Otherwise any one search
will involve direct queries to a large number of eTourism registries, negating the very
idea of visibility of data and services.
Two well known registry standards dominate the relatively small literature on the
subject, namely UDDI and the ebXML Registry Specification. But neither standard
has been widely adopted in the market. This is, as we argue in Küster, Moore,
Ludwig, 2007, due to fundamental design issues that plague both specifications,
108 – CEN/ISSS WS/eTOUR – CWA – 2009-06-03
UDDI
UDDI is the best-known standard for registries of services. The UDDI 1.0
specification was formally released in 2002, pushed by major software vendors such
as IBM, SAP and Microsoft. It was supposed to lay the basis for the loosely coupled
operation of web services, bringing together service consumers and service
providers, possibly even based on automatic discovery and cooperation. For this
purpose, the vendors created three public UDDI registries that were open to all
interested parties. These public registries, however, were not widely used and were
eventually discontinued in early 2006.
Technically, UDDI is above-all an API for a set of SOAP-based web services with
their respective data models. This API has continued to grow over the three
published versions of the standard and covers today amongst others methods for
publishing information on businesses and their services, for finding them and for
establishing links between them. By now, the monolithic UDDI 3.0 standard totals an
estimated 400 pages, not counting the nine XML schemata with the actual API
specifications.
The ebXML Registry Specification is composed of the two sister OASIS standards
[OASIS ebXML Registry], the former specifying its internal data model, the latter its
SOAP-based API. In coverage it is quite similar to UDDI, though it supports more
flexible content models. It distinguishes itself from UDDI by the support of federated
queries across a number of different registries:
Figure 9-2
Neither UDDI nor the ebXML Registry Specification allows per se for detailed
semantic descriptions of (web) services, let alone other types of resources such as
data stores. Queries can at maximum leverage rather coarse-grained, domain-
independent taxonomies such UNSPSC.
As has been argued above, semantic technologies are a key to enabling data and
process interoperability, but are at present largely underused in eTourism in general
and in GDSs in particular. The SATINE project
(http://www.srdc.metu.edu.tr/webpage/projects/satine) was funded under FP6 from
2004 to 2006 with the explicit goal to overcome the shortcomings of some current
GDSs. SATINE set out to “provide tools and mechanisms for publishing, discovering
and invoking web services through their semantics in peer-to-peer networks”
(http://www.srdc.metu.edu.tr/webpage/projects/satine/deliverables/D4.1.1.doc).
Semantic technologies and specifically ontologies for web services play a significant
role in the SATINE architecture.
Figure 9-3
Local registries are aggregated into larger registries that are often targeted at specific
user communities. Those aggregated registries can, of course, be further aggregated
into other registries still. All the while the origin of certain metadata sets remains fully
traceable through unique identifiers. Furthermore, each of the semantic descriptions
is addressable through normal URLs, making the overall architecture fully RESTful
and an ideal fit for Resource Oriented Architectures (ROAs) and SOAs alike.
Figure 9-4
The resulting multipart CWA is currently out for open consultation and consists of the
following parts:
Future work may add specifications for the organizational arrangements especially in
the eGovernment domain.
Neither UDDI nor ebXML registries have been well received in the market place. This
is due to a number of serious shortcomings that affect those registries:
112 – CEN/ISSS WS/eTOUR – CWA – 2009-06-03
Attempts such as SATINE to build ontology constructs into the registries further
complicate the specifications and have seen little adoption in practice.
In short, UDDI and, to a lesser extent, the ebXML Registry Specification, meshes
three important, but orthogonal concerns that should be kept apart:
Much of the eGov-Share architecture lends itself ideally to this adoption, provided
that a reference ontology for eTourism-related resources is developed.
CEN/ISSS WS/eTOUR – CWA – 200
2009-06-03 – 113
Figure 9-5
9.3.4 Recommendations
9.3.4.1 Short-term
term recommendations (1–3
(1 years)
Plan for the long-term operation and business models for the “watchtower”
registry.
CEN/ISSS WS/eTOUR – CWA – 2009-06-03 – 115
10 Object identification
10.1 Needs and requirements
10.1.1 Introduction
Until recently and still a standard practice, getting information or buying travel-related
products is performed via intermediaries (such as agencies) directly providing the
information and performing the bookings on dedicated systems, possibly vendor-
specific systems. As introduced in the case study, the use of internet for travel-
related searches and online shopping is increasing and already widely accepted.
Multiple sources of information are available, proposing single products (like hotels,
car rentals, events, etc.) or complex packaged products comparing or aggregating
information from different sources and becoming sources themselves.
Identifying identical items (like the same hotel with similar names from different sites),
comparing information on different items (such as room or price definitions), merging
or filtering similar information from different sources (such as getting information on
Baleares sometimes searching a Spanish region and sometimes directly Baleares) is
next to impossible in the current situation.
10.1.2 Needs
In this chapter the basic needs for unique identifiers for tourism products or services
are discussed.
10.1.3 Requirements
This chapter outlines the different requirements that may be deduced from the
previously exposed needs. Being able to uniquely identify objects corresponds to
building taxonomies for certain domains or ontologies, some of them being
mentioned in the following chapters. More information may be found in the taxonomy
chapter of this document.
Unique precise exhaustive location codes are a basic requirement for the travel
industry. Location coding should not be limited to general codes such as countries,
cities or airports. Online information and booking facilities becoming widespread, it is
now required to be able to associate codes to all levels of locations that can be used
in a travel, such as
touristic regions,
terminals,
stations (railways stations, ski stations, car rental pickup stations),
points of interests,
leisure, event or activity locations,
etc.
The location codes are often directly used by the experts and become also more and
more visible to end users (on itineraries, on displays, in search forms, etc.).
Geodesic coordinates is also becoming vital information for searches (“What can I do
in the vicinity of my hotel?”, “What alternative hotel?”, etc.), to represent itineraries,
results, etc. However, it does not seem realistic for the geodesic coordinates to be
the unique coding mechanism, the coordinates being complex and in essence
corresponding to a point. What would therefore be a country coordinate?
Furthermore, more and more types of leisure, activities or travel related services are
being proposed and published on Internet, without any unique identification (and
classification). Unique identifiers for all those services are required to have a chance
to discover and aggregate data in an efficient way.
To compare or qualify each type of services, it is now more and more required to
have structured information based on universally accepted taxonomies. This
information must also be codified. For some services, like hotels or car rental, it is
more developed than for others, but it only corresponds to recommended
codifications and not true unique identifiers. For most services, codification is still
specific to each service provider.
In that case also, it seems unrealistic to have a unique body responsible for that type
of codification.
The important level of intermediation and the quantity of different companies involved
in a selling process lead to complexity to explain pricing schemes, to unravel in
question of complaints, to proceed with payments. Adding traceability for each step in
the process is becoming an important requirement. That would imply unique
identifiers for each company involved in those processes, such as
10.2.1 IATA
The IATA codes are the first codes that come to mind in the travel industry, because
they are used for airports, airline companies, etc.
118 – CEN/ISSS WS/eTOUR – CWA – 2009-06-03
IATA Airport Codes: alpha-3 Codes. The IATA alpha-3 airport codes uniquely
identify individual airports worldwide. They are made up of precisely three
letters; numerals are not allowed. In fact those codes have been expanded to
also contain city codes in case a city has more than one airport, as well as
coach, rail or ferry locations if requested by an airline or CRS. For instance
TGV railway stations usually have IATA codes because TGV are used as
feeders for the airlines. It therefore becomes truer to define IATA codes as
location codes used in travel rather than only airport codes. Except for cities,
the codes correspond to transportation boarding locations and not really to
stay or service oriented locations. Drawbacks of IATA airport codes are the
fact that they cannot be much extended to include all locations required for the
travel industry.
IATA Airline Code: officially an alphanumeric-3 codes as well as pure numeric
codes (used for ticketing for instance). They were initially an alphanumeric-2
code which are the codes that are mainly used. The alphanumeric-2 codes are
used in combination with others in ticket numbers, timetables, tariffs, etc.
Codes are also allocated to railway or coach companies, whenever requested
by airlines or GDSs. There are also codes that are reused for different airlines,
whenever their destinations are not likely to overlap! Codes allocated to
airlines that discontinue business would be reused after six months.
IATA Agency codes: Numeric codes: IATA is pivotal in the worldwide
accreditation of travel agents issuing airline tickets with exception of the USA,
where this is done by the Airlines Reporting Corporation. Permission to sell
airline tickets from the participating carriers is achieved through national
member organizations. As a consequence, there are agencies that would not
have IATA numbers which have lead to alternative solutions according to
countries, allocating Pseudo IATA numbers in some cases (such as SNCF
issuing agencies in France that are not IATA).
There are also less used IATA codes such as baggage tag issuers, delay codes,
accounting prefix codes, logistics company codes, etc.
10.2.2 ICAO
ICAO airport codes: The ICAO (International Civil Aviation Organization)
alpha-4 airport identifier codes uniquely identify individual airports worldwide.
They are used in flight plans to indicate departure, destination and alternate
airfields, as well as in other professional aviation publications. Usually, the first
two letters of ICAO codes identify the country (but do not correspond to ISO
country codes). In the continental USA, however, codes normally consist of a
‘K’ followed by the airport’s IATA code.
ICAO airline designator: The ICAO airline designator is a code assigned by the
International Civil Aviation Organization (ICAO) to aircraft operating agencies,
aeronautical authorities and services. The codes are always unique by airline.
There are ICAO codes for companies that have no correspondence with IATA
codes.
CEN/ISSS WS/eTOUR – CWA – 2009-06-03 – 119
10.2.3 ISO
A number of ISO standards are used on a regular basis in the travel industry:
Country codes, ISO 3166-1 alpha-2, alpha-3 and numeric. ISO 3166-1, as part
of the ISO 3166 standard, provides codes for the names of countries and
dependent territories, and is published by the International Organization for
Standardization (ISO). Some codes are in fact regions and not countries (such
as MQ for Martinique, part of France), therefore leading to some confusion (“Is
FR only the mainland or the whole of France?” for instance). Alpha-2 codes
are more often used, alone or in combinations.
Region zones ISO 3166-2 alphanumeric codes. ISO 3166-2 is the second part
of the ISO 3166 standard published by the International Organization for
Standardization (ISO). It is a geocode system created for coding the names of
country subdivisions and dependent areas, such as regions, states,
departments, etc., depending on countries. They usually correspond to
administrative zones.
Language codes: ISO 639-1. Although alpha-2 codes are not sufficient to code
all languages, this is sufficient in most cases. In case there is a need to
expand, ISO 639-2 or ISO 639-3 could be used. In some cases, when local
variations of the languages are important, the ISO 3166-2 country code is
used in association with the language code (such as fr-FR and fr-CA).
Currency codes: ISO 4217. The first two letters of the code are the two letters
of ISO 3166-1 alpha-2 country codes and the third is usually the initial of the
currency itself. In some cases, the third letter is the initial for “new” in that
country’s language, to distinguish it from an older currency that was
revaluated; the code often long outlasts the usage of the term “new” itself.
10.2.4 UN/LOCODE
The United Nations Code for Trade and Transport Locations is commonly more
known as UN/LOCODE. Although managed and maintained by the UNECE, it is the
product of a wide collaboration in the framework of the joint trade facilitation effort
undertaken within the United Nations.
Each code element consists of five characters, where the two first indicate the
country (according to ISO 3166-1) and the three following represent the place name.
Examples such as CHGVA, FRPAR, GBLON, JPTYO and USNYC ring bells for air
travellers who are used to see the three last letters of these codes on their luggage
tags. UN/LOCODE picks up the IATA location identifiers wherever possible, to benefit
from their association value and to avoid unnecessary code conflicts. In allocating
codes, the secretariat tries to find some mnemonic association link with the place
names, to aid human memorization. This is of course increasingly difficult for large
country lists where the 17576 permutations of three letters are near exhaustion.
10.2.5 HEDNA
HEDNA is an international association focused on identifying distribution
opportunities and providing solutions for the lodging industry and its distribution
community. HEDNA compiles codes for instance for hotel chains, room types, etc.,
so as provides list and codes of conducts on how to use lists.
10.2.6 ACRISS
ACRISS Members utilize an industry standard vehicle matrix to define car groups
ensuring a like to like comparison of standards across countries. This easy-to-use
matrix consists of four categories. Each position in the four character vehicle code
represents a definable characteristic of the vehicle. The expanded vehicle matrix
makes it possible to have 400 vehicle types.
This coding system has been adopted to ensure that all ACRISS members display
the same coding for the same vehicles, enabling you to make an informed decision
when comparing rates.
This certainly facilitates understanding what type of vehicle being rented though
many surprises can still happen, even within ACRISS members.
ACRISS does not actually provide standardization for all car rental related data; for
instance car rental stations are not standardized, nor are opening hours.
10.2.7 GIATA
GIATA acquires and standardizes (normalizes) the digital image and text data for
many tour operators and travel agencies such as TUI, Thomas Cook, Easyjet,
Expedia, Opodo or Lastminute.com. They are also used by all well-known CRS/GDS
(Amadeus, Sabre, Galileo/Worldspan) to provide decoding information based on a
unique identifier present in those GDS.
GIATA is not a global standardization body but it has compiled enough data to
become de facto a “standard” source of information, their identifier becoming the
identifier. It is not completely true though since it is not globally used, nor even used
by the hotel owners.
CEN/ISSS WS/eTOUR – CWA – 2009-06-03 – 121
10.2.8 GS1
The GS1 System is an integrated system of global standards that provides for
accurate identification and communication of information regarding products, assets,
services and locations. It is the most implemented supply chain standards system in
the world.
GS1 Identification Keys automatically identify things such as trade items, locations,
logistic units, and assets in a unique way worldwide. They can be used on bar codes,
in online transactions, for selling or synchronization processes, etc.
Though this identification scheme is not used at present in a systematic way in the
travel industry, it is applied in many other trades in a successful manner and could
therefore be easily expanded to the travel trade.
GS1 operates in multiple sectors and industries and already works in close relation
with many corporations throughout the world as well as various standardization
bodies such as
10.2.9 URI
Since we are reviewing methods to obtain unique identifiers, the W3C provides a
means for globally unique identifiers: URIs. Uniform Resource Identifier (URI) is a
compact string of characters used to identify or name a resource on the Internet. The
main purpose of this identification is to enable interaction with representations of the
resource over a network, typically the World Wide Web, using specific protocols.
URIs could be used in the travel industry in a systematic way, but they have major
drawbacks such as
10.2.10 UUID
Universally Unique Identifier (UUID) is an identifier standard used in software
construction, standardized by the Open Software Foundation (OSF) as part of the
Distributed Computing Environment (DCE). The intent of UUIDs is to enable
distributed systems to uniquely identify information without significant central
coordination. Thus, anyone can create a UUID and use it to identify something with
reasonable confidence that the identifier will never be unintentionally used by anyone
122 – CEN/ISSS WS/eTOUR – CWA – 2009-06-03
for anything else. Information labelled with UUIDs can therefore be later combined
into a single database without needing to resolve name conflicts.
Though not directly applied in the tourism industry, since it is technically oriented,
UUIDs are interesting in the sense that they do not require a centralised body for
validation (though repositories or registries would be useful). UUID keys are still not
directly usable due to their inherent complexity.
10.3.1 Location
In the previous chapters we have seen that various associations and organizations
propose location identifiers. However, there is currently no worldwide identification
standard that can uniquely identify and provide information about entities within the
travel industry.
There is mainly consensus around the country codes (though several coding
schemes exist). The ISO 3166 standard is very widely used and even incorporated in
other standards (like UN codes). However, the alpha-2 codes are mostly used,
limiting the migration to alpha-3 codes. That may hinder extending the codes.
Some “country” codes are also allocated to regions of certain countries or even part
of the world that are bigger than countries (like EU for the European Union, MQ for
Martinique). This most likely comes from the need to have travel-oriented zones that
often coincide with countries, but not always. At present this is not done in a
systematic way (there is no code for Corsica or Baleares for instance). There is a real
need to differentiate touristic “zones” with political countries or areas.
There is less consensus here. The ISO subdivisions of countries are less widely used
because they are less matching the travel industry needs.
There is a need to provide travel specific regions, that do not really map
political or administrative boundaries (cruise regions at sea, ski regions (or
mountains)) that are present on several countries, specific touristic regions
that may be within a country or across countries (Mediterranean region, the
south of France, Sardinia, Balearics, La Réunion, etc.).
Some countries have several levels of subdivision and the current ISO codes
only take into account one level (like the French departments but not the
French regions for which a local coding is used, some codes being identical to
the ISO sub regions, but with different meaning though).
CEN/ISSS WS/eTOUR – CWA – 2009-06-03 – 123
Some travel companies are also specialized on certain domains (like diving,
hunting, etc.) and they also require specific regions related to their specialty.
There is no way to submit such regions in order to create a global repository.
There should be a mechanism to submit and validate such codification
because that would allow better understanding of offers which are at present
difficult to compare.
New codes are added only in relation with airline related business without systematic
coding processes.
Those codes are still widely used and a global coding process should allow their
integration, at least for their original objectives (airport codes).
ICAO is also providing airport codes in a more neutral way, including non ISO country
codes. They tend to be used internally by airlines and airports, therefore using two
sets of codes. They tend to be specialized though and limited to airports.
All in all, airport codification is fairly well covered though cluttered. However, no
codification integrate terminal data and airport codes so that vendors often create
pseudo codes such as CD3 in lieu of CDG terminal 3, disrupting the initial IATA
codes.
Furthermore, travel destinations are not limited to airports or main cities (which are
covered by the IATA codes). Precisely defining cities in general, villages, stations
(airport terminals, ski, railway, car rental, coach, etc.), points of interest within cities
or outside, lieu-dits, etc. does not exist on a global scale and is a major issue for
eTourism.
There are several possible ways to move forward: either differentiate airports, railway
stations, cities and build identification schemes for each type of item. Or on the
contrary create a unique set of identifiers for points of travel.
The second approach corresponds to the historical approach where cities actually
inherited the codes of their airports and then sometimes differentiation occurred. That
seems logical because when travelling somewhere, location and airport is a very
similar notion (for the trip), except in case of multiple airports and airport
differentiation is in order.
IATA nor ICAO seem in a position to provide coding schemes. Integrating local postal
codes and possibly other codifications in a global identification process could speed
124 – CEN/ISSS WS/eTOUR – CWA – 2009-06-03
up the process. The UN has also initiated the same type of process, with stations,
harbours, etc., completing airport codes whenever possible.
It is therefore impossible to have unique identifiers for each element of a trip and it is
therefore impossible to compare or even amalgamate information. In case such
identification was in place, there would then be a need to provide additional
qualification like understanding the rights of the source related to the content (like is
this first hand information, does the author have the right to create or distribute the
information, etc.).
Some organizations such as the HEDNA have such a project for hospitality services
or other specific services. Private companies in certain countries provide partial data
(such as GIATA in Germany). Private companies distributing content also provide
unique identifiers within their system which do not allow cross referencing.
Defining unique codes for travel services is very delicate because it touches
marketing or sales oriented information which is subjective and also requires many
details to allow precision. Actual codes are likely to be aggregates of different
information (such as room information, bed information, features, location, etc.).
10.4 Recommendations
10.4.1 Short-term recommendations (1–3 years)
Build a registry of present object identifications in the tourism industry.
Develop travel related global geography identifiers.
Integrate the global geography identifiers in the registry and build transcoding
capability.
Develop travel company related global identifiers.
We have selected an existing eTEN project, which joined the workshop as a member
and was also present with key-note speakers during one of the workshop meetings in
Berlin. The project called “euromuse.net” comes not from the core tourism domain,
but from the tourism driver cultural heritage. The project improves an existing
platform to offer services and exhibition data to the tourism industry and wants to
bridge the existing gap between cultural heritage and the tourism industry. It actually
faces the same problems as discussed in the workshop and has an appropriate data
mediation solution in use to show the way recommended in general by this CWA to
overcome the interoperability problems. It uses Harmonise 2.0 to integrate data from
100s of Europe’s top museums and provide this aggregated information to the variety
of players of the tourism industry. And of course there is a strong need for a cost-
effective and easy-to-use solution, since museums usually do not have large IT
departments, if any at all.
euromuse.net has been identified as a very good starting point for discussions of the
issue and to demonstrate a real live system, which could otherwise not really be
implemented easily within the course of the CEN workshop. It allows to make a real
demonstration and to discuss the issues presented in this document based on the
system in use.
euromuse.net offers both, a ‘one-stop’ web tool to the greatest exhibitions in Europe
for the public as well as a special data interface called Harmonise to deliver
structured data from the museums for the tourism sector. The euromuse.net project
will deploy an existing online service, which provides multilingual information about
temporary exhibitions and museums as well as other museum resources on a web
platform, to develop a wider pan-European data-collection based on public sector
information to be re-used by different actors in the cultural and tourism fields. The
project aims at three main goals:
CEN/ISSS WS/eTOUR – CWA – 2009-06-03 – 127
1. Improve and increase the existing platform, a website offering museum and
exhibition information to the general public for free.
2. Integrate the museums’ information of the euromuse.net database with the
Harmonise tools. Through this integration euromuse.net’s rich content will
affiliate with the online offers of other European and national tourism and
marketing services for culture.
3. Enhance the existing services to integrate information on scientific publications
from museums and to expand the current services, which provide an overview
of “virtual” museums and their (online) resources.
The main focus is to improve the connection between existing marketing and
promotion channels of the tourism industry and the cultural sector over the
euromuse.net database. A general idea of the euromuse.net project is to better
connect the museum sector with relevant target groups in the tourism sector– both on
a professional and on a non-professional or private level. euromuse.net services will
support and strengthen existing connections between the general public interested in
museums and exhibitions, the professional tourism sector and museum.
The service will help to create easily accessible information about exhibitions and
museums all over Europe. This takes place by offering the information on three
complementing services: On the website http://www.euromuse.net/, mainly for the
general public and accessible for free, via tools for structured data exchange with
databases of tourism industry and other tourism players and on a scientific literature
database of museum publications, mainly for researchers and museum staff. The
tools for data exchange will enable representatives of the tourism industry and
services to organize personalised tourism packages for their customers through the
service.
Because the requests of industrial and private users normally differ, the project offers
special access for tourism industry users besides the euromuse.net website. Special
search strings and precise queries to the euromuse.net database allow optimized
preparation of organized trips. Industrial users will receive structured and xml
formatted data on a special export from the euromuse.net database. The commercial
users of this functionality will be requested to pay a contribution for this service
provided.
This follows very much the approaches recommended in this CWA to overcome the
data interoperability problem. However, here the current setup ends leaving up
some issues open, which have also been discussed in the topics of this CWA. Some
128 – CEN/ISSS WS/eTOUR – CWA – 2009-06-03
of them should be deployed in euromuse.net in the future, some of them are still not
easy to solve.
Following the order of this document, the first issue is the process handling. Most
museums do not have a system to allow online ticket purchasing, but they might have
soon or later. Online buying of tickets will therefore become an issue, also because
travel agencies might wish to bundle services together dynamically to sell a full travel
package to the client comprising also exhibitions. Process handling would be
principally possible easiest by a stateless way of managing processes, handling it
only by exchange of data. Process mediators are currently being developed in
applied ICT-sciences and might offer an improved solution on the longer run (these
process mediators work similarly to the data mediator Harmonise).
Meta search is the next topic and in some sense euromuse.net is already a meta-
search repository, since it is aggregating data from different sources and makes it
available for search queries. When currently querying data on the euromuse.net data
base, a fixed query string or query rules have to be used, since no proper solution
could be found to handle different query strings in a flexible and generic way. In the
future, it should be possible also to map different queries to run one query
simultaneously on a larger number of instances, which all might have a different
query language. This shows the need for interoperable query languages but also the
need for registries, in order to find the data instances that should be searched.
Clearly, there is the need for some meta information about where to search, because
searching any data base in the world to get a certain set of data is inefficient if not
impossible. Thus, reliable registries directing search queries to potential data sources
would significantly improve search efficiency.
And even if you search various data bases and retrieve a large number of results
(let’s say exhibitions in the case of euromuse.net) you do not automatically know how
many exhibitions are represented several times in the data sets retrieved. Thus,
object identification is the last of the topics, which are covered by this CWA and are
also a future enhancement of euromuse.net. If all exhibitions, museums and locations
can be identified automatically, then it is possible to clean the data base from multiple
entries of the same object automatically. At the moment the issue is open in
euromuse.net, since the number of sources is manageable and the probability, that
one exhibition is reported by two museums, is very low. However, this might rise
significantly and quickly when the network grows.
It is easy to realise that the topics are exactly the same for exhibitions as they are for
accommodation. euromuse.net therefore demonstrates nicely how all of the issues
can be solved also on a global scale. The same technology and setup for mediating
data and processes can be used for any other object, like accommodation, flights, car
rentals, events, etc.
After all, one important issue remains unanswered, since it is out of scope of the
interoperability issue: Although you could exchange all the data smoothly, identify
data sources easily, understand the content and also run processes for bookings -
how to assure data quality? How to make sure a time table (opening hours, flight
schedules) is correct or the price quotes are valid? Quality of service and user
acceptance will depend very much on data quality. In euromuse.net it is discussed to
have users involved to report back quality of information. Maybe the involvement of
users (user generated content) is a reliable source for estimation of data quality. But
although this topic is an important one, it is not part of this CWA about data and
process interoperability.
130 – CEN/ISSS WS/eTOUR – CWA – 2009-06-03
[Adam, Hofer, Zang, et al, 2005] Otmar Adam, Anja Hofer, Sven Zang, Christoph
Hammer, Mirko Jerrentrup, Stefan Leinenbach: “A Collaboration Framework for
Cross-enterprise Business Process Management”. In: Panetto, Hervé (Hrsg.):
Interoperability of Enterprise Software and Applications – INTEROP-ESA’2005.
Geneva, Schwitzerland, February 23–25, 2005, Technical Sessions, 2005, p
499-510
[Addis, Boniface, Goodall, et al, 2003] M. Addis, M. Boniface, S. Goodall, P.
Grimwood, S. Kim, P. Lewis, K. Martinez, A. Stevenson: “SCULPTEUR:
Towards a new paradigm for multimedia museum information handling”, In:
Proceedings of the Second International Conference on Semantic Web, p 582-
596, 2003
[Addis, Stevenson, 2002] M. Addis, A. Stevenson: D6.2 Impact on World-Wide
Metadata Standards, Deliverable report of ARTISTE project, 2002
[Adrian, Sauermann, Roth-Berghofer, 2007] B. Adrian, L. Sauermann, T. Roth-
Berghofer: “ConTag: A semantic tag recommendation system”. In: Proceedings
of I-Semantics ’07, p 297-304, 2007
[Advanced Distributed Learning] http://www.adlnet.gov/
[Agent Link] http://www.agentlink.org/
[Ahern, King, Naaman, et al, 2007] S. Ahern, S. King, M. Naaman, R. Nair, J.H.I.
Yang: “ZoneTag: Rich, Community-Supported Context-Aware Media Capture
and Annotation”. In: Proceedings, MSI workshop CHI2007, San Jose, Calif,
2007
[AICC] Aviation Industry CBT Committee, http://www.aicc.org/
[Amadeus] http://www.amadeus.com/
[Amann, Fundulaki, 1999] B. Amann, I. Fundulaki: “Integrating Ontologies and
Thesauri to build RDF Schemas”, ECDL Research and Advanced Technologies
for Digital Libraries, p 234-253, 1999
[ANSI] American National Standards Institute, http://www.ansi.org/
[ArguGRID] http://www.argugrid.eu/
[Aristotle] Aristotle: Metaphisics Book IV,
http://classics.mit.edu/Aristotle/metaphysics.4.iv.html
[Arnarsdóttir, Berre, Hahn, Missikoff, Taglino] K. Arnarsdóttir, A.-J. Berre, A. Hahn, M.
Missikoff, F. Taglino: Semantic Mapping: ontology based vs. model based
approach. Alternative or complementary approaches?, ftp://ftp.informatik.rwth-
aachen.de/Publications/CEUR-WS/Vol-200/17.pdf
[ARTEMIS] http://www.srdc.metu.edu.tr/webpage/projects/artemis/
[ASG] http://asg-platform.org/cgi-bin/twiki/view/Public
[Aviation Industrie CBTI Committee] http://www.aicc.org/
[Baader, Horrocks, Sattler, 2003] F. Baader, I. Horrocks, U. Sattler: “Description
logics as ontology languages for the semantic web”. In: S. Staab, R. Studer,
eds: Lecture Notes in Artificial Intelligence, Springer Verlag, 2003
[Bailey, 1994] K.D. Bailey: Typologies and Taxonomies - An Introduction to
Classification Techniques, London, Sage Publications, Quantitative Applications
in the Social Sciences, 1994
CEN/ISSS WS/eTOUR – CWA – 2009-06-03 – 131
[de Laborda, Conrad, 2005] C.P. de Laborda, S. Conrad: Relational.OWL A Data and
Schema Representation Format Based on OWL. In Second Asia-Pacific
Conference on Conceptual Modelling (APCCM2005), volume 43 of CRPIT, p
89-96, Newcastle, Australia, 2005, ACS
[Dell’Erba, Fodor, Höpken, et al, 2005] M. Dell’Erba, O. Fodor, W. Höpken, et al,
“Exploiting Semantic Web Technologies for Harmonizing e-Markets”. In: IT&T
Information Technology & Tourism – Application – Methodologies –
Techniques, 2005
[DIP] http://dip.semanticweb.org/index.html
[Directive 90/314/EEC] Council Directive 90/314/EEC of 13 June 1990 on package
travel, package holidays and package tours
[Dodgeball] http://www.dodgeball.com/
[Dörr, 2003] M. Dörr: “The cidoc conceptual reference module: An ontological
approach to semantic interoperability of metadata”. AI Magazine 24(3) (2003),
75–92
[Dörr, Guarino, Fernández López, et al, 2001] M. Dörr, N. Guarino, M. Fernández
López, E. Schulten, M. Stefanova, A. Tate: “State of the Art in Content
Standards. OntoWeb Deliverable 3.1.”, Technical Report, 2001
[Dörr, Hunter, Lagoze, 2003] M. Dörr, J. Hunter, C. Lagoze: “Towards a core
ontology for information integration. Journal of Digital Information 4(1)” (2003)
[Dou, McDermott, Qi] D. Dou, McDermott, P. Qi: “Ontology translation by Ontology
Merging and Automated Reasoning”
[Dunieveld, Stoter, Weiden, et al, 2000] A.J. Dunieveld, R. Stoter, M.R. Weiden, B.
Kenepa, V.R. Benjamins: “WonderTools? A comparative study of ontological
engineering tools”, 2000
[Earley, 2005] S. Earley: Resolving Taxonomy Challenges and Information
Architecture Conflicts, 2005 http://www.dama-nj.org/presentations/
Seth%20Earley%20Taxonomies%20May%2012%202005%20(DamaNJ).pdf
[eBusiness W@tch Report 2006/2007] eBusiness W@tch Report 2006/2007,
http://www.ebusiness-watch.org/key_reports/documents/EBR06.pdf
[ebXML] eBusiness XML, http://www.ebxml.org/
[Echarte, Astrain, Cordoba, Villadangos, 2007] F. Echarte, J.J. Astrain, A. Cordoba,
J. Villadangos: Ontology of Folksonomy: A New Modelling Method. Proceedings
of the Semantic Authoring, Annotation and Knowledge Markup Workshop
(SAAKM2007), British Columbia, Canada, Vol-289, 2007,
http://ftp.informatik.rwth-aachen.de/Publications/CEUR-WS/Vol-289/p08.pdf
[ESP Game] http://www.espgame.org/
[ETSI] European Telecommunications Standards Institute, http://www.etsi.org/
[euromuse] http://www.euromuse.net, http://www.euromuse-project.net
[Expedia] http://www.expedia.com/
[Fabian, 1975] J. Fabian: “Taxonomy and Ideology: On the Boundaries of Concept
Classification”. In: M. Kinkade (ed), Linguistics and Anthropology, Lisse, p 183-
197, 1975
[Facebook] http://www.facebook.com/
[Flickr] http://www.flickr.com/
[Fodor, Werthner, 2005] Oliver Fodor, Hannes Werthner: Harmonise: a step toward
an interoperable e-tourism marketplace. In: International Journal of Electronic
Commerce, Winter 2004-5, Vol 9, No 2, p 11-39, 2005
[Freyer, 2006] Freyer, Walter: Tourismus: Einführung in die
Fremdenverkehrsökonomie, 8th revised ed, München : Oldenbourg, 2006
CEN/ISSS WS/eTOUR – CWA – 2009-06-03 – 133
[Fuxman, Hernández, Ho, et al, 2006] A. Fuxman, M.A. Hernández, H. Ho, R. Miller,
P. Papotti, L. Popa: Nested Mappings: Schema Mapping Reloaded. Proc. VLDB
2006 Conf., p 67-78, Seoul, Korea, 2006
[Garshol, 2004] L.M. Garshol: Metadata? Thesauri? Taxonomies? Topic Maps!
Making Sense of it all, Journal of Information Science, 2004
[Gennari, Musen, Fergerson, et al, 2002] J. Gennari, M.A. Musen, R.W. Fergerson,
W.E. Grosso, M. Crubezy, H. Eriksson, N.F. Noy, S.W. Tu: The Evolution of
Protégé: An Environment for Knowledge-Based Systems Development,
Technical Report SMI-2002-0943, 2002
[Ghawi, Cullot] R. Ghawi, N. Cullot: Database-to-ontology Mapping Generation for
semantic interoperability
[Gilchrist, 2003] A. Gilchrist: Thesauri, taxonomies and ontologies - an etymological
note. Journal of Documentation, 2003, 59 (1), p 7-18
[Goodall, Lewis, Martinez, et al, 2004] S. Goodall, P.H. Lewis, K. Martinez, P.
Sinclair, F. Giorgini, M.J. Addis, M.J. Boniface, C. Lahanier, J. Stevenson:
“SCULPTEUR: Multimedia Retrieval for Museums”, CIVR 2004, LNCS 3115, p
638-646, 2004
[Grishman, 2003] Ralph Grishman, “Information Extraction”. In: The Oxford
Handbook of Computational Linguistics, ed. R. Mitkov, Oxford University Press,
2003
[Grosof, Horrocks, Volz, Decker, 2003] B.N. Grosof, I. Horrocks, R. Volz, S. Decker:
Description logic programs: Combining logic programs with description logic. In
Proc. of the Twelfth International World Wide Web Conference (WWW 2003), p
48-57, ACM, 2003
[Grossman, 2004] Grossman, David: Confusion is the star of hotel rating systems,
http://www.usatoday.com/travel/columnist/grossman/2004-03-05-
grossman_x.htm
[Grove, 2003] A. Grove: Taxonomy. In: Encyclopedia of Library and Information
Science, p 2770-2777, New York, Marcel Dekker Inc, 2003
[Gruber, 1993a] T.R. Gruber: “A translation approach to portable ontology
specifications”, Knowledge Acquisition, Vol 5, 1993
[Gruber, 1993b] T.R. Gruber: “Towards Principles of the Design of Ontologies Used
for Knowledge Sharing”, International Journal of Human Computer Studies, Vol
43, p 907-928, 1993
[Gruber, 2005a] T. Gruber: Ontology of Folksonomy: A Mash-up of Apples and
Oranges, AIS SIGSEMIS Bulletin, 2005
[Gruber, 2005b] T. Gruber: TagOntology, a way to agree on the semantics of tagging
data, 2005
[GS1] http://www.gs1.org/
[Guarino, Giaretta, 1995] N. Guarino, P. Giaretta: “Ontologies and knowledge bases.
Towards a terminological clarification”, Towards Very Large Knowledge Bases.
Ed IOS Press, p 25-32
[GUID] http://en.wikipedia.org/wiki/GUID
[Gulli, Signorini, 2005] A. Gulli, A. Signorini: Building an open source meta search
engine [WWW2005]
[Haase, Wang, 2007] P. Haase, Y. Wang: “A decentralized infrastructure for query
answering over distributed ontologies”. In: Proceedings of the 2007 ACM
Symposium on Applied Computing (Seoul, Korea, March 11-15, 2007). SAC
’07. ACM, New York, NY, p 1351-1356,
http://doi.acm.org/10.1145/1244002.1244294
134 – CEN/ISSS WS/eTOUR – CWA – 2009-06-03
[HarmoNET] The Harmonisation Network for the Exchange of Travel and Tourism
Information, http://www.harmonet.org/
[HEDNA] http://www.hedna.org/
[Heflin, 2001] J. Heflin, J. Hendler, “A portrait of the Semantic Web in action”, IEEE
Intell. Syst. 16 (2) (2001), p 54–59
[Hempel, 1965] C.G. Hempel: “Fundamentals of Taxonomy”, p 137-154. In: C. G.
Hempel: Aspects of scientific explanation and other essays in the philosophy of
science, New York, The Free Press, 1965
[Hepp, Leymann, Domingue, et al, 2005] Martin Hepp, Frank Leymann, John
Domingue, Alexander Wahler, Dieter Fensel: Semantic Business Process
Management: A Vision Towards Using Semantic Web Services for Business
Process Management, Proceedings of the IEEE ICEBE. 2005
[Höpken, 2004] Wolfram Höpken: Reference Model of an Electronic Tourism Market
(IFITT RM), Version 1.3, 2004,
http://www.rmsig.de/documents/ReferenceModel.doc
[Hull, 1998] D.L. Hull: Taxonomy. In: Routledge Encyclopedia of Philosophy, Version
1.0, London, Routledge, 1998
[Hunter, 2002] J. Hunter: “Combing the CIDOC CRM and MPEG-7 to describe
multimedia in museums”, In: Proceedings of Museums on the Web 2002
Conference, Boston, 2002
[IATA] http://www.iata.org/, http://en.wikipedia.org/wiki/IATA
[IEEE] Institute of Electrical and Electronics Engineers, http://www.ieee.org
[IFITT] International Federation for IT and Travel & Tourism, http://www.ifitt.org/
[IFLA] International Federation of Library Associations and Institutions,
http://www.ifla.org/
[ISO] International Organization for Standardization, http://www.iso.org/; for
references to ISO standards see also chapter 2 “Normative references”
[ISO 3166] http://www.iso.org/iso/country_codes.htm,
http://www.iso.org/iso/fr/country_codes.htm
[ISO/IEEE 11073] Health informatics — Point-of-care medical device
communications (multiple parts)
[ISO 21127:2006] Information and documentation — A reference ontology for the
interchange of cultural heritage information
[IST] Information Society Technologies, http://cordis.europa.eu/ist/
[ITU] International Telecommunication Union, http://www.itu.int
[Iurgel, 2004] I. Iurgel: From another point of view: art-E-fact, In: Proc. TIDSE’04
(2004) vol 1, p 26-35
[Kalfoglou, Schorlemmer, 2003] Yannis Kalfoglou, Marco Schorlemmer: Ontology
mapping, the state of the art. Knowledge Engineering Review, 18(1), p 1-31,
2003
[Kim, Yang, Song, et al, 2007] H.L. Kim, S.K. Yang, S.J. Song, G.J. Breslin: “Tag
Mediated Society with SCOT Ontology”, Proceedings of the Semantic Web
Challenge 2007 in conjunction with the Sixth International Semantic Web
Conference, November 11-15, Busan, Korea, 2007
[Knerr, 2006] T. Knerr: Tagging Ontology: Towards a Common Ontology for
Folksonomies, 2006
[Konstantinou, Spanos, Chalas, et al, 2006] N. Konstantinou, D. Spanos, M. Chalas,
E. Solidakis, N. Mitrou: VisAVis: An Approach to an Intermediate Layer between
Ontologies and Relational Database Contents. International Workshop on Web
Information Systems Modeling (WISM 2006), Luxembourg, 2006
CEN/ISSS WS/eTOUR – CWA – 2009-06-03 – 135
[Küster, Moore, Ludwig, 2007] Marc Wilhelm Küster, Graham Moore, and Christoph
Ludwig, “Semantic registries.” In: XMLTage 2007 in Berlin, Berlin, 2007
[Lagoze, Hunter, 2001] C. Lagoze, J. Hunter: “The ABC Ontology and Model”,
Journal of Digital Information, Vol 2, No 2, 2001
[Lahti, Palola, Korva, et al, 2006] J. Lahti, M. Palola, J. Korva, U. Westermann, K.
Pentikousis, P. Pietarila: “A mobile phone-based context-aware video
management application,” In: Multimedia on Mobile Devices II, Edited by
Creutzburg, Takala, Chen, Proceedings of the SPIE, Volume 6074, p 204-215,
2006
[Lamsfus, Linaza, Smithers] Carlos Lamsfus, María Teresa Linaza, Tim Smithers:
“Towards semantic-based information exchange and integration standards: the
art-E-fact ontology as a possible extension to the CIDOC CRM (ISO/CD 21127)
standard”. K-CAP2005, Banff, Alberta, Canada, Proceedings (ISSN 1613-0073)
of the Workshop on Integrating Ontologies, p 49-54
[Landwehr, Bull, McDermott, Chpi, 1994] C.E. Landwehr, A.R. Bull, J.P. McDermott,
W.S. Chpi: A Taxonomy of Computer Program Security Flaws, with Examples.
ACM Computing Surveys, 26,3 (Sept 1994),
http://chacs.nrl.navy.mil/publications/CHACS/1994/1994landwehr-acmcs.pdf
[Lassila, Swick, 1999] O. Lassila, R.R. Swick: “Resource Description Frameworks
(RDF): Model and Syntax Specification”, Recommendation World Wide Web
Consortium, February 1999
[LOCODE] http://www.unece.org/cefact/locode/
[Lu, Meng, Shu, et al, 2005] Y. Lu, W. Meng, L. Shu, C. Yu, K. Liu: Evaluation of
Result Merging Strategies for Metasearch Engines. WISE Conference, 2005
[Lu, Wu, Zhao, et al, 2007] Yiyao Lu, Zonghuan Wu, Hongkun Zhao, Weiyi Meng,
King-Lup Liu, Vijay Raghavan, Clement Yu: MySearchView: A Customized
Metasearch Engine Generator. 26th ACM SIGMOD International Conference on
Management of Data (SIGMOD 2007), Demo paper, p 1113-1115, Beijing,
China, June 2007
[Marradi, 1990] A. Marradi Classification, Typology, Taxonomy. Quality and Quantity,
1990, XXIV, 2, p 129-157. Available at:
http://web.archive.org/web/20040705070709/http://www.unibo.edu.ar/marradi/cl
assqq.pdf (Visited 2004-01-04)
[McDowell, 2003] L. McDowell, O. Etzioni, S. Gribble, A. Halevy, H. Levy, W.
Pentney,D. Verma, S. Vlasseva, Enticing ordinary people onto the Semantic
Web via instant gratification. In: Proceedings of the 2nd International Semantic
Web Conference (ISWC 2003), October 2003
[Medjahed, Bouguettaya, 2005] Brahim Medjahed, Athman Bouguettaya: A Multilevel
Composability Model for Semantic Web Services, IEEE Transactions on
Knowledge and Data Engineering (July 2005) vol 17 Issue7 p 954-968
[Meehl, 1995] P.E. Meehl: Bootstraps taxometrics: solving the classification problem
in psychopathology. American Psychologist, 1995, 50(4), p 266-275
[Meng, Yu, Liu, 2002] W. Meng, C. Yu, K. Liu: Building Efficient and Effective
Metasearch Engines. ACM Computing Surveys, 34(1), March 2002, p 48-89
[Merholz, 2004] P. Merholz: Ethnoclassification and vernacular vocabularies, 2004
[metasearch] http://www.trln.org/events/NISO/NISOmetasearch.ppt
[Miles, Brickley, 2005] A. Miles, D. Brickley: SKOS Core Vocabulary Specification,
W3C Working Draft, 2005
136 – CEN/ISSS WS/eTOUR – CWA – 2009-06-03