eTOUR CWA Final 2009-06-03

CEN
CWA - - - - -
WORKSHOP Final 2009-06-03
AGREEMENT
ICS Number
English version
Harmonization of data interchange in tourism
This CEN Workshop Agreement has been drafted and approved by a Workshop of representatives of interested parties, the constitution
of which is indicated in the foreword of this Workshop Agreement.
The formal process followed by the Workshop in the development of this Workshop Agreement has been endorsed by the National
Members of CEN but neither the National Members of CEN nor the CEN Management Centre can be held accountable for the technical
content of this CEN Workshop Agreement or possible conflicts with standards or legislation.
This CEN Workshop Agreement can in no way be held as being an official standard developed by CEN and its Members.
This CEN Workshop Agreement is publicly available as a reference document from the CEN Members National Standard Bodies.
CEN Members are the national standards bodies of Austria, Belgium, Bulgaria, Cyprus, Czech Republic, Denmark, Estonia, Finland,
France, Germany, Greece, Hungary, Iceland, Ireland, Italy, Latvia, Lithuania, Luxembourg, Malta, the Netherlands, Norway, Poland,
Portugal, Romania, Slovakia, Slovenia, Spain, Sweden, Switzerland, and United Kingdom.
EUROPEAN COMMITTEE FOR STANDARDIZATION

COMITÉ EUROPÉEN DE NORMALISATION
EUROPÄISCHES KOMITEE FÜR NORMUNG
Management Centre: Avenue Marnix 17, 36 B-1000 Brussels
© 2009 CEN All rights of exploitation in any form and by any means reserved worldwide for CEN National Members.
Ref. No. CWA - - - - -

2 – CEN/ISSS WS/eTOUR – CWA – 2009-06-03
Contents
Foreword 7
Executive summary 8
Summary of recommendations 11
Overall recommendations 11
List of recommendations on different topics 12
1 Scope 17
2 Normative references 18
3 Abbreviations, terms and definitions 19
3.1 Abbreviations 19
3.2 Terms and definitions 20
4 Methodology and thematic overview 21
4.1 Thematic circle 21
4.2 Topics 23
4.2.1 Semantics 23
4.2.2 Data transformation 24
4.2.3 Process handling 25
4.2.4 Metasearch 25
4.2.5 Object identification 25
4.3 Cross-cutting concerns / Prerequisites 26
4.3.1 Legal aspects 26
4.3.2 Multiculturalism 27
4.3.3 Business models 28
4.3.4 Technology 29
5 Case study 30
5.1 The processes 31
5.1.1 The actors 31
5.1.2 Consumer process 31
5.1.3 Travel-related professional process 33
5.2 The information and communication technologies 34
5.2.1 Multiple levels of data sources 34
5.2.2 Type of information 36
5.2.3 Type of data sources 38
6 Semantics 40
6.1 Standards 40
6.1.1 Needs and requirements 40
6.1.1.1 Introduction 40
6.1.1.2 Needs 41
6.1.1.3 Requirements 42
6.1.2 State of the art 42
6.1.2.1 Types of standards 44
6.1.2.2 List of travel industry standards, companies and organizations
(examples) 44
CEN/ISSS WS/eTOUR – CWA – 2009-06-03 – 3
6.1.3 Gaps and future needs 55

6.1.4 Recommendations 55
6.1.4.1 Short-term recommendations (1–3 years) 55
6.1.4.2 Long-term recommendations (3–10 years) 56
6.2 Taxonomies 56
6.2.1.2 Needs 56
6.2.2.1 Examples of tourism taxonomies 58
6.3 Ontologies 60
6.3.1.2 Needs 61
6.3.2.1 Definitions of the notion of ontology within the computer science
domain 62
6.3.2.2 Main components of an ontology 63
6.3.2.3 Ontology development tools 63
6.3.2.4 Ontology development languages 64
6.3.2.5 Examples of standard ontologies 65
7 Data transformation 70
7.1 Structured data mapping 70
7.1.1.2 Needs 71
7.2 Manual semantic annotation 75
7.3 Automatic information extraction 79
7.3.1.1 Needs 79
7.3.2.1 Named entity recognition 80
7.3.2.2 Event extraction 80
7.3.2.3 Tourism-specific information extraction 81
7.3.3.1 Named entity recognition 82
7.3.3.2 Event extraction 82
7.3.3.3 Tourism-specific information extraction 82
7.4 Inter-ontology mapping 83
7.4.1.2 Needs 83
8 Process handling 87
8.1 Needs and requirements 87
8.1.1 Introduction 87
8.1.2 Needs 88
8.1.3 Requirements 90
8.2 State of the art 91
8.2.1 Global standardization efforts 91
8.2.2 Application Integration and APIs 91
8.3 Gaps and future needs 92
8.4 Recommendations 93
8.4.1 Short-term recommendations (1–3 years) 93
8.4.2 Long-term recommendations (3–10 years) 93
9 Metasearch 94
9.1 Methodology 94
9.1.1.2 Quality of results 94
9.1.1.3 Response time 94

9.1.1.4 Access to data 95
9.1.1.5 Efforts for maintenance 95
9.1.2.1 Web crawler 95
9.1.2.2 HTTP requests 95
9.1.2.3 Website wrapper 96
9.1.2.4 Application Programming Interfaces (API) 96
9.1.2.5 Web services 96
9.1.2.6 Semantic annotation 96
9.1.2.7 Caching mechanism 97
9.1.2.8 Summary 97
9.2 Querying 99
9.2.1.2 Needs and requirements 99
9.2.2.1 Methods for query distribution 100
9.2.2.2 Query by example 101
9.2.2.3 Standardized query languages 101
9.2.2.4 Interface standardization 102
9.2.2.5 Metadata syndication 103
9.2.3.1 Query by example 104
9.2.3.2 Standardized query languages / SPARQL 104
9.2.3.3 Interface standardization 104
9.2.3.4 Metadata syndication 105
9.3 Role of registries in eTourism 106
9.3.1.2 Needs 106
9.3.2.1 UDDI and the ebXML Registry Specification 107
9.3.2.2 CEN/ISSS eGovernment Focus Group and CEN/ISSS WS
eGov-Share 109
9.3.3.1 Shortcomings of current registry standards 111
9.3.3.2 Future needs 112

10 Object identification 115
10.1 Needs and requirements 115
10.1.1 Introduction 115
10.1.2 Needs 115
10.1.3 Requirements 116
10.1.3.1 Location codes 116
10.1.3.2 Travel service codes 116
10.1.3.3 Travel service qualifier codes 117
10.1.3.4 Travel company codes 117
10.2 State of the art 117
10.2.1 IATA 117
10.2.2 ICAO 118
10.2.3 ISO 119
10.2.4 UN/LOCODE 119
10.2.5 HEDNA 120
10.2.6 ACRISS 120
10.2.7 GIATA 120
10.2.8 GS1 121
10.2.9 URI 121
10.2.10 UUID 121
10.3 Gaps and future needs 122
10.3.1 Location 122
10.3.1.1 Country codes 122
10.3.1.2 Region codes 122
10.3.1.3 City, airport and other point of travel codes 123
10.3.2 Currency and language codes 124
10.3.3 Travel service codes 124
10.3.4 Travel service qualifier codes 124
10.3.5 Travel company codes 124
10.4 Recommendations 125
10.4.1 Short-term recommendations (1–3 years) 125
10.4.2 Long-term recommendations (3–10 years) 125
11 Best practice case 126
11.1 The starting point 126
11.2 The existing case of euromuse.net 126
11.3 Future scenario for euromuse.net 127
11.4 Critical discussion 128
12 Bibliography and references 130
Foreword
The objective of the Workshop CEN/ISSS WS/eTOUR on “Harmonization of data
interchange in tourism” and the production of this draft CEN Workshop Agreement
(CWA) was approved by the Workshop at its plenary meeting held in Brussels on 6
February 2008.
This final version of the CWA was approved by letter ballot following the final
Workshop meeting on 15 May 2009.
The document has been prepared by the eTOUR Project Team:
 David Faveur, Afidium, France,

 Manfred Hackl, x+o Business Solutions GmbH, Austria,
 Marc Wilhelm Küster, Fachhochschule Worms, Germany,
 Carlos Lamsfus, Asociacion Centro de Investigacion Cooperativa en
Turismo, Spain.
In his capacity as Chair of the Workshop Wolfram Höpken, University of Applied

Sciences Ravensburg-Weingarten and Etourism Competence Center Austria, has
contributed greatly to the work with the CWA.
The Secretary of the Workshop has been Håvard Hjulstad, Standards Norway.
Workshop participants have included: Afidium (France) • Asociación Centro de

Investigación Cooperativa en Turismo, CICtourGUNE (Spain) • BIT Reiseliv (Norway)
• Centre de Recherche Public Henri Tudor (Luxembourg) • ECCA – Etourism
Competence Center Austria • eCl@ss – International Classification System
(Germany) • euromuse.net – the European exhibition portal • ETOA – European Tour
Operators Association • Fachhochschule Worms (Germany) • Hochschule München
– Fakultät für Tourismus (Germany) • FernUniversität in Hagen (Germany) • Infoterm
– International Information Centre for Terminology • IfM – Institute for Museum
Research – SMB-PK (Germany) • OpenTravel Alliance (USA) • Smart Information
Systems (Austria) • Travel and Telecom Ltd (UK) • TTI – Travel Technology Initiative
Ltd (UK) • Universitat Oberta de Catalunya (Spain) • x+o Business Solutions GmbH
(Austria)
This CEN Workshop Agreement is publicly available as a reference document from

the National Members of CEN: the national standards bodies of Austria, Belgium,
Bulgaria, Cyprus, Czech Republic, Denmark, Estonia, Finland, France, Germany,
Greece, Hungary, Iceland, Ireland, Italy, Latvia, Lithuania, Luxembourg, Malta, the
Netherlands, Norway, Poland, Portugal, Romania, Slovakia, Slovenia, Spain,
Sweden, Switzerland, and United Kingdom.
Executive summary
Problem statement
Tourism is in the vanguard of ICT adoption and eBusiness in the area of eMarketing
and online sales (B2C). Yet, in a ranking of various sectors the tourism industry only
achieves a mid-level score in the overall use of ICT and eBusiness. It is still lagging
behind especially regarding the deployment of ICT infrastructure and the adoption of
e-integrated business processes [eBusiness W@tch Report 2006/2007, p 167]. At
the same time, tourism is an important and growing sector of the European economy,
with a large presence of SMEs.
Electronic data interchange and the interoperability between systems of different

parties are critical for the execution of eBusiness processes throughout the entire
industry. A CEN Workshop was set up to recommend approaches for reaching global
interoperability, i.e. seamless data interchange and execution of business processes
in the tourism sector, meeting the requirements of players on all levels of the value
chain.
Approach
Data interchange has two key components: The electronic data itself and the
exchange of data between two or more tasks in larger process chains. This hinges on
the ability of all tasks to understand the data they are supposed to consume – i.e.
data interoperability – and of processes to be able to meaningfully cooperate –
process interoperability. This draft CWA thus circles around the two core issues
“data” and “processes” and related challenges in the domain that need deeper
analysis. In particular, we have identified five topics for further analysis which are
briefly outlined below: “semantics”, “data transformation”, “process handling”,
“metasearch” and “object identification”.
These five topics are placed in the larger context of four cross-cutting concerns that
permeate all of them. Tourism transactions on the one hand regularly transcend
national and cultural boundaries and frequently involve both very small and very
large players. On the other hand, very many of the parameters – rating systems for
accommodation, opening hours of sites, classification of beaches – are regulated
nationally or even regionally and reflect cultural preferences. All transactions must
naturally follow pertinent national or regional laws and regulations. This leads to the
four cross-cutting concerns “Legal aspects”, “Multiculturalism”, “Business models”,
and “Technology”.
The five challenges

Semantics
The meaning and structure of data is at the heart of data interoperability – and, given
the plethora of pertinent formats, it is unfortunately a complex problem. Agreed
strategies towards the expression of semantics in eTourism applications are key to
the flexible integration of heterogeneous data structures from a wide number of data
sources. In this it is also a central requirement for the building of flexible, cross-
organizational process chains.
Data transformation
The co-existence of many different data formats already implies the need to
transform data during data exchange. This mapping can affect data structures on
different levels that need to be transformed:
 Meta data: Ontologies and taxonomies;

 Structured data;
 Unstructured data.
Together with well-defined semantics, data transformation is an essential tool to

integrate data sources and build cross-organizational processes.
Process handling
The World Wide Web has significantly boosted the use of ICT in the tourism industry
and empowered customers to make travel arrangements autonomously by the use of
a wide variety of different data sources. This requires the seamless interplay of
different computer systems, allowing new online services like dynamic packaging of
tourism products.
Metasearch
Metasearch proper builds on shared semantics and data transformation to enable
searches across different individual search components of heterogeneous websites
and aggregate the results in a unified list. From a user’s perspective they offer thus a
one-stop entry point to a specific type of information; from a technology perspective
they have high demands on distributed data querying.
Object identification
Electronic transactions often hinge upon the idea of being able to uniquely identify
the objects on which they operate. In contrast to for example flights, there are many
types of objects in Tourism that do not have a unique identifier. There is at present no
universally accepted scheme to identify, say, a given hotel that should be booked, or
to compare different offers for the same hotel.
Best practice case

To demonstrate the whole interoperability issue and reflect on ways how to solve the
problems derived from the five challenges, the existing eTEN project euromuse.net
has been chosen as a best practice case. euromuse.net deploys the Harmonise
technology, a result of a former IST project, to mediate between different data
formats from the cultural heritage and the tourism sector and is confronted very much
with the same challenges as discussed in the workshop report.
“Mediation” has been identified as the key concept to reach interoperability in a highly
fragmented and diversified area like the tourism industry. This best practice case
demonstrates the way how to easily reach interoperability by data mediation, while
leaving enough flexibility to each partner to define his own data format.
Recommendations
The workshop came up with a number of recommendations that are all centred
around the basic idea to deal with the diversity of existing standards, technologies,
projects, and entities – rather than bringing another standard to the market. The
keywords in this context are harmonization and mediation.
The suggested approach is to watch carefully existing standards or approaches,

when starting to create something new, and to build upon them keeping differences
to a necessary minimum. This harmonization shall help to avoid isolated standards
and approaches that make interoperability difficult.
Furthermore, ways should be found to mediate between the remaining differences of

existing approaches. The tourism sector has come up with a broad spectrum of
different standards and models, and for various reasons it will be difficult, if not even
impossible, to replace them. This diversity is also needed to some extend and
mediation between them shall help to deal with these differences.
To oversee the market, it is highly recommended to implement a watchtower as a

follow-up action within the work of this CEN workshop, keeping a map of the
semantic landscape to support harmonization of data and offering technology and
recommendations to mediate between existing standards. HarmoNET, as an existing
non-profit network, established out of a European project and dedicated to data
mediation in tourism, shall be the starting point for this watchtower.
In addition it is recommended to invest in long-term research on semantic methods

and tools, as well as new ways of object identification, to continue what has already
started in several European projects.
These recommendations aim at keeping diversity and flexibility of the European

eTourism landscape, while allowing process and data interoperability for the actors
involved to achieve a higher level of e-integration.
Summary of recommendations
Overall recommendations
The workshop came up with a number of recommendations that are all centred
around the basic idea to deal with the diversity of existing standards, technologies,
projects, and entities – rather than bringing another standard to the market. The
keywords in this context are harmonization and mediation. The desirable it seems to
unify terms and standards to allow easy exchange of information and execution of
processes, the important it is to leave the market flexibility and diversity to define data
schemas. Instead, ways should be found to mediate between the different
approaches. The tourism sector has come up with a broad spectrum of different
standards, and for various reasons it will be difficult, if not even impossible, to replace
them.
One reason is that eTourism-relevant information, like most of all product

descriptions and information classifications, are often deeply rooted in local and
national peculiarities and are sometimes even expressed by national law. Take as
simple example the classification of “sea view” or of “wellness area”.
Another reason is the game of market forces, making it difficult to reach consensus
on the issues involved. Different from many other industries, like the construction
industry, the benefit from having different standards seems to have more advantages
than the lack of interoperability has disadvantages. This can be observed in the area
of destination management as well as on the side of tour operators. However, the
need for standardization is recognized as it can be seen from different industry
associations and forums. But strong resistance can be observed when discussing
approaches for European or worldwide standards.
Above all the detailed recommendations listed in the chapters below, a general
approach is therefore suggested to harmonize (keeping differences to a minimum)
and to mediate (enable understanding between the differences) existing formats and
standards. This approach must be flexible, easy to use and cost-effective, as it is the
case for example within the project euromuse.net, which is described as the best
practice case. These criteria are critical to the success of the approach, since the
tourism sector is characterized by a large number of small and medium-sized
organizations.
The approach of mediation shall in no way invite to establish as many isolated new
standards as possible. One should rather try to watch carefully existing standards or
approaches when starting to create something new, to enable later mediation
between them as easily as possible, only deviating from other standards where it is
absolutely required.
To ease the harmonization and mediation, it is highly recommended to implement a

watchtower, keeping a map of the semantic landscape and offering technologies and
recommendations to mediate between existing standards. The watchtower shall
monitor relevant standards and reference lists to see what is coming up and what is
used in the market. It could also help to identify existing standards or frameworks for
object identification. In addition it could also keep track of technologies and projects
easing the problem of data and process interoperability, to come up with
recommendations on interoperability approaches and best practises for data models
and for interoperability approaches. At the same time the watchtower could
operatively be offering a data mediation service between the recognized standards in
the field, to serve as a central data meditation service.
It is recommended to write a more detailed proposal for this “eTourism watchtower”

as a follow-up action within the work of this CEN workshop. HarmoNET, as an
existing non-profit network established out of a European project and dedicated to
data mediation in tourism, is the ideal starting point for this watchtower. HarmoNET is
composed by main tourism bodies from different levels and is well positioned as the
host for the watchtower.
In addition it is recommended to invest in long-term research on semantic methods

and tools, as well as new ways of object identification, to continue what has already
been started in several European projects. A more detailed recommendation on
research areas is given in the following chapters. However, the proposed watchtower
could also help to identify gaps and needs for long-term research.
All these recommendations aim at keeping diversity and flexibility of the European
eTourism landscape, while allowing process and data interoperability for the actors
involved to achieve a higher level of e-integration.
List of recommendations on different topics

Under each topic are listed short-term recommendations and long-term
recommendations with time spans of 1–3 and 3–10 years respectively.
Standards
Short-term recommendations
 Leverage existing standards rather than develop new specifications whenever

possible.
 Build cooperation between private associations like IATA, OTA, and XFT and
formal standardization bodies such as ISO and CEN.
 Build a “watchtower” registry of relevant eTourism standards that is also acting
as a coordination body between various formal and informal standardization
activities. Such an activity can be modelled on the MoU/MG.
Long-term recommendations
 Lower the entry barrier for participation in pertinent formal and informal
standardization bodies especially for SMEs and extend the scope of those
activities to cover the requirements of SMEs.
 Work on interoperability approaches between different standards.
Taxonomies
 Follow existing taxonomies including established definitions wherever

possible.
 Produce mappings between eTourism-related taxonomies.
 Federate existing eTourism-related taxonomies across languages based on
taxonomy mappings and offer a SKOS (Simple Knowledge Organisation
System) interface to them.
 Formulate guidelines for the design of eTourism-related taxonomies.
 Build organizational structures for the long-term duration of eTourism-related

taxonomies.
Ontologies
 Use recognized standard reference models such as the Harmonise ontology

(for tourism purpose) or CIDOC CRM (for cultural heritage data) wherever
possible.
 Produce guidelines for the mappings between eTourism-related ontologies
based on standard reference models.
 Use established standards such as RDF(S), OWL or the Topic Map Constraint
Language to express ontologies.
 Heighten the awareness of Open Source, user-friendly tools for ontology
definition such as Protegé.
Structured data mapping

 Use (graphical) mediation tools enabled with reasoning capabilities to

automatically suggest same (semantically equivalent) data resources, identify
inconsistencies and decreases the amount of human intervention in the
mapping processes.
 Pursue the design and implementation of new data resources on the bases of
agreed recommendations, such as the W3C recommendations for Semantic
Web technologies.
 Use semantic web technologies (e.g. based on RDF URIs) to name and
represent (data) resources on the Web so that mapping can be automatically
undertaken.
 Agree the degree of formality information ought to be defined with, so that
automatic mapping tools can compare information.
 Ontologies should be developed on different abstraction level. Agreed high-
level ontologies should be in place and should be used when defining domain
ontologies. General domain ontologies should be reused when more specific
sub-domain ontologies are defined.
Manual semantic annotation

Due to the nature of this topic, there can be some overlapping of recommendations
with other issues that have already been covered, such as ontologies.
 Enhance the use of standard ontologies (e.g. Harmonise) on the field of

tourism.
 Enhance the development of ontologies with standard languages: OWL, RDF.
 Enhance the use of already existing manual annotation tools in the realm of
tourism.
 Investigate in automation of annotations:

 Investigate in automatic ontology extension.
Automatic information extraction

 Foster the use of semantic web technologies to describe non-structured data

on the web by the means of resources to make data machine processable.
 Semantically tag non-structured information.
 Together with a recognized body such as the W3C, agree on the name that
ought to be used for the tags that represent a particular tourism content and
that is valid for search machines.
 Develop SW that enables (semi)automatic information annotation according to
the previous recommendation.
Inter-ontology mapping
 Foster the development of ontologies using the same standard definition

language as well as the same degree of formality and expressivity to ease
automatic ontology mapping, following W3C recommendations.
 Based on the short-term recommendations, build graphic user interface based

tools that automatically merge and link ontologies, using the ontologies'
reasoning capabilities to automatically find and resolve alignment
inconsistencies.
Process handling
 Simplify and rationalize existing processes – use stateless process handling or

request-response-pairs only.
 Build an ontology of common processes in the tourism industry.
 Develop process mediators.

 Put research efforts into intelligent agent technologies for automatic process
handling.
Metasearch methodology
 Make use of semantic technologies to describe your data.

 Provide content and meta-content as close to an existing standard as
possible.
 Provide regularly updated, external data stores with pre-processed and well
described content for fast querying (caching mechanism), if you have larger
querying process times or complex queries.
 Development of aggregated data repositories, providing pre-processed data
from different sources.
 Focus on development of fast and easy to use alternatives of metasearch

technologies, enabling or supporting use of semantic technologies for data
transformation.
Querying
 If a system should be available for external queries, make use of general

query statements that are supported by a broad range of query languages.
Avoid specific features and functionality of own database.
 Further develop flexible standardized query languages that can be adapted to
different system environments and support semantically enriched data.
 Publish “partial translators”, which provide a structured translation for human
search concepts like “near”, that can be used by different query languages.
 Research on technologies for flexible and adaptive query methods, that are
able to understand semantics of a web repository and can send an appropriate
query.
Object identification
 Build a registry of present object identifications in the tourism industry.

 Develop travel related global geography identifiers and build transcoding
capabilities.
 Develop travel company related global identifiers.
 Provide guidelines for travel service coding schemes.

 Build a global repository with transcoding capacities.
1 Scope
The CEN/ISSS Workshop on eTourism aims at producing guidelines for reaching
global interoperability, i.e. enabling seamless data interchange and execution of
eBusiness processes in the tourism sector.
The Workshop’s main deliverable will be a CEN Workshop Agreement (CWA) on

“Harmonization of data interchange in tourism”.
The CWA will cover the following topics under a pan-European interoperability
perspective:
a. analysis and identification of the needs of B2B and B2C partners for
harmonized data interchange;
b. analysis of the gaps in the design of current interoperability approaches;
c. description of the metadata and principles and requirements for data
modelling;
d. analysis of business models and legal issues (IPRs5, DRMs, Personal data
protection and privacy);
e. analysis of existing initiatives and approaches for flexible harmonization and
global interoperability (including process interoperability);
f. recommendations concerning a general framework for eTourism related
information exchange;
g. best practice case.
The Workshop’s main focus is on interoperability issues in electronic data

interchange. It will analyse and further build on the results of the already completed
European projects Harmonise (Tourism Harmonisation Network), HarmoTEN and
Satine (Semantic based interoperability infrastructure for integrating web service
platform to peer to peer networks). The Workshop’s aim is to validate and
disseminate their results to a wider audience than the project partners. The CEN
Workshop will build on the work done by previous projects on metadata frameworks
and ontologies.
It is outside the scope of the Workshop to do any direct standardization work on

terminology. Instead, it will analyse existing initiatives or approaches to the
interoperability problem and recommend steps how to make maximal use of such
approaches as well as necessary research activities to further improve them.
The CEN Workshop will focus on data integration and discovery as well as seamless
execution of eBusiness processes. Application of the above will support end-user
satisfaction/consumption of travel products, increase data reliability, revenue
generation and margin contribution, motivating early adoption and roll out to market.
2 Normative references
The following normative documents (European and International Standards) are
referenced in this document. Other documents of interest are listed in the
Bibliography.
 ISO 639-1:2002 Codes for the representation of names of languages — Part

1: Alpha-2 code
2: Alpha-3 code
3: Alpha-3 code for comprehensive coverage of languages
 ISO 3166-1:2006 Codes for the representation of names of countries and their
subdivisions — Part 1: Country codes
 ISO 3166-2:2007 Codes for the representation of names of countries and their
subdivisions — Part 2: Country subdivision code
 ISO 4217:2008 Codes for the representation of currencies and funds
 ISO/IEC 7810:2003 Identification cards — Physical characteristics
 ISO 9000:2005 Quality management systems — Fundamentals and
vocabulary
 ISO/IEC 9075 (several parts) Information technology — Database languages
— SQL
 ISO/IEC 9834:2005 (several parts) Information technology — Open Systems
Interconnection — Procedures for the operation of OSI Registration Authorities
 ISO/IEC 10646:2003 Information technology — Universal Multiple-Octet
Coded Character Set (UCS)
 ISO/IEC 13250:2003 Information technology — SGML applications — Topic
maps
 ISO 14001:2004 Environmental management systems — Requirements with
guidance for use
 ISO 16642:2003 Computer applications in terminology — Terminological
markup framework
 ISO 21127:2006 Information and documentation — A reference ontology for
the interchange of cultural heritage information
 ISO/IEC Guide 2:2004 Standardization and related activities — General
vocabulary
3 Abbreviations, terms and definitions

3.1 Abbreviations
A I
AI — artificial intelligence IATA — International Air Transport
API — application programming Association
interfaces ICAO — International Civil Aviation
ASCII — American Standard Code for Organization
Information Interchange ICT — information and
communications technology
B
IEC — International Electrotechnical
B2B — business to business
Commission
B2C — business to consumer
IEEE — Institute of Electrical and
B2G — business to government Electronic Engineers
C IFITT — International Federation for IT
CEN — European Committee for and Travel & Tourism
Standardization IFLA — International Federation of
CRS — computer reservation system Library Associations and
Institutions
CWA — CEN Workshop Agreement
IPR — intellectual property right
CycL — Ontology language used in AI
and computer science ISO — International Organization for
Standardization
D IST — Information Society
DB — database Technologies
DML — data manipulation language M
DRM — digital rights management M2M — machine to machine
E N
EDI — electronic data interchange NKRL — narrative knowledge
ETSI — European representation language
Telecommunications Standards
Institute O
OML — outline markup language
F
OWL — ontology web language
ftp — file transfer protocol
P
G
P3P — Platform for Privacy
G2C — government to citizen Preferences
GDS — global distribution system PDA — personal digital assistant
H PMS — property management system
HEDNA — Hotel Electronic Distribution Q
Network Association
QBE — query by example
HTML — hypertext markup language
http — hypertext transfer protocol
R U
RDF — resource description UCS — universal character set
framework (ISO/IEC 10646)
RDFS — resource description UNWTO — World Tourism
framework schema Organization
RMSIG — Reference Model Special URI — uniform resource identifier
Interest Group (under IFITT)
W
S W3C — World Wide Web Consortium
SCORM — Sharable Content Object WAI — Web Accessibility Initiative
Reference Model WSMO — web service modeling
SHOE — simple HTML ontology ontology
extensions WWW — World Wide Web
SME — small and medium enterprises
X
SOA — service-oriented architectures
SQL — standardized query language XFT — exchange for travel
XHTML — extensible HTML
T
XML — extensible markup language
TCP/IP — Transmission Control XSLT — extensible stylesheet
Protocol / Internet Protocol language transformation
TGV — train grande vitesse: high
speed train
3.2 Terms and definitions

For the purpose of this document the following definitions apply.
computer reservation system (CRS) — computerized system used to store and

retrieve information and conduct transactions
eTourism — eBusiness methods and techniques applied to the tourism domain
global distribution system (GDS) — CRS connecting and integrating the
automated booking systems of different organizations
tour operator — person or company that organizes tours
thesaurus — controlled vocabulary containing synonyms and relationships, but not
definitions (see 7.2.1.2)
taxonomy — subject-based classification using a controlled vocabulary in a
hierarchy (see 6.2 and 7.2.1.1)
ontology — (1) study of the nature of being, existence or reality; (2) structured
information about reality (see 6.3)
folksonomy — taxonomy developed as a broad collaborative effort (see 7.2.1.3)
4 Methodology and thematic overview

Tourism is in the vanguard of ICT adoption and eBusiness in the area of eMarketing
and online sales (B2C). Yet, in a ranking of various sectors, the tourism industry only
achieves a mid-level score in the overall use of ICT and eBusiness. It is still lagging
behind especially regarding the deployment of ICT infrastructure and the adoption of
e-integrated business processes [eBusiness W@tch Report 2006/2007, p 167].
Tourism is an important and growing sector of the European economy, with a large
presence of SMEs. ICT is an enabler to strengthen efficiency, reduce costs and
improve competitiveness of the industry. Tourism is expected to contribute 8.4 % of
total employment and 9.9 % of the GDG worldwide [World Travel and Tourism
Council, 2008, p 4].
For these reasons, it is important that companies and associations in the tourism
sector understand the benefits they can reap from eBusiness, enhance their ICT
infrastructure, and adopt eBusiness processes.
Electronic data interchange and the interoperability between systems of different

parties are critical for eBusiness processes in all industry sectors. This CWS focuses
on approaches for reaching global interoperability, i.e. seamless data interchange
and execution of business processes in the tourism sector.
In eBusiness implementations the tourism sector has some specificities. Data quality
and reliability are critical issues (e.g. updated opening hours for a museum, reliable
on-line booking). Other critical issues are territorial definition and coordination
between regional or local groups and national sites. Commercial information (B2B,
B2C, B2G) and “touristic information” (information to the end user, G2C) are both
concerned. All involved parties provide information at different levels (e.g.
government – travel warning; B2C the mentioned opening hours, B2B distribution
prices and their meanings). These specificities lead to a high degree of heterogeneity
in tourism. Tourism market structures are complex and highly fragmented.
Information interchange on the level of processes and data structures is not
harmonized and the electronic execution of business processes on a global level is
still burdened by heterogeneous interfaces and data structures.
4.1 Thematic circle

Data interchange has two key components: the electronic data itself and the
exchange of data between two or more tasks in larger process chains; see figure 4-1.
Figure 4-1
This hinges on the ability of all tasks to understand the data they are supposed to
consume – i.e. data interoperability – and of processes to be able to meaningfully
cooperate – process interoperability.
interoperability. Our report thus circles around the two key
concepts of data and processes; see figure 4-2.4
Figure 4-2
The circle captures the relationship between data and process interoperability with
key enablers that need deeper analysis. In particular, we have identified the following
topics for further analysis:
 Semantics
 Data transformation
 Process handling
 Metasearch
 Object identification
The topics are placed in the larger context of four cross-cutting concerns that
permeate all of them. Tourism transactions on the one hand regularly transcend
national and cultural boundaries and frequently involve both very small and very
large players. On the other hand, many of the parameters – rating systems for
accommodation, opening hours of sites, classification of beaches – are nationally or
even regionally regulated or reflect cultural preferences. All transactions must
naturally follow pertinent national or regional laws and regulations.
Processes will in particular be implemented in line with the process owner’s overall
business model. The data structures will similarly often be dictated by the owner’s
value proposition. Furthermore, both data and processes will at least to a degree
reflect the technology – software, hardware, overall connectivity etc. – on which the
system in question operates.
4.2 Topics
The following subsections will briefly present each of the selected topics, give a birds-
eye view, and motivate the rationale for their choice. The remainder of the report will
then examine the issues methodically and in more detail.
4.2.1 Semantics
The meaning and structure of data is at the heart of data interoperability – and, given
the plethora of pertinent formats, it is unfortunately a complex problem. Differences
on the syntactic level – say, XML messages versus comma-separated files or EDI-
type communications – can already impact how much semantics data carries already
in itself. Formal or informal standards can externally assign meaning to an otherwise
meaningless data set (say, to an otherwise arbitrary sequence of fields in the rows of
a csv file), or explicate the semantics of XML structures that to humans are already
partially self explanatory.
Taxonomies can help to unambiguously specify possible value sets for the data,
ideally combined with specific definitions of the individual options and their
relationship to others. Ontologies can then reference and use theses value sets in
properties of classes that go a long way further towards specifying the exact
semantics of data.
In conjunction with data transformation techniques agreed strategies towards the

expression of semantics in eTourism applications are crucial to the flexible integration
of heterogeneous data structures from a wide number of data sources. In this it is

also a central requirement for building flexible, cross-organizational process chains.
4.2.2 Data transformation

The co-existence of many data formats already implies the need to transform data
during data exchange. This mapping can affect data structures on different levels:
 Ontologies and taxonomies

 Structured data
 Unstructured data
Together with well-defined semantics data transformation is an essential tool to

integrate data sources and build cross-organizational processes.
Structured data: Structured data can be expressed in a number of syntactic formats

(XML, csv, EDI etc.), but even within one “syntactic family” the concrete data
structures regularly conflict. For XML, standardized technologies such as XSLT are
used to describe and execute the mapping between data sets. However, this usually
involves loss of information and only works with a limited precision depending on the
similarity of the underlying data models.
Unstructured data: Much of the eTourism-related data is only available in

unstructured formats such as web sites. To use this data in automatic transactions is
difficult at best, as the semantics of individual data sets are quite unclear. Two
strategies can help to explicate their meaning: explicit manual semantic annotation
and automatic information extraction:
 Semantic annotation: Key information on a web page is explicitly added to the

site as metadata in a machine-readable format (e.g. a serialization format of
RDF).
 Automatic information extraction: The unstructured information is automatically
structured according to some predefined templates. The information is then
available for reuse.
Inter-ontology mapping: Often a number of independent ontologies compete in a

given domain, even more so in the case of overlapping domains. Furthermore,
different, though related standards such as RDF-S, OWL and Topic Maps [ISO/IEC
13250] are in common use to express ontologies syntactically or to describe the
constraints applied to it. These multitudes of approaches imply the need for reference
ontologies such as the Harmonise ontology to exchange semantic information across
individual ontologies.
An even larger number of formats are currently employed to express taxonomies,

many of which can be mapped to the reference system defined in the Terminology
Markup Framework [ISO 16642].
4.2.3 Process handling

The eTourism sector has evolved from isolated online presences to online
transaction platforms. The WWW has significantly boosted the use of ICT in the
tourism industry and empowered travellers and tourists to access more information
from a wide variety of data sources. More and more consumers want to arrange their
stay on their own and combine different products to a unique bundle instead of
buying pre-packaged tours. Dynamic packaging, for example, has become one of the
most discussed buzz words at industry events, but without process interoperability it
will always remain a more local or otherwise limited phenomenon.
Consumers are getting more and more used to make online transactions, and it
comes to a crowding out process: Business actors have to follow demand to keep or
expand their market share. Traditional distribution channels are vanishing, and more
flexible and dynamic networks rise. A trend for outsourcing and focussing on core
competences could be observed, leading to a more consumer-centric approach and
allowing highly individualized and ad-hoc product design. This challenge brings with it
the need to orchestrate business processes flexibly and across organizations.
4.2.4 Metasearch
One of the prerequisites for process handling is the ability to identify the relevant
players for potential joint processes and to find information across those players.
Registries, especially federated registries, will play a leading role in describing
potential partners and their services. They will thus facilitate to bring them together.
Metasearch builds on shared or mapped semantics and data transformation to

enable searches across different individual search components of heterogeneous
instances (platforms, websites, databases) and aggregate the results in a unified list.
From a user’s perspective they offer thus a one-stop entry point to a specific type of
information (e.g., hotels or flights).
At present search components differ in their query syntax, which makes it difficult to
scale metasearches and to spontaneously integrate new data sources. For the actual
technical realization of metasearches agreed query strategies and query syntaxes
are therefore desirable and being worked upon.
4.2.5 Object identification

Electronic transactions often hinge upon the idea of being able to uniquely identify
the objects on which they operate. A flight booking service needs to have a clear idea
how the flight booked actually maps to the physical event of a flight operated
between a point of departure A and a destination B. The mapping does not have to
be one-to-one – in our example a flight may have a number of identifiers through
code-sharing –, but the object described must be distinct.
While object identification does work for flights, there are many other types of objects
in eTourism that do not have a unique identifier. One of the most important cases in
point is accommodation. There is at present no universally accepted scheme to

identify, e.g., a given hotel that should be booked.
4.3 Cross-cutting concerns / Prerequisites

By their very nature cross cutting concerns permeate the requirements for and
implementation of all topics. We will here outline major characteristics of these
concerns that will be referenced and (if needed) expanded upon in the relevant
chapters of the report.
4.3.1 Legal aspects

eTourism transactions do not happen in a void. They regularly transcend national
and regional boundaries and are governed by the legal system(s) of the country or
countries concerned. Such transactions are almost always impacted by contractual
law, such as the laws that govern the contract between the end user and the tour
operator or service provider(s), between tour operators and their destinations, or the
obligations of travel agencies towards their customers. Related to this we see the
laws regulating the redress that one of the contractual partners can seek in the case
of a perceived break of obligations.
Laws, however, influence many other areas in eTourism transactions. The following
list is only indicative and certainly not a complete overview of pertinent legislation:
 Reporting obligations on security and crime prevention, especially in the case

of air transport.
 Classification schemes, e.g. in the form of legally defined eTourism terms (in
some countries).
 Customer protection laws setting, amongst others, minimum standards for the
data provided to end users.
 Reporting obligations on statistics.
 Health and anti-discrimination regulations that can impact eTourism data and
processes.
 Media publication regulations when sharing or reusing media on Internet in
general.
Some countries and regions such as Oberösterreich even have dedicated laws on
tourism (see http://www.oberoesterreich-tourismus.at/alias/lto/recht/410624/
tourismusrecht.html).
These laws are only partially harmonized across Europe, [Directive 90/314/EEC] as a
directive setting minimum pan-European standards for customer protections for
packaged tours being more an exception than the rule. Furthermore, the legal
systems of countries across Europe do not necessarily cover the same areas. For
example, hotel classification is mandated by law in some countries such as Italy and
Greece and does not even exist in others such as Finland.
Dynamic packaging (example): The example of dynamic packaging might illustrate

some aspects of the impact of legal system on tourism. For pre-packaged tours the
legal situation is quite clear from an end user’s point of view. Such tours are always
regulated by the national laws in question. The tour operator is from the customer’s
perspective the only contractual partner [Freyer, 2006, p 234] [Directive 90/314/EEC]
and responsible for providing all the services that were promised. It is also alone
responsible for any possible redress that may result from unsatisfactory services.
The situation is much muddier for extras such as car rental at the place of destination
for which a travel agency only acts as an intermediate. Dynamic packaging poses
even bigger problems in this direction. An intermediary – often a specialized travel
agent – combines pre-assembled packages based on user preferences. The user
does not administrate the different items in the package himself, but gets offers for
packages which are dynamically assembled based on his preferences. However, the
legal and contractual consequences of such dynamic bundles are not clear yet. For
an end-user such a bundle of sub-packages can imply also a set of separate
contracts which do not by themselves necessarily fall under the definition of
“package” of [Directive 90/314/EEC]. In consequence, the contractual situations and
the legal mechanisms for redress can be quite more complex for dynamic packaging.
An unnamed provider of software components for eTourism transactions named this
as the single biggest obstacle to the uptake of dynamic packaging.
4.3.2 Multiculturalism
Related to legal aspects are the multicultural facets of many eTourism transactions
which span cultures and frequently involve both very small and very large players,
thus also mixing organizational cultures. Culture here is a much wider concept than
high culture and covers “the set of distinctive spiritual, material, intellectual and
emotional features of society or a social group, and [...] encompasses, in addition to
art and literature, lifestyles, ways of living together, value systems, traditions and
beliefs” [UNESCO, 2002]. Europe in particular is characterized by multiculturalism
right down to its official motto, unity in diversity.
Many cultural preconditions have influenced local description systems such as rating
systems for accommodation or classification of beaches which in some countries are
nationally or even regionally regulated. Others such as usual opening hours of sites
or food offerings follow usually local customs without being subject to laws.
Multilingualism: Languages are an integral and often defining part of cultures, and as
such multiculturalism includes multilingualism, the coexistence of many languages.
Until around the turn of the millennium the treatment of multilingual data in computer
systems posed major problems. However, the widespread adoption of the Universal
Character Set (UCS) [ISO/IEC 10646], also known as Unicode, and its companion
standards has changed the game. The UCS is supported in virtually all current
operating systems and many application programs including all major browsers and
email clients. XML is squarely based on the UCS. Thus both the internal
representation, the exchange and the display of multilingual data is now quite
unproblematic.
That said, some of the Global Distribution Systems (GDSs) that are at the core of
many eTourism transactions stem from the 1950s and 1960s, and even the youngest
of the “big four” GDSs, Amadeus, was written in the 1970s and 1980s. In this they
long predate the UCS and have at best sketchy support for multilingual data. Many to
this day operate on subsets of ASCII. This obviously can create considerable issues
notably for the handling of personal names and the names of organizations. It is
outside of the scope of this report to elucidate these issues in detail, though it would
be highly beneficial to be able to get this overview.
Taxonomies and terminology are another important area in which data is necessarily
language-dependent. The exact definitions of categories such as “double room” (with
or without children), “luxury hotel” etc. will reflect the understanding in a given
language and culture.
Accommodation ratings (example): A concrete example for classification systems

that reflect on legal, cultural, linguistic, not to forget (in some countries) personal
preferences are accommodation ratings. As we have seen, some countries require
hotels to be classified whereas others do not even have a national classification
system. And where they do exist, the quality criteria differ widely from country to
country, as an overview such as http://www.hotelstars.org/ shows. For example, a
three-star hotel in Germany (http://www.hotelsterne.de/uk/system_kriterien.php) is
guaranteed to have rooms 14 m2 or bigger for singles and 18 m2 or bigger for
doubles with bilingual employees and a reception that is open 12 hours a day, to
single out only a few of the criteria. The rooms in a three-star hotel in Poland
(http://hotelarze.pl/en/regulations/) on the other hand must only have 10 m2 resp. 14
m2, but a minimum 12 hour room service, but do not need to command foreign
languages. Many other criteria are not even comparable as the overall schemes are
quite different.
In countries such as the USA the officious AAA classification is complemented by

many classification systems that are specific to individual travel websites such as
Expedia or Travelocity and at times even calculate customer feedback into the
figures. Depending on the (often non-transparent) weighting of criteria these sites
arrive at quite different ratings, which in turn may deviate again significantly from
customer ratings [Grossman, 2004]. With the increasing dominance of international
sites these additional ratings are going to start competing with the official or officious
national European rating systems.
In view of this multitude of taxonomies, which may or may not in turn coincide with
the customer’s own cultural and personal preferences, ratings will have to be based
on specific properties of accommodation, and, for that matter, general service, rather
than on general classifications alone. Searches for “hotels with WiFi, restaurant and
rooms over 20 m2” are likely to produce more acceptable results for users of many
cultural background than searches on “3-star” alone.
4.3.3 Business models

Each player in the tourism industry operates on an implicit or explicit business model.
The value proposition can be the traditional offer of a service, e.g., accommodation; it
can be the “convenience” and consulting proposition of a travel agent, or the
“integrated packaging” approach of a tour operator, to name only a few. Much of the
thrust of customer-driven eTourism transactions stems, in fact, from the desire to
disintermediate the industry or, at least, to offer a new type of intermediaries that
operate automatically and can thus compete primarily on the price front and, in part,
are closely related to today’s GDSs. The GDSs themselves operate on two related,
but distinct business models, namely of being a service company for major service
providers such as airlines, and as an integration platform for intermediaries.
All eTourism activities must be seen in the context of the relevant business models.
They dictate the initial willingness to interchange data and to engage in cross-
organizational processes. In much of a sense this willingness is a premise for this
report.
4.3.4 Technology
The advent of the World Wide Web makes a watershed also for the tourism industry.
As we have seen, GDSs have been operational since the early 1960s, but they
depended on highly proprietary distribution networks to allow travel agents to interact
with them. The advent of videotext systems such as BTX in Germany and Minitel in
France and similar technologies in the 1980s somewhat opened and standardized
these channels, but by and large the communication channels remained accessible
only to professional intermediates.
The success of the WWW has largely standardized the communication channels
between providers to standard internet protocols; not necessarily http, though, as
many larger data sets are still transferred using ftp or related protocols. The
underlying technology has in many cases changed much less, though, with today’s
GDSs largely operating on the same transactional stacks as before, but some –
though by no means all – of its details have been abstracted away through the
common protocols.
This standardization on common network protocols has allowed for the rise of
collaboration standards such as SOAP-based Web Services, XML-based data
formats, semantic standards and, last but not least, the http standard itself that is
again in today’s emphasis on RESTful web services. This report concentrates on the
interoperability layer between implementations.
5 Case study
Mechanisms and solutions for electronic data exchanges in the tourism industry were
developed a long time ago at first by airline companies in order to allow them to be
able to exchange data about flights and bookings. Different standards emerged from
those initial operational exchanges, taking into account the limitation of the means of
communications of that time.
Over the years, the need to access inventory, prices, booking files, customer data
and sales or descriptive information has boomed, first through the development of the
GDSs (Sabre in 1960, Galileo in 1971), main CRSs (Pegasus, Wizcom, etc.) and
more recently with the web, used both for B2B and B2C applications.
The thematic circle introduced earlier (4.1) will be illustrated through the following
case study. The base guideline for the case study corresponds to a consumer (end
user or travel-related professionals) wanting to book a trip or gather travel-related
information using information and communication technologies.
Figure 5-1
The case study is first detailed in terms of different trip phases and corresponding
information needs and processes to be used by the consumer:
 Before the trip to end up with a booked travel;

 Before the trip to increase his knowledge around the trip, update the trip itself,
etc.;
 During the trip to amend his trip or input comments, media, etc.;
 After the trip to testify, complain, etc.
Platforms, technologies, types of information and data sources are reviewed within
the case study. Some drawbacks and limitations, gaps and future needs will also be
identified and associated to the elements of our thematic circle, which will then be
detailed later in the document.
5.1 The processes

5.1.1 The actors
We consider the case study to be as general as possible and include any type of
tourism actors and business processes. In the context of the case study, we
differentiate the following types of tourism actors:
 end consumer (traveller, customer booking for somebody else, etc.),

 travel related professional (incoming agent, tour operator agent, travel
consultant, etc.).
Figure 5-2
5.1.2 Consumer process

Buying a trip taking advantage of the web can be seen as a four steps global
process:
 Discovering:
o select possible destinations and types of trips based on personal or
family interests (a particular activity or hobby, a destination, etc.);
o select according to a season (winter sport, sun in winter, etc.);
o investigate prices and opportunities, accommodations, services, events,
etc.;
o explore recommendations and ratings from other travellers;
o etc.
 Shopping: to match reality with expectations:
o compare prices;
o compare content of offers (similar offers, different types of trips, etc.);
o investigate testimonies.
 Constituting the trip itself by:
o validating price and availability for a trip from a unique vendor, or
o amalgamating components from different vendors – such as hotel
vendor, pre-packaged tour vendor, airline company, etc. (in a unique
booking or in multiple bookings);
o requesting bids or quotes or alerts from different vendors.
 Finalizing the buying process (confirmed or option booking(s)):
o finally buy from a unique vendor, or
o buy the amalgamated components (stored in a unique or multiple
bookings);
o add links to reference data (to keep track of weather, health or country
data, activities, testimony, etc.);
o pay (deposit or total).
Once a booking is finalized, this is not the end of the process. Certain consumers
would continue browsing the web to
 complement: Search for additional information, testimonies, activities to

perform, exchange with people having travelled in the same club or region;
 bargain: Find better opportunities and counter proposals or complements;
 manage: Simply update their booking(s) to take into account new information
(a change of plan, more people joining, new activities to cram into the agenda,
make a special request for a meal, print the e tickets or itineraries, consulting
with specialists, pay the due amounts, etc.).
Finally, during or after the trip comes the part that is now booming with the web 2.0
sites: The consumer could
 testify: He will add its own piece of information on the web, using forums,
testimony sites, polls;
 publish new generated content, such as media, text;
 enrich its profile(s) on the different sites in order to keep in touch with
opportunities in relation with their interests;
 follow his subsequent trips in case he actually prepared more than one trip or
he acquired components that would be valid on several trips;
 share common interests in order to organize group events;
 possibly file and follow up a complaint.
This is illustrated by the schema in figure 5-3.

Figure 5-3
5.1.3 Travel-related professional process

Not all end consumers perform the whole process online, but require assistance for
parts or all of the life cycle of a booking. In that case, part of the above mentioned
activities would apply in a B2B framework, with more or less the same features.
Additionally, travel professionals would also consume specific expert processes not
necessarily available for end consumers, such as
 air ticketing in case of negotiated fares;

 building complex itineraries including items that cannot be found or bought
online;
 finding availabilities or better prices where automated systems would not;
 bringing added value services or expertise that would correspond to the
differentiation of the distributor (specialized destination or activity, luxury trips,
etc.).
Other professional processes also revolve around the major task of publishing data
for professional and end consumer use, such as
 publishing fares,
 providing information on products and destinations,
 referencing other sources of information,
 selecting and ranking data (vendors, destinations, etc.),
 etc.
Those additional processes either rely on the same systems, platforms and
communications means as the ones available to end consumers, but with advanced
features, rely on specific systems not available to end consumers or end up being
manual.
5.2 The information and communication technologies

The present case study supposes that the consumer uses information and
communication technologies (over the web or a private network, a public web site, a
restricted B2B site, a dedicated rich application, etc.) to consult travel information.
The case study considers that the front system used by the end consumer takes
advantage of multilevel sources, possibly even of a multilevel dynamic network of
travel related services. Though sources may publish heterogeneous structured and
non-structured data, the front system would still provide homogeneous access to its
end user for all the data they publish. The distributor should have the responsibility
and choice of the final formatting and proposed processes.
This is of course only possible depending on the flexibility of the exchanges, on the
formats made available by the sources and intermediates, on the extensive use of
semantic web and other mechanisms allowing automated exchanges and recognition
of meaning and data. This will be detailed in the present document.
The user may also consult different sites in parallel, therefore initiating different
processes. This behaviour is considered outside the present case study.
5.2.1 Multiple levels of data sources
Figure 5-4
The owner of the communication and information technology would usually own one
or several data sources and directly make us of them. That would be the case for
instance for a hotel group for its hotel data (editorial text, prices, availabilities,
comments, etc.).
The front system may also connect to other external sources to aggregate additional
information. A hotel chain may not own the inventory of each hotel and could
interrogate the different hotel PMS or hotel groups CRS to validate the availability.
Each of those additional external sources could therefore either own the data or itself
aggregate content from other sources, therefore creating a chain of sources involved
in a single request from the consumer. That would typically be the case for an online
site like Opodo dynamically requesting airline availability and fares from a GDS
(Amadeus in our example), itself launching requests to different airlines in relation
with the expected city pair.
The added value of using layers of sources would reside in their capacity to
 concentrate coherent data from different sources (such is the case of GDSs
for airlines, comparators);
 enrich data from a source by either directly adding data or by concatenating
data from other external sources (like web sites proposing different types of
trips).
Online agencies such as Expedia or Opodo also have back office systems to enter
and maintain editorial data, price lists and stocks. That would be their own data
source. They typically do not own destination, weather, policy or health related data
but use external sources such as Lonely Planet or government web sites. Those
distributor in-house systems also usually connect to GDSs (Global Distribution
Systems such as Amadeus or Galileo) to request airline fares and availability. We
would be in the situation where an intermediate data source browses other external
data sources for information.
This need for a distributed architecture composed of distinct systems around the
world and owned by different companies with various strategies and technologies
lead to a number of constraints and requirements identified as cross-cutting aspects
in the previous introduction:
 Technical aspects come first to mind, with the need to ensure compatibility of
the different systems, increase the reliability of the individual elements,
measure the impact on architectures and scale accordingly. Performance of
the different systems and of the overall chain is key and leads to additional
complexity (such as caching, uniqueness of data, etc.).
 Business models must also be taken into account because making money is
central for the complete system to work smoothly. There must therefore be the
capacity
o to use other systems against retribution (fixed price, price per
transaction, percentage of a booking, etc.);
o to add mark-ups along the chain and still get a competitive price;
o to access net prices directly on intermediate levels in the chain;
o etc.
 Legal aspects is equally important, with the necessity to ensure that
o the information and products found and possibly purchased on the
different systems can legally be purchased or used;
o the distributor and the end user will have the capacity to track individual
providers so that they fulfil their obligation (provided there is the same
notion at the provider’s place), in case of any issue.
 Even multiculturalism is present when speaking about systems composing the
complete infrastructure:
o Provision of services (and support) on a 24 hours basis and not to stop
servers during the night is unusual in certain countries or for small
companies.
o Documentation to consume the service may not be written in a widely
used language such as English or with multiple translation.
The main topics involved to allow process and data interoperability also come into
play in case of multilevel data sources:
 Object identification is mandatory to avoid cumbersome and time-consuming

transcoding to allow data enrichment along the chain and ultimately
comparison and cleaning of results.
 Semantics provide the structure behind the data in order to have coherence
between the layers and to merge the information after some data
transformation has been performed.
 Data transformation is central to the implementation of multilevel data sources
because data seldom share the same formats even when based on the same
standards.
 Metasearch is the key to search for services dynamically and have loosely
bound systems.
 Without efficient process handling, multi level data sources efficiency will
remain minimal and would only correspond to juxtaposing data from different
services without true interactions.
With all these elements in place in the multilevel data sources scenario of our case
study, we could have complex processes in place like dynamic packaging for
instance with
 data interoperability, sharing and grouping objects with different identifiers and
semantic definitions, and
 process interoperability:
o compatibility of the different exchanges for each sub process;
o capacity to have evolution only on certain components of the system;
o etc.
5.2.2 Type of information

The type of information that front platforms would provide in our case study are:
 product-related data (editorial data, testimony, media, prices, availability,

technical data detailing the travel (flight numbers, airlines, type of rooms, etc.),
marketing qualification;
 destination-related data (geography, health, climate, history, activities,

religions, etc.);
 customer-related data (in case user is a known customer or consultant –
identity, preferences, past trips, relatives and family members, additional
qualification, etc.);
 etc.
According to the type of information, different types of issues and needs arise, that
we have again grouped based on our thematic circle with first the topic of our
thematic cycle and then the pre requisites:
 Object identification to pinpoint unique identical elements:

o Same object, but coming from different sources (e.g. the same hotel
from different web sites);
o Same object, but within aggregated content (e.g. a hotel within a
packaged tour same as a hotel sold alone).
 Semantics:
o Identify meaning of information (e.g. identify climate information in
country-related content);
o Provide explicit objective structured definitions and rules and not just
transcoding features (e.g. to explain for each provider what a double
room is – like D = exactly 2 adults whereas DBL may contain 2 adults
but would also accept a child in an extra bed).
 Data transformation:
o Extract media from text;
o Extract text from HTML;
o Extract data with certain meaning from a complete document;
o Map different ontologies to be able to share information (e.g. to
understand that a D Room for a provider is a DBL room for another and
a double for a third);
o etc.
 Process handling:
o Network capacity to ensure that a complete complex process would be
able to run on separate systems with true system independence;
o According to the presence or absence of certain data, launch certain
process, stop certain process, direct to alternative sources;
o Cost effectiveness;
o Reliability, stability and performance to ensure that the user will not
suffer from failure or inefficiency of one or several process chain
elements;
o Launch alerts according to certain contents or lack of certain contents;
o Ensure security over the complete process chain;
o Updates of data sources (through a manual request, remotely, with
automatic synchronization, etc.);
o etc.
 Metasearch:
o Perform queries without being hindered by specific query syntax;
o Browse the internet for certain types of information without having to

setup the searches manually.
 Multiculturalism
o Multilingualism of content (information in Italian provided by the Italian
government to be used in the USA);
o Access the right media for the audience (pictures with or without
people, certain colours of hairs, etc., would be more or less appropriate
in certain countries for instance);
o Dynamic translations.
 Technology:
o Share communication protocols or at least use interoperable
communication protocols
o Data accessibility (XML-based format to publish the data accessible via
web services versus csv file to be sent by mail);
o Large updates of data sources (large amounts of data for each source,
multiplication of sources, etc.).
 Legal aspects:
o Reliability of the data itself;
o Estimate the quality of the data;
o Determine the legal constraints associated with a piece of information.
o Conditions to use and distribute the data;
o Condition to store data not owned;
 Business models:
o Business model in relation with the use of the data;
5.2.3 Type of data sources

Typical data sources in the travel industry are
 GDSs (Galileo/Worldspan, Amadeus and Sabre – Allowing access to airline,

car rental, hotel, ferry, insurance, leisure, etc. systems – via XML, Edifact or
flat file transfers);
 Specialized online platforms providing structured data for a certain type of
activity (Car rental companies, Hotel concentrators, Tour operators,
destination management systems, etc., usually via XML based web services
or XHTML data);
 Web sites providing HTML based data (structured and non structured).
This is illustrated in figure 5-5.

Figure 5-5
As introduced in the previous chapters, each data source may in turn connect to
multiple data source of the same type or of other types.
For instance, a specialized pre-packaged provider could
 Connect to Galileo for scheduled air

o Galileo would directly connect to Delta airlines but
o Galileo would connect to Amadeus to get Air France flights
 Connect to Pegasus for hotels
o Pegasus would hold certain inventories but
o Pegasus could access Gulliver or Transhotels for others
 Connect to Trip advisor to get testimonies both on airlines and hotels
 etc.
6 Semantics
6.1 Standards
6.1.1 Needs and requirements
6.1.1.1 Introduction
The first word that may come to mind when talking about data and information
interoperability and exchange is “standards”. Standards have traditionally been
widely used in different industries. The general goal of standards and standardization
is to allow compatibility, interoperability, safety, repeatability, quality, etc. The process
of developing and agreeing upon a general standard is known as standardization.
Generally speaking a standard is an established norm or requirement that needs to

be followed in order to allow components (of different nature and origin) to fit and
work together. It is usually a formal document that establishes uniform engineering or
technical criteria, methods, processes and practices designed to be consistently used
as a rule, guideline, or definition.
The ISO/IEC Guide 2 defines a standard as “a document established by consensus

and approved by a recognized body that provides for common and repeated use,
rules, guidelines or characteristics for activities or their results, aimed at the
achievement of the optimum degree of order in a given context”. Standards help to
make life simpler and to increase the reliability and the effectiveness of many goods,
services and processes. They are intended to be a summary of good and best
practices rather than general practice. Standards are created by bringing together the
experience and expertise of all interested parties such as the producers, sellers,
buyers, users and regulators of a particular material, product, process or service.
Standards are designed for voluntary use and do not impose any regulations.
However, laws and regulations may refer to certain standards and make compliance
with them compulsory. For example, the physical characteristics and format of credit
cards is set out in International Standard ISO/IEC 7810:1996. Adhering to this
standard means that the cards can be used worldwide.
Within the computer science domain and Information and Communication Techno-
logies standards have also been widely used and are becoming increasingly more
important. There are a vast number of both software and hardware developers and
manufacturers worldwide that produce different items. These items do need to follow
particular standards in order to work together in a satisfactory manner. As the amount
of information contained on the Internet increases every second, a unified represent-
tation for web data and resources is needed in today’s large scale Internet data
management systems. This unification of standards will allow machines to meaning-
fully process the available information and to (successfully) exchange and integrate
data coming from distributed databases and information management systems. This
has been occurring, e.g. in the context of eLearning with the development of the
SCORM (http://www.adl.net/) and AICC (http://www.aicc.org/) standards, or in the
context of telemedicine applications with the development of standard data transport
protocols such as HL7 and ISO/IEEE 11073, among others.
This is also mandatory in the tourism sector as it is changing from a labour-intensive

industry into a knowledge and information-intensive industry. In the tourism domain
the usage of information systems to support market processes has not reached the
goal of a single electronic tourism market. The complex structure in traditional
tourism markets, characterized by a lot of different distribution channels and long
value chains, was transformed one-to-one to its electronic counterpart. The result
was a multitude of different electronic tourism markets. The most important obstacle
to a single tourism market is the missing commitment of all market participants on the
semantics of information to be exchanged as well as on the method for the
exchange.
There have already been some efforts invested in this direction (see 6.1.2) in order to
enable distributed data exchange and integration. Interoperability between databases
and information sources needs to be provided on both a technical and informational
(semantic) level. The social value of the Web is that it enables human communi-
cation, commerce, and opportunities to share knowledge, information and experien-
ces. One of W3C’s (World Wide Web Consortium, http://www.w3c.org/) primary goals
is to make these benefits available to all people, whatever their hardware, software,
network infrastructure, native language, culture, geographical location, or physical or
mental ability might be.
6.1.1.2 Needs
Benefits of use of standards
Standards have proved to be a powerful tool for organizations of all sizes, supporting
innovation, increasing productivity and efficiency in their business processes.
Effective standardization promotes competition and enhances profitability, enabling a
business to take a leading role in shaping the industry itself. Generally speaking,
standards allow a company to:
 attract and assure customers;

 demonstrate market leadership;
 create competitive advantage;
 develop and maintain best practice.
Standards within business
In modern business effective communication along the supply chain and with
legislative bodies, clients and customers is imperative. Applying standards within the
everyday operation of a company provides the means to measure various variables
and thus, to be able to manage the evolution of the variables, providing benefits
when applied within the infrastructure of a company itself. Business costs and risks
can be minimized, internal processes streamlined and communication improved.
Standardization promotes interoperability, providing a competitive edge necessary for
the effective worldwide trading of products and services.
6.1.1.3 Requirements
Within the tourism industry standards may help companies to be more competitive in
terms of being present on the web by complying with information and communication
standards and recommendations. In order to achieve exchange and integration of
information through different information systems, information formats and transfer
protocols must be compatible and ought to allow any hardware and software used to
access the information to work together.
Furthermore, information integration and exchange are required to provide trade and
commerce operations capacity on web sites, so that a local company be globally
present through the web and increases its business opportunities.
In order to do this there must be some kind of standard or sufficiently agreed

communication protocol between information systems that are interchanging
information. It may be standards about data repositories (how to structure data in a
repository or how that data ought to be named) or standards about intermediation
systems, such as HarmoNET ontology.
Regarding Web and information standards one of the most active bodies the W3C.
W3C designs and promotes interoperable open (non-proprietary) formats and
protocols to avoid the market fragmentation of the past. A W3C Recommendation is
the equivalent of a web standard, indicating that this W3C-developed specification is
stable, contributes to web interoperability, and has been reviewed by the W3C
membership, who favours its adoption by the industry.
To reach a general information standard in order to enable information integration

and exchange within the tourism sector is a relatively complex activity. Thus,
recommendations by official and recognized bodies (such as W3C) in this direction
are to follow to allow tourism companies’ information management systems to
effectively process all information and to interoperate.
6.1.2 State of the art

As mentioned before, a standard is “a document established by consensus and
approved by a recognized body that provides for common and repeated use, rules,
guidelines or characteristics for activities or their results, aimed at the achievement of
the optimum degree of order in a given context”.
ETSI is the European Telecommunications Standards Institute. It is an independent,

non-profit organization in the telecommunications industry in Europe with world-wide
projection. ETSI is officially responsible for standardization of Information and
Communication Technologies (ICT) within Europe.
ETSI standards could be described in general as being “definitions and specifications

for products and processes requiring repeated use”. They are certainly a set of rules
for ensuring that a process is always carried out the same way with a certain degree
of quality, or that a product is always manufactured following the same tasks and the
tasks the same order, also, complying with a certain degree of quality assumed to be
generally satisfactory.
A more complete definition of a “standard” from an ETSI perspective would be: “A

technical specification approved by a recognized standardization body for repeated
or continuous application, with which compliance is not compulsory and which is one
of the following:
 International Standard: a standard adopted by an international standardization

organization
 European Standard: a standard adopted by a European standardization body
 National Standard: a standard adopted by a national standardization body and
made available to the public.”
(Source: Directive 98/34/EC definitions.)
ETSI standards making priorities include:
 fully specified scoping;

 consistent use of specific terms;
 accurate referencing;
 contextualizing of abbreviations;
 accuracy and completeness of technical content;
 clear and unambiguous requirements;
 legibility and comprehension.
Two major objectives of ICT standardization are interconnection and interoperability.

ETSI’s uncompromising approach facilitates these by ensuring content is easily
interpretable, understandable and unambiguous. Only this level of attention to detail
can produce the truly high quality standards that Industry, Operators and Users now
demand to grow their increasingly global markets.
Standards can be found throughout daily life, but why would we need to use
standards? Rather than asking why we would need standards, we might usefully ask
ourselves what the world would be like without standards. Products would not work
as expected. They would be of inferior quality and incompatible with other products or
equipment, in fact they would not even connect with them, and in extreme cases;
non-standardized products could potentially be dangerous.
From a user’s standpoint, standards are extremely important in the computer industry
because they allow the combination of products from different manufacturers to
create a customized system. Without standards, only hardware and software from the
same company could be used together. In addition, standard user interfaces can
make it much easier to learn how to use new applications.
Most official computer standards are set by one of the following organizations:
 ANSI (American National Standards Institute);

 ITU (International Telecommunication Union);
 IEEE (Institute of Electrical and Electronics Engineers);
 ISO (International Organization for Standards);
 VESA (Video Electronics Standards Association).
6.1.2.1 Types of standards
The primary types of technical standards are:
 A standard specification is an explicit set of requirements for an item, material,

component, system or service. It is often used to formalize the technical
aspects of a procurement agreement or contract. For example, there may be a
specification for a turbine blade for a jet engine which defines the exact
material and performance requirements, shape, etc. This guarantees that
components produced by different manufactures may be used and assembled
in the same product and perform as expected.
 A standard test method describes a definitive procedure which produces a test
result comparable with a reference in order to validate a certain product. It
may involve making a careful personal observation or conducting a highly
technical measurement. For example, a physical property of a material is often
affected by the precise method of testing: any reference to the property should
therefore reference the test method used.
 A standard procedure (or standard practice) gives a set of instructions for
performing operations or functions, usually tasks to be carried out in a
particular order. For example, the quality assurance system at companies
ensures that all procedures within one company have been identified, defined
and are always carried out the same way.
 A standard guide is general information or options which do not require a
specific course of action.
 A standard definition is formally established terminology and sufficiently
agreed within the expert community.
6.1.2.2 List of travel industry standards, companies and organizations

(examples)
Following there is a classification and description of standards within or with direct

relevance for the tourism domain. Standards and initiatives have been assigned to
five different categories: Tourism initiatives and vocabularies; eBusiness
vocabularies; eBusiness frameworks; Business semantics; Modelling languages:
 Tourism initiatives and vocabularies:

o ACRISS: The Association of Car Rental Industry Systems Standards
has devised a car coding system, the ‘ACRISS Code’ This identifies the
features of a car so that you can be sure your client gets the same
standard of car wherever they rent in Europe from an ACRISS Member.
o ANSI ASC X121 TG08, American National Standards Institute: The
ANSI X12 standards have been the first branch-independent standards
for EDI, but their focus is only on the North American market. Today
ANSI X12 has specified more than 275 document types, so-called
transaction sets to be used in B2B. Similarly, to UN/EDIFACT the ANSI
X12 syntax is based on hierarchical structuring and implicit data
element identification. However, X12 has its unique set of notations and
rules on representations. X12 does not make use of composite data

elements. ANSI ASC 12 is divided into branch-specific subcommittees.
The subcommittee X12I is responsible for the area of transportation.
Each subcommittee consists of multiple task groups. X12I TG08 is the
task group for travel, tourism and leisure.
o CEN/TC 329, European Committee for Standardization / Technical
Committee Tourism Services: The Technical Committee 329 Tourism
Services of the European Committee for Standardization focuses on the
standardization of terminology and specification of facilities and
services including tourism related activities that can be used in
information and reservation systems. Accordingly, CEN/TC 329
develops a European glossary of definitions for tourist terms. The
project has brought together numerous national and international trade
associations and interest groups as well as tour operators, public
institutions and consumer groups. The glossary covers domain know-
how that should be captured by an ontology in the field of tourism.
o DATEX: DATEX is a European task force set up to standardize the
interface between inter-regional Traffic Control Centres. The
standardization work resulted in the Data Exchange Network (DATEX-
Net) Specifications for Interoperability, which is a set of basic tools to
provide a common interface including a common Data Dictionary, a
common set of EDIFACT messages and a common Geographical
messaging system.
o Enjoy Europe: In the enjoyeurope initiative, previously InTouriSME,
launched in 1996, 40 European regions agreed on a common metadata
definition (Minimum Data Set MDS) about key tourism information,
federating the local legacy systems and encapsulating the metadata
descriptions. The first tangible result of this federation is the
EnjoyEurope portal. In addition the achieved European tourism data
interoperability will provide the critical-mass information base to new
services.
o HEDNA stands for Hotel Electronic Distribution Network Association,
http://www.hedna.org/. HEDNA is a global association focused on
indentifying distribution opportunities and providing solutions for the
lodging industry and its distribution community. HEDNA’s activities are
intended to stimulate the booking of hotel rooms through the use of
Global Distribution Systems, the Internet and other electronic means.
HEDNA works on the following directions:
 Optimizing the use of current and emerging technologies;
 Providing an opportunity for an open exchange of information
among members;
 Educating industry partners.
 HEDNA has produced the Unique Global Identifiers for
the Hospitality Industry, UGI, which is a unique reference
number to identify and provide information about
operational units within the hospitality industry. A UGI is a
random number that is attached to attribute and
relationship information.
o HITIS, Hospitality Industry Technology Integration Standards: The goal

of HITIS is to identify general functions (of property management
systems) and standardize their implementation. In addition, a common
data dictionary for hospitality relevant data is to be developed. HITIS
provides an object standard and therefore specifies standardized
interfaces for objects providing the identified functions. The object
standard is additionally provided as XML specifications.
o IATA: The International Air Transport Association. The main objective of
this association is to assist airline companies to achieve lawful
competition and uniformity in prices. IATA assigns the following
standard identifiers:
 IATA Airport Codes, or also IATA location identifier, to designate
most of the airports around the world;
 IATA Railway station Codes: Following the idea of the Airport
Codes, IATA has also labelled different railway stations in the
world, especially if there is an agreement between a flying
company and a railway provider;
 IATA Airline designation code that identifies airlines operating
worldwide;
 IATA also assigns codes to delays.
o IATAN: IATAN’s mission is to promote professionalism, administer
meaningful and impartial business standards, and to provide cost-
effective products and services that benefit the travel industry. Through
the use of its informational and other resources, IATAN provides a vital
link between the supplier community and the US travel distribution
network.
o ICAO: The ICAO Council adopts standards and recommended
practices concerning air navigation, prevention of unlawful interference,
and facilitation of border-crossing procedures for international civil
aviation. In addition, the ICAO defines the protocols for air accident
investigation followed by transport safety authorities in countries
signatory to the Convention on International Civil Aviation, commonly
known as the Chicago Convention. The ICAO also standardizes certain
functions for use in the airline industry, such as the Aeronautical
Message Handling System AMHS; this probably makes it a standards
organization. The ICAO defines an International Standard Atmosphere
(also known as ICAO Standard Atmosphere), a model of the standard
variation of pressure, temperature, density, and viscosity with altitude in
the Earth’s atmosphere. This is useful in calibrating instruments and
designing aircraft. The ICAO standardizes machine-readable passports
world-wide. Such passports have an area where some of the
information otherwise written in textual form is written as strings of
alphanumeric characters, printed in a manner suitable for optical
character recognition. This enables border controllers and other law
enforcement agents to process such passports quickly, without having
to input the information manually into a computer. ICAO publishes Doc
9303, Machine Readable Travel Documents and the technical standard
for machine-readable passports. A more recent standard is for
biometric passports. These contain biometrics to authenticate the
identity of travellers. The passport’s critical information is stored on a

tiny RFID computer chip, much like information stored on smartcards.
Like some smartcards, the passport book design calls for an embedded
contactless chip that is able to hold digital signature data to ensure the
integrity of the passport and the biometric data.
o IFITT RMSIG, IFITT Reference Model Special Interest Group: The
objective of the IFITT Reference Model Special Interest Group (IFITT
RMSIG) is the harmonization of electronic tourism markets in an open
and flexible manner, based on a reference model. The main purpose of
the IFITT RMSIG is bringing together the different market participants
and domain experts to ensure a broad acceptance of the reference
model. The reference model, provided by the IFITT RMSIG, is a
framework for modelling electronic tourism markets. Instead of fix
standardization, the reference model enables the flexible description of
specific models, based on a common modelling language and
standardized building blocks as vocabulary. The purpose of the
reference model is to enable the description of specific models for
specific standards or data exchange formats in a form understandable
by other market participants and enable a mapping between different
standards. Suppliers of tourism services or brokers within the tourism
market can use the reference model to describe their specific standards
or data exchange formats which can be understood and used by other
market participants. In this way, not only new but also existing
standards or data exchange formats can be integrated into one open
electronic tourism market.
o JourneyWeb: The project JourneyWeb researched, designed and
developed an Internet-based protocol for dynamic exchanging of
electronic schedule data between distributed heterogeneous computing
systems, allowing any telephone enquiry centre or Internet-based
service universal access to any public transport information, regardless
of location. These services allow any traveller to obtain an unbiased
selection of integrated journey alternatives, which may contain trips
from multiple travel modes (train, bus, coach, air), and be remotely
sourced from different databases and software suppliers.
o KAREN, Keystone Architecture Required for European Networks: The
KAREN Framework Architecture gives support when intelligent
transport system (ITS) implementation is being planned and prepared,
and offers a basis for an European integrated approach to ITS. The
organization of the project was designed to facilitate the complete
process to be followed from the establishment of European
requirements, through the production of a comprehensive European
Transport Telematics Framework Architecture, to the creation of
consensus and endorsement of the results.
o omnis-online: omnis-online is an electronic marketplace platform for the
travel and tourism industry. The essential feature of omnis-online is the
contractual and procedural framework which makes it possible for
buyers and sellers of holiday and travel products, in different parts of
the world, to trade together. omnis-online provides a standards book,
which is a statement of procedures, rules and definitions to govern the
use of omnis-online, especially including product description standards

(based on XML).
o OTA, Open Travel Alliance: The Open Travel Alliance aims to promote
the free flow of travel services through multiple distribution channels.
Therefore, the objective is to provide a vocabulary and grammar for
communicating travel-related information as tags across all travel
industry segments. These tags will be implemented using the
eXtensible Markup Language (XML).
o SIGRT, Sistema de Informaçao de Gestao de Recursos Turísticos:
SIGRT is an information system that serves as a global reference in the
promotion of national tourism products in Portugal. The enormous
source of information spanning over multiple touristic sectors is made
available to the general public, to tourism operators and other public or
private institutions in the sector. The system is based on a tourism
resource database that enables the storage management and
availability of data elements regarding the national tourism services. In
the scope of SIGRT are defined standardized data structures for
describing tourism products. The information is structured according to
190 different, identified types each having its separate data structures.
o TIH, Travel Information Highway: The Travel Information Highway (TIH)
is an open communications approach to facilitate the exchange of real
time information between network operators themselves and with driver
information service providers. It was developed with the objectives of
co-ordinating network operating strategies and providing high quality
information services to the travelling public.
o TIN, Tourism Information Norm for the German tourism: The TIN aims
to provide rules for a uniform presentation and search structure within
the information and reservation systems of the German tourism.
Therefore, it defines and structures the characteristics for describing
tourism services and specifies the access to the tourism services.
o TourinFrance: TourinFrance aims to develop a nation-wide common
data format for describing and exchanging tourism information. This
data format respects the independence and autonomy of all the
regional and local tourist information systems, but should allow some
day to aggregate the local contents at a national level. This national
project will therefore enable the exchange of data between the existing
players involved in the French tourism: the French government
services, the regions, the counties, the tourist offices, and the major
tourism federations. The format, which is NOT based on XML, allows
transmitting information about the following touristic entities: hotels,
campsites, self catering, restaurants, events, natural sites, cultural
sites, leisure activities, tourist routes, and holiday villages, and holiday
resorts. Currently, TourinFrance does not support any commercial
transactions.
o Transmodel: TRANSMODEL is a reference data model for Public
Transport operations. TRANSMODEL is a description of the data of
interest to a company in designing an Integrated Information System.
TRANSMODEL is a conceptual model and does not mandate any
particular implementation at the logical or physical level.

TRANSMODEL increases the efficiency of transport operations by
underpinning them with more secure and reliable Information Systems.
TRANSMODEL is also expected to open the market by allowing
integration of complementary software products from different suppliers.
o TransXchange: TransXChange aims to define a national data standard
for the interchange of bus route registration, route and timetable
information between operators, the Traffic Area Network, Local
Authorities and Public Transport Executives, and the National
Passenger Transport Information System. TransXChange is a standard
for data records, thus its scope is defined by the possible extent of its
contents which are in turn determined by the concepts to be supported.
o TRIDENT, TRansport Intermodality Data sharing and Exchange
NeTworks: The goal of the project is to support multimodal travel ITS
services by establishing common and reusable mechanisms that
enable sharing and exchanging data between transport operators
(content owners) of different modes (bus/tram/metro, rail and road) as
well as information service providers. It will also investigate and
propose solutions for the organizational and strategic issues hampering
travel intermodality. This will lead to proposals for new standards as
well as to recommendations supporting the implementation of systems
based on the project’s results.
o TTI, Travel Technology Initiative: The Travel Technology Initiative was
created to establish technology standards within the travel industry. TTI
maintains and publishes the Unicorn EDI messages, of which there are
now over 130 in use throughout the travel industry. The TTI is
cooperating with the OTA on establishing XML standards.
o UIC 912, Union Internationale de Chemins de Fer 912 protocol: UCI
912 is a proprietary protocol developed by UIC to support the
information exchange of their members. The application areas covered
are international freight, passenger and baggage traffic, and
documentary research. Each message format contains a header
followed by a series of 32 bit fields. EDIFER (UIC’s competence centre
for EDI standardization) is maintaining the UIC 912. However, it was a
strategic decision to adopt the UN/EDIFACT standards as the overall
UIC standard. Recently, a new working group on XML has been
established within UIC. UIC was participating in ebXML.
o UN/LOCODE: This is a United Nations Code for Trade and Transport
Locations. It is a geographic coding scheme developed by United
Nations Economic Commission for Europe (UNECE). The UN/LOCODE
assigns codes to locations used in trade and transport with functions
such as seaports, rail and road terminals, airports, post offices and
border crossing points. UN/LOCODEs have five characters. The first
two are letters, and come from the ISO 3166-1 alpha-2 country codes.
Normally three letters will follow, but if there are not enough
combinations, numbers from 2 to 9 can also be used. For airports, the
three letters following the country code are not always identical to the
IATA airport code.
o UN/EDIFACTTT&L, United Nations rules for Electronic Data

Interchange for Administration, Commerce and Transport, Travel,
Tourism and Leisure: UN/EDIFACT aims at facilitating the electronic
exchange of business data between communication partners and is
comprised of a set of internationally agreed standards directories and
guidelines for the electronic exchange of structured data.
o USTOA: The United States Tour Operator Association is a professional
association representing the tour operator industry. It is composed of
companies whose tours and packages encompass the entire globe and
who conduct business in the USA.
o WATA, World Association for Travel Agencies: WATA is since 50 years
the leading association of the travel trade. With members in most
countries and on all continents it stands for quality and reliability. An
own guarantee-fund covers transactions. The MASTER KEY and the
GLOBAL TRAVEL PLANNER help tour operators and travel agents in
their daily business and the yearly General Assembly fosters friendship.
 eBusiness vocabularies:
o cXML, commerce eXtensible Markup Language: cXML allows buyers,
suppliers, aggregators, and intermediaries to communicate using a
single, standard, open language. Successful business-to-business
electronic commerce (B2B e-commerce) portals depend upon a flexible,
widely adopted protocol. cXML is a language designed specifically for
B2B e-commerce and provides access to products and services. cXML
transactions consist of documents, which are simple text files with well-
defined format and content. Most types of cXML documents are
analogous to hardcopy documents traditionally used in business.
o OAG, Open Applications Group: The Open Applications Group is a non-
profit industry consortium focussing on promoting the easy and cost-
effective integration of key business application software components
for enterprise and supply chain functions for end-user organizations.
The Open Applications Group Integration Specifications (OAGIS)
accelerate component integration and electronic commerce by
providing capabilities for Supply Chain Integration using the Extensible
Markup Language (XML). The Open Applications Group has also
published a proposal for a common middleware (OAMAS) that, when
adopted with OAGIS, will move the industry much closer to the vision of
plug and play compatibility for business applications.
o RosettaNet: RosettaNet is an independent, non-profit organization
dedicated to promoting an industry-wide initiative to agree on and adopt
common electronic business processes world-wide. RosettaNet focuses
on building a master dictionary to define properties for products,
partners, and business transactions. This master dictionary, coupled
with an established implementation framework (exchange protocols), is
used to support the eBusiness dialog known as the Partner Interface
Process (PIP). RosettaNet PIPs create new areas of alignment within
the overall IT supply-chain eBusiness processes, allowing IT supply-
chain partners to scale eBusiness, and to fully leverage electronic
commerce applications and the Internet as a business-to-business

commerce tool.
o SIMPL-edi: SIMPL-edi’s purpose is to provide more focused EDI
messages based on simple, standard international data elements and
well structured master files. It builds on the best practice work already
done with an aim to provide a sound basis for the widest, most cost
effective use of electronic commerce and associated computer
applications. The philosophy of Simpl-EDI is a common shared
understanding of what is required to pass between companies involved
in the supply, transport and purchase of all types of goods and services.
Simplified messages rely upon the removal of redundant or stable
information from the messages themselves to master data files where it
can be separately accessed or processed.
o xCBL, XML Common Business Library: The XML Common Business
Library (xCBL) is a set of XML building blocks and a document
framework that allows the creation of robust, reusable, XML documents
to facilitate global trading. It essentially serves as the “mother code”,
providing one language that all e-marketplace participants can
understand. This interoperability allows businesses everywhere to
easily exchange documents across multiple e-marketplaces, giving
global access to buyers, suppliers, and providers of business services.
 eBusiness frameworks:
o BizTalk: BizTalk is an industry initiative defining the BizTalk Framework,
an Extensible Markup Language (XML) framework for application
integration and electronic commerce. It includes a design framework for
implementing an XML schema and a set of XML tags used in messages
sent between applications. The BizTalk Framework will be used to
produce and publish XML schemas in a consistent manner.
o ebXML, electronic business XML: The mission of ebXML is to provide
an open XML-based infrastructure enabling the global use of electronic
business information in an interoperable, secure and consistent manner
by all parties. ebXML, sponsored by UN/CEFACT and OASIS, is a
modular suite of specifications that enables enterprises of any size and
in any geographical location to conduct business over the Internet.
Using ebXML, companies now have a standard method to exchange
business messages, conduct trading relationships, communicate data
in common terms and define and register business processes.
o eCoFramework: The goal of the eCo Framework project is to develop a
common framework for interoperability among XML-based application
standards and key electronic commerce environments. The project’s
working group will develop a specification for content names and
definitions in electronic commerce documents, and an interoperable
transaction framework specification.
o Ontology.org: Ontology.Org is an independent industry and research
forum focussed upon the application of ontologies in Internet
commerce. It is the central goal of Ontology.Org to use ontologies to
address the problems that impact the formation and sustainability of

large electronic trading groups.
o OntoWeb: OntoWeb - Ontology-based information exchange for
knowledge management and electronic commerce, a collaborative
network of European researchers and industrials, which aims to
strengthening the European influence on Semantic Web
standardization efforts such as those based on RDF and XML.
o OO-edi, Object-oriented edi: OO-edi is an attempt to put the Open-edi
Reference Model into practice. It is based on business process and
information modelling methodologies. The resultant model specifies the
business flow needs and identifies related object classes to the extent
that production of off-the-shelf software to support EDI exchanges
becomes feasible. Business process and information modelling in OO-
edi is based on the Unified Modelling Language (UML). The
methodology used for business process and information modelling is a
customization of the Rational Unified Process and is called
UN/CEFACT’s Modelling Methodology (UMM or “N90”). The
development of this methodology was influenced by the methodology
used at SWIFT. Later on, the methodology merged with the modelling
methodology & metamodel used in RosettaNet. UMM is referenced by
ebXML as the preferred methodology to model ebXML compliant
business processes.
o Open-edi, ISO/IEC 14662:1997 Information Technology — Open-edi
reference model: Open-edi specifies a Reference Model which should
enable business partners to do business electronically without any prior
agreements.
o SOAP, Simple Object Access Protocol: SOAP provides the definition of
an XML document which can be used for exchanging structured and
typed information between peers in a decentralised, distributed
environment. It is fundamentally a stateless, one-way message
exchange paradigm, but applications can create more complex
interaction patterns (e.g., request/response, request/multiple
responses, etc.) by combining such one-way exchanges with features
provided by an underlying transport protocol or application-specific
information. SOAP is silent on the semantics of any application-specific
data it conveys, as it is on issues such as the routing of SOAP
messages, reliable data transfer, firewall traversal, etc.
o UDDI, Universal Description, Discovery and Integration: UDDI enables
companies to publish how they want to conduct business on the web,
potentially fuelling growth of business-to-business (B2B) electronic
commerce. UDDI will benefit businesses of all sizes by creating a
global, platform-independent, open architecture for describing
businesses and services, discovering those businesses and services,
and integrating businesses using the Internet. Therefore, the core
function of UDDI is performed by a UDDI Business Registry.
o XML/EDI, eXtensible Markup Language / Electronic Data Interchange:
XML/EDI is an extension of EDI and aims to enhance the EDI
mechanisms by the flexibility and extensibility of XML. The basic
approach of the XML/EDI framework is expressing EDI mechanisms

using XML syntax. To reach full dynamic electronic commerce the
XML/EDI framework provides three additional components: process
templates containing processing information, software agents
interpreting the process templates and repositories providing syntactic
and semantic information needed for the execution of EDI transactions.
 Business semantics:
o BSR, Basic Semantic Register: The BSR is an official ISO data register
for use by designers, implementers and users of information systems in
a manner which will allow systems development to move from a closed
to an open multilingual environment, especially for use in domestic and
international electronic communication including electronic commerce
and EDI. The purpose of the BSR is to provide an internationally agreed
register of multilingual data concepts, semantic units (SU), with its
technical infrastructure. This will provide storage, maintenance and
distribution facilities for reference data about semantic units and their
links (bridges) with operational directories. The semantic units will be
built from semantic components, which can be considered as building
blocks.
o ISO/IEC 11179, Specification and Standardization of Data Elements:
ISO/IEC 11179 is a multi-part International Standard concerning data
element specification and standardization. The complete set includes
six interrelated parts, with each part focusing on one aspect of data
element development and maintenance.
o UNSPSC, United Nations Standard Product and Services
Classification: The UNSPSC Code is a coding system to classify both
products and services. It has been established in 1999 by the merger of
the UN Common Procurement Code (CPC) list with the Dun and
Bradstreet Standard Product and Services Code list. UNSPSC is a five
level hierarchical taxonomy. A product or service is identified by a two
character numerical (and a textual description) for each level. For
example the code “90” on the highest level contains “Travel and Food
and Lodging and Entertainment Services”. UNSPCS codes can be used
in a UDDI registry for the identification of products and codes.
o UNTDED, Trade Data Elements Directory, ISO 7372: The standard
data elements included in this Directory are intended to facilitate
interchange of data in international trade. These standard data
elements can be used with any method for data interchange on paper
documents as well as with other means of data communication: they
can be selected for transmission one by one, or used within a particular
system of interchange rules.
 Modelling languages:
o DCMI, Dublin Core Metadata Initiative: The Dublin Core Metadata
Initiative (DCMI) is an organization dedicated to promoting the
widespread adoption of interoperable meta data standards and
developing specialized metadata vocabularies for describing resources
that enable more intelligent information discovery systems. The DCMI is
committed to the continual refinement of a “core” foundation of property
values and types to provide vertically specific (or semantic) information

about Web resources, much in the same way a library card catalogues
provide indexed information about book properties. The Dublin Core
Metadata Element Set (DCMES) was the first metadata standard
developed out of the DCMI as an IETF standard. DCMES provides a
semantic vocabulary for describing “core” information properties, such
as “Description” and “Creator” and “Date”.
o GINF, Generic Interoperability Framework: The Generic Interoperability
Framework (GINF) has been developed to facilitate integration of
heterogeneous components. One of the main principles it employs is
the generic representation of protocols, languages, data and interface
descriptions. The current implementation of the framework is based on
RDF. The implementation of GINF provides semantic-oriented
middleware for application development and integration. GINF
middleware allows creating open and highly extensible client/server
applications.
o OIL, Ontology Inference Layer: OIL is a proposal for a web-based
representation and inference layer for ontologies, which combines the
widely used modelling primitives from frame-based languages with the
formal semantics and reasoning services provided by description logics.
It is compatible with RDF Schema (RDFS), and includes a precise
semantics for describing term meanings (and thus also for describing
implied information).
o Object Management Group: The OMG was formed to create a
component-based software marketplace by hastening the introduction
of standardized object software. The organization’s charter includes the
establishment of industry guidelines and detailed object management
specifications to provide a common framework for application
development. Primary goals are the reusability, portability, and
interoperability of object-based software in distributed, heterogeneous
environments. Within specific task forces are developed specifications
for special area markets or domains (domain interfaces) e.g. the
Transportation or the Retail group.
o RDF, Resource Description Framework: RDF provides the foundation
for metadata interoperability across different resource description
communities. RDF allows descriptions of Web resources - any object
with a Uniform Resource Identifier (URI) as its address - to be made
available in machine understandable form. This enables the semantics
of objects to be expressible and exploitable. RDF is based on a
concrete formal model utilizing directed graphs that elude to the
semantics of resource description. The basic concept is that a
Resource is described through a collection of Properties called an RDF
Description. Each of these Properties has a Property Type and Value.
Any resource can be described with RDF as long as the resource is
identifiable with a URI.
o UML, Unifying Modelling Language: The Unified Modelling Language
(UML) is a language for specifying, visualizing, constructing, and
documenting the artefacts of software systems, as well as for business
modelling and other non-software systems. Unified Modelling Language

fuses the concepts of object-oriented analysis and design approaches
(Booch, OMT and OOSE). The result is a single, common, and widely
usable modelling language.
6.1.3 Gaps and future needs

Standards are generally speaking becoming an increasingly central issue in the so-
called information and knowledge-based society. Information and communication
technologies are tremendously impacting the travel and tourism industry,
transforming the whole sector, both from the industry itself and from the consumer
side. Within this context, standards in the travel and tourism industry ought to provide
integration and exchange of heterogeneous sources of distributed tourism
information so that processes (reservation, purchasing, checking, etc.) can
seamlessly be carried out, no matter where from, which communication technology is
used, language of use, etc.
There is a need to define and provide (semantic) definitions and clarifications in order
to transform disparate localized information into a global, coherent resource within
the Internet (most common communication platform and environment in this case).
The following important functionalities should be obtained:
 Define a common language for domain experts and IT developers to formulate

requirements and to agree upon system functionalities with respect to the
correct handling of tourism information.
 Support the (semi-)automatic data transformation algorithms from local to
global structures without (essential) loss of meaning.
 Support associative queries against integrated resources by providing a global
model of the basic classes and their associations to formulate such queries.
 Define data structures that can be applied, used and processed by the
majority of organizations in the travel and tourism industry.
 Define the level of formality data ought to be defined with by the different
agents of the travel and tourism sector so that these data can efficiently be
automatically compared and processed by machines with the lowest human
intervention possible.
6.1.4 Recommendations
6.1.4.1 Short-term recommendations (1–3 years)
 Leverage existing standards rather than developing new specifications

whenever possible.
 Build cooperation between private associations like IATA, OTA, and XFT and
formal standardization bodies such as ISO, and CEN.
 Build a “watchtower” registry of relevant eTourism standards that is also acting
as a coordination body between various formal and informal standardization
activities. Such an activity can be modelled on the MoU/MG.
6.1.4.2 Long-term recommendations (3–10 years)
 Lower the entry barrier for participation in pertinent formal and informal
standardization bodies especially for SMEs and extend the scope of those
activities to cover their respective requirements.
 Work on interoperability approaches between different standards.
6.2 Taxonomies
Traditionally all sciences classify their objects. Astronomy classifies celestial bodies
such as planets, stars and galaxies. Botany classifies plants, chemistry the
chemicals, medicine classifies illnesses, psychology classifies mental processes,
library and information science classify documents and systems and methods of
knowledge organization, religious studies classify religions, and the list could go on
forever.
Such classifications are not performed just in order to create an aesthetic effect.
Classifications are constructed in order to work efficiently, and also to provide the
means to efficiently find and retrieve meaningful and required information.
Classification is not something extra put on the top of scientific work; rather it is
something deeply integrated within scientific work itself, as it provides deeper
understanding on the subject matter of study.
For example, if a new group of chemical substances are found to help cure a certain
disease and this fact is widely demonstrated, it shall be classified as a kind of drug
(e.g. as antidepressives, tranquillizers or anti-inflammatory drugs) that helps humans
recover from that particular disease.
There is a close connection between the development of scientific concepts and

classifications. E.g. when an astronomer recognizes the different between the nature
of a particular star or planet, s/he is reflecting this fact in both his/her conception of
the item and its later classification within the table of celestial bodies. Classification is
carried out under various criteria and it aims at distributing entities within different
groups that have one (or several) similar (common) features.
6.2.1.2 Needs
Due to the tremendous impact that information and communication technologies

have had upon the travel and tourism industry, the whole sector has to be rethought.
Despite traditional (tourism) research still being valid, research on other realms
(especially IT-related) is also needed, e.g. new information management systems
(storage, management, access, retrieval), new communication technologies,
channels and platforms, devices that allow people on the move to be able to access
and receive information-based services as well as new consumption patterns and
behaviours. All of this can only be possible if information is classified, stored and
organized in agreed ways by all agents (public institutions and bodies, industry,
research communities, final user, etc.). Relevant tourism information in general, its
organization within information management systems and its explicit specification
through schemas or information representation methods and models need to be
defined.
Information and content are key. To access the right piece of information at the right
moment information needs to be clearly stored and classified. Almost anything
(including tourism information, i.e. travel, accommodation, restoration, events, useful
information, etc.) has to be classified following a structure, e.g. taxonomic schemas.
As the amount of (all kinds of) available information increases on the web the
particular piece of information we seek may be buried into the one that we do not
seek. Thus, the activity of classifying information becomes increasingly more
important as it makes it easier to find a particular content on the web. This in terms of
service provided by a company can be translated into business opportunities.
Information availability in an easy way is significantly more important to those
planning some kind of leisure activity, as their behaviour pattern indicates that they
will not spend too long on Web sites looking for information. Thus, information has to
be object oriented, not experience oriented. However, in order to build a successful
tourism taxonomy, both approaches are required.
Taxonomies or conceptual hierarchies are crucial for any knowledge-based system,

i.e. any system making use of declarative knowledge about the domain it deals with.
Thus, in order for a knowledge system to succeed it has to be easy enough to use for
anyone without specialized training or background on its content. Information has to
be classified in a meaningful manner by taxonomists and tourism domain experts in a
general way and it requires strict control over the creation of new entities and
branches. Information management principles and practices, taxonomies, and other
controlled vocabularies serve as knowledge management tools that can be used to
help organize content and make connections between people and the information
they need.

Taxonomies are one way of classifying things into groups. There is a significant
difference between describing the objects being classified and describing the
subjects used to classify them. Taxonomies (and other classification techniques) are
different approaches to describe subjects, i.e. it is a subject-based classification that
arranges terms in a controlled vocabulary into a hierarchy without doing anything else
any further. In practise, taxonomies may be found applied to more complex
structures.
The benefit of this (taxonomy) approach is that it allows related terms to be grouped
together and categorized in ways that make it easier to find the correct term to use
for whatever purpose. Within the tourism domain, if there is a taxonomical
classification for the notion of “Event”, different “Event”s could be classified under the
general one, e.g., sport events, cultural events, etc., and would allow a tourist to
easily find the kind of “Event” s/he wants to undertake.
Etymologically speaking, the word “taxonomy” comes from the Greek taxis
(“arrangement, order”) and nomos (“law”).
The units in taxonomies are termed taxon (plural: taxa). Initially taxonomy was only
the science of classifying living organisms and species, but later the word was
applied in a wider sense, and may also refer to either a classification of things, or the
principles underlying that classification. Classification of species, however, began
well before the eighteenth century. Aristotle distinguished species by habitat and
means of reproduction, but Andrea Cesalpino produced the first significant taxonomy
of plants in 1583, arranging the species in a hierarchical, graded order. His work was
developed by Marcello Malpighi, who expanded his hierarchical system to include
animals. The word taxonomy is sometimes used synonymously with classification
and sometimes given a special meaning.
There have also been some attempts to differentiate taxonomies from simple
classifications. These attempts may also serve as a review of the different definitions
authors have given to the notion of taxonomy. “A taxonomy obtains when several
fundamenta divisionis are considered in succession, rather than simultaneously, by
an intensional cl. [classification]. The order in which fundamenta are considered is
highly relevant: the taxonomy obtained by using property X to classify a genus and
then property Y to classify its species is by no means the same as that obtained by
considering property Y first and property X afterwards” [Marradi, 1990].
Campbell & Currier (31/10/00) [Campbell, Currier] asks: What is a taxonomy? And
they provide the following answer:
 A taxonomy is an ordered classification system.

 Information is grouped according to presumed natural relationships.
 Ordered resources are grouped like with like.
 The structure of a taxonomy should be consistent with user groups
conceptualization of their subject.
6.2.2.1 Examples of tourism taxonomies
There is a vast number of taxonomic classifications within the tourism domain in the
literature. Almost every project applying information management methods use a
taxonomy in order to organize the existing information of their universe (the project to
be developed) of discourse. Taxonomies are later used to design database
structures, ontologies and other tools in order the information to be easily accessible
and retrievable for the final user.
In commercial web sites and online travel agencies, their services are often
organized under taxonomies, e.g. restaurants and kinds of restaurants.
Accommodation facilities are organized under different categories: hostel, 5-star
hotel, 4-star hotel, etc., or even in ranges of price, depending upon the search
criteria.
Other examples found in the literature are:
 Cultural Tourism Taxonomies and Folksonomies: The objective of the

taxonomy is to develop a comprehensive map of elements of cultural heritage
that attract different people to town and cities by:
o identifying and categorizing a range of cultural attractors;
o identifying interests and motivators for different types of tourists;
o identifying relations of attractions between attractors and interests;
 Cultural Tourism Taxonomies and Folksonomies: there is a project in the
COST Action C21 that defines and builds taxonomy of attraction tourism site.
23 types of attractions are defined thanks to the Prentice’s typology. Building
classification is a complex process. Some problems arise like in the Dewey
classification where most part of the topics concern mostly European culture
than worldwide culture. In the Urban field domain, classifications are more
object-oriented classifications than experience oriented classifications like the
Dewey classification. Experienced classification is based on the usage.
Folksonomy is an ethnoclassification. The goal is to define categories rather
than build a correct classification. For example Flikr web site contains the most
popular tags used for photography.

As it can be seen from the previous text after having thoroughly reviewed the most
significant literature, one single way of (correctly) classifying things does not exist.
Furthermore, the same instances could be classified in different ways (may be
depending on their application scenario) with different objects, and different objects
could be instantiated using the same meaning. Consequently there is a need to find
an agreement in the community involving all agents possible: public administration
and regulatory bodies, industry, final user, research community, etc.
In a taxonomy the means for subject description consists of essentially one

relationship: the broader/narrower relationship used to build the hierarchy. The set of
terms being described is of course open, but the language used to describe them is
closed, since it consists only of a single relationship. Before actually developing the
taxonomy, one needs to define the scope of the classification, purpose, and types of
content formats. It is crucial to bear in mind at all times the target audience and
communities who will use it. An evaluation of the needs can be carried out, or
interviews to identify and focus on the content final users care about and on the
organization of the content.
Taxonomies usually require strict control over the creation of new entities and
branches and this restriction needs to be overcome, especially given the way
information is consumed on the web. Systems need to be as dynamic as possible,
i.e., flexible.
Traditional information systems have classified information according to a particular

hierarchy. Now, some information systems allow introducing links and they make
information available in an easier way. In the future, information systems will not
introduce classification methodologies; rather, they will categorize their content and
will make it available via tags, links, etc.
 Follow existing taxonomies including established definitions wherever

possible.
 Produce mappings between eTourism-related taxonomies.
 Federate existing eTourism-related taxonomies across languages based on
the mappings and offer a SKOS interface to them.
 Formulate guidelines for the design of eTourism-related taxonomies.
 Build organizational structures for the long-term duration of eTourism-related

taxonomies.
6.3 Ontologies
The word “Ontology” (note the upper-case ‘O’) comes originally from philosophy.
From a philosophical point of view, Ontology is the branch of philosophy which deals
with the nature and the organization of reality [Guarino, Giaretta, 1995]. We have to
go as far back as to Aristotle to see the first reference to this word when he tries to
define a “science” that is “on top of” the rest of the sciences, when he describes in his
Metaphysics Book IV a science that studies the being as being (i.e. Ontology):
“There is a science that studies the being as being and its properties as such (being)
which belong to it in virtue of its nature. Now, this science is not the same as any of
the so called special sciences, since none of these other treat (universally) the being
as being itself but reducing the being to one part of it, they (“only”) investigate the
essential properties of this part. Since we are seeking the first principles and the
highest causes, there must (clearly) be something to which these belong in virtue of
its own nature. If then, those who sought the elements of existing things were
seeking these same principles, it is necessary that the elements must be elements of
being not only by accident but just because it is being. Therefore, it is of being as
being that we also must grasp the first causes” [Aristotle, Metaphysics Book IV].
At the computer science domain, ontologies (note now the lower-case ‘o’) aim at
capturing domain knowledge in a generic way and providing a commonly agreed
understanding of a domain which may be reused and shared across applications and
groups. Ontologies provide a common vocabulary of an area and define with different
levels of formality the meaning of terms and the relations between them. Since the
beginning of the 1990s, ontologies have become a popular research topic
investigated by several Artificial Intelligence research communities, including

knowledge engineering, natural language processing and knowledge representation.
More recently, the notion of an ontology is also becoming widespread in fields such
as intelligent information integration, information retrieval on the Internet, and
knowledge management. The reason for ontologies being so popular is in large part
due to what they promise: a shared and common understanding of some domain that
can be communicated across people and computers.
6.3.1.2 Needs
In recent years, the development of ontologies has been moving from the realm of
Artificial Intelligence (AI) laboratories to the desktops of domain experts. Ontologies
have become common on the World Wide Web. Ontologies on the web range from
large taxonomies categorizing web sites to categorizations of products for sale and
their features. Many disciplines now develop standardized ontologies that domain
experts can use to share and annotate information in their fields. Why would
someone want to develop an ontology? Here are some of the (possible) reasons:
Clarification of knowledge structures
Ontological analysis clarifies the structure of knowledge. The first reason is that they
form the heart of any system of knowledge representation. If there are not
conceptualizations that underlie knowledge, then there is not a vocabulary for
representing knowledge. Thus, the first step in knowledge representation is
performing an effective ontological analysis of some field of knowledge. Weak
analyses lead to incoherent knowledge bases.
Consider a domain in which there are people, some of whom are students, some
professors, some are other type of employees, some are females and some males.
For quite some time, a simple ontology was used in which the classes of students,
employees, professors, males and females were represented as “types of” humans.
Soon this caused problems because it was noted that students could also be
employees at times and can also stop being students. Further ontological analysis
showed that “students”, “employees”, etc. are not “types of” humans, but rather they
are “roles” that humans can play, unlike categories such as “females”, which are in
fact a “types of” humans. Clarifying the ontology of this data domain made it possible
to avoid various difficulties in reasoning about the data.
Knowledge sharing
Ontologies enable knowledge sharing. The second reason why ontologies are
important is that they provide a means of sharing knowledge. Suppose we do an
analysis and arrive at a satisfactory set of conceptualizations and terms standing for
them for some are of knowledge, say, the domain of “electronic devices”. The
resulting ontology would be likely to include terms such as “transistors” and “diodes”,
and more general terms such as “functions”, “processes”, and also terms in the
electrical domain, such as “voltage”, that could be necessary to represent the
behaviour of these devices. It is important to note that the ontology – defined by the
basic concepts involved and their relations – is intrinsic to the domain, apart from a
choice of vocabulary to represent it. This ontology can be shared with others who
have similar needs for knowledge representation in that domain, avoiding the need
for replicating the knowledge analysis.

Already in the middle of the 1980s the building of a big knowledge base on common
sense began. This knowledge base can be considered as an (probably the first)
ontology. However, it is not until the beginning of the 1990s that ontologies were
more known.
It is at that time when DARPA (Defence Advanced Research Projects Agency)

started its Knowledge Sharing Effort envisioning as a new way in which intelligent
systems could be built [Neches, Fikes, Finin, et al, 1991]. Building knowledge-based
systems today usually entails constructing new knowledge bases from scratch. It
could be done by assembling reusable components. System developers would then
only need to worry about creating the specialized knowledge. This new system would
interoperate with existing systems using them to perform some of its reasoning. In
this way declarative knowledge, problem solving techniques, and reasoning services
would be all shared among applications. This approach would facilitate building
bigger and better systems at lower cost.
Since then, a considerable progress has been made in developing conceptual bases
needed for building technology that allows knowledge component reuse and sharing.
6.3.2.1 Definitions of the notion of ontology within the computer science

domain
One of the first definitions of the word “ontology” within the computer science domain
is due to Neches et al [1991]. They defined an ontology as follows: “An ontology
defines the basic terms and relations compromising the vocabulary of a topic area as
well as the rules for combining terms and relations to define extensions to the
vocabulary”.
It can be affirmed that this definition gives some clues about how to proceed to build
an ontology, including some vague definitions:
 identify basic terms and relations between them,

 identify rules to combine them, and
 provide definitions of such terms and relations.
Later, in 1993, Gruber’s definition becomes the most referenced on the literature. The
following is his definition of an ontology: “An ontology is an explicit specification of a
conceptualization”. Conceptualization refers to an abstract model of phenomena in
the world by having identified the relevant concepts of those phenomena. Explicit
means that the type of concepts used and the constraints on their use are clearly
defined. Formal refers to the fact that the ontology should be machine readable and
process able. Shared reflects the notion that an ontology captures consensual
knowledge, that is, it is not private to some individual, but accepted by a
representative group of users that belong to a particular domain of knowledge.
Finally we include here Uschold’s and Grüninger’s [1993] definition of an ontology:

“Ontology is the term used to refer to the shared understanding of some domain of
interest which may be used as a unifying framework to solve problems e.g. semantic
interoperability, structuring and representing relevant concepts in a large knowledge
base, etc.” As a conclusion, it can be said that there are as many definitions of this
word as authors, although these last two are the most used ones in the reviewed
literature.
6.3.2.2 Main components of an ontology
Ontologies provide a common vocabulary of an area and define – with different levels
of formality – the meaning of the terms and the relations between them. Knowledge
in ontologies is mainly formalised using five kinds of components: classes, relations,
functions, axioms and instances [Gruber, 1993].
 Classes (also concepts) in the ontology are usually organized in taxonomies.

Classes or concepts are used in a broad sense. A concept can be anything
about which something is said and, therefore, could also be the description of
a task, function, action, strategy, reasoning process, etc.
 Relations represent a type of interaction between concepts of the domain.
They are formally defined as any subset of a product of n sets.
 Functions are a special case of relations in which the n element of the
relationship is unique for the n-1 preceding elements.
 Axioms are used to model sentences that are always true. They can be
included in an ontology for several purposes, such as defining the meaning of
ontology components, defining complex constrains on the values of attributes,
the arguments of relations, etc., verifying the correctness of the information
specified in the ontology or deducing new information.
 Instances are used to represent specific elements.
6.3.2.3 Ontology development tools
The tools that can be used for building ontologies usually provide a graphical user
interface for building ontologies, which allows the ontologists to create ontologies
without using directly a specific ontology specification language. Some tools such as
Protégé, Chimaera, and FCA-Merge have been created for merging and integrating
ontologies.
In the context of the Semantic Web, some tools have arisen during last years for the
annotation of web resources in SHOE, RDF or DAML+OIL and OWL. Their main
objective is the creation and maintenance of ontology-based markups in static web
documents. In fact, they are used for managing easily instances, attributes and
relationships between web resources. Some of these annotation tools are
OntoAnnotate, OntoMAt, and SHOE Knowledge Annotator.
There are also some ontology-based text mining tools, which allow extracting
ontologies either from structured, semi-structured or free text. These tools are used
to learn ontologies from natural language, exploiting the interacting constraints on the
various language levels (from morphology to pragmatics and background knowledge)

in order to discover new concepts and stipulate relationships between concepts.
There are some important parameters that can be used in the comparison and
evaluation of existing tools. Some of these parameters are:
 Software architecture and tool evolution. This includes information about

hardware and software platforms necessary to use the tool, its architecture,
extensibility and ontology storage. In this sense tools are moving towards
Java-based applications most of them accessible in the web.
 Interoperability. There is not standardization on the tools that are used when
performing any of these tasks, and these environments are not usually
interoperable.
 Methodology support. It is not usual that a tool gives support to a methodology
for building ontologies.
6.3.2.4 Ontology development languages
A great range of languages have been used for the specification of ontologies during
the last decade: Ontolingua, LOOM, OCML, Flogic, CARIN. Many of these languages
had already been used for representing knowledge inside knowledge-based
applications, others were adapted from existing knowledge representation
languages, and there is also a group of languages that were specifically created for
the representation of ontologies. These languages, which we will call “traditional”
languages, are in a stable phase of development, and their syntax consists of plain
text where ontologies are specified.
Recently many other languages have been developed in the context of the World
Wide Web: RDF, RDF Schema, SHOE, XOL, OML, OIL, DML+OIL, and OWL. Their
syntax is based on XML, which has been widely adopted as a ‘standard’ language for
exchanging information on the web, except for SHOE, whose syntax is based on
HTML.
Among all these languages, RDF and RDF Schema cannot be considered to be
ontology specification languages per se, but rather general languages for the
description of metadata in the web. Most of these “markup” languages are still in a
development phase.
There are many other languages that have been also considered in this survey. For
instance, some languages have been created for the specification of specific
ontologies, such as CycL and GRAIL. There are also some other languages that
have not been created specifically for the representation of ontologies, including
additional features that are not usual in ontologies, such as NKRL.
The selection of an ontology specification language for the development of an

ontology will not only depend on the characteristics of the language, but also on the
tools that support it, the applications in which the ontology will be used, and the
availability of reusable ontologies in the same domain in a specific language.
The most commonly used ontology development languages are the following:
 RDF: RDF (Resource Description Framework) is one of the essential tools

within Semantic Web. It is defined as a data model for objects (“resources”)
and their relations. It offers a simple semantic and uses XML-based syntax.
The scientific community has chosen the RDF as a standard in order to mark
metadata and it is widely supported by the W3C and various others
organizations;
 OWL: The Ontology Web Language has been designed in order to extend
RDF’s descriptive features. OWL is part of a growing recommendation practise
by the W3C in Semantic Web related issues. OWL has three different
languages, each of them related with a higher degree of expressivity. They
have been conceptualized according to the level of expressivity and formality
needed in the application:
o OWL Lite: it is widely used in cases where a hierarchical classification
is needed and where there are very simple restrictions. E.g., it allows
establishing cardinality restrictions, but it only allows to establish 0 or 1
values. OWL Lite has less complexity than OWL DL;
o OWL DL (Description Logic): The OWL DL language has been
designed for cases in which maximum expressivity is required. OWL DL
includes all functionality and power of OWL Full but with some
restrictions e.g., one class can be a sub-class of many other classes,
however, a class cannot be an instance of another class;
o OWL Full is the third OWL sub-language. In OWL Full a class can
simultaneously be considered as a set of individual classes and as an
individual class on its own. OWL Full can be considered as an
extension of RDF(S).
6.3.2.5 Examples of standard ontologies
There have been some research communities that have already tried to define
standard ontologies that cover a particular area of knowledge in a generic way and
that could thus be used in a standard way.
The CIDOC Conceptual Reference Model (CIDOC CRM)
The CIDOC CRM is a core ontology explaining the extended meaning of data
structures from humanities and cultural heritage, including history of science, is the
outcome of a long-term disciplined knowledge engineering activity which excels in its
ontological commitment, i.e. acceptance of its constructs by domain experts.
The primary role of the CRM is to enable information exchange and integration
between heterogeneous sources of cultural heritage information (Doe, 03). It aims at
providing the semantic definitions and clarifications needed to transform disparate,
localized information sources into a coherent global resource within a larger
institution, in intranets or within the Internet. More concretely, it defines and it is
restricted to the underlying semantics of database schema and document structures
used in cultural heritage and museum documentation in terms of a formal ontology.
The success of the CRM relies on the fact that the explanation of common meaning
can be done by a very small set of primitive concepts and relations in contrast to data
structure that suggest to the user what to say about an object. The relations in data
structures that connect items directly by highly specific, diverse kind of relationship
can frequently be expressed by data paths composed of a few fundamental
relationships defined within the core ontology.
The CIDOC CRM has become the most promising core element for realizing
semantic interoperability in archives, libraries and museums by its capability to link
intellectual structure of highly diverse sources and products of scientific and scholar
discourse with the elements formally handled by information systems.
The CIDOC CRM is the culmination of over 10 years work by the CIDOC
Documentation Standards Working Group and CIDOC CRM SIG (Special Interest
Group) which are working groups of CIDOC. Since 2006 it is official standard ISO
21127.
FRBRoo
The FRBRoo is a formal ontology intended to capture and represent the underlying
semantics of bibliographic information and to facilitate the integration, mediation, and
interchange of bibliographic and museum information. The FRBR model was
originally designed as an entity-relationship model by a study group appointed by the
International Federation of Library Associations and Institutions (IFLA).
The CIDOC CRM model was being developed from 1996 under the auspices of the
ICOM-CIDOC (International Council for Museums – International Committee on
Documentation) Documentation Standards Working Group. The idea that both the
library and museum communities might benefit from harmonizing the two models was
first expressed in 2000 and grew up in the following years. Eventually it led to the
formation, in 2003, of the International Working Group on FRBR/CIDOC CRM
Harmonisation that brings together representatives from both communities with the
common goals of:
 expressing the IFLA FRBR model with the concepts, tools, mechanisms, and
notation conventions provided by the CIDOC CRM, and
 aligning (possibly even merging) the two object-oriented models with the aim
to contribute to the solution of the problem of semantic interoperability
between the documentation structures used for library and museum
information, such that:
o all equivalent information can be retrieved under the same notions, and
o all directly and indirectly related information can be retrieved regardless
of its distribution over individual data sources;
o knowledge encoded for a specific application can be repurposed for
other studies;
o recall and precision in systems employed by both communities is
improved;
o both communities can learn from each other’s concepts for their mutual
progress;
o for the benefit of the scientific and scholarly communities and the
general public.
In 2006 a first draft of FRBRoo was completed. It is a logically rigid model interpreting
conceptualizations expressed in FRBRer and of concepts necessary to explain the
intended meaning of all FRBRer attributes and relationships. The model is formulated
as an extension of the CIDOC CRM. Any conflicts occurring in the harmonization
process with the CIDOC CRM have been or will be resolved on the CIDOC CRM side
as well. The Harmonization Group intends to continue work modelling the FRAR
concepts and elaborating the application of FRBR concepts to performing arts.
HarmoNET
The Harmonisation Network for the Exchange of Travel and Tourism Information,
HarmoNET, is an international network bringing together people and organizations
with an interest in the topic of harmonization and seamless information exchange in
travel and tourism. HarmoNET provides unique technologies and services enabling
an easy, affordable and fast information exchange.
The travel and tourism industry is an information-based business in which information

exchange is essential in order to maintain a dynamic market. HarmoNET aims to
create for its members an international network for harmonization and seamless data
exchange in the travel and tourism industry. HarmoNET does not implement a new
standard, rather it provides the means for an effective data mediation process.
HarmoNET offers the following services:
 Ontology Management: HarmoNET provides and maintains a tourism specific

ontology as a common definition of concepts and terms, their meaning and
relations between them. This ontology serves as a common agreement for the
HarmoNET mediation service as well as a reference model for building
specific data models or tourism information systems.
 Mediation Service: The HarmoNET mediation service provides a technical
solution to the interoperability problem. Heterogeneous data are mapped from
the local format on the one side to the format on the other side.
 Community Services: In order to build a strong community and foster the
communication and information exchange within the community HarmoNET
offers online community services like mailing lists, discussion fora, newsletter
or bulletin boards as well as traditional community services like conferences,
workshops and seminars, which will allow the community to meet together and
to further work on the definition of the HarmoNET ontology.
SUO
Recognizing both the need for large ontologies and the need for an open process
leading to a free, public standard, a diverse group of people has come together to
make such a standard a reality. The Standard Upper Ontology (SUO) will be an
upper level ontology that provides definitions for general-purpose terms and acts as a
foundation for more specific domain ontologies.
It is estimated to contain between 1000 and 2500 terms plus roughly ten definitional
statements for each term.
 The standard will be suitable to support knowledge-based reasoning

applications.
 This standard will enable the development of a large (20 000 +) general-
purpose standard ontology of common concepts, which will provide the basis
for middle level domain ontologies and lower-level application ontologies;
 The ontology will be suitable for “compilation” to more restricted forms such as
XML or database schemata. This will enable database developers to define
new data elements in terms of a common ontology, and thereby gain some
degree of interoperability with other compliant systems.
 Owners of existing systems will be able to map existing data elements just
once to a common ontology, and thereby gain a degree of interoperability with
other representations that are compliant with the SUO.
 Domain-specific ontologies that are compliant with the SUO will be able to
interoperate (to some degree) by virtue of the shared common terms and
definitions.
 Applications of the ontology will include:
o e-commerce applications from different domains which need to
interoperate at both the data and semantic levels;
o educational applications in which students learn concepts and
relationships directly from, or expressed in terms of, a common
ontology. This will also enable a standard record of learning to be kept;
o natural language understanding tasks in which a knowledge-based
reasoning system uses the ontology to disambiguate natural language
terms and structures.
SOUPA
Standard Ontology for Ubiquitous and Pervasive Applications (SOUPA) is designed

to model and support pervasive computing applications. This ontology is expressed
using the Web Ontology Language OWL and includes modular component
vocabularies to represent intelligent agents with associated beliefs, desires, and
intentions, time, space, events, user profiles, actions, and policies for security and
privacy. SOUPA can be extended and used to support the applications of CoBrA, a
broker-centric agent architecture for building smart meeting rooms, and MoGATU, a
peer-to-peer data management for pervasive environments.

Ontologies are still not flexible enough and extensible enough. The tourism sector
could be partially covered by some concepts, however, the extension of the initial
ontology would require a relatively large (manual) effort in order to cover new
concepts. Due to the heterogeneity of the travel and tourism industry, it is a challenge
for a single ontology to cover the whole market offer, thus the ontology management
process would potentially be too complicated.
In order for tourism companies to adopt either a standard or an ontology mediation

process, companies should have the feeling to be able to make their offer
differentiable within the market. Standards and ontologies sometimes tend to bury
distinct features of products and services.
 Use recognized standard reference models such as the Harmonise ontology

(for tourism purpose) or CIDOC CRM (for cultural heritage data) wherever
possible.
 Produce guidelines for the mappings between eTourism-related ontologies
based on standard reference models.
 Use established standards such as RDF(S), OWL or the Topic Map Constraint
Language to express ontologies.
 Heighten the awareness of Open Source, user-friendly tools for ontology
definition such as Protegé.
 Build ontologies to represent other standards, e.g. IATA, etc.

 Build tools to automatically map ontologies.
 Work on automatic ontology (re)structuring and population.
7 Data transformation
7.1 Structured data mapping
The so-called information society demands complete access to available information,

which is most of the times distributed and heterogeneous. First a suitable information
source must be located that potentially contains data of interest. Then access to the
data contained in the information source has to be provided, i.e. the information
source and the querying system need to understand each other in order to effectively
retrieve the particular piece of information of interest.
In order to establish comprehensive information sharing and to achieve efficient

interoperability of information systems various kinds of solutions have been made
available. Within these range of possible approaches, ontologies have shown to play
an important role in resolving semantic heterogeneity among information sources by
providing a shared understanding of a given domain of interest, e.g. the travel and
tourism industry.
Information sources may contain information on different levels of organization: Data

may be structured in databases, semi-structured in XML documents or completely
non-structured as web pages or other type of documents available. Regardless what
the origin of data is it has to be mapped to an ontology if the objective is to achieve
interoperability of the local system with some other. In this chapter the mapping
between an information source (e.g. a database, an XML file, etc.) and an ontology is
reviewed.
The first step in a mapping process is to relate ontologies to actual contents of an

information source. Ontologies may relate to the database scheme but also to single
terms used in the database or data structure. Regardless of this distinction we can
observe different general approaches used to establish a connection between
ontologies and information sources:
 Structure Resemblance: A straightforward approach to connecting the

ontology with the structured data source is to simply produce a one-to-one
copy of the structure and encode it in a language that makes automated
reasoning possible. The integration is then performed on the copy of the
model and can easily be tracked back to the original data.
 Definition of Terms: In order to make the semantics of terms in a database
schema clear it is not sufficient to produce a copy of the schema. There are
approaches that use the ontology to further define terms from the database or
the database scheme. These definitions do not correspond to the structure of
the database; these are only linked to the information by the term that is
defined. The definition itself can consist of a set of rules defining the term.
However, in most cases terms are described by concept definitions.
 Structure Enrichment is the most common approach to relating ontologies to

information sources. It combines the two previously mentioned approaches. A
logical model is built that resembles the structure of the information source
and contains additional definitions of concepts.
 Meta-Annotation: A rather new approach is the use of meta-annotations that
add semantic information to an information source. This approach is becoming
prominent with the need to integrate information present in the World Wide
Web where annotation is a natural way of adding semantics. We can further
distinguish between annotations resembling parts of the real information and
approaches avoiding redundancy.
7.1.1.2 Needs
Using information systems in the travel and tourism industry implies using information
coming from different data sources. All in all, it is a system working in cooperation
with other systems and for this to happen, information coming from various data
sources may be needed to provide a particular service to a client.
Mapping is a very critical operation in various application domains such as semantic

web, schema or ontology integration, data integration, data warehouses,
eCommerce, etc. As it has been mentioned in previous chapters, eCommerce
activities are crucial in the eTourism domain. Focussing on mappings three different
kinds can be distinguished:
 Schema mapping: Mappings are established between schemas of databases.

This method takes two database schemas as an input and produces a
mapping between elements of the two schemas that correspond to each other.
 Ontology mapping: Ontology mapping is somewhat similar to schema
mapping. In this case, the purpose of the mapping is to create a relation of the
vocabulary of two ontologies that share the same domain of discourse.
 Database-to-Ontology mapping: This is the process through which a
structured data source and an ontology are semantically related at a
conceptual level, i.e. relationships are set up between the ontology and data
source components.
The approach to be taken requires the creation of a mapping description using some
kind of formal language that maintains the level of formality and expressivity of both
the ontology and the database. The document containing the description of them has
to show the correspondences between the components of the database’s SQL
schema and those of the ontology. Afterwards, the ontology needs to be populated
through the mappings that have been made explicit in the document. The process
ought to be as automatic as possible in order to not need a high human effort.
In order to do this, languages to define mappings are needed. These languages have
to have the following features:
 They have to be fully declarative in order to efficiently define and describe

mappings between relational database schemas and ontologies. It is has to be
expressive enough to define the semantics of the mappings.
 The language ought to define how to create instances in the ontology in terms
of the data stored in the database.
 The language needs to have a declarative nature in terms of discovering
inconsistencies and ambiguities in the definition of a mapping. This potential
problems have to automatically be discovered by the mapping language.
 The mapping definition language could potentially be used to automatically
characterize data sources to allow dynamic query distribution in intelligent
information integration approaches.
 The mapping definition language doesn’t have to declare the degree of
similarity between database elements and ontology components. Rather, it
has to state under which conditions and after what transformations the
database elements are equivalent to the ontology components.
Semantic conflicts occur whenever two contexts do not use the same interpretation of
the information. Goh identifies three main causes for semantic heterogeneity that
need to be overcome in order to achieve semantic interoperability [Goh, 1997]:
 Confounding conflicts occur when information items seem to have the same
meaning, but differ in reality, e.g. owing to different temporal contexts.
 Scaling conflicts occur when different reference systems are used to measure
a value. Examples are different currencies.
 Naming conflicts occur when naming schemes of information differ
significantly. A frequent phenomenon is the presence of homonyms and
synonyms.
The use of ontologies for the explication of implicit and hidden knowledge is a
possible approach to overcome the problem of semantic heterogeneity. With respect
to the impact on the data exchange, structuring conflicts can be differentiated:
 fully mappable: all clashes can be resolved without any loss of information;
 partially or non-mappable: covering the structural conflicts for which any
conceivable transformation will cause a loss of information.
Here are some examples of clashes between different standards identified [Dell’Erba,
Fodor, Höpken, et al, 2005].
 Different naming: Equivalent concepts have different names in different

standards. This is a fully mappable semantic clash.
 Different position: Equivalent concepts have different positions within the
structure of the standards. This is also a fully mappable semantic clash.
 Different scope of concepts: Concepts, containing the same piece of
information in different standards, have different scopes, i.e., the same piece
of information might be represented as single concept or as a part of several
concepts. This is also a fully mappable semantic clash.
 Different abstraction levels: The same information is represented on different
levels of abstraction. This is a partially mappable semantic clash.
 Different granularity: The same information is represented on different levels of

granularity. This is a partially mappable semantic clash.
 Missing concept: If a concept in one standard has no counterpart in the other
standard, it cannot be mapped.
Most of current approaches to solve the interoperability problem are mainly based on
the idea of fixed, obligatory standards, which define all details of the exchanged
messages. An example of an international XML-based standard is the specification of
OTA [OTA]. Companies, which are using such standards, are automatically able to
exchange information with each other. However, all details of the exchanged
message must be committed among all communication participants. The process of
defining and maintaining such standards requires a lot of effort and therefore such
standards are almost exclusively used by large companies such as hotel chains,
airline companies and Global Distribution Systems (GDS).

This section presents the state of the art on structured-to-ontology mapping from a
database perspective. The same concepts hold for any other structured data source,
such as XML data structures.
There are different mapping situations arising from database-to-ontology mapping. A

database-to-ontology mapping can be defined as a set of correspondences that
relate the vocabulary of a relational database schema with that of an ontology. That
is, we want to relate a database’s tables, columns, primary and foreign keys, etc.,
with an ontology’s concepts, relations, attributes, etc.
There are several approaches in the literature to address the database to ontology
mapping. In general, they can be classified into two main categories: approaches to
create a new ontology from a database and approaches to map a database to an
already existing ontology.
 Creating an ontology from a database: This approach refers to the creation of

an ontology model from a relational database model and migrates the contents
of the database to the generated ontology. The mappings here are simply the
correspondences between each created ontological component (class,
property, etc.) and its original database component (table, column). Mappings
in this case are usually not extremely complex and the process could be
automated in a high degree. However, this kind of direct mapping may fail to
express the full semantics of the database domain. The creation of an
ontology structure may require the discovery of hidden semantics implicitly
expressed between database components (e.g., referential constraints) and
take them into account in the ontology building process.
 Mapping a database to an already existing ontology refers to the creation of
links between them or to populate the ontology with database content.
Mappings in this case are far more complex as different levels of overlaps
between the database domain and the ontology’s one can be found. Those
domains do not necessarily have to coincide, as the criteria used to design
databases and the criteria used to design ontologies are different.
Both mapping processes include two processes:
 mapping definition (i.e. the definition from the database structure (schema) to
the ontology structure, and
 data migration, the migration of database content to instances of the ontology.
Volz et al [Volz, Handschuch, Staab, Studer, 2004] [Volz, Stojanovic, Stojanovic,

2002] propose an approach based on semiautomatic generation of an F-Logic
ontology from a relational database model. Mappings are defined between the
database and the generated ontology. The ontology generation process takes into
account different types of relationships between database tables and maps them to
suitable relations in the ontology. The mapping process is not completely automatic
and a user intervention is needed when several rules could be applied to choose the
most suitable.
Each table is transformed to a class and each attribute is transformed to a property.

In addition, if the relational database table has foreign key references to other tables,
these can be transformed to instance pointers, i.e. a new slot is added to the class
representing the reference table whose value is an instance of the class representing
the referenced table. The user manually selects the tables that he wants to map to
the ontology, then the mapping process is run in a completely automatic manner.
Relational.OWL [de Laborda, Conrad, 2005] is an OWL ontology representing

abstract schema components of relational databases. Based on this ontology, the
schema of (virtually) any relational database can be described and in turn be used to
represent the data stored in that specific database. This approach uses the meta-
modelling capabilities of OWL-Full, which prevents the use of decidable inference on
the resulting ontology.
The definition of mappings is automatic or semi-automatic in the approaches that

create a new ontology, whereas there is no approach allowing the completely
automatic definition of mappings to an already existing ontology. On the other hand,
the process of ontology population is always automatic. The approaches that create
a new ontology utilize the massive dump process for ontology population, except the
approach DB2OWL which allows the query driven process.

Although a lot of effort has already been invested in ontology research (concept,
methods, building, theory, etc.) and (commercial) application building, general
mapping processes are still at their infancy. There is a clear notion of what a mapping
is, however, the real semantics and expressiveness of the links themselves have not
yet been clearly defined.
Most of the mappings have been defined ad-hoc, i.e. for particular cases and are
neither reusable nor extensible to other cases. Besides, should changes occur within
databases, the whole mapping and even ontology would have to be redefined in
order to cover new concepts and relations.
The literature review has shown a number of languages that have been used to map
databases to ontologies. However, there is no evidence of any language that links
(maps) ontology components to database elements.
There is still a lot of human intervention needed for creating mappings. Although
graphical interfaces have been created (like in the case of R2O) still the mapping
work is in general hand intensive. This depends upon the level of formality and
different expressivity information is represented with and stored in databases. One
possible way to automate in a certain degree the mapping creating process could be
to recommend the building of the ontology using existing standard languages. This
way ontologies could be compared, as they would have the same degree of
expressivity and formality.
 Use (graphical) mediation tools that enabled with reasoning capabilities to

automatically suggest same (semantically equivalent) data sources, identify
inconsistencies and decreases the amount of human intervention in the
mapping process.
 Pursue the design and implementation of new data resources on the bases of
agreed recommendations, such as the W3C recommendations for Semantic
Web technologies.
 Use semantic web technologies (e.g. based on RDF URIs) to name and
represent (data) resources on the Web so that mapping can be automatically
undertaken.
 Agree the degree of formality information ought to be defined with, so that
automatic mapping tools compare same kind of information.
 Foster high level general ontologies to describe particular domains of interest
so that low-level more concrete ontologies can later be linked or merged within
the (more general) structure (if and only if both ontologies are defined with the
same level of formality and with the same ontology definition language).
7.2 Manual semantic annotation

Semantic Annotation is about attaching meaningful (information) structures to
information resources such as documents, general multimedia content or information
on the Web in such a way that they can be used by computers in a meaningful way to
enhance the usefulness of those resources. Semantic Annotation formally identifies
concepts and relations between concepts in documents, and is intended primarily for
use by machines.
Information about documents and information sources has traditionally been

managed through the use of metadata. Metadata is just information concerning a
particular source of information: author, date, origin, content, type of file, etc. Within
the context of Semantic Web (as defined by Tim Berners-Lee) annotating document
content is proposed by using semantic information from domain ontologies [Berners-
Lee, 2001]. The result of (manually) annotating a Web information resource is Web
pages with machine interpretable mark-up that provide the source material with which
agents and Semantic Web services and advanced search engine operate. The goal
is to create annotations with well-defined semantics.
The amount of tourism information on the Web is huge and the diversity of its nature
is also vast. Furthermore, recent studies have shown that decisions of tourists about
their potential destinations are increasingly influenced by multimedia and web-based
content and comments generated by other tourists. Besides, tourists have begun to
share their experiences on the web in the so-called Web 2.0 phenomenon and a
tremendous amount of web pages have been created by tourists and final users.
Event destination management organizations are beginning to include user
generated content into their own web sites as a way to promote their destination.
All of this information (usually non-structured) has to be made available to the

general public, i.e. metadata about that information has to be created in order to
make that information reachable on the Internet.

For the sake of data interoperability and exchange a well defined semantics is a must
to ensure that annotator and annotation consumer actually share meaning. A key
contribution of the Semantic Web is therefore to provide a set of worldwide standards
and recommendations on manual annotation. These recommendations allow to
operate with heterogeneous resources by providing an intermediation of common
syntax, methods, semantics and understanding.
Travel and tourism is a leading industry in the application of B2C and B2B2C
eCommerce and mCommerce solutions as well as Web based information channel,
and a huge number of tourism information systems have been developed in order to
support all the processes related to the electronic market. If the objective is to
automate the eBusiness processes over the Web with no human intervention and
allowing machines to automatically interoperate among them, there is a must to
annotate information sources so that a mediation ontology can integrate information
coming from heterogeneous systems.
Therefore, in order for the tourism industry to succeed, new ways of data and content
annotation have to be developed so that the particular piece of information is used by
a particular machine for a particular business process allowing a vertical data
integration approach to the tourism market.
Semantic annotation is required as it brings enhanced information retrieval and

improved interoperability among systems. Information retrieval, mostly related to
search on unstructured data sources, is improved by the ability to perform searches,
which exploit the ontology to make inferences about data from heterogeneous
resources [Welty, 1999].
According to the Semantic Annotation for knowledge Management [Uren, 2005]:

requirements and a survey of the state of the art, there are six requirements for
semantic annotation:
 Standard formats: standards can provide a bridging mechanism that allows

heterogeneous resources to be accessed simultaneously and collaborating
users and organizations to share annotations. Two standards can be
mentioned: the OWL for describing ontologies and RDF for annotation
schema;
 User centred: easy to use interfaces that ease the task of annotating
documents;
 Ontology Support: Annotation tools need to support multiple ontologies;
 Support of heterogeneous document formats;
 Document evolution: Keep documents and annotations consistent;
 Annotation storage: There are different storage criteria. Ones argue that
annotations ought to be stored separately from the documents and others
argue that annotations are an integral part of the document and therefore they
should be stored together.

There are a number of tools that produce semantic annotations, i.e. annotations that
refer to a particular ontology. These tools meet some of the requirements above,
however, they need further development.
Manual annotation tools allow users to manually create annotations, i.e. metadata
about a particular information source. These tools are in general terms relatively
similar to those used for pure textual annotations, but differ in the sense that they
provide some support for ontologies.
Following, there is a list with some of the most relevant annotation tools found in the
literature:
 Amaya [Quint, 1994] is a Web browser and editor that marks-up Web
documents in XML or HTML. The user can make annotations in the same tool
s/he uses for browsing purposes. It facilitates manual Web pages annotation
but does not support any automatic annotations;
 The Annozilla browser aims to make all Amaya annotations readable in the
Mozilla browser;
 The Mongrove system is another example of manual but user friendly
annotation tool [McDowell, 2003]. The annotation tool is a straight forward GUI
that allows users to associate a selection of tags to text that they highlight;
 Due to the increase of multimedia content on the Web, tools to annotate this
kind of content have become very useful. Vannotea [Schroeter, 2003] can be
used to add metadata to MPEG-2 (video), JPEG(2000) image and Direct 3D
(mesh) files, with the mesh being used to define regions of images;
 OntoMat Annotizer: this is a tool for making annotations which is built on the
principles of the CREAM framework. It has a Web browser to display the page
which is being annotated and provides some reasonably user friendly
functions for manual annotation, such as drag and drop creation of instances
and the ability to mark-up pages while they are being created;
 The M-OntoMat-Annotizer [Bloehdorn, 2005] supports manual annotation of
image and video data by indexers with little multimedia experience by
automatic extraction of low level features that describe objects in the content.
A commercial version of OntoMat, called OntoAnnotate,5 is available from
Ontoprise;
 SHOE Knowledge Annotator [Heflin, 2001] was an early system which allowed
users to mark-up HTML pages in SHOE guided by ontologies available locally
or via a URL. Users were assisted by being prompted for inputs. Unusually,
the SHOE Knowledge Annotator did not have a browser to display Web
pages, which could only be viewed as source code.

Although annotation tools are most of them based in easy to understand and to use
GUI, it is still relatively expensive to annotate information sources. There is a need for
integrated systems that allow users to deal with the documents, ontologies and the
annotations that link documents to ontologies.
The most important challenge in manual annotation tools is automation – automation

to support annotation, to support ontology maintenance and automation to help
maintain the consistency of documents, ontologies and annotations (Uren, 2005).
Other important challenges for the future in this active research area are: automating
the annotation of information of various formats, addressing issues of trust and
security and resolving problems of storage.
Due to the nature of this topic, there can be some overlapping of recommendations
with other issues that have already been covered, such as ontologies.
 Enhance the use of standard ontologies (e.g. harmoNISE) on the field of

tourism.
 Enhance the development of ontologies with standard languages: OWL, RDF.
 Enhance the use of already existing manual annotation tools in the realm of
tourism.
 Investigate in automation of annotations.

 Investigate in automatic ontology extension.
7.3 Automatic information extraction

Much of the data relevant for eTourism is available on normal public web sites. Just
as tourism itself is a wide-ranging concept, data pertinent for it can stem from many
sources, touristic and otherwise. Local communities often provide ample information
about points of interest in their area. Theatres and orchestras regularly publish their
programmes, museums inform about opening hours and ticket prices, and hotels
frequently provide information about their services that complement the basic facts
that are stored in major booking systems.
These scenarios are only a few examples of many. Actually, the amount of
information that is stored in this way probably vastly surpasses that in structured
sources. As Martin Hepp, Katharina Siorpaes, et al have analyzed, structured and
unstructured data complement each other in many cases, e.g. for hotels where web
sites frequently contain more complete descriptions of the hotel, while the GDSs only
publish the room availability.
Normally, however, the data on the web is unstructured and geared towards human
consumption only. Only rarely do metadata or formal resource descriptions reliably
complement and explicate this unstructured information to facilitate its use in
automated transactions or automated integration with structured resources. It seems
unlikely that this situation is going to improve fundamentally over the next years.
The unstructured nature of the data invariably limits its reuse in electronic
transactions. Based on this type of information it will be difficult at best to, e.g.,
automatically complement a hotel booking with the reservation of museum and
theatre tickets.
7.3.1.1 Needs
Nonetheless, as long as there is no prospect of fundamentally reversing the present

situation, companies need to leverage the currently existing data as well as possible.
In an ideal world, information extraction would structure free text in such a way that it
can be automatically analyzed, queried and integrated with structured data sources.
This is certainly illusionary for the foreseeable future. Nevertheless, it is necessary to
explore the potential of the various facets of information extraction for the eTourism
domain.

Information extraction is “the automatic identification of selected types of entities,
relations, or events in free text” [Grishman, 2003]. Unstructured information is thus
automatically structured and usually imported into databases, XML files or other
structured storage formats for subsequent analysis and evaluation.
Information extraction is by no means restricted to web sites. In fact, information

extraction was originally popularized in the 1980s based on locally stored free text
corpora. However, many of today’s application incorporate the harvesting of
information on the web. This is certainly also the more applicable scenario for
eTourism.
Currently the two branches of information extraction that have drawn most attention
in the research community are named entity recognition – the explication of
references to persons, organizations, places, etc. – and event extraction; the latter,
e.g., practiced in projects such as JRC’s EMM Violent Events Maps that are
automatically compiled from published news feeds. Both are pertinent to eTourism.
Furthermore, some research has been done on information extraction specific to
eTourism.
7.3.2.1 Named entity recognition
Named entity recognition is by now a rather well understood topic with wide
applications both across many fields – computational linguistics, computational
philology and related disciplines, even genetics – and across many languages.
Approaches for name taggers often build either on hand-crafted rules – good
classifiers can reach a precision well above 90 % for English language material (cf.
Grishman, 2003, note 3) – or machine learning technologies including automated
learning and statistical model building. Both maximum entropy [Borthwick, 1999] and
Hidden Markov [Bikel, Miller, Schwartz, Weischedel, 1997] models have been trained
using tagged reference materials. The models have then been successfully applied
to untrained material, reaching again precision levels above 90 % for new material.
Various readily available tools implement named entity recognition. The ANNIE
package of the open source GATE suite contains resources such as a tokenizers,
gazetteers and semantic taggers to build rule-based named entity resolvers. Many
other open source or commercial offerings are listed in
http://en.wikipedia.org/wiki/Named_entity_recognition.
7.3.2.2 Event extraction
Whereas named entity recognition is a rather well understood topic, event extraction
is somewhat more experimental and by necessity more closely bound to the type of
events that are supposed to be extracted. A given event type is usually captured
according to a given template – essentially a database table or a set of formal
assertions – whose valencies are filled from entities that are isolated in the free text.
As a rule named entity recognition is a part of this explication process as named
entities frequently occur in the description of events.
To illustrate this situation a typical example of the description event from the Wall
Street Journal of 1993-02-19 may help. This example is lifted directly from GATE
Information Extraction:
 New York Times Co. named Russell T. Lewis, 45, president and general
manager of its flagship New York Times newspaper, responsible for all
business-side activities. He was executive vice president and deputy general
manager. He succeeds Lance R. Primis, who in September was named
president and chief operating officer of the parent.
Ideally event extraction might automatically capture the series of events implied in
this article according to a job-related template with fields such as organization, job
title, newly appointed person, and previous job holder. In reality this is often highly
non-trivial, as exemplified by the number of anaphoric references (“he”, “who” and
“the parent”), the need for inference (Primis obviously was the previous job holder
and has now been promoted) and the amount of encyclopaedic knowledge (New
York Time Co. is the holding for the newspaper) needed for interpreting even this
short and seemingly simple news bulletin.
Unsurprisingly results tend to be better if the source material already follows some
recurrent pattern, as is the case, e.g., for many job postings or medical records, but
also, interestingly, for news articles on violent events such as bombings or earth
quakes.
The number of readily available tools for event extraction is smaller than that for
named entity extraction, and they need to be heavily tailored for any given type of
event extraction and template. One example for such a tool is the open source GATE
Information Extraction package. Commercial offerings include the OpenCalais suite
of web services.
7.3.2.3 Tourism-specific information extraction
Information extraction for tourism specific data necessarily has to deal with a number
of different types of events such as performances, sports events, entries from event
calendars, etc. Each of these can have its own display rules and needs its own
templates. Furthermore, pertinent data is regularly spread across many sources in
many different languages and must hence support parsing from many languages.
Ideally, it should then be stored in language-independent templates based on
language-neutral concepts hierarchies. For end-user consumption the templates
must be rendered in various languages, ideally fully automatically. Already the FP4
project on Multilingual Information Extraction for Tourism and Travel Assistance
(MIETTA, 1998–2000) worked precisely on these issues. Xu, Netter, Stenzhorn
elaborate on two event types, adult education courses and theatre performances and
describe the MIETTA system developed in the project. Sadly, they do not publish any
data on the reliability of the system by testing the extracted information against
manually captured data, as would have been normal. To know this would obviously
be a precondition to gauge the viability of the project’s approach.
A brief follow-up project on a Multilingual Information Environment for Travel and

Tourism Applications (MIETTA-II) was funded under FP5 from 2001 to 2002. Its
primary goal was to commercialize the findings of MIETTA. Unfortunately, there is
very little information publicly available on MIETTA-II. It is not clear if the project
actually achieved its primary objective and if the results of the two projects were ever
deployed in real-life settings.
Current research seems to have largely abandoned information extraction in the

tourism domain and have opted for semantic web approaches to interoperability.
Such approaches have been analyzed in projects such as SATINE (2004–2006) and
concentrate on the semantic description of web services that give access to already
structured data.

7.3.3.1 Named entity recognition
For eTourism named entity recognition is a key to linking extracted information with
given locations or organizations such as hotels, theatres, or other relevant players.
For this purpose one need agreement on an suitable model to unambiguously link
the names of organizations against a suitable vocabulary of organizational units in
the eTourism domain, possibly based on the 29 types proposed in “Annotation
guidelines for answer types” [Brunstein, 2002]. These findings need to be validated
against sample data to test the level of granularity and a sufficient precision in the
tagging.
7.3.3.2 Event extraction
Event extraction is still a research area, though, as we have seen, first applications
are operational, e.g. in the news arena. Standardization in this area would be
premature, though.
7.3.3.3 Tourism-specific information extraction
Event extraction for eTourism is still very much an area of research. In particular it
misses performance tests that would allow an informed decision on the precision that
current systems can reach. Given the great potential that information extraction can
have for the domain, it would be highly desirable to have such data.
 Foster the use of semantic web technologies to describe non-structured data

on the web by the means of resources to make data machine processable.
 Semantically tag non-structured information.
 Agree on the name tags (labels) (preferably with intervention of a recognized

body such as the W3C) representing particular tourism content ought to have,
so that it is made visible for search machines.
 Develop SW that enables (semi)automatic information tagging according to
the previous recommendation.
7.4 Inter-ontology mapping

The mapping between an integrated global ontology and local ontologies may
support enterprise knowledge management and data or information integration. In
the Semantic Web an integrated global ontology extracts information from the local
ones and provides a unified view through which users can query different local
ontologies. In an information integration system a mediated schema is constructed
for user queries. Mappings are used to describe the relationship between the
mediated schema, i.e. an integrated global ontology and local schemas.
7.4.1.2 Needs
There may be different airlines flying to the same destinations from same origins, and
that information has to be shown to the final user in order for her to make a decision
on the most convenient way to travel.
Tasks on distributed and heterogeneous systems demand support from more than
one ontology. Multiple ontologies need to be accessed from different systems. In
addition, the distributed nature (conceptualization) of ontology development has led
to dissimilar ontologies for the same or overlapping domains. Therefore, various
parties with different ontologies do not fully understand each other and they cannot
work together as a consequence, not allowing electronic transactions. To solve these
problems it is necessary to use ontology mapping to achieve interoperability among
information sources and enable effective and efficient business transactions over the
Internet
Information sharing and integration does not only have to provide full accessibility to
data. In addition it ought to make that data fully processable and interpretable by
machines as well. One possible way to achieve effective heterogeneous information
integration is creating links among already existing ontologies. There are different
ways to map ontologies among them: from an integrated global ontology into local
ontologies, local ontologies among them and ontology mapping in ontology merging
and alignment.
Ontology mapping between local ontologies provide interoperability for highly

dynamic, open and distributed environments, i.e. tourism. It can be used for
mediation between distributed data in such environments. This kind of mapping is
more appropriate and scalable than mappings between an integrated global ontology
and local ontologies. It enables ontologies to be contextualized as it keeps content
local. It can provide interoperability between local ontologies when different local
ontologies cannot be integrated or merged because of mutual inconsistency of their
information.
With the growing use of ontologies in different domains of interest, the problem of
overlapping knowledge in a common domain becomes critical. The complexity of the
travel and tourism industry could by no means be represented by a single ontology,
thus multiple ontologies would have to be accessed from various applications. Inter-
ontology mapping could very well provide a common layer from which several
ontologies could be accessed and hence could exchange information in semantically
sound manners.

The task of integrating heterogeneous information sources put ontologies in context.
They cannot be perceived as standalone models of the world but should rather be
seen as the glue that puts together information of various kinds. Consequently, the
relation of ontologies to their environment plays an essential role in information
integration. The term mapping is used to denote the connection of ontologies to other
parts of the application system. The two most important uses of mappings required
for information integration are mappings between ontologies and the information they
describe and mappings between different ontologies used in a system.
Many information integration systems use more than one ontology to describe the
information. The problem of mapping different ontologies is a well known problem in
knowledge engineering. General approaches that are used in information integration
systems are:
 Defined Mappings: A common approach to the ontology mapping problem is to

provide the possibility to define mappings. Different kinds of mappings are
distinguished in this approach starting from simple one-to-one mappings
between classes and values up to mappings between compound expressions.
This approach allows a great flexibility, but it fails to ensure a preservation of
semantics: the user is free to define arbitrary mappings even if they do not
make sense or produce conflicts.
 Lexical Relations: The approaches extend a common description logic model
by quantified inter-ontology relationships borrowed from linguistics. Some of
the relationships used are synonym, hypernym, hyponym, overlap, covering
and disjoint. While these relations are similar to constructs used in description
logics they do not have a formal semantics. Consequently, the subsumption
algorithm is rather heuristic than formally grounded.
 Top-Level Grounding: In order to avoid a loss of semantics, one has to stay
inside the formal representation language when defining mappings between
different ontologies. A straightforward way to stay inside the formalism is to
relate all ontologies used to a single top-level ontology. This can be done by
inheriting concepts from a common top-level ontology. This approach can be
used to resolve conflicts and ambiguities. While this approach allows
establishing connections between concepts from different ontologies in terms
of common superclasses, it does not establish a direct correspondence. This
might lead to problems when exact matches are required.
 Semantic Correspondences: An approach that tries to overcome the ambiguity
that arises from an indirect mapping of concepts via a top-level grounding is
the attempt to identify well-founded semantic correspondences between
concepts from different ontologies. In order to avoid arbitrary mappings

between concepts, these approaches have to rely on a common vocabulary
for defining concepts across different ontologies.

Ontologies have been widely used in a large number of information systems for
different purposes. However, there is still a lot to be done in order to successfully
mediate information exchange and integration processes.
Although reasonable results have been achieved on the technical side of using
ontologies for intelligent information integration, the use of inter-ontology mapping is
still an exception. Reviewing the literature, it seems that most of the mappings have
been realised ad-hoc, i.e. for the particular purpose of the mapping itself, especially
for the connection of different ontologies. There are approaches that try to provide
well-founded mappings, but they either rely on assumptions that cannot always be
guaranteed or they face technical problems. There is a need to undertake research
on mapping methodologies for general purposes.
Most systems only provide tools to develop ontologies, and they fail to indicate a
particular methodology to develop them. The comparison of different approaches
indicates that requirements concerning ontology language and structure depend on
the kind of information to be integrated and the intended use of the ontology. There is
a need to develop a more general methodology that includes an analysis of the
integration task and supports the process of defining the role of ontologies with
respect to these requirements.
By the use of ontologies inter-ontology mapping could offer ontology-based

mediation amongst diverse information sources. eTourism is an industry with a strong
need for interoperability among all agents that take part in the value chain if an
integrated service (general market trend) is to be provided. While languages for
representing semantic models have been intensively studied, semantic mapping is an
open and active research field still in very early stages. Still, there are a number of
issues that need to be overcome in the near future:
 Identify incompatibilities between different data models and representation

structures.
 Define the degree of formality information needs to be defined with, so that
two different data structures can effectively and as automatically as possible
be compared, merged, and eventually mapped.
 Language Heterogeneity is still a problem. There are a number of standard
languages (with different degrees of formality) for representing semantic
models. However, one of the major challenges still is the translation between
models encoded in different languages. The challenge is to provide
translations with guaranteed formal properties.
 The Nature of Semantic Relations: Most existing mapping approaches use a
very limited set of semantic relations that can hold with elements from different
models. In particular implication and equivalence are frequently used. Many
realistic settings, however, demand for richer relations such as inconsistency,
effect-cause relations or overlap. Very limited work exists on approaches for

measuring the degree of relatedness specified by a mapping. This is in
particular important when mappings are created by automatic mapping tools.
A very specific problem with respect to semantic relations is the definition of
semantic relations between models that describe the domain of interest at
different levels of abstraction.
 A general observation about the state of the art of mapping representations is
that mappings are not yet considered to be first class entities in semantic
models. While most approaches agree on elements such as concepts
relations and instances, mappings are not yet an agreed element of semantic
modelling. Important operations on mappings such as reasoning about,
retrieving and composing mappings are currently not supported.
 A Framework for Comparing Mappings: A very concrete research task is to
design a common framework for comparing existing mapping approaches.
 Text mining: Text analysis and conversion into a formal representing model
that can automatically be linked into the common model of the tourism domain.
 Identify techniques and methodologies in order to solve these problems. The
problem of semantic mapping needs to be further automated; however,
complete automation is not expected to be reached.
 Tools that support the creation of a common ontology that enables semi-
automatic mapping of data structures with data representation models.
 Tools that act as information and mediation brokers (data and information
conversion) between the common and particular models.
Recommendations within this section are by nature very similar to the
recommendations proposed within chapter 6.3 (“Ontologies”).
 Foster the development of ontologies using the same standard definition

language as well as the same degree of formality and expressivity to ease
automatic ontology mapping, following W3C recommendations.
 Based on short term recommendations, build graphic user interface based

tools that automatically merge and link ontologies using the ontologies'
reasoning capabilities to automatically find and resolve alignment and
inconsistencies.
8 Process handling
8.1 Needs and requirements
8.1.1 Introduction
Consumers in the tourism industry are getting more and more used to make online
transactions, and the industry is competing with services to attract these customers
and get them to the actual booking act as fast as possible. Traditional distribution
channels are vanishing, and more flexible and dynamic networks rise. This very
dynamic development puts pressure on service providers: Business actors have to
follow demand to keep or expand their market share, otherwise they might get
crowded out.
These challenges require skills in marketing but most of all in deploying modern
information technology to manage the actual buying or booking process. This
process and other processes in the domain alike usually require the participation of
different players along the value chain to be fulfilled, making it necessary to interact
easily with other computer systems on a process level. But the management of
business processes is already difficult within one organization, making it a much
more sophisticated challenge in a network of organizations.
We want to start the discussion on the topic by looking at broadly accepted

definitions. Davenport [1993] defines a (business) process as “a structured,
measured set of activities designed to produce a specific output for a particular
customer or market. It implies a strong emphasis on how work is done within an
organization, in contrast to a product focus’ emphasis on what. A process is thus a
specific ordering of work activities across time and space, with a beginning and an
end, and clearly defined inputs and outputs: a structure for action. ... Taking a
process approach implies adopting the customer’s point of view. Processes are the
structure by which an organization does what is necessary to produce value for its
customers.”
Although this is a very customer-oriented definition, it is first of all broadly accepted,

and second it has an important phrase: “A process is thus a specific ordering of work
activities across time and space, with a beginning and an end, and clearly defined
inputs and outputs.” Something similar comes from Rummler, Brache [1995] saying
that “a business process is a series of steps designed to produce a product or
service”.
As already outlined in the introduction to this topic, in the context of information and
communication technology we consider a process to consist of data, being defined as
inputs and outputs, and of its execution, being a “work activity” or step. The problem
of data heterogeneity across different systems is part of the chapter on semantics,
while we want to discuss the dynamic aspect of executing processes by involving
heterogeneous computer systems in this chapter.
In fact even the one-time exchange of data is already a simple process, which implies
that data cannot be exchanged without having some kind of processes being
involved. Since this is already true for web sites being “crawled” to get information,
we do not want to consider passive process participation in our discussion of the
matter. Instead, we consider a rather complex interplay of at least two participants.
This has always been a problematic issue, being a more critical challenge compared
to mere exchange of data. This issue becomes even more pressing within a highly
networked, dynamic and diverse environment like the tourism industry today. The
introduction of standards is of course always a sophisticated way to meet
interoperability issues, but we know from the past that it is difficult to find industry-
wide acceptance. One reason is the loss of flexibility accompanying standards,
another one the game of market forces.
Since we leave the problem of data mediation to the chapter of semantics, and since
we consider complex processes, we have named this chapter “Process handling”,
and we consider it the dynamic component of process interoperability with the need
of active participation of all actors involved.
8.1.2 Needs
Under this chapter the basic and principle needs for process interoperability are
analysed and discussed, while requirements are outlined in the following chapter.
The challenge is to find ways of process interoperability between heterogeneous
systems that allows an easy integration of business processes and leaves the
autonomy and diversity of the different players, which is needed to correspond with
the diversity of requirements on a global scale. The following discussion does not
touch business issues like pricing, virtual or ad-hoc organizational forms, dynamic
packing, legal aspects, etc. The intention is also not to design platforms for these
issues; it is merely about discussing and recommending one or several ways to allow
process interoperability.
According to the definition presented in the introduction, a process has a clearly

defined starting point, a clearly defined ending point (an objective), and might have a
number of interim steps. The following use case illustrates in simple terms a booking
process in the tourism industry: The starting point is a user with the intention to book
a room in a hotel he has already selected. The starting point is a clearly identified
object; the ending point is a booking confirmation for this object for a specific date.
The process could be broken down into the following steps:
1. Check availability of the room for the date given.

2. Get customer’s acceptance of terms (e.g. room rate for that date) and
payment details.
3. Make reservation and print booking confirmation.
These three steps are very basic and might have some backward loops (e.g. if the
room is not available) or sub-steps (the check for availability might include a
temporary blocking of the room for the specific date). But most of all, parts of the
process might run on other systems. Imagine that the booking is done on a portal
comprising a number of different hotel chains. The checking of availability is done on
a hotel chain’s computer system and the check for approval of payment by credit
card (a pre-requisite for making a reservation) is done on a third system.
This short use case illustrates that a business process can be broken down in
different steps (or sub-processes), which might need the interaction of different
systems. The entire process could be drawn by using a flow chart showing the
different steps and their dependencies. In any case the completion of the entire
business process requires the handling of the steps and conditions. For example, if
the room has been reserved in a first step and in a later step the credit card is not
accepted, then the reservation of the room must be cancelled. Or it is cancelled
automatically after some time if there is no confirmation which is required to complete
the booking. This is up to the design of the process on the hotel’s side. However, the
portal, owner of the entire business process, might need to deal with as many
different systems as hotel chains are presented on the portal. And each of the
systems might have different naming for reserving a room (booking, reservation,
locking, etc.) and different conditions (requires confirmation to complete the
reservation, cancels reservation automatically after some time without confirmation,
keeps reservation alive until status is changed, etc.).
Although the portal requires just one step to be done on another system, it might
have to deal with 100 different ways how to deal with this step if 100 hotels are
involved. And each hotel might have to deal with 100 booking systems if they make
business with 100 portals. Thus each actor might have to implement 100 interfaces to
be interoperable with the required other systems. It is obvious that this is increasing
dramatically the efforts to run processes automatically with other partners.
Since this simple use case is a very common use case and the industry is depending
more and more on the interaction of different computer systems, we can assume a
strong need for a solution that decreases the complexity for process interoperability
in a networked environment. Typical business processes in the tourism industry are:
 searching,
 selling and buying,
 reservation,
 booking,
 modification,
 cancellation,
 confirmation,
 notification,
 payment and other money transfers.
This list might not be complete and could provoke a lot of discussion (e.g. the
difference of buying and booking, confirmation and notification, etc.). However, it
shall only bring examples of the frame of possibilities we are discussing. To perform
all these processes in a networked environment we can assume that
 “the basic industry need is an applicable concept for the technical interaction
of heterogeneous ICT systems to provoke and run complete business process
cycles involving at least two different technical systems.”
“Applicable concept” shall express the need for something that is useful in daily
business life.
“Technical interaction of heterogeneous ICT system” focuses the topic on the

technical level, leaving out business, legal, social or any other aspects. It has to be
possible to run processes on different systems regardless of their technical
specification (frameworks, programming language, data base, etc.). However, it is
obvious that these different systems cannot be completely different. They need at
least a, whatever kind of, network connection and some protocol to be able to interact
with each others. In this context, and due to its relevance, it is assumed that an
internet connections and industry-wide used protocols (e.g. TCP/IP) and standards
(e.g. XML) are available.
“Complete business process cycle” means that a business process, wherever it starts
or ends, should be carried out completely as defined.
“Involving at least two different technical systems” shall define that the topic can have
a bi-directional setup, but in any case has to be flexible enough to be run in a
network of different technical systems, thus more than two and up to an unspecified
number.
The design and management of business processes is a subject on its own, but for
the discussion following it is enough to say that business processes can be broken
into a number of steps. Each of the steps needs a trigger to initiate the step, has
some conditions to be started and delivers an output, including information required
for the performance of the overall process (e.g. trigger for another step).
Furthermore, we assume that a step can only run on one system. If it requires two
systems this step must be broken down into several steps. This assumption is
reasonable, because despite of the discussion whether this is technically feasible, it
is necessary to have the authority over a step with one actor. Otherwise two or more
actors would be responsible for the same step which is obviously not feasible in
practice.
8.1.3 Requirements
The chapter about requirements shall bring the industry needs, as described above,
in a more structured and operative form.
1. Network capability: Ability to run one complete business process as an

interplay of an unlimited number of different and clearly separated computer
systems.
2. System independence: Ability to be deployed independent from the ICT
system used, especially independent of:
o databases,
o data structuring,
o operating systems,
o frameworks.
3. Player independence: Ability of each player to participate in the same
business process but initiated from an unlimited number of different players.
4. Process range: Flexibility to run each business process possible in the
tourism industry.
5. Player’s autonomy: Leave autonomy and flexibility to the individual player to

change own system and consume external data in an autonomous way.
6. Cost effectiveness: Low cost of integration and operation.
7. Stability and reliability: Including fault secure system and error handling.
8. System performance: High performance to allow fast transactions and
comprehensive handling.
9. Security: Ability to meet security and trust requirements like:
o data encryption,
o partner identification,
o fraud resistance.
10. System openness: Accessibility and availability, e.g. whether the system is
publicly available or the specification is open.
8.2 State of the art

The following systems are currently the state of the art in the tourism industry:
8.2.1 Global standardization efforts

Standards define a formulized schema how to handle processes - thus they usually
leave only little flexibility for the participants who have to comply with the standards.
Furthermore they flatten diversity, since no deviations from standards are allowed.
Standards can provide rather rigid rules, as for example the standards set by the
Open Travel Alliance or by OASIS (ebXML). They define on a concrete level the
data-schemas and process rules. Developing such a standard is in general a lengthy
process, initiated either by market power or by industrial or governmental interest
groups. And the implementation of a standard can be a complex and expensive
procedure, which makes it especially difficult for smaller players.
Additionally, there are also more flexible initiatives, giving a framework within which
players can adapt according to their needs, based on a non-cohercitive language that
allow to express common basic element in a similar ways for all players but at the
same time allow combination of those elements in different ways so as to allow
diversity. Cost of implementation can be reduced compared to a full standard since
all players may publish different levels of services. The use of templates also allows a
certain flexibility in the format of responses according to requesters. A drawback
stems from the fact that integrating different players may require certain adaptations
due to commercial or system driven specificities. On the other hand this fact allows
competition and diversity. An example of such a language is the XFT (Exchange For
Travel) language.
8.2.2 Application Integration and APIs

Application Integration and Application Programming Interfaces (API) allow a 1:1
interplay of different players. The type of interplay is defined by the partners involved
jointly or by one central partner having the market power to do so (like in the case of
Amadeus). Application Integration is a much deeper way of system interplay requiring
development work to fulfil the purpose by “integrating” the systems, while an API is a
gateway for external systems where the corresponding partners do not care about
what happens behind the partner’s gate.
This is feasible when having a central player, but is not feasible in open and dynamic
networks, since an interface for each player has to be developed. This increases
drastically complexity and cost of implementation. However, Application Integration
and APIs are better suited to handle different processes and are more responsive
regarding specific requirements for the systems involved.
8.3 Gaps and future needs

Service providers in the tourism industry are faced with a fast changing and highly
dynamic environment. They have to meet changing market requirements in shorter
time and more cost-efficient. They need to offer enhanced functionalities to their
customers and at the same time need to run processes within the interplay of
different systems, including the integration of external information systems.
The following table helps to highlight the current state of the art as described above
meet the needs and requirements identified:
Criteria Standards Application integration

Network capability rather no no
System independence rather yes rather yes
Player independence yes rather no
Process range rather no rather yes
Player’s autonomy no no
Cost effectiveness rather no no
Stability and reliability yes rather yes
System performance rather yes yes
Security indifferent rather yes
System openness rather yes no
The different entries might well be questionable and can raise discussions, but in
general they reflect well the current situation: Standards and Application Integration
are not fully suitable for a highly networked and dynamic environment like the tourism
industry today. They result in a loss of autonomy, need some central entity or control
of power, and are expensive.
For each process run over different systems the interface needs to be specified,
developed and maintained separately, since they do not all make use of the same
standards or interfaces. If a new version of a standard or interface is published it
cannot be used automatically. It needs to be deployed and maintained manually. It is
obvious that a more flexible solution with a mediating technology meets better the
requirements than rather rigid technologies.
Current research projects touch the issue of process interoperability by focussing on

Semantic Web Services and Multi-Agent Technologies, but also Grid technologies,
with the aim to develop intelligent and adaptive systems for the interplay of
heterogeneous systems. Examples for these projects are:
 Agent Link: http://www.agentlink.org/

 ArguGRID: http://www.argugrid.eu/
 ARTEMIS: http://www.srdc.metu.edu.tr/webpage/projects/artemis/
 ASG: http://asg-platform.org/cgi-bin/twiki/view/Public
 BREIN: http://www.eu-brein.com/
 DIP: http://dip.semanticweb.org/index.html
 SATINE: http://www.srdc.metu.edu.tr/webpage/projects/satine/
 SUPER: http://www.ip-super.org/
These projects address the need for a more flexible and cost efficient way to align
business processes between different systems. A promising way is the concept of
Semantic Business Process Management, resulting from the application of Semantic
Web Services to Business Process Management, as for example discussed by Hepp
et al [2005]. Based on this concept, Cimpian, Mocan [2005] proposed a process
mediator, adjusting the bi-directional flow of messages based on the Web Service
Modeling Ontology (WSMO). This approach is similar to that chosen by the
Harmonise project (http://www.harmonet.org/) for data mediation, in which a
technology for mediating between heterogeneous data sources was developed. The
Harmonise technology allows involved parties to exchange information without
changing the local data structure, only by referring to a common understanding of a
domain-specific ontological concept, the Harmonise Ontology [Fodor, Werthner,
2005].
8.4 Recommendations
8.4.1 Short-term recommendations (1–3 years)
 Simplify and rationalize existing processes – use stateless process handling or
request-response-pairs only.
 Build an ontology of common processes in the tourism industry.
8.4.2 Long-term recommendations (3–10 years)

 Develop process mediators.
 Put research efforts into intelligent agent technologies for automatic process
handling.
9 Metasearch
9.1 Methodology
Metasearch is the ability to run one search process over different search engines of
heterogeneous instances (platforms, websites, databases) and aggregate result in a
unified list. In the tourism industry they are typically used to compile and compare
specific offers. Examples are:
 Checkfelix: http://www.checkfelix.at/,
 Kayak: http://www.kayak.com/,
 Farechase: http://farechase.yahoo.com/,
 Trabber: http://www.trabber.com/,
 Kelkoo Travel: http://travel.kelkoo.co.uk/, and
 Minube: http://www.minube.com/.
Typically, search results are not stored in a database, but delivered as real-time
results. However, some systems make use of data replication for static data (like,
e.g., hotel descriptions).
9.1.1.2 Quality of results
Metasearch engines in tourism rely on quality of data, especially regarding accuracy

of information. Quality of results obviously depends on the understanding of the
same information provided by different systems, like:
 data organization: the structuring and naming of data;

 data understanding: the encoding (like different categories) and precision of
information (like miles and kilometres);
 data depth: the availability of different information items in different systems
(like disclosure of price composition);
 data accuracy: like prices and availability.
9.1.1.3 Response time
An acceptable response time for search engines is of high importance to meet user’s
expectations. Metasearch engines depend on the response time of the other search
engines and have to face clever algorithms to avoid deadlocks. However, this is less
a matter of information and process interoperability, except that runtime performance
in aggregating data might be improved.
9.1.1.4 Access to data
Data can either be accessed by getting it automatically from the web interface (user
interface) or via a data interface (e.g. web services). Semantic Annotation of content
and Semantic Mapping, so that the metasearch engine can find the information that
is required provides part of the answers and are detailed in the corresponding
chapters of this study. However, there are still some remaining issues for instance
regarding data encapsulated in pure graphical applications such as Flash
applications, because the data is not accessible at all. Possible solutions come from
the new technology trends such as Flex applications where the whole application is
XML based. Other issues stem from client-side calculations (e.g. options depend on
different settings, prices are calculated on the fly by client side). In that case, the data
is directly hard coded in the application and would require interpretation of the code
to access the data and the corresponding rules.
Another aspect of access to data is covered in the section on querying (9.2).
9.1.1.5 Efforts for maintenance
The search on another system is often tailored to the particularities of the foreign
system. Interfaces have to be updated each time the other systems changes to keep
the service level, if these interfaces do not follow a given pattern or standard.

9.1.2.1 Web crawler
Web crawlers (synonyms: Robots, Bots, Spiders) are software scripts and programs
that browse the World Wide Web in an automated manner to create copies of
website (which are processed by other software agents later) or to gather specific
information. They are used by search engines but are typically not used for
metasearch processes, since they are normally only gathering information and not
running processes on other websites.
9.1.2.2 HTTP requests
HTTP requests can be used to run automated search queries on existing search
engines by rebuilding the HTTP request that is used on each of the external sites.
HTTP requests are very maintenance-intensive, since each little change in the HTTP
requests requires an up-date of the process. Depending on the external site, data is
sent back in an unstructured or in a structured manner and needs to be processed to
bring it into the scheme (semantics) used for displaying results by the metasearch
engine. Depending on the provision of results on the external systems, HTTP
requests can be used as a light-weighted, yet still maintenance-intensive, way for
running a metasearch process.
9.1.2.3 Website wrapper
Website wrappers allow to grasp the unstructured information provided on websites

and transform them into a structured form. Advanced wrappers allow to run more into
deeper information architectures than webcrawlers, but have to be adapted for each
website that has to be wrapped. The provider of the website that is wrapped does not
need to make any changes or adaptations, thus the semantics of the wrapper can be
applied to the search process.
Since advanced tools can run different operations, website wrappers are well suited
for metasearching. Still they need considerable maintenance efforts since a change
in the wrapped website requires an update of the wrapper.
9.1.2.4 Application Programming Interfaces (API)
APIs are the classic way for interactions between different computer systems and
allow a broad range of possibilities. They can be independent of programming
language and are therefore open for any kind of integration and information
exchange. It has to be mentioned that the implementation of an API typically causes
considerable efforts, and there is no general standard for APIs, since APIs vary
significantly from the purpose and the domain concerned.
9.1.2.5 Web services
A web service is a software application enabling the exchange of data in XML format
to allow machine-to-machine (M2M) interaction on different platforms. Different from
HTTP requests and web crawlers, the provision of a web service has to be
implemented by the service provider, who is identified by listings in registries (UDDI).
A similar approach is REST (or RESTful web services) to allow the exchange of
domain-specific data over HTTP without an additional messaging layer like in web
services. It is often described as an easier form of web services.
Web services and REST provide means to provide data in a structured

(understandable) way and are therefore well suited to get information for metasearch
queries. However, it still leaves out the problem of different ways to describe the
data.
9.1.2.6 Semantic annotation
Semantic annotations provide methods to add metadata to documents, allowing

some formalised understanding of the information provided in these documents. In
this way the meaning (semantics) of information can be understood automatically by
information systems (see also chapter on semantic annotations, 7.2).
Three different approaches exist do add semantic annotations to a document:
1. Embedded annotations are added into the document.

2. The document refers to another document providing the annotations.
3. Annotations refer to the document concerned.
Semantic annotation is an enabler for metasearch, making it easier to find and

understand resources with information relevant to the search criteria. It is not a
search method itself and still depends on a common schema to describe the
annotation, but can be a powerful method in combination with web crawlers.
9.1.2.7 Caching mechanism
Caching mechanisms aim to provide extensive content directly disclosed by vendors

on a regular basis (every day, hour, etc.) in a common language to be easily
integrated by metasearch engines. Many metasearch engines in fact constitute
internal caches from cached data provided by the vendors on a regular basis so as to
improve response time and improve the probability to be present on the metasearch
engine (because sources with poor response time are unlikely to be displayed in
metasearch engines).
9.1.2.8 Summary
The methods described above can be divided into two groups.
The first group comprises methods where the agent providing a metasearch engine
can integrate other search engines without any assistance from the search engines
used (web crawler, HTTP requests, website wrapper). Thus the metasearch agent is
more independent from the other systems. These methods are therefore more
flexible, but cause considerable efforts for the implementation and maintenance of a
metasearch service. However, they would obviously cause less effort if standards are
supported or interoperability problems are solved.
The second group comprises methods, where the assistance of the external search
engine is required, where some kind of interface is provided or where other changes
are necessary. Clearly, these methods make it easier to implement and maintain a
metasearch service, but require the application of standards or the solving of
interoperability issues to run smoothly.

The tourism industry is characterised by a vast number of search engines to search
for flights, accommodations, events, attractions and other tourism services.
Metasearch engines are important tools to provide a one-stop access to this
information on a regional, national or transnational level.
The methods for metasearch described above provide useful tools to integrate
different search engines, but quality of results, response time, access and efforts for
maintenance depend very much on the use of standards or the ability to understand
the other system in another way. Especially the combination of website wrapper and
semantic annotations to websites seem a promising way to enable improved
metasearch functionalities. The deployment of metasearch engines could be
supported sharply if either broadly accepted standards or means for the

interoperability of tourism related information could be provided.
One important direction metasearch engines can take is that of semantics. Semantic
search engines are becoming increasingly popular. Semantic search engines are
systems that need to understand (the meaning of) both what the user is asking for as
well as the information that is stored in the web. Any semantics-based query
recognizes key words used in order to carry out a search and uses that same
information in order to display more precise results. The final and main objective of
this search technique is to find all documents on the web that contain the most
relevant information related to the query (i.e. those that syntactically match with the
search keywords), minimizing the number of false results.
Additionally, semantic information enables the inference of new knowledge from

semantic documents, based on various logic rules together with classes, attributes
and values. This knowledge can to be stored and processed, which is not done by
traditional search engines. The resulting graph from a semantic data collection is
different from the one created with hyperconnected HTML-based documents. Thus, a
semantic search engine is different from traditional ones, because a semantic search
engine searches on semantic annotations of content, i.e. annotations realised using
domain ontologies.
Another big challenge for metasearch engines, after understanding the content, is the
system performance. All methods have to fetch data from different systems,
transform them and display them in appropriate manner (ranking, paging, etc.). The
more sources the systems queries, the more benefit it offers for the user, but the
slower the system becomes.
 Make use of semantic technologies to describe your data.

 Provide content and meta-content as close to an existing standard as
possible.
 Provide regularly updated, external data store with pre-processed and well
described content for fast querying (caching mechanism), if you have larger
querying process times or complex queries.
 Development of aggregated data repositories, providing pre-processed data
from different sources.
 Focus on development of fast and easy to use alternatives of metasearch

technologies, enabling or supporting use of semantic technologies for data
transformation.
9.2 Querying
More often than not the information involved in eTourism transactions is distributed
across a number of different data stores, usually operated by different companies:
various GDSs (potentially in their respective national incarnations), CRSs, other
sources, not to forget the plethora of unstructured data such as the web. As
discussed throughout this section, we often need to find information in and across
many of these data sources and, indeed, often for the data sources themselves.
However in many scenarios applications need to go beyond mere search and to

query across data sources for data sets that meet very specific sets of constraints.
Typical queries might be:
 List all hotels in Rome with at least three stars that have availabilities between
October 20th and 22nd.
 List all prices for flights to Rome that fly in on the morning of 20th and return
on the evening of the 22nd.
In many cases queries could be much more complex still and be combined with
constraints based on geographical data (hotels not more than 500 metres from the
Spanish Steps), price ranges (not more than EUR 100 per night), etc. In many cases
subsequent queries will build on the existing result sets of simpler queries and further
refine them in a piecemeal manner.
Going more into detail of human search behaviour, we can observe that users are not
searching for hotels “not more than 500 metres from the Spanish Steps” but rather for
hotels “close to” or “near” to the Spanish Steps. However, a hotel “near” Rome might
describe another distance than a hotel “near” the Spanish Steps. The translation of
human search needs or peculiarities into a respective machine-readable search
query covers aspects of interoperability we are not going to cover in this chapter,
which focuses on machine-machine interoperability. Nevertheless, natural language
processing and the transformation into search queries remain important aspects and
challenges in querying.
9.2.1.2 Needs and requirements

Fast transactions
Queries along these lines are typical parts of the selection phase in eTourism
transactions. A given transaction will often involve a considerable number of queries
as the customer or her agents are narrowing down their result set to a small number
of hits that fit the demands. Queries therefore must be fast and return results within a
maximum of a few seconds. Nevertheless, we can observe metasearch engines on
the market today taking minutes rather than seconds to run real-time queries on
external systems.
Reflect complexity of search requirements
Queries must be able to be sufficiently expressive to model the customer’s

requirements, either directly through a single complex query that enumerates all
constraints, or through a sequence of simpler queries that narrow down result sets.
100 % correctness of query results at this stage is highly desirable, but not absolutely
necessary. The ultimate corroboration or falsification of query results can follow at the
booking phase when an unbinding service offer is turned into a binding contract
between supplier and customer.
Expansion of querying sources
The tourism industry is a highly dynamic environment, and data stores and search
engines appear, and also disappear, almost continuously. Content aggregation and
syndication become indispensable tasks for the provision of one-stop platforms.
Ideally, integration of new data stores into a user or travel agency facing metasearch
engine should be largely transparent, easy and thus cost-efficient.

9.2.2.1 Methods for query distribution
Technically, the search query entered into current metasearch engines has to be
translated to other data stores for further processing. This can either be done by
1. manual translations on a case-per-case basis,

2. use of query by example,
3. use of standardized query languages, or
4. use of standardized query interfaces.
Furthermore, queries can be truly federated or based on pre-harvested and regularly

updated data that the participating data stores provide to the metasearch engine.
Some engines also combine these two approaches.
At present, option 1 dominates. For federated queries individual data stores today
offer their own query strategies. These strategies often reflect their historic evolution
and their specific internal processes. This makes querying one of the biggest
challenges for metasearch, since queries cannot be translated easily from one
system to another. The integration of each new data store means considerable
custom programming, making it a costly and time-consuming enterprise.
For want of a standardized format, options 2 (“query by example”), 3 (“standardized

query languages”) and option 4 (“standardized query interfaces”) are at present not
widely used, though they offer considerable potential for easier integration of data
stores. All of them are related in that they propose to look at commonly agreed query
language.
9.2.2.2 Query by example
“Query by example” was developed by IBM in the 1970s in parallel to what was to
become SQL (cf. Ramakrishnan, Gehrke, 2002, chapter 6). The user supplies
example result sets that can formulate constraints or other selection criteria in
addition to typical string values. Examples can often be built through graphical user
interfaces.
When looking for a hotel, for example, the client would specify basic hotel
characteristics (e.g. name, category, etc.), and room category, and the system would
on that basis return a suitable set of hotels [Höpken, 2004]. This way, query by
example partially relieves users to learn about formalized query languages and,
instead, allows them to find related entries to known samples. However, it needs
clear templates for the type of examples that can be constructed and used as the
basis for cross-data store queries.
9.2.2.3 Standardized query languages
Standardized query languages usually expect users – which in this case will normally
be system integrators rather than end-users – to learn a specialized language for
querying a system. The best known of these certainly is the Standardized Query
Language (SQL) for relational databases which is used by all current relational
database management systems. At least the core features of ISO/IEC 9075, the
international standard specifying SQL, are implemented by virtually all suppliers of
relational database management systems.
SQL does not lend itself particularly well to federated queries and is normally used to
consult a given database instance. SQL-like syntax is used, however, for drill-down
searches in federated registries such as the ebXML Registry Specification. Likewise,
the SQL syntax has heavily influenced the syntax of a number of other non-relational
query languages such as the Object Query Language (OQL), Simple Protocol and
RDF Query Language (SPARQL), and aspects of the Topic Map Query Language
(TMQL). In the following we shall look at one new query language especially for
(potentially federated) semantic queries.
SPARQL: The Query Language for RDF (SPARQL) is, as the name suggests, a
language for querying RDF triples. This relatively new W3C Recommendation was
only published in January 2008, but can already point to a considerable
implementation base. It can be used to query against a considerable number of
commercial and non-commercial native triple stores – for a not complete list cf.
http://esw.w3.org/topic/SparqlImplementations –, but also against adaptors such as
D2R Server that sit on top of relational databases. This flexibility has encouraged the
growth of a number of publicly available SPARQL endpoints, some of which are listed
on http://esw.w3.org/topic/SparqlEndpoints.
SPARQL can honour the transitivity properties defined in RDF-S and OWL
ontologies.
Typical queries might look like:

SELECT ?resource
WHERE {
?resource dc:creator <http://www4.wiwiss.fu-
berlin.de/gutendata/resource/people/Abbott_Eleanor_Hallowell_1872-1958>
which would list all resources created by Eleanor Hallowell Abbot available in a given
triple store, or
SELECT ?title
WHERE {
?book dc:language "en" .
?book dc:title ?title
} ORDER BY ?title
which lists all available titles of publications in English.
Unlike SQL, SPARQL can be used for distributed queries and aggregation of data
across data stores [Schenk, Staab, 2008], [Haase, Wang, 2007] [Quilitz, Leser,
2008]. Vocabularies can be cross-references and cross-queries across data stores.
However, in the case of divergent ontologies being exposed in participating data
stores, suitable mappings, e.g. to a reference ontology, must exist for a distributed
query to succeed. Such a search strategy could in principle scale to manually
annotated data sources such as web pages annotated with RDFa.
That said, at present few, if any, examples of distributed SPARQL queries across a
number of nodes operated by different organizations are known to be used in a
production environment, even less so queries including many individual web pages
(though some commercial products such as Allegrograph
(http://agraph.franz.com/allegrograph/) support elaborate and largely transparent
federated SPARQL queries and reasoning across distributed instances of the
system). Little is known if the technology would, in fact, scale well enough for large
heterogeneous networks, and, if so, in which type of network topology.
Similarly, in spite of the rather positive overall SPARQL take-up in general, no

endpoints in production use are currently known in the eTourism domain. SPARQL
may or may not prove to be a good choice for the domain.
9.2.2.4 Interface standardization
Just as the query language itself, also the interfaces to query services can be
standardized, e.g. through shared interface specifications in WSDL. Without a shared
query language the expressiveness of such services is necessarily limited, but in
many cases the result sets even of simple queries can be subsequently refined in
further query steps.
The Open Travel Alliance specifies query interfaces in their schemas

(http://www.opentravel.org/Specifications/SchemaIndex.aspx?FolderName=2008A),
e.g. for the availability of cruises, of golf courses and many more. While the long-term
benefits of interface standardization may be less than that of a shared query
CEN/ISSS WS/eTOUR – CWA – 200
2009-06-03 – 103
language, it is a chain of piecemeal, often informal standardization activities that can

lower integration cost in the short to mid-term.
9.2.2.5 Metadata syndication
The alternative to distributed queries is local data stores based on metadata

syndication. In rather simple forms – regular supply of data dumps generated from
supplier’s CRSs – this is in many cases the practice in today’s
today’s GDSs. Often such
dumps simply replace all the supplier’s data.
Using simple syndication protocols based on Topic Map or RDF playloads only
actually changed records are exchanged between data stores. An Atom Atom-based
general purpose syndication protocol is specified
specified in part 1b of the nascent eGov
eGov-
Share CWA (http://www.egovpt.org/fg/CWA_Part_1b
http://www.egovpt.org/fg/CWA_Part_1b). ). Nodes subscribe to change
feeds and can thus import new, deleted or updated records on a case case-by-case basis
from their source registry, provided they have the necessary credentials to access
the feeds – and their metadata can be mapped on a shared reference ontology.
Figure 9-1
Querying thus becomes a sub-problem of data integration, and queries can then be
run locally against the aggregated data store. Updates can be pulled at short
intervals (say, every 10 minutes), thus providing cached queries with nearly live
results.

9.2.3.1 Query by example
Query by example (QBE) can be used without any specific query language, only by
the use of data samples. This makes it easy to implement when data interoperability
is solved, independently of which kind of method is used to reach data
interoperability (standard, interfaces, mediation). The main drawback is the fact that
complex queries cannot be made and user requirements will not be met in most
cases. However, in specific scenarios, especially when looking for descriptions or
listings in domains with a shared or even standardized ontology, QBE might prove to
be sufficient.
9.2.3.2 Standardized query languages / SPARQL
In order to gauge the potential of standardized query languages in general and

SPARQL in particular in tourism scenarios we need experience reports and test-
beds. Such test-beds should involve major information providers and or integrators to
evaluate aspects such as:
 ease of the production of RDF triples based on existing data stores,

 ease of the implementation of SPARQL endpoints on top of existing data
stores, and
 performance characteristics of federated SPARQL queries.
Since the ICT infrastructure in the tourism industry is characterized by a broad range
of heterogeneous systems (and thus different databases), it is very unlikely that a
typical query language can be deployed as a standard on a broad base. SPARQL, on
the other hand, has the potential for broad acceptance, since it can be deployed on
top of existing reference models in the case of divergent data models. In this setup it
has similar benefits and constraints as QBE, but overcomes QBE’s main obstacle by
allowing complex queries. SPARQL seems therefore to be one of the main potential
candidates for handling metasearch queries in a distributed and divergent
environment.
9.2.3.3 Interface standardization
The short-term benefit of interface standardization can be heightened by an overview

of respective schemata. On this basis the relevant fora such as the Open Travel
Association and XFT can spot gaps and fill them. Such specifications should be
elaborated with the general recommendations in the process handling section.
Interface standardization seems also a reasonable practical method for running

distributed queries. Its potential is limited by mainly two facts: Firstly, each possible
query sequence must be defined as part of the interface(s). This makes query
interfaces difficult in its definition and adoption, and therefore limits the potential for
running complex queries since the efforts for defining and deploying interfaces
become overwhelming. Secondly, participating partners must either implement the
standard or define mappings to be interoperable with the standard. Thus query
interfaces have to be implemented either for each query scenario or mappings based
on a shared reference model must be setup.
Thus interface standardization is more advanced than QBE, but still has similar
restrictions. It might be well suitable for specific scenarios, but sets its limits for a
broader deployment.
9.2.3.4 Metadata syndication
eTourism can build on the experience with metadata syndication in the eGovernment
domain. Those results should be evaluated and screened for their applicability in the
data integration between CRSs, GDSs and intermediates.
In fact metadata syndication bypasses the problem of running integrated queries in

“metasearch” scenarios, by making normal queries in “metadata repositories”. This
seems to be a fairly practical approach. It results in fast queries allowing whatever
degree of complexity – only limited by the possibilities of the query language used.
One disadvantage is the hosting of redundant data. Another one is the need of
constant updates in a highly dynamic environment like, e.g., for hotel bookings. It
might also be doubtful that metadata syndication is feasible for networked
environments, since it might result in multiple asynchronous data hubs. A clear
advantage is that it keeps the data source free from queries, improving the source’s
overall system performance. Especially GDSs are subject to performance problems,
which are alleviated this way. Again, the full benefit might be reserved to a limited
number of search scenarios.
 If a system should be available for external queries, make use of general

query statements that are supported by a broad range of query languages.
Avoid specific features and functionality of own database.
 Further develop flexible standardized query languages that can be adapted to
different system environments and support semantically enriched data.
 Publish “partial translators”, which provide a structured translation for human
search concepts like “near”, that can be used by different query languages.
 Research on technologies for flexible and adaptive query methods, that are
able to understand semantics of a web repository and can send an appropriate
query.
9.3 Role of registries in eTourism

As has been discussed above, both services and data is widely distributed in typical
eTourism scenarios. In addition to the information provided by one or more large
players such as a GDS, a typical eTourism transaction can bring together – or could
in the future profit from combining – local and remote data from many sources.
Standardized queries, e.g. based on SPARQL, or ad-hoc protocols can be used to
actually retrieve specific data sets from data stores (see above) and web services
can be used to access specific services, ideally through standardized APIs.
This scenario presumes, however, prior awareness of all pertinent sources of

information and services that in reality no single player is acquainted with. Instead,
machine-processable information on such stores and services is currently either not
available at all or spread across many different data collections. These range from
major commercial operators such as the GDSs themselves over national or regional
integrators and portals to the web sites of small tourist destinations that list relevant
services in their small geographical area.
The need for machine-processable information especially on services has long been
recognized. When web services became popular in the late 1990s, three key factors
were considered to be crucial for the success of the then new paradigm:
1. Technical interoperability: Web services need to exchange (often pre-

defined) data structures or perform RPC-style calls across systems.
2. Description of service interfaces: The APIs of web services and the data
structures must be defined in a machine-readable fashion.
3. Lists of available services: Knowledge about other existing web services
and their goals as prerequisite for using them.
Solutions for these requirements are based on open specifications and are in the
context of “traditional” web services usually identified with the three well-known basic
web service standards SOAP, WSDL and UDDI (questions of semantic
interoperability were largely out of focus at that time). In RESTful Web Services the
stack is somewhat less clearly defined especially for machine-processable API
descriptions, but the general requirements are the very much the same.
9.3.1.2 Needs
Looking beyond those specific web service standards, the OASIS Reference Model
for Service Oriented Architectures [OASIS Reference Model] explores some of these
requirements on a more precise, technologically neutral level. Around the idea of a
service as “the mechanism by which needs and capabilities are brought together”
gravitate concepts such as interaction of services, their service descriptions and their
visibility and reachability, all grounded in the willingness to collaborate with the goal
of achieving a real-world effect.
Rightly, “the large amount of associated documentation and description” [OASIS

Reference Model] that exists for a service is seen as one of the defining
characteristics of Service-Oriented Architectures (SOAs). This service description,
however, goes way beyond interface descriptions and includes both information
about organizational prerequisites – the foundations for what terms organizational
interoperability – and a depiction of the service’s semantics, required for semantic
interoperability. The same principles can be extended to the primordial role of the
visibility for data stores in eBusiness and in particular eTourism transactions, an
aspect less in the focus of the OASIS Reference Model.
Registries are typically regarded as one approach to achieve visibility, other options
being semantic or general-purpose search engines. Registries in this sense help to
find actual resources, thus enabling their discovery. For that purpose, they store
more or less standardized metadata to describe those resources and offer an
interface to query that metadata. This metadata could conceivably one day also be
harvested using information extraction.
Registries must facilitate finding existing services and data repositories. Together
with standardized query technologies they thus help to put those resources to optimal
use.
While registries focus on the visibility of resources, they build on the often unspoken
assumption that there is already a willingness to collaborate and share those
resources in a given context, be it within an organization or across organizational
boundaries, be it for free or for a charge. This may or may not be true in a given
case, and it may or may not imply that a registry owner is willing to give up control
over the data. Furthermore, in the real world there is rarely a single source of
information for any given area of interest, and, as we have seen in the introduction to
this section, it is particularly true for the tourism sector. Individual registries are
maintained at various levels of government – notably, local authorities supporting
their local tourism industry –, in tourism associations, GDSs and other private sector
organizations. This makes sense; in many cases the maintainers are closest to the
very resources themselves and have both the best first-hand knowledge and the
strongest business case to keep the data up-to-date.
That said, there is also a strong requirement for centrality, or, more exactly, central
interfaces to enable searches across individual registries. Otherwise any one search
will involve direct queries to a large number of eTourism registries, negating the very
idea of visibility of data and services.

9.3.2.1 UDDI and the ebXML Registry Specification
Two well known registry standards dominate the relatively small literature on the
subject, namely UDDI and the ebXML Registry Specification. But neither standard
has been widely adopted in the market. This is, as we argue in Küster, Moore,
Ludwig, 2007, due to fundamental design issues that plague both specifications,
namely the mixing up of the in reality orthogonal technical exchange formats,

information models and organizational rules, leading to very limited adaptability for
new requirements and to bloated specifications.
UDDI
UDDI is the best-known standard for registries of services. The UDDI 1.0
specification was formally released in 2002, pushed by major software vendors such
as IBM, SAP and Microsoft. It was supposed to lay the basis for the loosely coupled
operation of web services, bringing together service consumers and service
providers, possibly even based on automatic discovery and cooperation. For this
purpose, the vendors created three public UDDI registries that were open to all
interested parties. These public registries, however, were not widely used and were
eventually discontinued in early 2006.
Technically, UDDI is above-all an API for a set of SOAP-based web services with
their respective data models. This API has continued to grow over the three
published versions of the standard and covers today amongst others methods for
publishing information on businesses and their services, for finding them and for
establishing links between them. By now, the monolithic UDDI 3.0 standard totals an
estimated 400 pages, not counting the nine XML schemata with the actual API
specifications.
ebXML Registry Specification
The ebXML Registry Specification is composed of the two sister OASIS standards
[OASIS ebXML Registry], the former specifying its internal data model, the latter its
SOAP-based API. In coverage it is quite similar to UDDI, though it supports more
flexible content models. It distinguishes itself from UDDI by the support of federated
queries across a number of different registries:
Figure 9-2
The Open Source Omar project

(http://ebxmlrr.sourceforge.net/3.0/PropertiesGuide.html) is by all appearances the
most popular implementation of the ebXML Registry Specification.
Semantically Enhanced Registries in SATINE
Neither UDDI nor the ebXML Registry Specification allows per se for detailed
semantic descriptions of (web) services, let alone other types of resources such as
data stores. Queries can at maximum leverage rather coarse-grained, domain-
independent taxonomies such UNSPSC.
As has been argued above, semantic technologies are a key to enabling data and
process interoperability, but are at present largely underused in eTourism in general
and in GDSs in particular. The SATINE project
(http://www.srdc.metu.edu.tr/webpage/projects/satine) was funded under FP6 from
2004 to 2006 with the explicit goal to overcome the shortcomings of some current
GDSs. SATINE set out to “provide tools and mechanisms for publishing, discovering
and invoking web services through their semantics in peer-to-peer networks”
(http://www.srdc.metu.edu.tr/webpage/projects/satine/deliverables/D4.1.1.doc).
Semantic technologies and specifically ontologies for web services play a significant
role in the SATINE architecture.
Amongst other deliverables SATINE set out to establish a “Semantic-based

Interoperability Infrastructure for integrating Web Service Platforms to Peer-to-Peer
Networks”. Looking at both specifications, but in particular at the ebXML RS, the
SATINE project defined mechanisms for describing the Web service semantics of
registry entries through the use of OWL-S (task 4.1 deliverable
(http://www.srdc.metu.edu.tr/webpage/projects/satine/deliverables/D4.1.1.doc)). In
particular it built mapping tools for OWL constructs into UDDI and ebXML RS, in
particular ebXML class hierarchies. It studied how these semantic descriptions can
be leveraged in queries. The user interfaces permits to discover the Web Services
advertised in the SATINE P2P network using their semantic definitions
http://www.srdc.metu.edu.tr/webpage/projects/satine/publications/FreezedeChallenge
s.doc.
Discovery is intended to happen on both levels: that of concrete eTourism services

and of eTourism-related collections and registries. Few strategies for actually
federating those registries are defined, though.
9.3.2.2 CEN/ISSS eGovernment Focus Group and CEN/ISSS WS

eGov-Share
The CEN/ISSS eGov-Share Workshop

(http://www.cen.eu/cenorm/businessdomains/businessdomains/isss/workshops/wseg
ovshare.asp) was established in February 2008 with the aim to help designers and
developers of eGovernment systems and applications by developing approaches and
tools to facilitate the sharing of information across agencies and across borders.
The workshop produces specifications, guidelines and two practical demonstrators to

help designers and developers of eGovernment systems and services to be able to
exchange descriptions of eGovernment resources in the widest sense and to build
and maintain federated repositories that integrate resources – both services and data
stores – created and managed by several agencies creating a single point of access
to users.
Figure 9-3
Local registries are aggregated into larger registries that are often targeted at specific
user communities. Those aggregated registries can, of course, be further aggregated
into other registries still. All the while the origin of certain metadata sets remains fully
traceable through unique identifiers. Furthermore, each of the semantic descriptions
is addressable through normal URLs, making the overall architecture fully RESTful
and an ideal fit for Resource Oriented Architectures (ROAs) and SOAs alike.
In the overall framework of specifications the workshop first specifies a simple

domain-independent, Atom-based protocol for the exchange of semantic
descriptions. It continues with a reference ontology for eGovernment resources with
two representations, one in OWL and one in Topic Maps. While this ontology is
naturally domain specific, the overall architecture supports to plug in arbitrary other
domain reference ontologies e.g. for eTourism. Terminological resources, e.g.
“skosified” vocabularies and taxonomies such as Eurovoc, are used to provide
anchor points for value domains. Soft cultural elements help to heighten the
awareness of and information on culturally variable system factors.
Figure 9-4
The resulting multipart CWA is currently out for open consultation and consists of the
following parts:
 CWA Part 0: Introduction

 CWA Part 1a: Reference Ontology and Metadata Schema
 CWA Part 1b: Protocol for the Syndication of Semantic Descriptions
 CWA Part 2: Federated Terminological Resources
 CWA Part 3: Establishment of a set of Soft Cultural Elements
 CWA Part 4: Evaluation and Recommendations
Future work may add specifications for the organizational arrangements especially in
the eGovernment domain.

9.3.3.1 Shortcomings of current registry standards
Neither UDDI nor ebXML registries have been well received in the market place. This
is due to a number of serious shortcomings that affect those registries:
1. Both specifications essentially build on a fixed ontology with corresponding

data formats for registry entries. This ontology is non-trivial to extend for other
requirements (the ebXML Registry model being more flexible than UDDI).
2. Both specifications are overly long and complex.
3. Both are bound to a single technologies stack, SOAP-based web services.
4. Both standards lack a well-defined and simple data exchange storage format.
5. Especially UDDI clearly implies a specific set of procedures and organizational
environments in which to operate.
6. UDDI has insufficient support for linking up registries. The federation support
in ebXML is heavily Web Service based and difficult to implement.
Attempts such as SATINE to build ontology constructs into the registries further
complicate the specifications and have seen little adoption in practice.
In short, UDDI and, to a lesser extent, the ebXML Registry Specification, meshes
three important, but orthogonal concerns that should be kept apart:
 an information model for registry entries,

 a specific technical interface to the registry, and
 organizational procedures for maintaining the registry data.
9.3.3.2 Future needs
In line with the recommendation of the CEN/ISSS eGovernment Focus Group to

“build domain registries in response to the needs of individual business cases and to
construct them out of existing, standardized technologies” (section 1.2.2) we need to
design federated registries in the eTourism domain that are built on a suitable
information model – possibly in line with models used in semantic interoperability. At
the same time we should align with trends on technical interfaces – both notification
and exchange formats – to those registries that are laid by the CEN/ISSS eGov-
Share Workshop.
Much of the eGov-Share architecture lends itself ideally to this adoption, provided
that a reference ontology for eTourism-related resources is developed.
CEN/ISSS WS/eTOUR – CWA – 200
2009-06-03 – 113
Figure 9-5
The “watchtower” registry of relevant eTourism standards (cf. recommendation

6.1.4.1) lends itself to be the test case also for developing collaboration models for
shared eTourism registries for those registries. Once this prototype is in operation,
plans for the long-term
term operation of that registry must be elaborated that, again, can
be exemplary for other registries in the domain.
9.3.4.1 Short-term
term recommendations (1–3
(1 years)
 Develop a reference ontology for eTourism-related

eTourism related resources.
 Build the “lighthouse” registries (cf. other recommendations) based on the
syndication specifications standardized in WS eGov/Share.
eGo
 Specify collaboration models for shared eTourism registries for those
registries.
 Plan for the long-term operation and business models for the “watchtower”
registry.
10 Object identification
10.1 Needs and requirements
10.1.1 Introduction
Until recently and still a standard practice, getting information or buying travel-related
products is performed via intermediaries (such as agencies) directly providing the
information and performing the bookings on dedicated systems, possibly vendor-
specific systems. As introduced in the case study, the use of internet for travel-
related searches and online shopping is increasing and already widely accepted.
Multiple sources of information are available, proposing single products (like hotels,
car rentals, events, etc.) or complex packaged products comparing or aggregating
information from different sources and becoming sources themselves.
Identifying identical items (like the same hotel with similar names from different sites),
comparing information on different items (such as room or price definitions), merging
or filtering similar information from different sources (such as getting information on
Baleares sometimes searching a Spanish region and sometimes directly Baleares) is
next to impossible in the current situation.
10.1.2 Needs
In this chapter the basic needs for unique identifiers for tourism products or services
are discussed.
Travel being a change in location, precisely identifying geographical locations is a

basic need for tourism. Identification mechanisms should allow searches and
identification as well as geopositioning on maps, being a growing tool used on the
web in relation with travel. In a more general context each travel service should have
unique identifiers so as to allow cross references between the various sources,
reliable comparisons and data aggregation. That would cover hospitality items,
events, animations, activities, historical sites, exhibitions, museums, etc. In a world of
open architecture technology, that would allow matching data from different sources
without time consuming transcoding data being built, therefore reducing the amount
of translations, allowing more efficient querying and reduced time to resolve query
issues like cache synchronization. That would also allow building extensive
knowledge bases compiling different data sources (matching hotel, chain and
testimony sites for instances, completing with regional, historical or event site
information, etc.).
Certain aspects of each type of service further require unique identification so as to

remove ambiguity of definition and to allow comparisons. For instance, when buying
a stay in a hotel, the type of room (a double or a triple room) becomes major.
However at present, it is not necessarily clear what a double room would be (what is
the size of the bed, is there one or two beds, can an extra bed be used, for a child or
even an adult, etc.). For another component, other features may be crucial and
should be correctly identified.
On a different level, to track the different intermediaries introduced in the previous

case study and possibly to allow compensations for services rendered by different
entities in the whole travel (pre trip, on trip and post trip), tagging individual entities
such as central reservation systems, credit card companies, GDSs, web sites,
wholesalers, travel agencies, chains, etc. would greatly facilitate commission and
money collection.
10.1.3 Requirements
This chapter outlines the different requirements that may be deduced from the
previously exposed needs. Being able to uniquely identify objects corresponds to
building taxonomies for certain domains or ontologies, some of them being
mentioned in the following chapters. More information may be found in the taxonomy
chapter of this document.
10.1.3.1 Location codes
Unique precise exhaustive location codes are a basic requirement for the travel
industry. Location coding should not be limited to general codes such as countries,
cities or airports. Online information and booking facilities becoming widespread, it is
now required to be able to associate codes to all levels of locations that can be used
in a travel, such as
 touristic regions,
 terminals,
 stations (railways stations, ski stations, car rental pickup stations),
 points of interests,
 leisure, event or activity locations,
 etc.
The location codes are often directly used by the experts and become also more and
more visible to end users (on itineraries, on displays, in search forms, etc.).
Geodesic coordinates is also becoming vital information for searches (“What can I do
in the vicinity of my hotel?”, “What alternative hotel?”, etc.), to represent itineraries,
results, etc. However, it does not seem realistic for the geodesic coordinates to be
the unique coding mechanism, the coordinates being complex and in essence
corresponding to a point. What would therefore be a country coordinate?
10.1.3.2 Travel service codes
Travel products are always composed of a number of separate travel services

proposed by different vendors through a multitude of resellers. Some of those
companies have codes that are standardized (such as airline IATA codes), but most
have codes that depend on the vendor or distributor (for instance hotel codes are
different for each distributor so that you would have a different hotel code for the
same hotel in Sabre, Amadeus, Hotels.com, the hotel chains proposing the hotel and
the hotel Property management system itself).
Furthermore, more and more types of leisure, activities or travel related services are
being proposed and published on Internet, without any unique identification (and
classification). Unique identifiers for all those services are required to have a chance
to discover and aggregate data in an efficient way.
It seems however unrealistic at present to imagine a unique global entity providing

identification for all services worldwide, specific identifiers per country or per sector
would also be possible provided there is capacity to ensure uniqueness of codes.
10.1.3.3 Travel service qualifier codes
To compare or qualify each type of services, it is now more and more required to
have structured information based on universally accepted taxonomies. This
information must also be codified. For some services, like hotels or car rental, it is
more developed than for others, but it only corresponds to recommended
codifications and not true unique identifiers. For most services, codification is still
specific to each service provider.
In that case also, it seems unrealistic to have a unique body responsible for that type
of codification.
10.1.3.4 Travel company codes
The important level of intermediation and the quantity of different companies involved
in a selling process lead to complexity to explain pricing schemes, to unravel in
question of complaints, to proceed with payments. Adding traceability for each step in
the process is becoming an important requirement. That would imply unique
identifiers for each company involved in those processes, such as
 the end travel services introduced in the previous chapter,

 the wholesalers (hotel chains, tour operators),
 the distributors such as travel agents, online companies,
 the intermediaries (central reservation systems, GDSs, switch companies,
 the compensation, commission processing or payment processing companies,
 the call centres,
 etc.
10.2 State of the art

In this section, we review commonly used codes. There are other bodies producing
codes that either associate other codes to regions, cities, countries or provide local
codes (for instance for all cities in a country, postal codes, etc.). Those have not been
reviewed, though they could very well be used to designate more locations, not
linked with airports for instance.
10.2.1 IATA
The IATA codes are the first codes that come to mind in the travel industry, because
they are used for airports, airline companies, etc.
 IATA Airport Codes: alpha-3 Codes. The IATA alpha-3 airport codes uniquely
identify individual airports worldwide. They are made up of precisely three
letters; numerals are not allowed. In fact those codes have been expanded to
also contain city codes in case a city has more than one airport, as well as
coach, rail or ferry locations if requested by an airline or CRS. For instance
TGV railway stations usually have IATA codes because TGV are used as
feeders for the airlines. It therefore becomes truer to define IATA codes as
location codes used in travel rather than only airport codes. Except for cities,
the codes correspond to transportation boarding locations and not really to
stay or service oriented locations. Drawbacks of IATA airport codes are the
fact that they cannot be much extended to include all locations required for the
travel industry.
 IATA Airline Code: officially an alphanumeric-3 codes as well as pure numeric
codes (used for ticketing for instance). They were initially an alphanumeric-2
code which are the codes that are mainly used. The alphanumeric-2 codes are
used in combination with others in ticket numbers, timetables, tariffs, etc.
Codes are also allocated to railway or coach companies, whenever requested
by airlines or GDSs. There are also codes that are reused for different airlines,
whenever their destinations are not likely to overlap! Codes allocated to
airlines that discontinue business would be reused after six months.
 IATA Agency codes: Numeric codes: IATA is pivotal in the worldwide
accreditation of travel agents issuing airline tickets with exception of the USA,
where this is done by the Airlines Reporting Corporation. Permission to sell
airline tickets from the participating carriers is achieved through national
member organizations. As a consequence, there are agencies that would not
have IATA numbers which have lead to alternative solutions according to
countries, allocating Pseudo IATA numbers in some cases (such as SNCF
issuing agencies in France that are not IATA).
There are also less used IATA codes such as baggage tag issuers, delay codes,
accounting prefix codes, logistics company codes, etc.
10.2.2 ICAO
 ICAO airport codes: The ICAO (International Civil Aviation Organization)
alpha-4 airport identifier codes uniquely identify individual airports worldwide.
They are used in flight plans to indicate departure, destination and alternate
airfields, as well as in other professional aviation publications. Usually, the first
two letters of ICAO codes identify the country (but do not correspond to ISO
country codes). In the continental USA, however, codes normally consist of a
‘K’ followed by the airport’s IATA code.
 ICAO airline designator: The ICAO airline designator is a code assigned by the
International Civil Aviation Organization (ICAO) to aircraft operating agencies,
aeronautical authorities and services. The codes are always unique by airline.
There are ICAO codes for companies that have no correspondence with IATA
codes.
10.2.3 ISO
A number of ISO standards are used on a regular basis in the travel industry:
 Country codes, ISO 3166-1 alpha-2, alpha-3 and numeric. ISO 3166-1, as part
of the ISO 3166 standard, provides codes for the names of countries and
dependent territories, and is published by the International Organization for
Standardization (ISO). Some codes are in fact regions and not countries (such
as MQ for Martinique, part of France), therefore leading to some confusion (“Is
FR only the mainland or the whole of France?” for instance). Alpha-2 codes
are more often used, alone or in combinations.
 Region zones ISO 3166-2 alphanumeric codes. ISO 3166-2 is the second part
of the ISO 3166 standard published by the International Organization for
Standardization (ISO). It is a geocode system created for coding the names of
country subdivisions and dependent areas, such as regions, states,
departments, etc., depending on countries. They usually correspond to
administrative zones.
 Language codes: ISO 639-1. Although alpha-2 codes are not sufficient to code
all languages, this is sufficient in most cases. In case there is a need to
expand, ISO 639-2 or ISO 639-3 could be used. In some cases, when local
variations of the languages are important, the ISO 3166-2 country code is
used in association with the language code (such as fr-FR and fr-CA).
 Currency codes: ISO 4217. The first two letters of the code are the two letters
of ISO 3166-1 alpha-2 country codes and the third is usually the initial of the
currency itself. In some cases, the third letter is the initial for “new” in that
country’s language, to distinguish it from an older currency that was
revaluated; the code often long outlasts the usage of the term “new” itself.
10.2.4 UN/LOCODE
The United Nations Code for Trade and Transport Locations is commonly more
known as UN/LOCODE. Although managed and maintained by the UNECE, it is the
product of a wide collaboration in the framework of the joint trade facilitation effort
undertaken within the United Nations.
Each code element consists of five characters, where the two first indicate the
country (according to ISO 3166-1) and the three following represent the place name.
Examples such as CHGVA, FRPAR, GBLON, JPTYO and USNYC ring bells for air
travellers who are used to see the three last letters of these codes on their luggage
tags. UN/LOCODE picks up the IATA location identifiers wherever possible, to benefit
from their association value and to avoid unnecessary code conflicts. In allocating
codes, the secretariat tries to find some mnemonic association link with the place
names, to aid human memorization. This is of course increasingly difficult for large
country lists where the 17576 permutations of three letters are near exhaustion.
Each code is also associated to different additional information, among them

(possibly multiple) function(s) such as airport, harbour, railway station, road terminal,
etc.
The position of this additional coding mechanism is interesting because
 it is based on existing and accepted standard (IATA codes whenever possible,

ISO 3166-2 country codes);
 it expands the code list following the same structure and methodology;
 it takes into account the human use of the codes, facilitating mnemonic
associations.
10.2.5 HEDNA
HEDNA is an international association focused on identifying distribution
opportunities and providing solutions for the lodging industry and its distribution
community. HEDNA compiles codes for instance for hotel chains, room types, etc.,
so as provides list and codes of conducts on how to use lists.
HEDNA also works on a project to provide global unique identifiers.
10.2.6 ACRISS
ACRISS Members utilize an industry standard vehicle matrix to define car groups
ensuring a like to like comparison of standards across countries. This easy-to-use
matrix consists of four categories. Each position in the four character vehicle code
represents a definable characteristic of the vehicle. The expanded vehicle matrix
makes it possible to have 400 vehicle types.
This coding system has been adopted to ensure that all ACRISS members display
the same coding for the same vehicles, enabling you to make an informed decision
when comparing rates.
This certainly facilitates understanding what type of vehicle being rented though
many surprises can still happen, even within ACRISS members.
ACRISS does not actually provide standardization for all car rental related data; for
instance car rental stations are not standardized, nor are opening hours.
10.2.7 GIATA
GIATA acquires and standardizes (normalizes) the digital image and text data for
many tour operators and travel agencies such as TUI, Thomas Cook, Easyjet,
Expedia, Opodo or Lastminute.com. They are also used by all well-known CRS/GDS
(Amadeus, Sabre, Galileo/Worldspan) to provide decoding information based on a
unique identifier present in those GDS.
GIATA is not a global standardization body but it has compiled enough data to
become de facto a “standard” source of information, their identifier becoming the
identifier. It is not completely true though since it is not globally used, nor even used
by the hotel owners.
10.2.8 GS1
The GS1 System is an integrated system of global standards that provides for
accurate identification and communication of information regarding products, assets,
services and locations. It is the most implemented supply chain standards system in
the world.
GS1 Identification Keys automatically identify things such as trade items, locations,
logistic units, and assets in a unique way worldwide. They can be used on bar codes,
in online transactions, for selling or synchronization processes, etc.
Though this identification scheme is not used at present in a systematic way in the
travel industry, it is applied in many other trades in a successful manner and could
therefore be easily expanded to the travel trade.
GS1 operates in multiple sectors and industries and already works in close relation
with many corporations throughout the world as well as various standardization
bodies such as
 International Organization for Standardization (ISO),

 UN/EDIFACT,
 GCI (Global Commerce Initiative),
 ISBN (International Standard Book Number), and
 ISSN (International Standard Serial Number).
10.2.9 URI
Since we are reviewing methods to obtain unique identifiers, the W3C provides a
means for globally unique identifiers: URIs. Uniform Resource Identifier (URI) is a
compact string of characters used to identify or name a resource on the Internet. The
main purpose of this identification is to enable interaction with representations of the
resource over a network, typically the World Wide Web, using specific protocols.
URIs could be used in the travel industry in a systematic way, but they have major
drawbacks such as
 not being short,

 requiring registration (and therefore money),
 not really providing standard naming conventions.
10.2.10 UUID
Universally Unique Identifier (UUID) is an identifier standard used in software
construction, standardized by the Open Software Foundation (OSF) as part of the
Distributed Computing Environment (DCE). The intent of UUIDs is to enable
distributed systems to uniquely identify information without significant central
coordination. Thus, anyone can create a UUID and use it to identify something with
reasonable confidence that the identifier will never be unintentionally used by anyone
for anything else. Information labelled with UUIDs can therefore be later combined
into a single database without needing to resolve name conflicts.
Though not directly applied in the tourism industry, since it is technically oriented,
UUIDs are interesting in the sense that they do not require a centralised body for
validation (though repositories or registries would be useful). UUID keys are still not
directly usable due to their inherent complexity.
10.3 Gaps and future needs

In this section the gaps present in the current identification schemes are outlined as
well as future needs.
10.3.1 Location
In the previous chapters we have seen that various associations and organizations
propose location identifiers. However, there is currently no worldwide identification
standard that can uniquely identify and provide information about entities within the
travel industry.
10.3.1.1 Country codes
There is mainly consensus around the country codes (though several coding
schemes exist). The ISO 3166 standard is very widely used and even incorporated in
other standards (like UN codes). However, the alpha-2 codes are mostly used,
limiting the migration to alpha-3 codes. That may hinder extending the codes.
Some “country” codes are also allocated to regions of certain countries or even part
of the world that are bigger than countries (like EU for the European Union, MQ for
Martinique). This most likely comes from the need to have travel-oriented zones that
often coincide with countries, but not always. At present this is not done in a
systematic way (there is no code for Corsica or Baleares for instance). There is a real
need to differentiate touristic “zones” with political countries or areas.
10.3.1.2 Region codes
There is less consensus here. The ISO subdivisions of countries are less widely used
because they are less matching the travel industry needs.
 There is a need to provide travel specific regions, that do not really map
political or administrative boundaries (cruise regions at sea, ski regions (or
mountains)) that are present on several countries, specific touristic regions
that may be within a country or across countries (Mediterranean region, the
south of France, Sardinia, Balearics, La Réunion, etc.).
 Some countries have several levels of subdivision and the current ISO codes
only take into account one level (like the French departments but not the
French regions for which a local coding is used, some codes being identical to
the ISO sub regions, but with different meaning though).
 Some travel companies are also specialized on certain domains (like diving,
hunting, etc.) and they also require specific regions related to their specialty.
There is no way to submit such regions in order to create a global repository.
There should be a mechanism to submit and validate such codification
because that would allow better understanding of offers which are at present
difficult to compare.
10.3.1.3 City, airport and other point of travel codes
IATA, though widely accepted provides a number of incomplete identifications (city,

airport, railway stations, etc.) without differentiation or identification of the types of
locations.
New codes are added only in relation with airline related business without systematic
coding processes.
Furthermore, alpha-3 identification is far too limited to code travel related

identifications.
Those codes are still widely used and a global coding process should allow their
integration, at least for their original objectives (airport codes).
ICAO is also providing airport codes in a more neutral way, including non ISO country
codes. They tend to be used internally by airlines and airports, therefore using two
sets of codes. They tend to be specialized though and limited to airports.
All in all, airport codification is fairly well covered though cluttered. However, no
codification integrate terminal data and airport codes so that vendors often create
pseudo codes such as CD3 in lieu of CDG terminal 3, disrupting the initial IATA
codes.
Furthermore, travel destinations are not limited to airports or main cities (which are
covered by the IATA codes). Precisely defining cities in general, villages, stations
(airport terminals, ski, railway, car rental, coach, etc.), points of interest within cities
or outside, lieu-dits, etc. does not exist on a global scale and is a major issue for
eTourism.
There are several possible ways to move forward: either differentiate airports, railway
stations, cities and build identification schemes for each type of item. Or on the
contrary create a unique set of identifiers for points of travel.
The second approach corresponds to the historical approach where cities actually
inherited the codes of their airports and then sometimes differentiation occurred. That
seems logical because when travelling somewhere, location and airport is a very
similar notion (for the trip), except in case of multiple airports and airport
differentiation is in order.
IATA nor ICAO seem in a position to provide coding schemes. Integrating local postal
codes and possibly other codifications in a global identification process could speed
up the process. The UN has also initiated the same type of process, with stations,
harbours, etc., completing airport codes whenever possible.
10.3.2 Currency and language codes

Relevant ISO standards are already in place and there are no any major gaps,
except possibly defining precisely the notion of localization.
10.3.3 Travel service codes

IATA and preferably ICAO provide extensive identification for airlines. Car rental
companies, hotel chains are also identified by two or three identifiers. However, there
is no global unique identification for hospitality items, cruise companies, events,
animations, activities, services, restaurants, etc.
It is therefore impossible to have unique identifiers for each element of a trip and it is
therefore impossible to compare or even amalgamate information. In case such
identification was in place, there would then be a need to provide additional
qualification like understanding the rights of the source related to the content (like is
this first hand information, does the author have the right to create or distribute the
information, etc.).
Some organizations such as the HEDNA have such a project for hospitality services
or other specific services. Private companies in certain countries provide partial data
(such as GIATA in Germany). Private companies distributing content also provide
unique identifiers within their system which do not allow cross referencing.
10.3.4 Travel service qualifier codes

If we still want to refine the definition of travel related items, we would need to identify
rooms types, car types, facilities, staff credentials, etc.
Here again, nothing comprehensive really exists. Certain associations provide

recommendations or partial identification schemes and guidelines, without possibly
imposing a standard. For instance, there are coding recommendations for double
rooms (such as DBL or just D) but that does not inform on the true occupancy of the
room or its situation, view, comfort, etc.
Defining unique codes for travel services is very delicate because it touches
marketing or sales oriented information which is subjective and also requires many
details to allow precision. Actual codes are likely to be aggregates of different
information (such as room information, bed information, features, location, etc.).
10.3.5 Travel company codes

Finally, we expressed the importance of being able to track each company in a travel
related booking or data exchange process. There is presently no global service
providing unique identifiers for vendors, distributors, central reservation systems,
cruise companies, commission payment systems, tour operators, etc.
10.4 Recommendations
10.4.1 Short-term recommendations (1–3 years)
 Build a registry of present object identifications in the tourism industry.
 Develop travel related global geography identifiers.
 Integrate the global geography identifiers in the registry and build transcoding
capability.
 Develop travel company related global identifiers.
10.4.2 Long-term recommendations (3–10 years)

 Provide recommendations for travel service coding schemes.
 Build transcoding capacity over the above mentioned repository to transform
the registry into a thesaurus.
11 Best practice case

11.1 The starting point
The objective of the best practice case is to instantiate a real case study
demonstrating the future scenario of main issues of data and process interoperability,
based on an existing business case. The CWA results shall be discussed based on
this case to have them witnessing the feasibility of what the CWA states.
We have selected an existing eTEN project, which joined the workshop as a member
and was also present with key-note speakers during one of the workshop meetings in
Berlin. The project called “euromuse.net” comes not from the core tourism domain,
but from the tourism driver cultural heritage. The project improves an existing
platform to offer services and exhibition data to the tourism industry and wants to
bridge the existing gap between cultural heritage and the tourism industry. It actually
faces the same problems as discussed in the workshop and has an appropriate data
mediation solution in use to show the way recommended in general by this CWA to
overcome the interoperability problems. It uses Harmonise 2.0 to integrate data from
100s of Europe’s top museums and provide this aggregated information to the variety
of players of the tourism industry. And of course there is a strong need for a cost-
effective and easy-to-use solution, since museums usually do not have large IT
departments, if any at all.
euromuse.net has been identified as a very good starting point for discussions of the
issue and to demonstrate a real live system, which could otherwise not really be
implemented easily within the course of the CEN workshop. It allows to make a real
demonstration and to discuss the issues presented in this document based on the
system in use.
11.2 The existing case of euromuse.net

People interested in exhibitions and museums depend on access to information,
which – in most cases – is only available spending great effort on a rather complex
and scattered market as such bundled data from museums on a supranational and
multilingual level is difficult to access. euromuse.net is a public access portal
providing multilingual information on museums and their exhibitions throughout
Europe.
euromuse.net offers both, a ‘one-stop’ web tool to the greatest exhibitions in Europe
for the public as well as a special data interface called Harmonise to deliver
structured data from the museums for the tourism sector. The euromuse.net project
will deploy an existing online service, which provides multilingual information about
temporary exhibitions and museums as well as other museum resources on a web
platform, to develop a wider pan-European data-collection based on public sector
information to be re-used by different actors in the cultural and tourism fields. The
project aims at three main goals:
1. Improve and increase the existing platform, a website offering museum and
exhibition information to the general public for free.
2. Integrate the museums’ information of the euromuse.net database with the
Harmonise tools. Through this integration euromuse.net’s rich content will
affiliate with the online offers of other European and national tourism and
marketing services for culture.
3. Enhance the existing services to integrate information on scientific publications
from museums and to expand the current services, which provide an overview
of “virtual” museums and their (online) resources.
The main focus is to improve the connection between existing marketing and
promotion channels of the tourism industry and the cultural sector over the
euromuse.net database. A general idea of the euromuse.net project is to better
connect the museum sector with relevant target groups in the tourism sector– both on
a professional and on a non-professional or private level. euromuse.net services will
support and strengthen existing connections between the general public interested in
museums and exhibitions, the professional tourism sector and museum.
The service will help to create easily accessible information about exhibitions and
museums all over Europe. This takes place by offering the information on three
complementing services: On the website http://www.euromuse.net/, mainly for the
general public and accessible for free, via tools for structured data exchange with
databases of tourism industry and other tourism players and on a scientific literature
database of museum publications, mainly for researchers and museum staff. The
tools for data exchange will enable representatives of the tourism industry and
services to organize personalised tourism packages for their customers through the
service.
Because the requests of industrial and private users normally differ, the project offers
special access for tourism industry users besides the euromuse.net website. Special
search strings and precise queries to the euromuse.net database allow optimized
preparation of organized trips. Industrial users will receive structured and xml
formatted data on a special export from the euromuse.net database. The commercial
users of this functionality will be requested to pay a contribution for this service
provided.
11.3 Future scenario for euromuse.net

The current setup of euromuse.net is sufficient to collect exhibition information from
hundreds of partners with different data models, and to pool this information in a
central repository via the Harmonise service. This data can be searched by project
external partners, most of all tourism organizations, to get up-to-date information
about exhibitions all over Europe. This is again done via the Harmonise service, so
the tourism partners do receive the data in their own data format and can feed their
data bases easily.
This follows very much the approaches recommended in this CWA to overcome the
data interoperability problem. However, here the current setup ends leaving up
some issues open, which have also been discussed in the topics of this CWA. Some
of them should be deployed in euromuse.net in the future, some of them are still not
easy to solve.
Following the order of this document, the first issue is the process handling. Most
museums do not have a system to allow online ticket purchasing, but they might have
soon or later. Online buying of tickets will therefore become an issue, also because
travel agencies might wish to bundle services together dynamically to sell a full travel
package to the client comprising also exhibitions. Process handling would be
principally possible easiest by a stateless way of managing processes, handling it
only by exchange of data. Process mediators are currently being developed in
applied ICT-sciences and might offer an improved solution on the longer run (these
process mediators work similarly to the data mediator Harmonise).
Meta search is the next topic and in some sense euromuse.net is already a meta-
search repository, since it is aggregating data from different sources and makes it
available for search queries. When currently querying data on the euromuse.net data
base, a fixed query string or query rules have to be used, since no proper solution
could be found to handle different query strings in a flexible and generic way. In the
future, it should be possible also to map different queries to run one query
simultaneously on a larger number of instances, which all might have a different
query language. This shows the need for interoperable query languages but also the
need for registries, in order to find the data instances that should be searched.
Clearly, there is the need for some meta information about where to search, because
searching any data base in the world to get a certain set of data is inefficient if not
impossible. Thus, reliable registries directing search queries to potential data sources
would significantly improve search efficiency.
And even if you search various data bases and retrieve a large number of results
(let’s say exhibitions in the case of euromuse.net) you do not automatically know how
many exhibitions are represented several times in the data sets retrieved. Thus,
object identification is the last of the topics, which are covered by this CWA and are
also a future enhancement of euromuse.net. If all exhibitions, museums and locations
can be identified automatically, then it is possible to clean the data base from multiple
entries of the same object automatically. At the moment the issue is open in
euromuse.net, since the number of sources is manageable and the probability, that
one exhibition is reported by two museums, is very low. However, this might rise
significantly and quickly when the network grows.
11.4 Critical discussion

The discussion above showed that even in this rather small scenario, where the
business case and the players can be overlooked easily, all the topics touched in the
CWA are relevant issues. Even if the project comes originally from the cultural
heritage sector, it has strong links with the tourism industry, maybe stronger than we
might perceive when looking at it at the first time. This more “extra-orbital” issue of
exhibitions might also make it easier to see the questions and answers raised in this
document, since it is less bound to the daily topics of hotels and flights (and of course
other products more in the core of tourism than exhibitions). Nevertheless, even
coming from the outer sphere of tourism, it is of deep relevance to tourism in Europe.
It is easy to realise that the topics are exactly the same for exhibitions as they are for
accommodation. euromuse.net therefore demonstrates nicely how all of the issues
can be solved also on a global scale. The same technology and setup for mediating
data and processes can be used for any other object, like accommodation, flights, car
rentals, events, etc.
After all, one important issue remains unanswered, since it is out of scope of the
interoperability issue: Although you could exchange all the data smoothly, identify
data sources easily, understand the content and also run processes for bookings -
how to assure data quality? How to make sure a time table (opening hours, flight
schedules) is correct or the price quotes are valid? Quality of service and user
acceptance will depend very much on data quality. In euromuse.net it is discussed to
have users involved to report back quality of information. Maybe the involvement of
users (user generated content) is a reliable source for estimation of data quality. But
although this topic is an important one, it is not part of this CWA about data and
process interoperability.
12 Bibliography and references

The following is a list of documents and web sites other than referenced European
and International Standards, which are listed in chapter 2 (“Normative references”).
[Adam, Hofer, Zang, et al, 2005] Otmar Adam, Anja Hofer, Sven Zang, Christoph
Hammer, Mirko Jerrentrup, Stefan Leinenbach: “A Collaboration Framework for
Cross-enterprise Business Process Management”. In: Panetto, Hervé (Hrsg.):
Interoperability of Enterprise Software and Applications – INTEROP-ESA’2005.
Geneva, Schwitzerland, February 23–25, 2005, Technical Sessions, 2005, p
499-510
[Addis, Boniface, Goodall, et al, 2003] M. Addis, M. Boniface, S. Goodall, P.
Grimwood, S. Kim, P. Lewis, K. Martinez, A. Stevenson: “SCULPTEUR:
Towards a new paradigm for multimedia museum information handling”, In:
Proceedings of the Second International Conference on Semantic Web, p 582-
596, 2003
[Addis, Stevenson, 2002] M. Addis, A. Stevenson: D6.2 Impact on World-Wide
Metadata Standards, Deliverable report of ARTISTE project, 2002
[Adrian, Sauermann, Roth-Berghofer, 2007] B. Adrian, L. Sauermann, T. Roth-
Berghofer: “ConTag: A semantic tag recommendation system”. In: Proceedings
of I-Semantics ’07, p 297-304, 2007
[Advanced Distributed Learning] http://www.adlnet.gov/
[Agent Link] http://www.agentlink.org/
[Ahern, King, Naaman, et al, 2007] S. Ahern, S. King, M. Naaman, R. Nair, J.H.I.
Yang: “ZoneTag: Rich, Community-Supported Context-Aware Media Capture
and Annotation”. In: Proceedings, MSI workshop CHI2007, San Jose, Calif,
2007
[AICC] Aviation Industry CBT Committee, http://www.aicc.org/
[Amadeus] http://www.amadeus.com/
[Amann, Fundulaki, 1999] B. Amann, I. Fundulaki: “Integrating Ontologies and
Thesauri to build RDF Schemas”, ECDL Research and Advanced Technologies
for Digital Libraries, p 234-253, 1999
[ANSI] American National Standards Institute, http://www.ansi.org/
[ArguGRID] http://www.argugrid.eu/
[Aristotle] Aristotle: Metaphisics Book IV,
http://classics.mit.edu/Aristotle/metaphysics.4.iv.html
[Arnarsdóttir, Berre, Hahn, Missikoff, Taglino] K. Arnarsdóttir, A.-J. Berre, A. Hahn, M.
Missikoff, F. Taglino: Semantic Mapping: ontology based vs. model based
approach. Alternative or complementary approaches?, ftp://ftp.informatik.rwth-
aachen.de/Publications/CEUR-WS/Vol-200/17.pdf
[ARTEMIS] http://www.srdc.metu.edu.tr/webpage/projects/artemis/
[ASG] http://asg-platform.org/cgi-bin/twiki/view/Public
[Aviation Industrie CBTI Committee] http://www.aicc.org/
[Baader, Horrocks, Sattler, 2003] F. Baader, I. Horrocks, U. Sattler: “Description
logics as ontology languages for the semantic web”. In: S. Staab, R. Studer,
eds: Lecture Notes in Artificial Intelligence, Springer Verlag, 2003
[Bailey, 1994] K.D. Bailey: Typologies and Taxonomies - An Introduction to
Classification Techniques, London, Sage Publications, Quantitative Applications
in the Social Sciences, 1994
[Barrasa, Corcho, Gómez-Pérez, 2004] J. Barrasa, O. Corcho, A. Gómez-Pérez:

R2O, an Extensible and Semantically Based Database-to-Ontology Mapping
Language. Second Workshop on Semantic Web and Databases (SWDB2004).
Toronto, Canada, August 2004
[Berners-Lee, Hendler, Lassila, 2001] Tim Berners-Lee, J. Hendler and O. Lassila:
“The Semantic Web”. In: Scientific American vol 284, no 5, p 34-43, May 2001
[Bikel, Miller, Schwartz, Weischedel, 1997] Daniel M. Bikel, Scott Miller, Richard
Schwartz, Ralph Weischedel: Nymble: a High-Performance Learning Name-
finder, 1997, http://xxx.lanl.gov/pdf/cmp-lg/9803003
[Biron, Malhotra, 2001] P.V. Biron, A. Malhotra (Eds): XML Schema Part 2:
Datatypes. W3C Recommendation, May 2001,
http://www.w3.org/TR/xmlschem-2/
[Bizer, 2003] C. Bizer: D2R MAP – A Database to RDF Mapping Language, The
twelfth international World Wide Web Conference, WWW2003, Budapest,
Hungary, 2003
[Bloehdorn, 2005] S. Bloehdorn, K. Petridis, C. Saathoff, N. Simou, V. Tzouaras,
Y.Avrithis, S. Handschuh, Y. Kompatsiaris, S. Staab, M.G. Strintzis, Semantic
annotation of images and videos for multimedia analysis, in: Proceedings of the
2nd European Semantic Web Conference (ESWC 2005), 29 May–1 June 2005,
Heraklion, Greece, 2005
[Borgida, An, Mylopoulos, 2005] A. Borgida, Y. An, J. Mylopoulos: Inferring Complex
Semantic Mappings Between Relational Tables and Ontologies from Simple
Correspondences. In: CoopIS, DOA, and ODBASE, OTM Confederated
International Conferences, Cyprus, Part II, volume 3761 of LNCS, p 1152-1169,
Springer, 2005
[Borthwick, 1999] Andrew Eliot Borthwick: A maximum entropy approach to named
entity recognition, New York University, 1999
[BREIN] http://www.eu-brein.com/
[Brunstein, 2002] Ada Brunstein, “Annotation guidelines for answer types”, BBN
Technologies, 2002, http://www.ldc.upenn.edu/Catalog/docs/LDC2005T33
/BBN-Types-Subtypes.html
[Campbell, Currier] L.M. Campbell, S. Currier, (31/10/00),
http://www.sesdl.scotcit.ac.uk/sellic_pres/sellic2.html
[Chandrasekaran, Josephson, Benjamins, 1998] B. Chandrasekaran, J.R.
Josephson, V.R. Benjamins: “Ontology of Tasks and Methods”, In: Proceedings
of 1998 Banff Knowledge Acquisition Workshop, 1998
[CIDOC CRM] The CIDOC Conceptual Reference Model, http://cidoc.ics.forth.gr/
[Cimpian, Mocan, 2005] Emilia Cimpian, Adrian Mocan: WSMX Process Mediation
Based on Choreographies, 1st International Workshop on Web Service
Choreography and Orchestration for Business Process Management, 2005
[DARPA] Defense Advanced Research Projects Agency, http://www.darpa.gov
[Davenport, 1993] Thomas Davenport: Process Innovation: Reengineering work
through information technology, Harvard Business School Press, Boston, 1993
[Davis, Van House, Towle, et al, 2005] M. Davis, N. Van House, J. Towle, S. King, S.
Ahern, C. Burgener, D. Perkel, M. Finn, V. Viswanathan, M. Rothenberg:
(2005). MMM2: mobile media metadata for media sharing, CHI ’05 extended
abstracts on Human factors in computing systems, April 02-07, Portland, OR,
USA, 2005, http://portal.acm.org/citation.cfm?id=1056910&dl=GUIDE
&coll=GUIDE&CFID=25341546&CFTOKEN=26292269
[de Laborda, Conrad, 2005] C.P. de Laborda, S. Conrad: Relational.OWL A Data and
Schema Representation Format Based on OWL. In Second Asia-Pacific
Conference on Conceptual Modelling (APCCM2005), volume 43 of CRPIT, p
89-96, Newcastle, Australia, 2005, ACS
[Dell’Erba, Fodor, Höpken, et al, 2005] M. Dell’Erba, O. Fodor, W. Höpken, et al,
“Exploiting Semantic Web Technologies for Harmonizing e-Markets”. In: IT&T
Information Technology & Tourism – Application – Methodologies –
Techniques, 2005
[DIP] http://dip.semanticweb.org/index.html
[Directive 90/314/EEC] Council Directive 90/314/EEC of 13 June 1990 on package
travel, package holidays and package tours
[Dodgeball] http://www.dodgeball.com/
[Dörr, 2003] M. Dörr: “The cidoc conceptual reference module: An ontological
approach to semantic interoperability of metadata”. AI Magazine 24(3) (2003),
75–92
[Dörr, Guarino, Fernández López, et al, 2001] M. Dörr, N. Guarino, M. Fernández
López, E. Schulten, M. Stefanova, A. Tate: “State of the Art in Content
Standards. OntoWeb Deliverable 3.1.”, Technical Report, 2001
[Dörr, Hunter, Lagoze, 2003] M. Dörr, J. Hunter, C. Lagoze: “Towards a core
ontology for information integration. Journal of Digital Information 4(1)” (2003)
[Dou, McDermott, Qi] D. Dou, McDermott, P. Qi: “Ontology translation by Ontology
Merging and Automated Reasoning”
[Dunieveld, Stoter, Weiden, et al, 2000] A.J. Dunieveld, R. Stoter, M.R. Weiden, B.
Kenepa, V.R. Benjamins: “WonderTools? A comparative study of ontological
engineering tools”, 2000
[Earley, 2005] S. Earley: Resolving Taxonomy Challenges and Information
Architecture Conflicts, 2005 http://www.dama-nj.org/presentations/
Seth%20Earley%20Taxonomies%20May%2012%202005%20(DamaNJ).pdf
[eBusiness W@tch Report 2006/2007] eBusiness W@tch Report 2006/2007,
http://www.ebusiness-watch.org/key_reports/documents/EBR06.pdf
[ebXML] eBusiness XML, http://www.ebxml.org/
[Echarte, Astrain, Cordoba, Villadangos, 2007] F. Echarte, J.J. Astrain, A. Cordoba,
J. Villadangos: Ontology of Folksonomy: A New Modelling Method. Proceedings
of the Semantic Authoring, Annotation and Knowledge Markup Workshop
(SAAKM2007), British Columbia, Canada, Vol-289, 2007,
http://ftp.informatik.rwth-aachen.de/Publications/CEUR-WS/Vol-289/p08.pdf
[ESP Game] http://www.espgame.org/
[ETSI] European Telecommunications Standards Institute, http://www.etsi.org/
[euromuse] http://www.euromuse.net, http://www.euromuse-project.net
[Expedia] http://www.expedia.com/
[Fabian, 1975] J. Fabian: “Taxonomy and Ideology: On the Boundaries of Concept
Classification”. In: M. Kinkade (ed), Linguistics and Anthropology, Lisse, p 183-
197, 1975
[Facebook] http://www.facebook.com/
[Flickr] http://www.flickr.com/
[Fodor, Werthner, 2005] Oliver Fodor, Hannes Werthner: Harmonise: a step toward
an interoperable e-tourism marketplace. In: International Journal of Electronic
Commerce, Winter 2004-5, Vol 9, No 2, p 11-39, 2005
[Freyer, 2006] Freyer, Walter: Tourismus: Einführung in die
Fremdenverkehrsökonomie, 8th revised ed, München : Oldenbourg, 2006
[Fuxman, Hernández, Ho, et al, 2006] A. Fuxman, M.A. Hernández, H. Ho, R. Miller,
P. Papotti, L. Popa: Nested Mappings: Schema Mapping Reloaded. Proc. VLDB
2006 Conf., p 67-78, Seoul, Korea, 2006
[Garshol, 2004] L.M. Garshol: Metadata? Thesauri? Taxonomies? Topic Maps!
Making Sense of it all, Journal of Information Science, 2004
[Gennari, Musen, Fergerson, et al, 2002] J. Gennari, M.A. Musen, R.W. Fergerson,
W.E. Grosso, M. Crubezy, H. Eriksson, N.F. Noy, S.W. Tu: The Evolution of
Protégé: An Environment for Knowledge-Based Systems Development,
Technical Report SMI-2002-0943, 2002
[Ghawi, Cullot] R. Ghawi, N. Cullot: Database-to-ontology Mapping Generation for
semantic interoperability
[Gilchrist, 2003] A. Gilchrist: Thesauri, taxonomies and ontologies - an etymological
note. Journal of Documentation, 2003, 59 (1), p 7-18
[Goodall, Lewis, Martinez, et al, 2004] S. Goodall, P.H. Lewis, K. Martinez, P.
Sinclair, F. Giorgini, M.J. Addis, M.J. Boniface, C. Lahanier, J. Stevenson:
“SCULPTEUR: Multimedia Retrieval for Museums”, CIVR 2004, LNCS 3115, p
638-646, 2004
[Grishman, 2003] Ralph Grishman, “Information Extraction”. In: The Oxford
Handbook of Computational Linguistics, ed. R. Mitkov, Oxford University Press,
2003
[Grosof, Horrocks, Volz, Decker, 2003] B.N. Grosof, I. Horrocks, R. Volz, S. Decker:
Description logic programs: Combining logic programs with description logic. In
Proc. of the Twelfth International World Wide Web Conference (WWW 2003), p
48-57, ACM, 2003
[Grossman, 2004] Grossman, David: Confusion is the star of hotel rating systems,
http://www.usatoday.com/travel/columnist/grossman/2004-03-05-
grossman_x.htm
[Grove, 2003] A. Grove: Taxonomy. In: Encyclopedia of Library and Information
Science, p 2770-2777, New York, Marcel Dekker Inc, 2003
[Gruber, 1993a] T.R. Gruber: “A translation approach to portable ontology
specifications”, Knowledge Acquisition, Vol 5, 1993
[Gruber, 1993b] T.R. Gruber: “Towards Principles of the Design of Ontologies Used
for Knowledge Sharing”, International Journal of Human Computer Studies, Vol
43, p 907-928, 1993
[Gruber, 2005a] T. Gruber: Ontology of Folksonomy: A Mash-up of Apples and
Oranges, AIS SIGSEMIS Bulletin, 2005
[Gruber, 2005b] T. Gruber: TagOntology, a way to agree on the semantics of tagging
data, 2005
[GS1] http://www.gs1.org/
[Guarino, Giaretta, 1995] N. Guarino, P. Giaretta: “Ontologies and knowledge bases.
Towards a terminological clarification”, Towards Very Large Knowledge Bases.
Ed IOS Press, p 25-32
[GUID] http://en.wikipedia.org/wiki/GUID
[Gulli, Signorini, 2005] A. Gulli, A. Signorini: Building an open source meta search
engine [WWW2005]
[Haase, Wang, 2007] P. Haase, Y. Wang: “A decentralized infrastructure for query
answering over distributed ontologies”. In: Proceedings of the 2007 ACM
Symposium on Applied Computing (Seoul, Korea, March 11-15, 2007). SAC
’07. ACM, New York, NY, p 1351-1356,
http://doi.acm.org/10.1145/1244002.1244294
[HarmoNET] The Harmonisation Network for the Exchange of Travel and Tourism
Information, http://www.harmonet.org/
[HEDNA] http://www.hedna.org/
[Heflin, 2001] J. Heflin, J. Hendler, “A portrait of the Semantic Web in action”, IEEE
Intell. Syst. 16 (2) (2001), p 54–59
[Hempel, 1965] C.G. Hempel: “Fundamentals of Taxonomy”, p 137-154. In: C. G.
Hempel: Aspects of scientific explanation and other essays in the philosophy of
science, New York, The Free Press, 1965
[Hepp, Leymann, Domingue, et al, 2005] Martin Hepp, Frank Leymann, John
Domingue, Alexander Wahler, Dieter Fensel: Semantic Business Process
Management: A Vision Towards Using Semantic Web Services for Business
Process Management, Proceedings of the IEEE ICEBE. 2005
[Höpken, 2004] Wolfram Höpken: Reference Model of an Electronic Tourism Market
(IFITT RM), Version 1.3, 2004,
http://www.rmsig.de/documents/ReferenceModel.doc
[Hull, 1998] D.L. Hull: Taxonomy. In: Routledge Encyclopedia of Philosophy, Version
1.0, London, Routledge, 1998
[Hunter, 2002] J. Hunter: “Combing the CIDOC CRM and MPEG-7 to describe
multimedia in museums”, In: Proceedings of Museums on the Web 2002
Conference, Boston, 2002
[IATA] http://www.iata.org/, http://en.wikipedia.org/wiki/IATA
[IEEE] Institute of Electrical and Electronics Engineers, http://www.ieee.org
[IFITT] International Federation for IT and Travel & Tourism, http://www.ifitt.org/
[IFLA] International Federation of Library Associations and Institutions,
http://www.ifla.org/
[ISO] International Organization for Standardization, http://www.iso.org/; for
references to ISO standards see also chapter 2 “Normative references”
[ISO 3166] http://www.iso.org/iso/country_codes.htm,
http://www.iso.org/iso/fr/country_codes.htm
[ISO/IEEE 11073] Health informatics — Point-of-care medical device
communications (multiple parts)
[ISO 21127:2006] Information and documentation — A reference ontology for the
interchange of cultural heritage information
[IST] Information Society Technologies, http://cordis.europa.eu/ist/
[ITU] International Telecommunication Union, http://www.itu.int
[Iurgel, 2004] I. Iurgel: From another point of view: art-E-fact, In: Proc. TIDSE’04
(2004) vol 1, p 26-35
[Kalfoglou, Schorlemmer, 2003] Yannis Kalfoglou, Marco Schorlemmer: Ontology
mapping, the state of the art. Knowledge Engineering Review, 18(1), p 1-31,
2003
[Kim, Yang, Song, et al, 2007] H.L. Kim, S.K. Yang, S.J. Song, G.J. Breslin: “Tag
Mediated Society with SCOT Ontology”, Proceedings of the Semantic Web
Challenge 2007 in conjunction with the Sixth International Semantic Web
Conference, November 11-15, Busan, Korea, 2007
[Knerr, 2006] T. Knerr: Tagging Ontology: Towards a Common Ontology for
Folksonomies, 2006
[Konstantinou, Spanos, Chalas, et al, 2006] N. Konstantinou, D. Spanos, M. Chalas,
E. Solidakis, N. Mitrou: VisAVis: An Approach to an Intermediate Layer between
Ontologies and Relational Database Contents. International Workshop on Web
Information Systems Modeling (WISM 2006), Luxembourg, 2006
[Küster, Moore, Ludwig, 2007] Marc Wilhelm Küster, Graham Moore, and Christoph
Ludwig, “Semantic registries.” In: XMLTage 2007 in Berlin, Berlin, 2007
[Lagoze, Hunter, 2001] C. Lagoze, J. Hunter: “The ABC Ontology and Model”,
Journal of Digital Information, Vol 2, No 2, 2001
[Lahti, Palola, Korva, et al, 2006] J. Lahti, M. Palola, J. Korva, U. Westermann, K.
Pentikousis, P. Pietarila: “A mobile phone-based context-aware video
management application,” In: Multimedia on Mobile Devices II, Edited by
Creutzburg, Takala, Chen, Proceedings of the SPIE, Volume 6074, p 204-215,
2006
[Lamsfus, Linaza, Smithers] Carlos Lamsfus, María Teresa Linaza, Tim Smithers:
“Towards semantic-based information exchange and integration standards: the
art-E-fact ontology as a possible extension to the CIDOC CRM (ISO/CD 21127)
standard”. K-CAP2005, Banff, Alberta, Canada, Proceedings (ISSN 1613-0073)
of the Workshop on Integrating Ontologies, p 49-54
[Landwehr, Bull, McDermott, Chpi, 1994] C.E. Landwehr, A.R. Bull, J.P. McDermott,
W.S. Chpi: A Taxonomy of Computer Program Security Flaws, with Examples.
ACM Computing Surveys, 26,3 (Sept 1994),
http://chacs.nrl.navy.mil/publications/CHACS/1994/1994landwehr-acmcs.pdf
[Lassila, Swick, 1999] O. Lassila, R.R. Swick: “Resource Description Frameworks
(RDF): Model and Syntax Specification”, Recommendation World Wide Web
Consortium, February 1999
[LOCODE] http://www.unece.org/cefact/locode/
[Lu, Meng, Shu, et al, 2005] Y. Lu, W. Meng, L. Shu, C. Yu, K. Liu: Evaluation of
Result Merging Strategies for Metasearch Engines. WISE Conference, 2005
[Lu, Wu, Zhao, et al, 2007] Yiyao Lu, Zonghuan Wu, Hongkun Zhao, Weiyi Meng,
King-Lup Liu, Vijay Raghavan, Clement Yu: MySearchView: A Customized
Metasearch Engine Generator. 26th ACM SIGMOD International Conference on
Management of Data (SIGMOD 2007), Demo paper, p 1113-1115, Beijing,
China, June 2007
[Marradi, 1990] A. Marradi Classification, Typology, Taxonomy. Quality and Quantity,
1990, XXIV, 2, p 129-157. Available at:
http://web.archive.org/web/20040705070709/http://www.unibo.edu.ar/marradi/cl
assqq.pdf (Visited 2004-01-04)
[McDowell, 2003] L. McDowell, O. Etzioni, S. Gribble, A. Halevy, H. Levy, W.
Pentney,D. Verma, S. Vlasseva, Enticing ordinary people onto the Semantic
Web via instant gratification. In: Proceedings of the 2nd International Semantic
Web Conference (ISWC 2003), October 2003
[Medjahed, Bouguettaya, 2005] Brahim Medjahed, Athman Bouguettaya: A Multilevel
Composability Model for Semantic Web Services, IEEE Transactions on
Knowledge and Data Engineering (July 2005) vol 17 Issue7 p 954-968
[Meehl, 1995] P.E. Meehl: Bootstraps taxometrics: solving the classification problem
in psychopathology. American Psychologist, 1995, 50(4), p 266-275
[Meng, Yu, Liu, 2002] W. Meng, C. Yu, K. Liu: Building Efficient and Effective
Metasearch Engines. ACM Computing Surveys, 34(1), March 2002, p 48-89
[Merholz, 2004] P. Merholz: Ethnoclassification and vernacular vocabularies, 2004
[metasearch] http://www.trln.org/events/NISO/NISOmetasearch.ppt
[Miles, Brickley, 2005] A. Miles, D. Brickley: SKOS Core Vocabulary Specification,
W3C Working Draft, 2005
[Miller, Haas, Hernandez, 2000] E. Miller, L. Haas, M.A. Hernandez: Schema

Mapping as Query Discovery. Proc. VLDB 2000 Conf., p 77-88, Cairo, Egypt,
2000
[Mishler, 2006] B.D. Mishler: Integrative Biology 200A, “Principles of phylogenetics”,
2006, http://ib.berkeley.edu/courses/ib200a/pdfs/lect_12_(classification).pdf
[Mutton, P., and Golbeck, 2003] P. Mutton, J. Golbeck: “Visualization of semantic
metadata and ontologies”, In: Proc. of Information Visualization 2003, London
UK, 2003
[MySpace] http://www.myspace.com/
[Neches, Fikes, Finin, et al, 1991] R. Neches, R.E. Fikes, T. Finin, T.R. Gruber, T.
Senator, W.R. Swarout: “Enabling technology for knowledge sharing”, AI
Magazine, Vol 12, No 3, p 36-56, 1991
[Noy, McGuinness, 2001] N.F. Noy, D.L. McGuinness: “Ontology development 101: A
Guide to creating your first ontology”, Standford University, 2001
[OASIS ebXML Registry] OASIS ebXML Registry Information Model (RIM) Standard,
v3.0 and OASIS ebXML Registry Services (RS) Standard, v3.0,
http://www.oasis-open.org/committees/download.php/23648/regrep-3.0.1-
cd3.zip
[OASIS Reference Model] http://www.oasis-open.org/committees/tc_home.php
?wg_abbrev=soa-rm, http://www.oasis-open.org/committees/download.php/
16587/wd-soa-rm-cd1ED.pdf
[ontology] http://www.ontologyportal.org/pubs/IJCAI2001.pdf
[Opodo] http://www.opodo.com/
[OTA] Open Travel Alliance, http://www.opentravel.org/
[P3P] W3C’s Platform for Privacy Preferences, http://www.w3.org/P3P/
[Petrini, Risch, 2004] J. Petrini, T. Risch: Processing Queries over RDF views of
Wrapped Relational Databases. In 1st International Workshop on Wrapper
Techniques for Legacy Systems, WRAP 2004, Delft, Holland, 2004
[Photostuff] http://www.photostuff.com/
[Prud’Hommeaux, Seaborne, 2006] E. Prud’Hommeaux, A. Seaborne: SPARQL
Query Language for RDF. World Wide Web Consortium, Working Draft WD-rdf-
sparql-query-2006, 2006
[Quilitz, Leser, 2008] B. Quilitz, U. Leser: Querying Distributed RDF Data Sources
with SPARQL. In: The Semantic Web: Research and Applications. LNCS
5021/2008, p 524-538
[Quint, 2004] V. Quint, I. Vatton, An Introduction to Amaya, W3C NOTE 20-February-
1997, 1997, http://www.w3.org/TR/NOTE-amaya-970220.html
[Rabble] http://www.rabble.com/
[Ramakrishnan, Gehrke, 2002] Raghu Ramakrishnan, Johannes Gehrke: Database
Management Systems, 3rd edition, McGraw-Hill, 2002
[RDF] http://www.w3.org/RDF/
[reference sources]
http://www.libraries.rutgers.edu/rul/rr_gateway/e_ref_shelf/refmaps.shtml
[REWERSE] Reasoning on the Web with Rules and Semantics, http://rewerse.net/
[RFC 4122] http://www.ietf.org/rfc/rfc4122.txt, http://www.faqs.org/rfcs/rfc4122.html
[Rodriguez, Gómez-Pérez, 2006] J.B. Rodriguez, A. Gómez-Pérez: “Upgrading
relational legacy data to the semantic web”. In: Proceedings of the 15th
International Conference on World Wide Web (Edinburgh, Scotland, May 23 -
26, 2006), WWW ’06, ACM Press, New York, NY, p 1069-1070
[RQL] http://www.openrdf.org/doc/rql-tutorial.html
[Rummler, Brache, 1995] Rummler, Brache: Improving Performance: How to manage

the white space on the organizational chart, Jossey-Bass, San Francisco, 1995
[Sanghee, Lewis, Martinez, 2004] K. Sanghee, P. Lewis, K. Martinez: “SCULPTEUR-
D7.1- Semantic Network of Concepts and their Relationships”, Technical
Deliverable, 2004
[Sarvas, Herrarte, Wilhelm, Davis, 2004] R. Sarvas, E. Herrarte, A. Wilhelm, M.
Davis: “Metadata creation system for mobile images”. In: Proceedings of the
2nd international conference on Mobile systems, applications, and services,
Boston, MA, USA, 2004,
http://portal.acm.org/citation.cfm?id=990072&dl=GUIDE&coll=GUIDE&CFID=25
341318&CFTOKEN=52999446
[Sarvas, Viikari, Pesonen, Nevanlinna, 2004] R. Sarvas, M. Viikari, J. Pesonen, H.
Nevanlinna: “MobShare: controlled and immediate sharing of mobile images”.
In: Proceedings of the 12th annual ACM international conference on
Multimedia, October 10-16, 2004, New York, NY, USA,
http://portal.acm.org/citation.cfm?id=1027690&dl=GUIDE&coll=GUIDE&CFID=2
5341318&CFTOKEN=52999446
[SATINE] Semantic Web travel services on a voyage of discovery,
http://www.srdc.metu.edu.tr/webpage/projects/satine/,
http://cordis.europa.eu/ictresults/index.cfm/section/news/tpl/article/BrowsingTyp
e/Features/ID/79947
[Schenk, Staab, 2008] S. Schenk, S. Staab: “Networked graphs: a declarative
mechanism for SPARQL rules, SPARQL views and RDF data integration on the
web”. In: Proceeding of the 17th international Conference on World Wide Web
(Beijing, China, April 21-25, 2008). WWW ’08. ACM, New York, NY, p 585-594,
http://doi.acm.org/10.1145/1367497.1367577
[Schroeter, 2003] R. Schroeter, J. Hunter, D. Kosovic, Vannotea, A collaborative
video indexing, annotation and discussion system for broadband networks, in:
Proceedings of the K-CAP 2003 Workshop on “Knowledge Markup and
Semantic Annotation”, October 2003, Florida, 2003
[SCORM] Sharable Content Object Reference Model,
http://www.adlnet.gov/scorm/index.aspx
[Shvaiko, Euzenat, 2005] P. Shvaiko, J. Euzenat: “A Survey of Schema-Based
Matching Approaches”. In: J. Data Semantics IV 3730, 2005, p 146-171
[Silva] Nuno Silva: Ontology Mapping for Interoperability in Semantic Web, GEDAC -
Knowledge Engineering and Decission Support Research Group, Porto,
Portugal
[Slavic, 2000] A. Slavic: A Definition of Thesauri and Classification as Indexing Tools,
2000, http://dublincore.org/documents/thesauri-definition/ (Visited 2005-12-20)
[Smithers, Posada, Stork, et al, 2004] T. Smithers, J. Posada, A. Stork, M.
Pianciamore, N. Ferreira, S. Grimm, I. Jimenez, S. di Marca, G. Marcos, M.
Mauri, P. Selvini, N. Sevilmis, B. Thelen, V. Zecchino: “Information
management and knowledge sharing in wide”, In: European Workshop on the
Integration of Knowledge, Semantics and Digital Media Technology, London,
2004
[SOUPA] Standard Ontology for Ubiquitous and Pervasive Applications,
http://ebiquity.umbc.edu/paper/html/id/168/
[standard] http://www.webopedia.com/TERM/S/standard.htm,
http://www.etsi.org/WebSite/Standards/WhatIsAStandard.aspx
[Stuckenschmidt, van Harmelen, de Waard, et al, 2004] H. Stuckenschmidt, F. van

Harmelen, A. de Waard, T. Scerri, R. Bhogal, J. van Buel, I. Crowlesmith, Ch.
Fluit, A. Kampman, J. Broekstra, E. van Mulligen: “Exploring Large Document
Repositories with RDF Technology: The DOPE project”, IEEE Intelligent
Systems, Vol 19, No 3, p 34-40, 2004
[Studer, Benjamins, Fensel, 1998] R. Studer, R. Benjamins, D. Fensel: “Knowledge
Engineering”, DKE, Vol 25, No 1-2, p 161-197, 1998
[SUO] Standard Upper Ontology, http://suo.ieee.org/
[SUPER] http://www.ip-super.org/
[taxonomy] http://archive.eiffel.com/doc/manuals/technology/oosc/inheritance-
design/penn.html, http://www.db.dk/bh/lifeboat_ko/concepts/taxonomy.htm,
http://www.db.dk/jni/lifeboat/info.asp?subjectid=15,
http://en.wikipedia.org/wiki/Taxonomy
[Terzi, Vakali, Hacid] Elvimaria Terzi, Athna Vakali, Mohand-Saïd Hacid, “Knowledge
Representaioin, ontologies and the Semantic Web”
[Topic maps] http://www.ontopia.net/topicmaps/materials/tm-vs-thesauri.html
[towntology] http://www.towntology.net/Meetings/0605-Belfast/Minutes-
Presentations.pdf
[Trant, Wyman, 2006] J. Trant, B. Wyman: Investigating social tagging and
folksonomy in art museums with steve.museum. World Wide Web 2006:
Tagging Workshop. Editor, Edinburgh, Scotland, ACM, Access, 2006
[TUI] http://www.tui.com/
[UNESCO, 2002] UNESCO: Universal Declaration on Cultural Diversity, 2002,
http://unesdoc.unesco.org/images/0012/001271/127160m.pdf
[UNWTO] World Tourism Organization, http://www.unwto.org/
[Uren, Cimiano, Iria, Handschuh, Vargas-Vera, Motta, Ciravegna] Semantic
annotation for knowledge management: Requirements and a survey of the state
of the art, 2005: Journal of Web Semantics
[URI, URL] http://www.w3.org/Addressing/
[URN] Uniform Resource Name,
http://en.wikipedia.org/wiki/Uniform_Resource_Name,
http://fr.wikipedia.org/wiki/Nom_uniformisé_de_ressource,
http://de.wikipedia.org/wiki/Uniform_Resource_Name
[Uschold, Grüninger, 1996] M. Uschold, M. Grüninger: “Ontologies: Principles,
Methods and Applications”, Knowledge Engineering Review, Vol 2, 1996
[Van Damme, Hepp, Siorpaes, 2007] C. Van Damme, M. Hepp, K. Siorpaes:
FolksOntology: An Integrated Approach for Turning Folksonomies into
Ontologies, Proceedings of the ESWC 2007 Workshop “Bridging the Gap
between Semantic Web and Web 2.0”, Innsbruck, Austria, 2007
[Van Harmelen, Broekstra, Chirtiaan, et al, 2001] F. Van Harmelen, J. Broekstra, F.
Chirtiaan, H. Horst, A. Kampman, J. van der Meer, M. Sabou: “Ontology-based
Information Visualisation”, In: Proceedings of the Fifth International Conference
on Information Visualisation, England, 2001
[VESA] Video Electronics Standards Association, http://www.vesa.org/
[Volz, Handschuch, Staab, Studer, 2004] R. Volz, S. Handschuch, S. Staab, R.
Studer: OntoLiFT Demonstrator, 2004
[Volz, Stojanovic, Stojanovic, 2002] R. Volz, L. Stojanovic, N. Stojanovic: Migrating
dataintensive Web Sites into the Semantic Web. ACM Symposium on Applied
Computing (SAC 2002), Madrid, Spain, March 2002
[W3C] World Wide Web Consortium, http://www.w3.org/
[Wache, et al, 2001] H. Wache, et al: Ontology-Based Integration of Information - A

Survey of Existing Approaches. In Stuckenschmidt, H., editor, IJCAI-2001
Workshop on Ontologies and Information Sharing, p 108-117, Seattle, USA,
April 4-5, 2001
[WAI] W3C’s Web Accessibility Initiative, http://www.w3.org/WAI/
[Web Service Modeling Ontology Working Group] http://www.wsmo.org/
[Welty, 1998] C.A. Welty: The Ontological Nature of Subject Taxonomies. In: N.
Guarino (ed), Proceedings of the First Conference on Formal Ontology and
Information Systems, Amsterdam, IOS Press, 1998,
http://www.cs.vassar.edu/faculty/welty/papers/fois-98/fois-98-1.html
[Welty, 1999] C. Welty, N. Ide, Using the right tools: enhancing retrieval from
markedup documents, J. Comput. Humanit. 33 (10) (1999), p 59–84
[Wielinga, Schreiber, Wielemaker, Sandberg, 2001] B.J. Wielinga, A.Th. Schreiber, J.
Wielemaker, J.A.C. Sandberg: “From Thesaurus to Ontology”, In: Proceedings
of the First International Conference on Knowledge Capture and Acquisition, p
194-201, 2001
[Wordpress] http://wordpress.com/
[World Travel and Tourism Council, 2008] World Travel and Tourism Council: 2008
Tourism and Travel Executive Summary, 2008,
http://www.wttc.org/bin/pdf/temp/exec_summary_final.html
[WRL] The Web Rule Language, http://www.wsmo.org/wsml/wrl/wrl.html
[XFT] eXchange For Travel, http://www.exchangefortravel.org/
[YouTube] http://www.youtube.com/
[Zhao, Meng, Wu, et al, 2005] H. Zhao, W. Meng, Z. Wu, V. Raghavan, C. Yu: Fully
Automatic Wrapper Generation for Search Engines. World Wide Web
Conference (WWW14), p 66-75, 2005

eTOUR CWA Final 2009-06-03

Caricato da

Informazioni sul documento

Titolo originale

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

eTOUR CWA Final 2009-06-03

Caricato da

Copyright:

Formati disponibili

CEN

Harmonization of data interchange in tourism

EUROPEAN COMMITTEE FOR STANDARDIZATION

Management Centre: Avenue Marnix 17, 36 B-1000 Brussels

Ref. No. CWA - - - - -

6.1.3 Gaps and future needs 55

9.1.1.3 Response time 94

9.3.3.2 Future needs 112

The document has been prepared by the eTOUR Project Team:

 David Faveur, Afidium, France,

In his capacity as Chair of the Workshop Wolfram Höpken, University of Applied

Workshop participants have included: Afidium (France) • Asociación Centro de

This CEN Workshop Agreement is publicly available as a reference document from

Electronic data interchange and the interoperability between systems of different

The five challenges

 Meta data: Ontologies and taxonomies;

Together with well-defined semantics, data transformation is an essential tool to

Best practice case

The suggested approach is to watch carefully existing standards or approaches,

Furthermore, ways should be found to mediate between the remaining differences of

To oversee the market, it is highly recommended to implement a watchtower as a

In addition it is recommended to invest in long-term research on semantic methods

These recommendations aim at keeping diversity and flexibility of the European

One reason is that eTourism-relevant information, like most of all product

To ease the harmonization and mediation, it is highly recommended to implement a

It is recommended to write a more detailed proposal for this “eTourism watchtower”

In addition it is recommended to invest in long-term research on semantic methods

List of recommendations on different topics

 Leverage existing standards rather than develop new specifications whenever

 Follow existing taxonomies including established definitions wherever

 Build organizational structures for the long-term duration of eTourism-related

 Use recognized standard reference models such as the Harmonise ontology

Structured data mapping

 Use (graphical) mediation tools enabled with reasoning capabilities to

Manual semantic annotation

 Enhance the use of standard ontologies (e.g. Harmonise) on the field of

 Investigate in automation of annotations:

Automatic information extraction

 Foster the use of semantic web technologies to describe non-structured data

 Foster the development of ontologies using the same standard definition

 Based on the short-term recommendations, build graphic user interface based

 Simplify and rationalize existing processes – use stateless process handling or

 Develop process mediators.

 Make use of semantic technologies to describe your data.

 Focus on development of fast and easy to use alternatives of metasearch

 If a system should be available for external queries, make use of general

 Build a registry of present object identifications in the tourism industry.

 Provide guidelines for travel service coding schemes.

The Workshop’s main deliverable will be a CEN Workshop Agreement (CWA) on

The Workshop’s main focus is on interoperability issues in electronic data

It is outside the scope of the Workshop to do any direct standardization work on

 ISO 639-1:2002 Codes for the representation of names of languages — Part

3 Abbreviations, terms and definitions

3.2 Terms and definitions

computer reservation system (CRS) — computerized system used to store and

4 Methodology and thematic overview

Electronic data interchange and the interoperability between systems of different

4.1 Thematic circle

In conjunction with data transformation techniques agreed strategies towards the

of heterogeneous data structures from a wide number of data sources. In this it is

4.2.2 Data transformation

 Ontologies and taxonomies

Together with well-defined semantics data transformation is an essential tool to