Automated Ontology Evolution Through Semantic Matching

Automated Ontology Evolution:
Semantic Matching

ExaminationNumber:5858947
MScinSpeechandLanguageProcessing
THEUNIVERSITYOFEDINBURGH
2010
I have read and understood The University of
EdinburghguidelinesonPlagiarismanddeclarethat
this written dissertation is all my own work except
where I indicate otherwise by properuseof quotes
andreferences.
Acknowledgements
Iwouldliketoexpressmysincerestgratitudetomysupervisor,FionaMcNeillforher
continuoussupportandencouragementthroughoutthisproject.Shespenthoursreading
draftsandextendingthepreexistingsystemtoaccommodatemynewlycreatedmodule.A
specialthanksgoestoAlanBundy,mysecondsupervisor,whoneverrefusedtodevote
someofhisprecioustimetomeandwhoseeverypositivecommentmotivatedmetowork
even harder. I am deeply grateful to my friend Dimitris Kartsaklis, who guided me
throughmyfirststepsinprogrammingandwhowasalwaysthereeverytimeIneeded
someonetotalkto.ManythankstomyfriendJulienEychenneforthehelpheofferedto
meduringadifficulttime.Lastbutnotleast,Iwouldliketothankmyparentsforoffering
theirloveandsupporttomefrommilesaway.
Abstract
This thesis deals with the problem of semantic mismatch in an agentcommunication

environment.Ifagentsusedifferentwordsforthesamerealworldentities,isthereany
waytopreventcommunicationbreakdown?Canwedoitinafullyautomatedmanner?In
thecourseofourdiscussionIwillshowhowthisproblemcanbeaddressedinpractice
anddiscusstheimplicationsofsuchanapproachforontologyengineeringandontology
matchinginthecontextofmultiagentsystems.IwillpresentedtheSemanticMatcher,a
systemthatfullyimplementstheideasthatIproposeandhandlessemanticheterogeneity
inanovelway:bycombiningontologieswithfolksonomies.
CONTENTS
INTRODUCTION.................................................................................................................................................. 7
METHODOLOGY................................................................................................................................................. 9
ORGANISATION OF THE THESIS........................................................................................... 10
CHAPTER 1 Setting the scene............................................................................................... 11
1.1Ontologies............................................................................................................................................................... 11
1.2.Ontologymismatch..................................................................................................................................... 16
1.2.1OntologyRepairSystem........................................................................................................................ 17
Summaryofchapter1............................................................................................................................................... 21
CHAPTER 2 Semantic Matching......................................................................................... 21
2.1Ourproblem......................................................................................................................................................... 22
2.2Previouswork..................................................................................................................................................... 23
2.3.Challengesforonlinesemanticmatching........................................................................... 27
2.3.1Implementationchallenges................................................................................................................ 27
2.3.2Theoreticalchallenges............................................................................................................................ 28
2.3.3Theproposedsolution........................................................................................................................... 34
CHAPTER 3 Implementation..................................................................................................... 37
3.1TheSemanticMatcher............................................................................................................................... 37
3.1.1Buildingasearchengine....................................................................................................................... 39
3.1.1.1TrainingtheTextAcquisitionmodel.................................................................................. 42
3.1.1.2SensecreationandTermWeighting.................................................................................... 46
3.1.1.3Queryprocessing............................................................................................................................ 54
3.2Evaluation&AnalysisofResults..................................................................................................... 63
3.2.1Effectiveness................................................................................................................................................... 64
3.2.2Efficiency.......................................................................................................................................................... 71
3.3IntegrationwithORS................................................................................................................................... 72
CHAPTER 4 Discussion................................................................................................................. 75
4.1Theoreticaljustification............................................................................................................................. 75
4.2ImplicationsforOntologyEngineering................................................................................... 79
4.3ImplicationsforOntologyMatching.......................................................................................... 81
CONCLUDING REMARKS................................................................................................................... 82
REFERENCES........................................................................................................................................................ 84
APPENDIX.................................................................................................................................................................... 92
A.1.Glossary................................................................................................................................................................. 92
A.2Outputofevaluationmodule............................................................................................................ 94
A.3AdditionstoPA'sontology......................................................................................................................... 99
A.4ORSoutput.............................................................................................................................................................. 100
AutomatedOntologyEvolution:SemanticMatchingExaminationNumber:5858947
INTRODUCTION
Weareenteringanerawheretheamountofinformationproducedandstored(e.g.in
text, audio, images) makes our access to knowledge a complicated and time
consumingtask.Representingknowledgeinsuchawaythatitcanbehandledby
machines,andauthorisingintelligentagentstoperformactionsonbehalfofhumans
using this knowledge, is desirable. However, ambitious efforts to satisfy such a
demandatalargescale(e.g.theSemanticWeb)seemtohavereachedabottleneck
because of the lack of agreement on a shared ontology, that is, a common
representationoftheworld.Attemptstomatchdifferentontologiesandupdatethem
torepresentbeliefchangeusingontologymatchingtechniquesareoflimitedusein
SemanticWebtechnologiesandhavehadmixedsuccessbecausetheyarestilllargely
laborious'offline'proceduressincetheyusuallyrequirehumanexpertiseandtake
placebeforeagentinteraction.Matchingontologiesinadvanceisfruitlessinanon
lineenvironmentwhereservicerequestingagentsdiscoverserviceprovidingagents
automatically and may interact with them only once. Even the seemingly ideal
scenario,thatisestablishingauniversalontologytowhicheveryagentconforms,
wouldstillbefacedwithchallengessuchasaccommodatingdifferentopinions(e.g.
should'tomato'beclassifiedasafruitorasavegetable?),satisfyingtheontology
engineer'sneedforflexibility(e.g.whatif'hasPrice'isatwoplacepredicatebutthe
engineerneedstoaddathirdargument?)oradaptingtonewknowledge(e.g.howdo
weensurethatallontologiesonthewebareupdatedsimultaneously?).Moreover,a
universalontologythatencodesinformationfromalldomainsofknowledgewould
be too big to be usable in practical applications. It is obvious that insisting on a
shared ontology is not only nonrealistic but also nondesirable as it imposes
constraintsonhowagentscanrepresenttheirknowledge.
These issues are addressed within Automated Ontology Evolution, a theoretical

framework in which communication between agents with disparate ontologies is
7
facilitated.ThisideawasfirstintroducedbyFionaMcNeill(McNeill2006),whobuilt
the Ontology Repair System (ORS); a system which tries to diagnose and repair
ontologymismatchesautomaticallyandonthefly,thatisduringagentinteraction,
forthepurposesofthecurrentcommunicationneeds.Mismatchescanbeofmany
types. Forexample,agentscanusedifferentwordstoexpressthesameidea(e.g.
loves(?X, ?Y) vs. likes(?X, ?Y) or capital(UK, London) vs.
capital(UnitedKingdom, London)), predicates with different arities (e.g.
hasPrice(ThisCD, 12)vs. hasPrice(ThisCD, 12, GBPounds))andmanyothercases
mentionedinthepaper.
Thefirsttypeofheterogeneities,thatisuseofdifferentwordsforthesamemeaning,
isdealtwithinthisproject.InthispaperIpresenttheSemanticMatcher,anewORS
modulethathelpsagentsmeasurethesemanticsimilarityofmismatchedtermsand
negotiatemeaning.
Throughout the study I test the hypothesis that combining formal ontologies with
folksonomies (i.e. informal, 'folk' taxonomies) allows for efficient and effective
matchingincaseswhererelyingonontologiesalonewouldleadtofailedorpoor
matching.IntheenditIwilldemonstratethatincorporatingmatchingtoolsbasedon
these ideas into ORS extends the abilities of this system to allow agents to
successfullyinteractevenwheretheirontologiesusedifferentwords.
8
METHODOLOGY
My approach to the problem of semantic matching in an agent communication

environmentisinterdisciplinary.ThedesigndecisionsthatImakearemotivatedor
justified by theories within Artificial Intelligence (Automated Ontology Evolution
(McNeill 2006)1, Folksonomy (Vander Wal 2007)), Linguistics (Triangle of Reference
(Ogden and Richards 1923)), Philosophy of Language (Intension vs. Extension
(Carnap 1947) and models of conceptual structure) and Cognitive Science (Dual
Theoryofconcepts(LaurenceandMargolis1999)).
The system that I present is built on the basis of Information Retrieval

methodologies (Vector Space Model (Salton et al. 1975), tfidf weighting scheme
(Robertson and SprckJones 1976), Krovetz stemming algorithm (Krovetz 1993), Tag
plateau optimisation algorithm (Finn et al. 2001)) and makes use of a number of
databases(WordNet (Milleretal.1993), SUMO/WordNetmappings (NilesandPease
2003),SUMOontologiesfromSigmakeerepository2).
TheSemanticMatcheriswritteninPythonandwasincorporatedtothepreexisting
OntologyRepairSystem,writteninSicstusProlog3.
ThroughoutthepaperImakesome assumptionsandexplainonwhatbasistheyare
justified.Ialsoillustratemymainpointsusingtables,diagramsandformulas.
1 Theoriginalnameasusedin(McNeill2006)is'DynamicOntologyRefinement'
2 http://sigmakee.cvs.sourceforge.net/viewvc/sigmakee/KBs/
3 ThecompletecodeisavailablethroughFionaMcNeill(f.j.mcneill@ed.ac.uk)
9
ORGANISATION OF THE THESIS
Thisthesisisorganisedasfollows:
In Chapter 1 I introduce the notion of 'ontologies' and some of their basic

characteristicsandaddresstheproblemofontologymismatchinanonlineagent
communicationenvironment.
InChapter2Iexplaintheproblemtobedealtwithinthisproject,summarisesome
previousworkintheareaanddiscussthetheoreticalandimplementationchallenges
thatoursystemhastomeet.
In Chapter3 Ipresentthe SemanticMatcher,discussitsdesigndetails,analysethe

results from its evaluation and explain how it was integrated with the existing
OntologyRepairSystem.
InChapter4Ishowhowmyimplementationdecisionsarejustifiedbyphilosophical
andcognitivetheoriesofconceptualstructureandexplainhowthesystem'sdesign
canprovideanargumentforthecombinationofontologiesandfolksonomiesand
improveontologymatchinginmultiagentenvironments.
TheAppendixcontainsaglossaryofbasicterms,outputfromtheevaluationmodule
andfromtheOntologyRepairSystemandalltheinformationthatwasaddedtothe
SUMOontologiesbeforedemonstratinganagentcommunicationscenario.
10
CHAPTER 1 Setting the scene
1.1 Ontologies
InthissectionIbrieflydefinewhatwemeanbyontologyinArtificialIntelligence,I
present the different types of ontologies according to their domain specificity,
expressivity and ability to act on the environment, and I describe some of their
characteristicsthatwillfacilitateourunderstandingoflatersections.
Thetermontologyoriginatesfromphilosophyandcanbereferredtoasthestudy
ofwhatthereis(Hofweber2004),thatisthestudyofhowtheworldaroundusis
organised.TheattempttocategoriseexistencedatesbacktoAristotle's Metaphysics,
andfromthenonmanyphilosophersattemptedtounderstandrealityinthisway
(Buchholz 2006; Hofweber 2004). In the context of Knowledge Representation, an
ontologyisa(usually)machineunderstandable4 modeloftheworldoraparticular
domainofinterest.Gruber(1993)definesanontologyasanexplicitspecificationofa
conceptualization that consists of terms (i.e. words forobjects) and the relations
between them. Ontologies are useful because they clarify how knowledge is
structured (Chandrasekaran et al. 1999), so they support complex querying and
inference,andenableagentstoreasonandperformtasks.Thisisthemaintoolfor
realisingthevisionoftheSemanticWeb (BernersLeeetal.2001);awebinwhich
informationissemanticallyexplicitandcanbesearchedintelligently5.Intheauthors'
4 Althoughintheory,anontologycanbespecifiedindifferentlanguages(eitherformalornatural),theutility
oftheSemanticWebrelatesprimarilytoformalontologieswhicharemachineinterpretable.(Fortierand
Kassel2006:747;myemphasis)
5 Forexample,supposethatastudentwantstofinda1yearpostgraduateprogrammeinNeurobiologyata
NorthAmerican university, which can provide full funding or pay for travel expenses. To find such a
programmethestudenthastospendtimelookingatdifferentwebsitesandrisksnotexploitingalltheoptions.
Theidealsituationwouldbetosubmitaqueryascomplexasthestudent'sinformationneedandwaitforalist
ofanswersthatsatisfythisconstraint.Withthecurrentdatawebthisisnotpossiblebecausemachinescannot
readnaturallanguagetextandperforminference.InaSemanticWeb,wherealltheknowledgeisrepresented
withontologies,thiswillbeachievable.
11
words:
TheSemanticWebwillbringstructuretothemeaningfulcontentofWeb
pages,creatinganenvironmentwheresoftwareagentsroamingfrompage
topagecanreadilycarryoutsophisticatedtasksforusers.
(BernersLeeetal.2001)
An ontology can represent objects in domains and relationships between them,

functions, properties and processes inwhichtheobjects areinvolved, constraints and
rules (axioms) (Dakontaetal.2003).Ofcourse,howmanyoftheaboveareactual
componentsofaparticularontologydependsontheontology'sformalism.Whatis
importanttounderstandisthat,whateverthestructureoftheontology,itdoesnot
have to be cognitively or metaphysically acceptable. Ontologies are just
organisationalsystemsandontologyconstructionisanengineeringdesigneffort
notanexerciseincategorizingtheworld'scontent(Gruber2007).
Ontologiescanmodeleitherspecificdomains(e.g.medicine,law,physics,fashion
etc.)orgeneralknowledgeabouttheworld.Examplesoftheformer,called domain
ontologies,aretheGeneOntology6,theEnterpriseOntology7,theFoundationalModel
of Anatomy8 and others. Examples of the latter, called upperlevel ontologies, are
SUMO(Peaseetal.2002)9,Cyc10,DOLCE11,WordNet(Milleretal.1993)12andothers.
Differentontologiescanhavedifferentexpressivepower(i.e.abilitytodescribethe
world).Thesimplestcaseistaxonomies,thatisgraphstructureswhichdefineaclass
hierarchy. For example: isSubclassOf(Place, Thing), isSubclassOf(Country,
Place), isSubclassOf(City, Place). These can also include instances (e.g.
6 http://www.geneontology.org/
7 http://www.aiai.ed.ac.uk/project/enterprise/enterprise/ontology.html
8 http://sig.biostr.washington.edu/projects/fm/
9 http://www.ontologyportal.org/
10 http://www.cyc.com/
11 http://www.loacnr.it/DOLCE.html
12 Thisqualifiesasanontologyunderourdefinitionoftheterm,whichisbroadenoughtoincludetaxonomies
(seelaterdiscussion).
12
isInstanceOf(China, Country)) or partof relations (e.g. isPartOf(CityHall,

City))13. More expressive graphs can have other kinds of relations, for example,
hasColour(Sea, Blue), isBrotherOf(George, Susan).Thiskindofontologiesmake
useofbinaryrelationsonly(i.e.relationsthattaketwoarguments)andaretherefore
notappropriatetorepresentfactslike'Maryeatsfisheveryweek',whichtypically
requireaternarypredicate(e.g. eats(Mary, fish, week))14.Ifweallowarity(i.e.
number of arguments) to be greater than two and also write axioms (rules that
supportinference)inquantifiedformulas15,thenwehaveafullyfledgedFirstOrder
ontology.AgoodexampleofafirstorderontologyistheSuggestedUpperMerged
Ontology(SUMO)(Peaseetal.2002)andthedomainontologiesthatextendit.Much
oftheworkpresentedinthispaperfocusesonSUMOanditssubontologies.
Ontologiestypicallyhave signatures (alsoknownasTbox;terminologicalbox)and

theories (alsoknownasAbox;assertionbox) (Coluccietal.2006:106).Theformer
includetheclasshierarchyandaxiomsandthelattercontainfactsaboutindividuals.
Forsome(e.g.(Bergman2009))thesplitisoffundamentalimportancewhilefor
others (e.g. Web Ontology Language16) it is not significant, but it is sometimes
convenient to think in terms of two different kinds of knowledge (e.g. structural
properties of the ontology vs. beliefs/facts). This distinction is important to
rememberbecause,aswewillseelater,theOntologyRepairSystem(presentedin
section1.2.1)isabletoperformnotonlybeliefrevisionbutalsosignaturerepairs.
Asmentionedabove,ontologiescandifferintermsofthedomainstheymodelandin
termsoftheirexpressivity.Anotherdistinctionthatwecanmakeisbetween static
13 Subsumption(i.e.setinclusion)relationscanarecalled'isa'andcanappearindifferentforms(e.g. is-a,
isa,subclass,subclassOf,hasSubclass(inverse)etc.).Setmembershiprelationscanberepresentedas
instanceOf,type,hasTypeetc.Partofrelationsarealsocalled'hasa'.
14 Thisdoesnotmeanthatitisimpossibletomodelsuchfactsinbinaryrelations.Forexample,wecouldsay
eatsFish(Mary, week).ButthisisnotagoodenoughsolutionsinceapredicatelikeeatsBreadwouldbe
considereddifferentfromeatsFish,andwouldnotanswerthequestion'WhatthingsdoesMaryeat?'.Another
workaround could be: argument1(Mary, eats), argument2(fish, eats), argument3(week,
eats),butthismakesqueryingmoredifficult.
15 e.g.Thereisastudentwholikesallbooks:x(student(x) y(book(y) likes(x, y))
16 http://www.w3.org/TR/owlfeatures/
13
anddynamicontologies.Mostexistingontologiesarestatic,thatistheydeclaratively
representfactsthatcanbedecidedastrue(ifthefactispartoftheontologyorcanbe
inferredfromotherfactsandaxioms)orfalse.However,thetruthorfalsityofafactis
only a statementanddoes not do anythingtotheworld.Inmultiagentsystems,
ontologiesserveasknowledgebasesforagents,whoareabletochangetheworld
withtheiractions(Baral2010).Forexample,anagentcanbeauthorisedbyahuman
to purchase abook on the internet; when itadds the fact,say, hasBought(agent,
thisParticularBook)initsontology(andthereforethefactbecomestrue)thisisnot
justastatementbutatransformationofreality,sincenowthehumanistheownerof
the book and his/her credit card balance is lower17. Dynamic ontologies are of
increasing importance in multiagent systems, which are quite often based on a
variantoftheBDImodel(Beliefs,Desires,Intentions)(see Wooldridge2009):they
havebeliefs(i.e.factsintheirknowledgebase),desires(i.e.goals;factsthattheywould
liketomaketrue,i.e.toaddamongtheirbeliefs)and intentions (i.e.actionswhose
effectsamounttothemfulfillingtheirdesires).Inordertobringaboutadesiredstate
ofaffairs,agentsneedtofollowacourseofaction,whichisdecidedbyplanning18.
Planningisanimportantnotiontoremembersinceitiscentraltothedesignofthe
OntologyRepairSystem(section1.2.1).Furthermore,theneedofagentstoperform
actionsinordertofulfiltheirdesiresnecessitatestheexistenceofdynamicontologies
andnotjuststaticrepresentationsoftheworld.
17 Itmightsoundcounterintuitivethatontologylanguages,whicharedeclarative,canperformactions,buttake
Prologasanexample:IfweopenaProloginterpreterandtype write('hello world'),our'fact'willbe
evaluatedtotruebutthiswillalsoproduceasideeffect,namelytheprintingof'helloworld'.Sonowwehave
notjustaskedifwrite('hello world')istrue,butwehavealsoaskedPrologtodosomethingforus.Of
course,oneobviousquestioniswhetherlanguageswithoutaninterpreter(e.g.OWL,KIFetc.;fordefinitions,
seelaterdiscussion)canhavethesameeffect.Theansweris'no'butthereisawaytorepresentactionsand
theirconsequencesintotallydeclarativeontologiessothattheycanaffecttheenvironmentwhentranslated
intoalanguagelikeProlog.Forexample,PDDL(PlanningDomainDefinitionLanguage)(Ghallabetal.
1998)representsactionsinsimpletextfilesintermsofpreconditions(i.e.stateoftheworldbeforetheaction)
andeffects(i.e.stateoftheworldaftertheaction).
18 Forexample,ifmygoal(desire)istobeinAmsterdaminaweek,Iplanaheadtoperformanumberof
actions(intentions),e.g.buyaticketsothatthestateofaffairsofmehavingaticketistrue(belief),thengo
totheairportsothatmybeingattheairportistrueandsoon.Planningaheadisknownas'practicalreason'or
'practical reasoning' and was first introduced in Philosophy by Michael Bratman (Bratman 1987); see
(Wallace2008)foranoverview.
14
Anotherthingtoknowaboutontologiesisthattheyarewritteninontologylanguages.
SomenotableexamplesareOWLDL(WebOntologyLanguageDescriptionLogic),
aW3CrecommendationfortheSemanticWeb19,whichsupportsbinaryrelationsand
formaldefinitionsofconcepts; RDFS (ResourceDescriptionFrameworkSchema)20,
similar to OWLDL but without formal definitions, therefore does not support
complex inference; KIF (Knowledge Interchange Format)21 (Genesereth and Fikes
1992), a firstorderlanguagewhich includes arities belowandabove2as well as
quantification. It also supports nonmonotonic reasoning. SUOKIF was derived
from KIF [...] to support the definition of the Suggested Upper Merged
Ontology(Pease2009).Thisisthelanguageoftheontologiesusedinthisproject.For
anoverviewofontologylanguagessee(CorchoandGmezPrez2000).
Finally,weshouldbrieflydiscusshowwordsandformulasinfirstorderontologies
gettheirmeanings,asthiswillbeusedinsection2.3.2whereIshowthatontologies
inmultiagentsystemsarevulnerabletosymbolgroundingproblems.Theexample
ontologylanguageIwillbeusinghereisKIF.Anontologylanguagehasasyntax(i.e.
formationrules,someofthemrecursive,thatspecifywhatkindofformalpatterns
canbegeneratedfromthislanguage)andaformalsemanticsthroughwhichpatterns
(i.e.stringsofsymbols)arerelatedtotheobjectiveworld.Individualsareobjectsin
the'universeofdiscourse',thatisaconceptualisationofthethingsthatexistinthe
world(Pease2009),thereforetheyarealreadygroundedintherealworld22.Relations
aresetsofindividuals(thisistrueofunaryrelations,i.e.classes)orsetsoftuplesof
individuals. For example, the unary relation actress/123 which is just a string of
characters, therefore a symbol, achieves semantic grounding by pointing to its
denotation (i.e. set of individuals that make the relation true) through an
interpretation function. Under, say, interpretation 1, actress/1 means {Mary,
19 http://www.w3.org/TR/owlguide/
20 http://www.w3.org/TR/rdfschema/
21 http://wwwksl.stanford.edu/knowledgesharing/kif/
22 Butaswewillseelater,groundingisnotsuccessfulwhenontologiesareusedinagentinteraction.
23 Thenumberaftertheslashrepresentsarity.
15
MissBrown,SaraG,Paula}.Thebinaryrelationloves/2getsitsmeaningbypointingto
asetofpairsthatsatisfyit(e.g.loves/2canmean{<Mary,John>,<George,Alice>,
<Sophie,Mark>}underaparticularinterpretation. Formulas areconnectedtothe
worldbypointingtotruthvalues.Forexample,loves(George,Alice)returns'true'
because we can look at the already grounded relation loves/2 and confirm that
<George, Alice> is atuple in its denotation.This is all we need to know for the
moment.
Nowthatwehaveseenwhatontologieslooklike,wecanproceedtoournextsection,
which describes what happens when agents with disparate ontologies try to
communicate.
1.2 Ontology mismatch
TheSemanticWeb,asimaginedbyTimBernersLee(BernersLeeetal.2001),involves
intelligentagentswhoareabletomanipulatesemanticallyexplicitinformationand
interoperatewithotheragentstocarryoutcomplextasksonbehalfofhumans.But
what happens when the interacting agents have different representations of the
world?Howiscommunicationpossible?TheoriginalviewwithintheWorldWide
WebConsortium(W3C)wasthatagentssomehowhavetoconformtoacommon
ontology. As Heflin (2003) puts it on the W3C website: Ontologies should be
publiclyavailableanddifferentdatasourcesshouldbeabletocommittothesame
ontologyforsharedmeaning..However,asdiscussedinourintroduction,thisisnot
only unattainable (Finkelstein et al. 1993) but also undesirable, since it imposes
constraintsontheontologyengineer'sviewoftheworld.
TheproblemofheterogeneousontologiesisaddressedwithinthefieldofOntology
Matching (see (Euzenat and Shvaiko 2007) for an extensive overview). There are
16
many different techniques that are currently being developed for the purpose of
finding correspondences between parts of different ontologies (e.g. Matching,
Merging,Mapping,Alignment,Translationetc.).Inallofthem,theideaisthatone
ontology is linked to another or two ontologies are linked to a central one with
relationsholdingbetweentheirparts.Anotheroptionistomergetwoontologiesin
one.However,thesetechniquesgenerallypresupposeaccesstotheontologiesofboth
agentsinvolved,makingthemunsuitableforcommunicatingagents.Asmentioned
intheintroduction,thisisnotrealisticsinceserviceprovidingagentsmightnotbe
willingtorevealpartsoftheirknowledgebasetoagentswhorequesttheirservices24.
Anotherreasonwhytheseapproachesarenotappropriateformultiagentsystemsis
thattheyareusuallytooslowfortherequirementsofonlineagentcommunication.
A solution to this problem was proposed by McNeill (McNeill 2006) (see also
(McNeill 2007)), who built the Ontology Repair System; the subject of our next
section.
1.2.1 Ontology Repair System
TheOntologyRepairSystem(henceforthORS)isasystemdesignedtofacilitateagent
communication at runtime (i.e. during their interaction), not by establishing
correspondencesbetweentheontologiesoftheagentsinvolvedbutbybeingavailable
as a tool for one agent (known as the Planning Agent; henceforth PA), whose
representationoftheworldis'repaired'tomatchthatofa ServiceProvidingAgent
(henceforthSPA).ThePAtriestofulfilitsgoal(e.g.bookaplaneticket)byforming
plansandaskingoneormoreSPAstoperformactions.ORScanbeseenasaplugin
toPA,whichhelpstheagentfixpartsofitsontologyiftheycausecommunication
problems.Thisisanovelapproachtoontologymismatchasitproposesmeaning
negotiation'onthefly',evenbetweenagentsthathavenevercontactedeachother
24 Forexample,ifanagentwantstobuyaCDfromAmazon,thenitwouldbeunreasonabletoassumethatit
willhaveaccesstoprivateinformationwhichisknowntoAmazon.
17
before:
ORSisthefirstexampleofanewbreedofdynamic,automaticontology
repairmechanisms,whichwebelievewillbeessentialtorealisethevision
ofautonomous,interactingagents,suchasenvisagedintheSemanticWeb.
(McNeillandBundy2007)
ThemaincharacteristicsofORScanbesummarisedbelow:
ORS:
doesnotpresupposesharedontologicalrepresentationsamongagents
dealswithagentswhointeroperateinaplanningcontext,whichisnecessary
forsemanticwebservices
ispluggedintothePlanningAgent(PA),andhasaccesstoitsontology
hasnoaccesstoanyServiceProvidingAgent's(SPA)ontologyotherthanwhat
isrevealedduringcommunication
workswithfirstorderontologies,whicharerichenoughtosupportplanning
performs not just belief revision (i.e. changes in the PA's facts), but also
signaturerepairs(i.e.structuralchangesinthePA'sontology;changestothe
classhierarchy,axiomsetc.)
is minimal (fixes only the parts thatinhibitcommunication ataparticular
interaction)
isdynamicandfullyautomated(nohumaninvolvementisrequired,whichis
afterallthewholeideabehindmultiagentSemanticWebsystems)
iswritteninSicstusProlog25andadaptedtoOntolinguaportableontologies26
(writteninKIF);see(Gruber1992).Withthisproject,itwasalsomadepossible
forORStosupportthe SUMO ontologyanditssubontologies27 (writtenin
SUOKIF).
consistsofaTranslationsystem(whichtranslatesthePA'sontologyfromKIF
25 CurrentlyrunningonSicstusversion4;see(Carlssonetal.2010)
26 http://ksl.stanford.edu/software/ontolingua/
27 http://www.ontologyportal.org/
18
toProlog),a Diagnosticsystem (whichdiagnosesthetypeofmismatchthat

hasbeendiscovered)anda RefinementSystem (whichchangespartsofthe
PA'sontologytomakeitcompatiblewithSPA'srepresentation).Inthisproject
IaddedaSemanticMatchingsystem,calledtheSemanticMatcher(discussed
inchapter3).
ORSisabletosolvedifferentkindsofmismatches(forexample,existencevs.non
existenceoffacts,argumentsappearingindifferentplacesetc.).InthisprojectIam
dealing with themostfrequently occurring typeofmismatches,namely semantic
mismatches28,untilrecentlyunsolvablebyORS.
In another MSc project, Akinsola listed different kinds of mismatches that are
expectedtooccurinmultiagentsystemsinthefuture(Akinsola2008:3642).Thislist
wasbasedonanobservationofthechangesthatoccurwhenontologiesareupdated
in one of the largest repositories of ontologies, Sigmakee, which lists versions of
SUMOanditssubontologies(i.e.MILO29 &domainontologies).Theideawasthat
thechangesoccurringfromoneversiontoanothercanprovideuswithanintuition
astowhatkindofmismatchestoexpectbetweentheontologiesofdifferentagents.
Forexample,ifweseethatchangingapredicate'sarityisacommonontologyrevision
practice,wecanassumethatinfuturemultiagentsystems,differentaritywillbea
commonkindofmismatch.AnadaptedlistcanbeseeninTable1:
28 i.e.classes,individualsorrelationshavingdifferentnamese.g.lovesvs.likes
29 MidLevelOntology
19
REVISIONSINSUMO EXPECTEDONTOLOGY
REPOSITORY MISMATCHES
(fromversionxtoversionx+1) (betweenontologiesofPAandSPA)
1 Changesinnamesforclasses, Differentnamesforclasses,relationsor
relationsorindividuals individuals(semanticmismatches)
2 Additionoffacts
Existencevs.nonexistenceoffacts
3 Removaloffacts
4 Recategorisationofclasses Classeswithdifferentplaceinthehierarchy
5 Redefinitionsofrelations Relationswithdifferenttyperestrictions
6 Changesinargumentsforrelations Relationswithdifferentarguments
7 Hierarchicalrecategorisationof Classesthataresubclassesvs.classesthatare
classes instancesofanotherclass30
8 Additionofconjunctioninaxioms
Axiomswithvs.axiomswithoutconjunction
9 Removalofconjunctioninaxioms
10 Removalofuniversalquantifiersin
axioms Axiomswithvs.axiomswithoutuniversal
11 Additionofuniversalquantifiersin quantification
axioms
12 Removalofexistentialquantifiers
inaxioms Axiomswithvs.axiomswithoutexistential
13 Additionofexistentialquantifiers quantification
inaxioms
14 Changesinvariablenames
(notamismatch;variablesarejust'containers')
15 Changesintyperestrictionsfor Axiomswithdifferenttyperestrictionsfor
variableswithinaxioms variables
16 Removaloffactsorarguments Axiomswithvs.axiomswithoutparticular
fromaxioms factsoraxioms
17 Fixingtypos Differencesinwording
18 Refinementofinferencerules Individualsaremembersofdifferentclasses
19 Swappingofargumentsin Relationswithdifferentpositionsin
relations arguments
20 Additionofsubclassestoclasses Classeswithvs.classeswithoutparticular
subclasses
21 Implicationsign(=>)changesto Axiomswithconditionalsvs.axiomswithbi
equalitysign(<=>) conditionals
Table1:Ontologymismatchesexpectedtobecommoninmultiagentsystems
30 Thelatterisasecondorderrelation.
20
Currently ORS has the infrastructure for dealing with approximately 1/3 of the
mismatchespresentedearlier.Semanticmismatches,whichwereaddressedinthis
work,aredescribedintherestofthepaper.
Summaryofchapter1
InthischapterIexplainedwhatontologiesareanddescribedsomeoftheirbasic
characteristics.Ialsoaddressedtheproblemofontologymismatchandpresented
ORS,anonlineontologyrepairsystem,whichIhaveextendedforthepurposesof
thisproject.Afterhavingsetthesceneforourdiscussion,wecanproceedtoournext
section,whichlooksathowsemanticmismatchesaretobedealtwith.
CHAPTER 2 Semantic Matching
Inthecontextofthispaper,semanticmatchingisdefinedasdeterminingsynonymy
betweenoneormorewordsfromthePA'sontologyandoneormorewordsfromthe
SPA's ontology31. These words are strings of characters (linguistic symbols) that
represent classes, individuals or relations. Throughout this study I refer to them
usingthecoverterm'lexeme'32.
31 Forexample,howcanwedeterminethat'Cat'and'FelisCatus'refertothesameclassofthingsintheworld?
32 Theterm'lexeme'isusedbyPease(2009)torefertostringsofcharactersseparatedbywhitespace.This
definitionallowsvariablenames(e.g.?X?Y),quantifiers('forall','exists')andoperators('and','or','=>'etc.)
tobelexemes.Mydefinitionis,therefore,morerestrictive.
21
2.1 Our problem
Semanticmatchinginthecontextofonlineagentcommunicationcanbeofusein
twosituations,whichIcall surprisinglexemesituation and neededlexemesituation.In
theformercase,thePAreceivesaqueryfromtheSPA33(e.g.intheformofchecking
preconditionsforanactionithasbeenaskedtoperform)andoneofthelexemesin
thequerybeitarelation,individualoraclassisunknowntothePA.ThePA's
plantoachievethegoalbyaskingtheSPAtoexecutetheactionwillfail.Thenthe
ORSDiagnosticSystemwillbeconsulted,whichcandetermineeitherthatthelexeme
ismissingfromthePA'sontology,orthatthereneedstobeacorrection(e.g.atypo)
inorderforthePAtorecognisethelexeme,orthatthereisanotherlexemewhichhas
thesamemeaningbutadifferentsymbol,inwhichcasesemanticmatchinghastobe
performed. In the needed lexeme situation, the PA wants the SPA to perform an
actionbuttheplanfailsbecausethePAusesdifferentwordsfromtheonestheSPA
expects.Iftheneededlexemeisaclassnameorthenameofanindividual,thePAcan
submit a query to the SPA to retrieve the relevant information. This of course
presupposesthatthenameofthepredicateisshared.Butifitisthepredicatethat
needstobematched,firstorderlogicisnotenoughforthis,thereforeweneedto
makesomeassumptions,forinstance,thatSPAssupporthigherorderlogics34orthat
the SPA is helpful enough to do the search for the PA and suggest a matching
predicate. The needed lexeme situation is outside my scope in this study, as is
complexdiagnosis35.TheproblemIamdealingwithis: GiventhatthePAreceivesa
surprising question and the diagnostic algorithm determines that there is a semantic
33 Tosimplifyourwork,weassumethatthePAcontactsonlyoneSPAinordertogetanactionperformed.
However,inreallifesituationsitisexpectedthatthePAwillachieveitsgoalsbyrequestingservicesfrom
variousagents,accordingtotheirabilitytoperformcertainactions.Thisassumptiondoesnotaffectthe
resultsofourworkonsemanticmatching;itjusthelpsusavoidcomplexscenarioswhichareunnecessaryfor
ourtask.
34 ImplementationwisethisispossibleinProlog.Forsomeideasonhowtoretrievepredicatesusingsecond
orderlogicinProlog,see(SterlingandShapiro1994),chapter16.
35 Diagnosing semantic mismatches can be a very complex task, and might need to involve probabilistic
techniques.Forthepurposesofthisproject,theDiagnosticAlgorithmissimplifiedtodetermineinadvance
thatSemanticMatchingisrequired;however,aspartoftheORSDiagnosticSystem,theSemanticMatcher
diagnoseswhereexactly(i.e.inwhichlexemes)theproblemlies.
22
mismatch, can we perform semantic matching (i.e. search your ontology and find the
synonymous lexeme)? After this stage, the Refinement module will replace the old
lexemewiththeSPA'swordandreplanuntiltheplansucceedsornoplancanbe
found.
Beforewediscusshowthisproblemcanbedealtwith,let'slookatsomeprevious
work done on semantic matching and some extra challenges that a multiagent
environmentposes.
2.2 Previous work
The term semantic matching was first introduced by Giunchiglia and Shvaiko
(GiunchigliaandShvaiko 2003),whodescribeditasaprocessinwhicha'match'
operatortakestwographlikestructures(e.g.,databaseschemasorontologies)and
produces a mapping between elements of the two graphs that correspond
semantically to each other36. Semantic matching is contrasted to what they call
syntacticmatching,whichmatchesnodesofagraphbylookingat'labels',thatistheir
stringsimilarity(e.g.'phone'vs.'telephone').Whattheauthorsproposeinsteadisa
matchingonthebasisofmeaning,whichisencodedinthegraphstructure(i.e.by
looking at a node's 'ancestors', 'children' and 'sisters' in the treestructure). This
would enable nodes like 'Europe' and 'pictures' to be matched as equivalent
(synonymous)iftheirancestorsare'Images'and'Europe'respectively,giventhatboth
nodes mean 'pictures of Europe'37. SMatch (Giunchiglia et al. 2004) was the first
system that implemented semantic matching as described above. Its input is two
graphs and its output is pairs of nodes from the two graphs and their relations
(equivalence(=),moregeneral(),lessgeneral(),mismatch(),overlapping()).
36 Priortothis,theterm'semanticmatching'wasusedinabroaderway.
37 Weshouldalsonotethatrelations(i.e.arcsinthegraph)arenotlabelledandthatsynonymsarefoundin
WordNet(discussedinsection3.1.1.1)
23
Mydefinitionof'semanticmatching'atthebeginningofthischapterisnarrower
withrespecttothesystem'sdesiredoutputasitisonlyconcernedwithfinding
equivalence (=) relations between entities in two ontologies and broader with
respecttothetechniquesthatthesystemcanexploitasmanipulatingthe'label'
itself38isalsopermissible.Semanticmatching,asseeninthisstudy,canbenefitfrom
what Euzenat and Shvaiko call 'namebased' techniques for ontology matching
(EuzenatandShvaiko2007),whichcomparelexemes(i.e.words)onthebasisoftheir
formand/ortheentitiestheydenote.Thesetechniquesaresubdividedinto string
based methods and languagebased methods. The former are very close to what
GiunchigliaandShvaikocalled'syntactic'(asopposedto'semantic')matchingsince
they try to compute correspondences between lexemes disregarding the meaning
behindit(GiunchigliaandShvaiko 2003).Forexample,usingsuchamethod,itis
easiertomatch'bold'to'bald'ratherthan'bold'to'fearless'.Stringbasedmatching
systems typically involve a number of steps before the lexemes are compared:
normalisation(e.g.casenormalisation('CD''cd'),diacriticssuppression('crpe'
'crepe'), blank normalisation ('world\tcup' 'world cup')39, link stripping ('easy
going' 'easy going'), digit suppression ('cat2144' 'cat') and punctuation
elimination('C.D.' 'CD')), string equality (i.e.checkingiftwostringsareequal
('catch'=='cat'? False)),substringtest(checkingifonestringispartoftheother
('cat'in'catch'?True)),editdistance(e.g.(Levenshtein1965);('cat'and'catch'have
distance 2)), path comparison (applicable to schemas or directories, e.g.
'Images/Europe/Italy' vs. 'Europe/Pictures/Italy') and tokenbased distances
(discussedlater). Languagebasedmethods,ontheotherhand,regardastringnotas
series of symbols (characters) but as little texts (e.g. 'goodwill ambassador' means
somethinglike'ambassadorwhohasgoodwill'or'ambassadorbecauseofgoodwill'
or'ambassador'+'good'+'will'40).Thesemethodscaninvolvelinguisticnormalisation
38 e.g.segmentingthelexeme(i.e.theentity's'label')intowords('BalletDancer'['ballet','dancer'])
39 \t represents tabulation.Other'empty'characters usedinblanknormalisationare \r (carriagereturn), \n
(newline)etc.
40 Iintendthisasaconcatenationofmeaningsandnotstrings.
24
such as tokenisation ('easytocook cake' [easy, to, cook, cake]), lemmatisation

(also known as stemming, (dropped drop)), term extraction (e.g. from 'theory
paper'weextractthecentralterm'paper'because'theorypaper'means'paperon
theory'), stopwordelimination ('alongwaytogo' [long,way,go])andextrinsic
methodssuchasconsulting:dictionaries(inordertocomparetheirglosses41),multi
linguallexiconsandthesauri(i.e.dictionariesofsynonyms,hypernyms42andother
relations,e.g.WordNet(Milleretal.1993)).
Examples of systems that make use of 'namebased techniques' are too many to
discusshere.Alistcanbefoundin(EuzenatandShvaiko2007:187192)andbrief
descriptions of their main components can be seen on pages 153187. What is
importanttonoteisthatallofthesesystemssupportgraphlikestructures(e.g.XML,
RDF etc.) or ontologies based on Description Logics (e.g. OWL), which are less
expressivethanfirstorderontologieslikeKIF,whichisusedinORS.Moreover,none
of them is designed for multiagent systems, and therefore they are not directly
applicable to the problem addressed in this paper. In chapter 3 we will see that
althoughthemethodsIusearelanguagebased(becausetheycomparemeaningsand
notstringsofcharacters),mysystemalsobenefitsfromstringbasedtechniquessuch
as normalisation and tokenbased distances, but diverges significantly from their
traditionaluse(seechapter3forimplementationdetails).
Onesystem,builtbyQuandhiscolleagues(Quetal.2006),isworthmentioning,
since it provided the inspiration for the Semantic Matcher, which I am going to
presentlater.ThissystemcomputescorrespondencesbetweennodesofRDFgraphs
(whicharewords)43 byconstructing'virtualdocuments'foreachoneofthem.The
term'document'comesfromInformationRetrievalandmeansa'bag'ofunordered
words that represent a web page. In the system described, 'virtual documents'
41 i.e.naturallanguagedefinitions
42 e.g.'animal'isahypernymof'koala'
43 Infact,bothnodesandarcsinRDFgraphsareUniformResourceIdentifiers(URIs;discussedlater)
25
representingnodesarecomparedforsimilarityusingthevectorspacemodel(see
section3.1.1.3).Thebagsaregeneratedfromthetokenisedlexeme(i.e.nameofnode)
butalsofrom'neighbouringinformation',thatisnamesofothernodesconnectedtoit
inthegraph.Theresearchersavoidtheuseofexternallinguisticresourcessuchas
WordNetonthebasisthatitistoocomputationallyexpensive,whichisreasonable
sincetheirsystemcomparesallvirtualdocumentsofonegraphagainstallvirtual
documentsofanothergraph.ThedesignoftheSemanticMatcherisinfluencedby
thissystem,butaswewillseeithasmajordifferences(e.g.informationtobethrown
in thebags isaggregatedfromall possiblesources,WordNetisusedbecausewe
don't need to compare whole graphs etc.). Euzenat and Shvaiko classify this
approachunder'tokenbaseddistances'becauseittreatsalexemeasa'bagofwords'
(i.e. a set of tokens) (Euzenat and Shvaiko 2007). It is important to note that the
authorsregarditasa'stringbased'methodonthebasisthatsimilaritycomparison
betweenbagsisinfactstringcomparisonbetweenwordsinthebag.However,asI
willshowintherestofthestudy,creatingappropriatebagsforlexemesallowsusto
predict similarity of meaning even between pairs such as 'corn' and 'maize' and
thereforeitdeservestobeviewedasaproper semantic asopposedto syntactic44 or
stringbasedmatchingtechnique.
Nowthatwehaveseensomebasicsemanticmatchingtechniques,wecanlookat
whydesigningasystemformeaningnegotiationbetweenagentscannotfollowall
thesemethodstotheletter.
44 according to Giunchiglia and Shvaiko's use of the term 'syntactic', because in other contexts 'syntactic
matching'ismatchingofstructures(GiunchigliaandShvaiko2003)
26
2.3 Challenges for on-line semantic matching
2.3.1 Implementation challenges
SemanticmatchinginORScannotbeperformedbystrictlyfollowinganyoftheways
describedabove.Asdiscussedearlier,ORSoperatesinarealisticenvironment,thatis
it tries to resolve agent communication conflicts that will inevitably occur in the
futurewhiletakingintoaccountallthelimitationsthatthisinvolves:i)timepressure,
ii) agents' privacy, iii) ontologies have FirstOrder expressivity, because planning
can'tbesuccessfullysupportedbylessexpressiveontologies,butthisautomatically
raisesthecomplexityoftherepresentation45iv)ontologiesaredynamicasopposedto
static46.Theseconsiderationswillplacesomeconstraintsonourchoiceofsemantic
matchingtechnique.Inourproject,theneedforefficiency,combinedwiththesemi
decidability of FirstOrder Logic (henceforth FOL) will discourage us from
employingcomplexreasoning(thatis,tryingtoinfertheword'smeaningbymeans
of the relations itis involved in ortheaxioms thatencodefacts abouttheentity
referredtobytheword.(Theexpressivepoweroftheontologywillalsopreventus
fromthinkingintermsofgraphlikeconstructs.Graphsaretreestructures(usually
classifications,taxonomies)whereallrelationsarenecessarilybinary:everyarcisa
relation47anditsendsaretwonodes;theonlytwoargumentsitcantake.However,in
FOLwecanhavehigherarities(e.g. likes(john, squash, summer))orlower(i.e.
unary relations like philosopher(socrates)). Another issue is that many of the
mismatches are at the predicate level (section 1.2), which means thatwe need to
matchnotonlyclassesandindividuals(i.e.'nodes'inagraph)butalsorelations.
Finally,evenifwethinkofPA'sontologyasbeyondtreestructures,wehaveanother
constraint:thereisnootherontologywithwhichtofindcorrespondences.GivenPA's
45 althoughcurrentlylessexpressiveontologiesaremorewidespread
46 i.e.theyincludeactionconceptswithpreconditionsandeffects,withwhichagentsactontheenvironment
andchangetheworld.
47 e.g.isa,partof,instanceof,orperhapsancestorofinfamilytrees,higherthaninmilitaryhierarchiesetc.
27
(andORS's)limitedaccesstotheSPA'sontologybecauseofprivacyissues,wecannot
takeforgrantedthatthelatter'sontologicalrepresentationwillbeavailabletous.The
mostlikelyscenarioisthatthePAhasonlyseenafewwordsfromtheotheragent's
ontology;theonesthattheSPAwaswillingtorevealduringtheinteraction48.Hence,
there is no issue of matching two ontologies, but rather, matching lexical items
againstanontology.Thismeansthatinasemanticmatchingsituation,weonlyhave
onelexemeandwetrytofinditssynonyminsidePA'sontology.
2.3.2 Theoretical challenges
Thetheoreticalchallengethatoursystemhastomeetistackletherootoftheevil,
thatisuncoverandsolvetheproblemthatcausessemanticmismatch.Thissection
attempts to explain what happens in an agent's mind, that is how the agent
representstheworldandwhattheworldisanyway.Ourdiscussionwillexposethe
problemofsymbolgrounding,which,asIclaim,isthehiddencauseofsemantic
mismatch andwill providethemotivation forthedesign ofasemanticmatching
system thatis theoretically foundedon amorerobustnotion of meaning.This
section will also provide the background for a better understanding of the final
chapter,wherethenatureofthismeaningisexplained.But,beforeweproceed,we
havetoanswerthisquestion:Whatdoesmeaningmean?
Meaning is understood in two different ways in Philosophy of Language; as

intension49 (i.e. mentalrepresentation)andas extension (i.e.thesetofthingsinthe
worldthatthelexemedesignates50)(Carnap1947).Thedistinctionbetweenthesetwo
facetsofmeaningwasmadebyFrege(1892),whosuggestedthatsigns(i.e.linguistic
48 Thisisalsousefulasmatchingtwo(potentiallylarge)ontologiescanbeexpensiveandlargelyunnecessary.
49 Notethatthewordisspeltwithan's'andisnottobeconfusedwith'intention'
50 Ofcourse,meaningisnotonlyapropertyofwords,butofeverythingcomposedoutofwords,including
sentences.Forexample,theintensionalaspectofsentencemeaningistheproposition,thatisthe'thought'
encodedinthesentenceandtheextensionalaspectisthesetofsituationswhichmakeittrue.
28
symbols; words) refer to entities by means of thoughts. In other words, sense

(meaninginthehead;intension)mediatesbetweenthewordandtheobjectinthe
world to determine reference51. Some theorists have focused more on the external
aspectofmeaning,thatisextension(mostnotablyHilaryPutnam(Putnam1975)and
formalsemanticistssuchasRichardMontague(Montague1970),basedontheideaof
meaning as reference to the objective world, which is after all the purpose of
language. Viewingmeaningasonlyextensionismethodologicallyconvenient:the
meaningofapropernameisjustthepersonnamed(Kripke1972);themeaningofa
verb, adjective or noun (i.e. predicate) is a set of individuals52 in the world; and
sentences (i.e. formulas) are determined as true or false with the help of truth
functions(i.e.bycheckingwhetheranindividualortupleisintheextensionofa
particular predicate). Extensional definitions are also good at explaining how
meaningisshareableamonghumans,despitethedifferentconceptualisationsthat
eachpersonmighthaveforthesameword.However,restrictingsemanticassignment
toreferenceisnotimmunetoproblems:Forexample,whatisthemeaningof'Santa
Claus'sincenosuchindividualexists?Andwhyisthesentence'SupermanisClark
Kent' not atautology ifthemeaningofthetwonames isthe same personinthe
world?53Thissuggeststhatintension,meaningintheheadisalsonecessaryifwe
wanttohaveacompletesemantictheory.Indeed,nowadaysmuchoftheresearchin
SemanticsandCognitiveSciencehasfocusedonthenatureofmentalrepresentations
andhowtheydeterminereferencetotheworld(See(LaurenceandMargolis1999)for
agoodoverview).Forexample,whatisthestructureofmyconceptforcatandhow
doesithelpmedecidewhatis,andwhatisnotacat intheworld?Thequestionof
whatmodelofconceptualstructureismoreplausiblewillbediscussedinthelast
chapter.Whatweneedtounderstandnowisthatwordsareencodedintoconceptsin
ourheadandareusedtorefertoentitiesintheworld,hencemeaningcan'tbeonly
reference,itisalso sense.Thisisveryimportanttorememberbecausethesystem
51 FromnowonI'mgoingtousethepairsintensionextensionandsensereferenceinterchangeably
52 ortuplesofindividualsforaritygreaterthan1
53 Afurtherproblemwillbeexposedlater,whenwetalkaboutagentcommunication.
29
presented in chapter 3 is theoretically founded on the idea of building mental

representationsintheagent'sheadbecause,asIwillshow,itisinthiswaythattheir
wordscanrefertotheworld.Buthowdoesthewholeprocessworkinhumans?This
willbedescribedwiththefamoustriangleofreference.
Thetriangleofreference(alsoknownas'meaningtriangle'),introducedinOgden
andRichards'classicbook TheMeaningofMeaning (OgdenandRichards1923),isa
schematicrepresentationofhowwordsarerelatedtoentitiesintheworld.Ascanbe
seen in diagram 1, the linguistic symbol54 cat on the left stands for the set of
entitiesthatthewordextendsto(i.e.allthecatsintheworld),butthisrelationisonly
indirecthencethedottedlinebecausewhatmediatesisthe'thought',thatisthe
senseCAToftheword.
Diagram1:Humanmentalrepresentations
Apartfromsolvingphilosophicalproblems,asdiscussedabove,intensionalmeaning
alsoaccountsforwhatwecall'understanding'oftheword.Aswewillseelater,even
ifagents'ontologieshaveaperfectlysoundsemanticsforlexemes,itisuselessfor
54 i.e.stringofcharactersorsequenceofspeechsoundsthatrepresentalexicalunit.Tobeprecise,thisisnota
symbolbutasigninthatitisconventionallyassociatedwithitsreferent(needcitation).Inthecontextof
thisstudyIwillusethetwotermsinterchangeably.
30
semanticmatchingbecausetheagentsdon'tunderstandanyofthemeaning.Nowthat
we have seen how words get their meaning in human language, let's see what
happens in the case of agents. Imagine an agent having the lexeme cat in its
ontology,say,asaclass.Usually,ontologieshaveextensionalsemantics(e.g.SUOKIF
ontologies(Pease2009)),thatis,wordscontainedinthemareassignedmeaningwith
the help of an interpretation function (see chapter 1), which maps them to their
denotation(i.e.setofindividualsXfromthedomainofdiscourseforwhichcat(X)is
true).Therefore,linguisticsymbolsgettheirmeaningbypointingtoindividualsin
thedomainofdiscourse.Thatmakesaverygoodsemanticsforontologies,butthere
issomethingwrong:thedomainofdiscourseisnottheobjectiveworld55itselfbuta
symbolicworldwrittenwithothersymbolsinsidetheontology.Forexample,ifthe
predicatecat/1returns'true'wheninstantiatedinthefactscat(Fluffy),cat(Kitty)and
cat(Cat25633),thenthemeaningofcatwillbetheset{Fluffy,Kitty,Cat2563}.But
these individuals are still lexemes in the agent's knowledge base, which are not
groundedintherealworld:
Whentheknowledgeofadomainisrepresentedinadeclarativeformalism,
thesetofobjectsthatcanberepresentediscalledthe universeofdiscourse.
This set of objects, and the describable relationships among them, are
reflectedinthe representationalvocabulary withwhichaknowledgebased
programrepresentsknowledge.
(Gruber1993;myemphasis)
So,thefollowingsituationseemstobethecase:
55 ourworld;whatwewouldcallthe'contingent'worldinmodalphilosophy
31
Diagram2:Agent'mental'representations(Thesituationnow)
Asshownintheabovediagram,thelexemecatfailstorefertotheextensionofcats
in our world but instead designates a set of other uninterpretable lexemes. Even
worse,thissetisintheagent'sontology,thatisintheirmind.Whatseemstobe
happeninghereisaconflationofsenseandreference:Ifsemanticsisextensional,
whyarethereferentsontheagent'smind?Andifitisintensional,whydoesn'tthe
mentalrepresentationmediatetoestablishtheconnectionbetweencatandtheset
ofcatsinourworld?
However,onecouldraisetheobjectionthattheworlddoesnotneedtobeanything
more than a virtual world in the ontology and that semantics in ontologies has
nothing todo with thecontingentworld.Thisrescues formal semantics andit is
perfectly acceptable, but useless if agents have to interact with each other. For
example,let'ssaywehavetwoagents,JerryandTom.Jerryknowsthatcat/1means
{Fluffy,Kitty,Cat2563}andTomknowsthatfelisCatus/1means{MyLovelyCat,Kitty,
Max,Smokey,Kitten4}.HowcantheagentsdeterminethatcatandfelisCatusare
synonymous? They can't because in extensional semantics, synonymous words
shouldbecoreferential,thatis,theyshouldpointtotheverysamesetintheworld.
But in Jerry and Tom's case, sets are ontologyinternal, therefore agentspecific,
thereforeunabletorefertothe'verysame'set.Evenifthewordingisthesameand
32
setsappeartobeidentical,synonymycannotbeestablishedbecausenamesarenot
grounded56. Here it seems to be the case that each agent has its own private
language (to use Wittgenstein's 1953 terminology (Wittgenstein's 1953)) which is
comprehensible only to its single originator because the things which define its
vocabularyarenecessarilyinaccessibletoothers(Candlish2008)57.Whatweneedin
ordertoachieveagentcommunicationisaconnectionbetweentheagent'slexemes
andtherealworld.Butaswesaw,thishastobeachievedthroughintension.Inother
words,weshouldbuildasenseforeverylexemeintheagent'sontology;amental
representationthatcanhelptheagentunderstandthewords'meaningsandmediate
inordertodesignatethesetofcatsintherealworld.Asshowninthediagrambelow,
whatweneedtoachieveistocreateasenseCAT58intheagent'smind.
Diagram3:Agent'mental'representations(Whatwewouldliketohave)
Thenatureofthismentalrepresentationwillbediscussedinthelastchapter,where
56 One could say that the ontology engineers of the two agent assign meaning to 'cat' and 'felisCatus'
respectively,sothroughtheengineer'smindthelanguageworldconnectionwerequireisestablishedbutthat
wouldonlybeusefuliftherewascommunicationbetweentheengineers,thereforehumanandnot agent
interaction.
57 Toavoidconfusion,itisimportanttomentionthathumanlanguageincludesbothsyntacticrulesandwords,
whosecombinationgeneratessentences(Chomsky1957).However,ontologylanguagessuchasKIFcontain
onlysyntactic rules,basic operators and quantifiers; words are generatedand their meaningis assigned
pseudoextensionally,asIclaimed.Thatexplainswhylexemesandtheirmeaningsarenotalreadyshared
amongagentswhohaveacommonontologylanguage.
58 IcapitalisewordsformentalrepresentationsaccordingtotheconventioninPhilosophyofLanguageand
CognitiveScience.
33
wewillseehowtheSemanticMatcheriscompatiblewiththisnotionofmeaningand
whatconsequencesthishasforontologyengineering.
Belowisatableofsynonymsforsymbols,mentalrepresentationsandentitiesinthe
world,asfoundinfamousworksinPhilosophyofLanguage(Frege1892;Carnap
1947),Semiotics59 (Saussure1916;Peirce19311958collectedworks)andLinguistics
(OgdenandRichards1923).ThetermsthatIwillmainlybeusingare lexeme, sense
andreferencerespectively.
LANGUAGE THOUGHT WORLD

(symbol) (concept) (entity)
Frege1892 zeichen sinn bedeutung
(sign) (sense) (reference)
Saussure1916 signifiant signifi

(signifier) (signified)
OgdenandRichards1923 symbol thought referent
Peirce19311958 representamen interpretant referent,object
Carnap1947 Intension extension
Inthispaper lexeme, sense, reference,
word intension, referent,
mental extension,
representation set,entity
Table2:Triangleofreference(terminology)
2.3.3 The proposed solution
As we saw in section 1.2, semantic mismatch is very common across existing

ontologiesandpreventsmutualintelligibilityinagentcommunication.Thesolution
offered in this project breaks away from standard semantic matching techniques.
Here the alignment (matching) process is seen as an Information Retrieval
59 i.e.thestudyofsigns
34
(henceforthIR)task:givenasurprisinglexeme,thesystemhastosearchthroughall
thePA's 'candidate'lexemesandreturntheones thataresemanticallycloseto it.
Candidatelexemes includenames ofclasses,individuals orrelationsandexclude
variablenames(e.g.?X?Y),quantifiers('forall','exists')andoperators('and','or','=>'
etc).Itwillhopefullyturnoutintheendthatthisisanadequatewayofmeetingboth
theimplementationalandtheoreticalchallengesdiscussedabove.
Thesystemwhichcomputessemanticsimilarity,calledSemanticMatcher,isasearch
engine whose queries and documents are intensional meanings (senses). As a
thoughtexperimentimagineaGoogleofmeanings,wherewecaninputamental
representationandgetasoutputarankedlistofsimilarmentalrepresentations.In
the Semantic Matcher these meanings are simulated by bags of words (that is
multisets60ofwords(Lewis1998:6)).Thebagofwordsmodel(oftenabbreviatedto
bow)isapopularassumptioninInformationRetrievalwherebydocuments(i.e.
representationsofwebpagesaftertextprocessing)aresetsofindexterms(words)
whichcanretaintopicalmeaningwithoutretainingtheoriginalorder.Thefrequency
ofeachwordinthebagdeterminesitsimportanceinthedocument.Forexample,a
web page which talks about the song The rain in Spain (see diagram 8) can be
represented as the bag [where, soggy, plain, spain, spain, rain, spain,
stay, mainly, plain].Abagofwordscanbevisuallyrepresentedasatagcloud,a
term from Folksonomy (Vander Wal 2007). Folksonomy is a folk taxonomy that
emergesoutofcollaborativetaggingofinformationresources.Forexample,usersof
Delicious61 can label bookmarked web pages with keywords that they consider
relevanttothepage'scontent.Wordsthatarerepresentativeofthepagetendtobe
usedmoreoftenandgraduallyabottomupnotcentrallycontrolledclassificationof
bookmarkswillemerge.Atagcloudisthedepictionofabagofwordscreatedby
usersforaparticularresourceandconsistsofacollectionofkeywordswhosesize
60unorderedlistswhosemembers(wordsinourcase)canappearmorethanonce;alsoknownasweightedset
(Blizard1988).Forexample, [scissors, pen, ink, pencil] isasetwhile [scissors, pen, pen,
pen, ink, pencil, pencil]isamultiset.
61 http://www.delicious.com/
35
differsaccordingtotheirimportance.Inthefollowingtwopictureswecanseewhata
bagofwordsandatagcloudforthewordbachelorcanlooklike.
bachelor,bachelor,
unmarried,male,
man,single,young
enjoy,nightclubs,
man,wife,whiskey,
bachelor,marry,women
life,women,unmarried,
single,outgoing,male
Picture1:Bagofwordsforlexeme
'bachelor'
whiskey
wife marry
single
nightclubs bachelor young
unmarried man
outgoing
Picture2:Tagcloudforlexeme'bachelor'
Folksonomywillbediscussedfurtherinchapter4whilethebagofwordsmodelin
IRwillbeshowntoworkinpracticeinchapter3.TheSemanticMatcherconstructs
sensesbycollectinginformation,whichisparsedtorenderabagofwords.Inthe
nextchapter,thesystem'sarchitecturewillbedescribedbutitisonlylaterthatIwill
showwhyIchoosetocreatementalrepresentationsinthisway.
36
The outcome of this implementation cannot be compared against previous work

because,tomyknowledge,itisthefirstattempttoincorporatesemanticmatchingin
anagentcommunicationsystemthathas minimalaccess tooneoftheontologies.
Therefore,thereisnoanswertothequestionofwhetherthisisabettersolutiontothe
problem.However,itwillhopefullyturnoutlaterinthestudythatthisisa good
enough solution, not only because of its taskeffectiveness and efficiency, but also
because of its theoretical foundation (section 4.1) and its implications for the
semanticsofformalontologies(section4.2)andforontologymatching(section4.3).
Summaryofchapter2
In this chapter I explained the problem that our system has to deal with and
summarised some previous work done in the area of semantic matching. Then I
discussedthechallengesthatonlinesemanticmatchinghastomeetandintroduced
theprinciplesthatguidedthedesignoftheSemanticMatcher.Nowwecanproceed
tothedetailsofthesystem'simplementation.
CHAPTER 3 Implementation
3.1 The Semantic Matcher
TheSemanticMatcherisasearchenginethattriestofindthe'bestmatch'forthe
SPA'slexemeamongthecandidatelexemesinthePA'sontology.TheSPA'slexemeis
expandedintoa'bagofwords'thatmakeupitsintensionalmeaning(i.e.sense).This
37
bagwillserveasaquerytothesearchengine62.ThePA'scandidatelexemesarealso
bagsofwords,actingasacollectionofdocumentswhichwillberankedfromthemost
totheleastrelevant63afterthesearchhasbeenperformed.
Throwingunorderedsetsofwordsinsideanontology,someofwhichmightfailto
capturethemeaningofthelexemetheyaresupposedtodescribe,mightseemtobe
outsidethespiritofontologyengineeringandformalreasoning,wheredefinitions
havetobeprecise.However,asdiscussedearlier,theformalapproachhaslimitations
as to semantic assignment and the approach proposed in this study has its own
merits.First,itproducesaccurateandfastresults(seesection3.2),asisrequiredin
an agent communication situation. Second, it establishes a notion of meaning (in
particular, combination of formal definitions of lexemes from the ontology and
informalknowledgeassociatedwiththemwhichiscompatiblewithcurrentresearch
in PhilosophyofLanguageandCognitiveScience(seesection 4.1).Thethirdand
mostimportantcontributionofsuchamodelofsemanticmatchingisthatitbrings
together ontologies with folksonomies64, and points to an interesting research
direction in Ontology Engineering, where lexemes can achieve some semantic
grounding (i.e.relationbetweenthewordsintheontologyandtheobjectiveworld),
which can in turn i) compensate for the potential absence of Uniform Resource
Identifiers(URIs)65, ii) partlysolvethesymbolgroundingproblemassociatedwith
62 e.g.somethinglikewhatwewouldsubmitasinputtoGoogle;nottobeconfusedwithaformalquerywritten
inSQL,SPARQLetc
63 RelevanceisafundamentalnotioninInformationRetrieval,whichwecouldthinkofintwodifferentways:1)
asameasureofhowwellthesearchenginesatisfiestheuser'sinformationneeds(knownasuserrelevance)
and2)asameasureofhowsimilarthequery'scontentistothedocument'scontent(topicalrelevance)(Croft
etal.2010:4).Inrealuserenvironments,thefirstkindofrelevanceiscrucialforarankingalgorithmto
predict,butinmanyothercases,includingtheSemanticMatcher,allwecareaboutistopicalrelevance.
Therefore,inourcontext'relevance'isanotherwordfor'similarity',andthiswillbecomemoreevidentlater
whenwetalkabouttheVectorSpaceModel,whichcomputesthesimilaritybetweenaqueryandevery
documentinthecollection.
64 i.e.informalandimprecisemodelsoftheworldthatarecreatedinawaysimilartocollaborativetagging(the
folk'definition'thatemergesoutofthetagsgivenbyuserstoe.g.photographsonFlickr)
65 URIsareusedinRDFandOWLasameanstorefertoa'resource'thatisanentityintheworld,includinga
webpage,inwhichcaseaURIisaUniformResourceLocator(URL)(BernersLeeetal.1998).URIsare
betterthanlexemesatachievingsymbolgrounding,althoughaswewillseeinchapter4,theyalsohave
limitations.Inthecontextoffirstorderontologies,itisreasonabletoexpectlexemestobeannotatedwith
URIs,andthiswillbecomeusefullaterinthisstudy.
38
the formal semantics of ontologies (see section 4.2) and iii) provide a theoretical
frameworkonwhichtojudgesemanticproximityasopposedtosemanticidentity,
which might be crucial in agent communication with heterogeneous ontologies
(section4.3).
In what follows I will describe the Semantic Matcher architecture and draw the
analogybetweenmyimplementationandInformationRetrievalmethodologies.Itis
onlyinthefinalchapterthatIwilldemonstratehowthisissupportedbyresearchin
otherfieldsandwhatimplicationsithasfortheideaof'formalontologies'.
3.1.1 Building a search engine
Searchengines havetwomajorcomponents: indexing process and queryprocess

(Croftetal.2010:14).Theformertypicallycomprisestextacquisition(e.g.crawlingthe
web to discoverandstore newwebpages), texttransformation66 and indexcreation
whilethelatterinvolves userinteraction (howtheuser'sinputisprocessedandhow
resultsarepresentedtotheuser), ranking and evaluation onthebasisoftheuser's
behaviour67. To put it simply, during the indexing process the search engine
organisesthedataavailabletoitandduringthequeryprocessitsearchesthrough
themandpresentsthebestresultsforaparticularquery.
FortheSemanticMatchersomeoftheabovestagesareuseful,somenotapplicable,
whilesomeothers(e.g.parsingWordNetandSUMOontologies)hadtobeadded.
Therougharchitectureofthesystemisshowninthefollowingdiagram:
66 i.e.bringingthetextinanappropriaterepresentation.Typicallythisstageinvolves documentparsing (i.e.

extractingthecontentfromwebpagesbyremoving'noise'likeadvertisements,scriptsandtags),tokenisation
(splittingthetextintowordsormultiwordunits), stopping(removingwordslike'and','the'etc.whichare
uninterestingininformationretrievalsincetheycan'tdiscriminatebetweendocuments),stemming(removing
inflectionalmorphologyfromwords,e.g.cats cat,driving driveetc.,seesection3.1.1.2)andinsome
searchenginesalsoinformationextraction(i.e.namedentityrecognition,eventextractionetc).(Croftetal.
2010:63116).
67 i.e.assessmentofhowwelltheuser'sinformationseekingneedshavebeensatisfiedjudgingbytheir
behaviour,e.g.clicks,timespentonapageetc.
39
DATABASES MODULES DATABASES MODULES OUTPUT MODULES OUTPUT

USED CREATED
all_words
WordNet WorNet parser

words_n
synsets
Compute Bags-of-words & tf-idf

synsets_ RANKING
info
SUMO Bag1
WordNet 1st
Compute similarity
mappings sumo_
wordnet Bag1
2nd
SUMO parser
Bag3
SUMO,
MILO, docum_ kth
Domain subcl_inst Bagn
ontol.
...fork= ...fork=
threshold threshold
Stop
words
list
INPUT
INPUT
Bag
THE
SEMANTIC Planning INPUTExtract content

MARCHER Agent's
ontology
ARCHITECTURE
ServiceProviding
Agent'sURI
Diagram8:SemanticMatcherarchitecture
40
ThewholepurposeoftheprocessistotakethePA'sontology,findthe'candidate'
lexemes, that is lexemes eligible for matching (see section 2.3.3), aggregate
informationfromdifferentdatabasesinordertocreatea'bagofwords'associated
witheachoneofthem,thencomputeabagofwordsfortheSPA'slexeme(butina
differentwayaswewillsee),comparetheSPA'slexemewithallofthePA'scandidate
lexemesforsimilarityandreturnarankingofbestpossiblematches.Fourdatabases68
areusedintotal.Thefirstthreeareparsed,resultinginthecreationofanotherfive
databases69whichhavebeencreatedonceanddon'thavetoberecomputed.Therole
ofthelatteristoholdtherelevantcontentextractedfromtheoriginaldatabasesinan
easilyprocessableformat.Thenextphaseinthediagramisthecreationofthebags
for the PA's candidate lexemes. The process ends with the Semantic Matcher
returningtherankinggiventhe'query'(i.e.theSPA'sbag).
Thesystemcanbebrokendownintothefollowingcomponents:
TrainingtheTextAcquisitionModel(diagram5).
ThisisnotanactualstageinInformationRetrieval,butisusefulforour
purposes(seesection3.1.1.1).
SensecreationandTermWeighting(diagram6)
This is equivalent to the indexing process in IR but with significant
differencesinthesubprocessesinvolved.
QueryProcessing(diagram7)
ThisissimilartothequeryprocessinIRbutwithoutuserinvolvement
68 shownasgreencylindersinthediagram
69 yellowcylindersinthediagram
41
Diagram5:TrainingtheTextAcquisitionmodel Diagram6:TextAcquisition&TextTransformation
THE 3 STAGES
OF THE
SEMANTIC
MARCHER
Diagram7:QueryProcessing
Thesestagesaredescribedinmoredetailbelow.
3.1.1.1 Training the Text Acquisition Model
TheTextAcquisitionModelisasetofdatabasescreatedbytheWordNetandSUMO
42
parserstakingasinputthelexicalresourceWordNet(Milleretal1998)70,adatabaseof
SUMOWordNetmappings71(NilesandPease2003)andacollectionof645ontologyfiles
(different versions of 38 ontologies that extend SUMO)72. WordNet (WN) is an
electronic lexical reference system for English, designed in accordance with
psycholinguistictheoriesoftheorganizationofhumanlexicalmemory(Milleretal.
1988).Itcanbedescribedasageneralthesaurusinwhichwordsaregroupedinto
fourpartsofspeech,namelynoun,verb,adjectiveandadverb.Withineachpartof
speechwordsareorganisedintosynsets,thatissetsofcognitivesynonyms,which
collectivelyrepresentasense,or'meaning'.Forexample,thenoun'bat'isamember
of5differentsynsets,twoofthembeing{bat,chiropteran}and{squash_racket,bat},
referringtotheanimalandtheracketrespectively.Someofthesynonymsetswhere
thenoun'home'belongsare{family,household,house,home,menage},{base,home},
{home, nursing_home, rest_home}, referring to the social unit, location and
institution respectively. The verb 'hold' is part of the sense representations {hold,
support, sustain, hold_up} and {keep, maintain, hold}, among others. WordNet
capturesaveryimportantaspectofLexicalSemantics:thefactthatrelationssuchas
synonymy, antonymy (semantic opposition) and hypernymy (subsumption) hold
betweensensesandnotbetweenwords73.Forinstance,wecanonlyclaimthatthe
noun'head'issynonymoustothenoun'principal'withrespecttooneoftheirsenses
(i.e. the meaning they share) and WordNet captures this notion of synonymy by
listing these two lexical entries in the same synset, namely {principal,
school_principal,head_teacher,head}74.Synsetsareorganisedinacomplexhierarchy
withrelationssuchasISAandPARTOF;therefore,theycanhaveothersuchsense
groupingsashyponyms,hypernymsandmeronyms75.Synonymsetsusuallyhavea
70 Dependingonthefocus,WordNetcanbedescribedasathesaurus(becauseofitsuseofsynonymsforlexical
entries)orasalightweightontology(becauseofitstaxonomicalinformation,i.e.hypernyms,hyponyms,
meronymsetc).WordNetisdownloadablefromhttp://wondnet.princeton.edu.
71 http://sigmakee.cvs.sourceforge.net/viewvc/sigmakee/KBs/WordNetMappings/
72 foundintheSUMOrepository(http://sigmakee.cvs.sourceforge.net/viewvc/sigmakee/KBs/)
73 Sinceasemanticrelationisarelationbetweenmeanings,andsincemeaningscanberepresentedbysynsets,
itisnaturaltothinkofsemanticrelationsaspointersbetweensynsets(Milleretal.1993:6)
74 Ofcourse,theword'head'isamemberof32othersynsetsand'principal'belongstoanother5synsets.
75 e.g.'car'isahyponymof'vehicle','vehicle'isahypernymof'car','wheel'isameronymof'car'
43
'gloss' (thatis anaturallanguagedefinition ofthesense)andexamples featuring

wordsfromtheset.Someofthesewillbeusefulforsensecreationinournextstage76.
SUMOWordNetmappingsisadatabaseinwhichWordNetsynsetsarealignedwith
SUMOconcepts('lexemes'inourterminology)intermsofthreerelations:synonymy
(i.e.theWNsynsetisasynonymtotheSUMOlexeme), subsumption77 (i.e.theWN
synsetisahyponymoftheSUMOlexeme)andinstantiation(i.e.theWNsynsetisan
instance of the SUMO class) (Niles and Pease 2003). From these relations, only
synonymy will be useful for our Text Acquisition Model since extracting synsets
whicharehyponyms(subclasses)orinstancesmightdisorientusfromthelexeme's
meaning.Forexample,throwing,say,ahundreddifferentnamesofcountriesinthe
bagfor'Country'willnothelpusdeterminewhat'Country'means.Similarly,the
senseof'Device'ismoreeffectivelycreatedusingsynonyms(e.g.'gadget','machine')
than subclasses (e.g. 'smartphone'). Indeed avoiding subclasses and instances
producedmoreaccurateresultsinpractice.Finally,the645ontologyfilesthatwere
parsed provided us with natural language documentation and superclass or type
information78 forsomeSUMOlexemes.ThiswasnecessarygiventhatSUMOisa
modular ontology, which means that the class hierarchy, axioms, facts and other
informationissplitintodifferentsubontologies79.
Theabovedatabaseswereparsedtocreateasetofneweasilyreadabledatabases.
WordNet and SUMOWordNet mappings were scanned with regular expressions
usingtheWordNetparserandtheontologyfileswereparsedwiththeSUMOparser.
Thesecondparserwasmoresophisticatedbecauseoftworestrictions:1)SUMOand
itssubontologies(MILOandthevariousdomainontologies)arewritteninSUOKIF
withbalancedparentheses.Thisprecludedthepossibilityofextractingthenested
76 FormoreinformationonthevarioususesofWordNetsee(Fellbaum1998).
77 NilesandPease(2003)callthisrelation'hypernymy'(i.e.thelexemeisahypernymoftheWNsynset).Inthis
contextIavoidthistermbecauseitmightmisleadusintothinkingthatweextracthypernymsofthelexeme,
butthisisnottruesincenorelationoflexeme synsethasbeenestablished.
78 Setinclusion()andsetmembership()respectively
79 http://www.ontologyportal.org/SUMOhistory/SUMO1.22.txt
Forexample,documentationofalexemeintheGeographysubontologyversion4mightbefoundinthe
CountriesAndRegionsontologyversion6.
44
structures with regular expressions (regex) given that languages with balanced
bracketinghave contextfree expressivity(see(Russeletal1995:656;Manna1974))
becausetheirsyntacticrulessupportrecursion80,whileregexarestringsofaregular
language, which is lower in the Chomsky hierarchy (Chomsky and Schtzenberger
1963). While regular expressions can be generated andrecognised (and therefore,
matchedwithpatterns)usingFiniteStateAutomata(FSA), contextfreestringscan
onlybeparsedwithPushDownAutomata(PDA).Therefore,theSUMOparserwas
essentiallyaPDAwithacounterthatkeptarecordofhowmanybracketsareopen
andhowmanyareclosed. 2) Asecondcomplicationwasthatasmallnumberof
ontologyfilescontainederrorssuchasunbalancedparenthesesorunbalancedquotes
(i.e.......)whicheithercausedtheprogramtoexit,ormadethewrongpredictions.
Thereforetheparserwasdesignedtoberobusttofailuresandsolvetheseproblems
inamoresophisticatedway.
ThedatabasescreatedcanbesummarisedinTable3:
80 ThisistrueoftheSUOKIFsyntax(seePease2009)
45
FULLNAME INFORMATIONCONTAINED
all_words Allwords A set of all words found in
WordNet
words_n_synsets Wordsand AllwordsfoundinWordNetwith

synsets the ids of all their possible senses
(i.e. synsets they belong to) from
the most to the least frequently
occurring.
synsets_info Synsets Synonym sets of synset ids, their

information gloss and their hypernymous
synsets
sumo_wordnet Sumowordnet SUMOlexemesandtheidsoftheir
synonymoussynsets.
docum_subcl_inst Documentation, Documentation, superclass and
subclass,instance type information for SUMO
lexemes. (i.e. which classes the
SUMO lexeme is a subclass or
instanceof)
Table3:DatabasescreatedbyWordNetparserandSUMOparser
ThesedatabasesformwhatIcalltheTextAcquisitionModel,thatisacollectionof
resources (later read as lookup tables; dictionaries) that will determine what
informationcanenterthebagforeachlexemeinthePA'sontology.Thesefileshave
beencreatedonceandwillonlyhavetoberecomputediftheontologiesorWordNet
versionhavetobeupdated.Theirformatisveryeasilyprocessablesoastodecrease
thecomputationaltimeofthenextstage.
3.1.1.2 Sense Creation and Term Weighting
This module takes as input the PA's ontology and the databases created in the
previous stage and returns a collection of bags of words, each one of which
representstheintensionalmeaning('sense')ofacandidatelexeme.Eachbagcontains
46
weightedwords,thatiswordswithacoefficientofimportance.
SensecreationismycovertermforTextAcquisition andTextTransformation;thetwo
subprocesses of creating the bags of words. Both of these phases are actual
components of an Information Retrieval task. The main difference is that in the
SemanticMatchertextisnotacquiredbyparsingHTMLpagesandextractingtheir
content,butbyaggregatinginformationfromdifferentdatabases81.Furthermore,in
the Semantic Matcher acquisition and transformation takes place many times for
everylexemesincebagsarefilledincrementally.Acomparisonofdocumentcreation
inIRandsensecreationherecanbeseeninthefollowingtwodiagrams:
CRAWLING TEXT TOKENISATION

ACQUISITION
Andwhere'sthat
and,where's,that,
soggyplain?In
soggy,plain,in,
Spain!InSpain!
spain,in,spain,the,
TheraininSpain
rain,in,spain,stays,
staysmainlyinthe
mainly,in,the,plain
plain!
Bagofwords(document) STEMMING STOPPING
where's,
where,soggy, soggy,plain,
plain,spain,spain, spain,spain,
rain,spain,stay, rain,spain,
mainly,plain stays,mainly,
plain
Diagram8:DocumentcreationinInformationRetrieval
81 AnexceptioniscomputingtheSPA'ssense.Wewillcometothislater.
47
READ TEXT TOKENISATION

LEXEME& ACQUISITION
DATABASES
VisaCard
<rawtext> <tokenisedtext>
Bagofwords(sense) STEMMING STOPPING
card,bank,
money,card,visa, <stopword
visa,financial,card, freemultiset>
currency
Diagram9:SensecreationintheSemanticMatcher
Beforeweseewhatthesensecreationalgorithmlookslike,weneedtodiscusshow
textistransformedintowordsafteritisacquiredfromthedatabases.Thisprocessis
generally the same as the one used in Information Retrieval (i.e. involves
tokenisation,stoppingandstemming)(Manning2008;Croftetal.2010)withsome
additionsthatIdiscussbelow.
Tokenisation is the process of segmenting natural language text into strings of

characters('tokens')usuallyequivalenttowords(Schmid2007)butcanalsoinclude
longerstretchesofcharacterssuchastwowordsconnectedwithahyphen(e.g.'user
friendly'), abbreviations such as 'U.S.A.', dates (e.g. 15011985) and others.
Sometimes interpreting punctuation such as dots can be very problematic
(GrefenstetteandTapanainen1994)andtoday'ssearchenginesuseverysophisticated
tokenisers.TokenisationintheSemanticMatcherisshallowerandmostpunctuation
(with the exception of hyphens and underscores) as well as numbers have been
48
removedbeforehand82.Anextrafeaturethathasbeenadded,however,isresolving
word sequences from strings in camel case or with underscores (e.g. CreditCard,
credit_cardcreditcard),whichareverycommoninSUMOandotherontologies83.
Thenextstageisstopping,wherefunctionwords(e.g.'and','the','of'etc.),whichare
consideredsemanticallyvacuous,arepreventedfromenteringanybag.Thisisdone
inordertoreducestoragespaceandmakesearchduringthequeryprocessingphase
more efficient. Another reason is that such words are found in almost every
document('bagofwords'),sotheydon'thelpusdiscriminatebetweenthedifferent
optionstorank.Aswewillseelater,rarewords(i.e.wordsthatappearinonlyafew
documents)aremoreimportantthancommonwords.Termsasfrequentlyusedas
'the' are negligible. In the Semantic Matcher the list of tokens generated after
tokenisationischeckedagainstastopwordlist84andblacklistedwordsareremoved.
After that,thetokensthatremain inthelistgothroughastemmingfunction, as
describedbelow.
Stemming(alsoknownasconflation)isthemorphologicalanalysisperformedona
word(e.g.'providing')inanattempttoreduceittoitsstem(e.g.'provid')orbase
form(e.g.'provide').Thereasonforincorporatingastemmerintoasearchengineis
that words of the same class but with different inflections (e.g. drop, dropping,
dropped) are essentially thesame wordappearing morethan once butdisguised
undermorphologicalvariation;therefore,theyshouldbeconflated.Ifnostemming
wasinvolved,thedocumentwouldbeinaccuratelyrepresented,whichwouldinturn
affecttermweighting.Generallyspeaking,wordsthatappearmanytimesaremore
important or more 'representative' of the document than words that appear less
82 TheonlyreasonIusealightweighttokeniseristhatanythingmorecomplexthanthatwouldbetooambitious
forthetimescaleofthisproject.Itwouldbeinterestingtoseeinthefutureifamoresophisticatedtext
segmentationmodulewouldrendertheSemanticMatchermoreeffectiveandwhetherthiswouldhaveany
noticeablenegativeeffectoncomputationaltime.
83 Onethingthatcanbeimplementedinthefutureisabbreviationexpansiontoproducemeaningfultokensout
of
84 IusePedersen'sWordNetstopwordlist(availableon
http://www.d.umn.edu/~tpederse/Group01/WordNet/wordnetstoplist.html),towhichImadesomeadditions.
49
often.Withoutstemming,thepictureofthedocumentisaltered.Stemmerstypically
dealwithinflection,thatismorphologicalvariationwithoutchangingthewordclass
(i.e.thepartofspeech).Examplesofinflectionalsuffixationarepluralisation(s,ies,
es etc.), past tense (ed, ied), progressive form (ing) and others. Derivational
morphology (e.g. ion) creates a word that has a different class (e.g. create
creation) and consequently a different meaning. This means that words with
derivationalvariationarenotsupposedtobeconflated.FortheSemanticMatcherI
implementedaslightlyalteredversionoftheKrovetzstemmingalgorithm(Krovetz
1993), whichstripsinflectionalsuffixesandbringsthewordtoitsrootform.The
mainadvantageoftheKrovetzstemmeroverothers(e.g.theLovinsStemmer(Lovins
1968)orthePorterStemmer(Porter1980))isthatitisbasedondictionarylookup 85,
therefore it produces actual words rather than stems (e.g. 'describe' instead of
'describ' from the word 'describing')86. For the same reason it can also handle
irregularpluralsorpasttenses.Oneofitsdisadvantagesisthatsinceitperforms
deeperanalysisandmightincludeanumberofdictionarylookupsforasingleword,
itcanslowdowntheindexingprocessinlargescalesearchengines.However,inthe
contextoftheSemanticMatcher,whereonlyafewthousandwordsarestemmed,this
wasnotanissue.
Asdescribedabove,theSemanticMatchercreatesbagsofwordsforlexemesintwo
phases:textacquisition,wheresometextrelevanttothelexeme'smeaningisextracted
fromthedatabases,and texttransformation,wherethistextisconvertedintowords
(called'indexterms'inIR).Theprocessoffillingthebagswithwordsisroughly
describedwiththeSenseCreationalgorithm:
85 e.g.itremovesasuffixandaddsletterswhilerepeatedlycheckingagainstadictionaryuntilthebaseformis
foundornomorerulesapply.
86 Foracomparisonofdifferentstemmerssee(FullerandZobel1998)
50
SenseCreationalgorithm
For every candidate lexeme, do:
Create an empty bag,
Transform the lexeme and throw the resulting word(s) in the bag,
If the lexeme exists in sumo_wordnet, then:
Extract the i.d. of its equivalent synsets,
For every synset i.d., do:
Go to synsets_info and extract the synset's members (i.e.
words), its gloss and its hypernyms,
Transform all the strings extracted and throw them in the bag,
If nothing has been extracted from synsets_info so far, then:
Go to words_n_synsets and try to find the word,
If you find it, then:
Extract the i.d. of the most frequently occuring synset for this word,
Go to synsets_info and extract the synset's members (i.e. words),
its gloss and its hypernyms,
Transform all the strings extracted and throw them in the bag
If the lexeme exists in docum_subcl_inst, then:
Extract documentation and/or (superclass or type),
Transform all the strings extracted and throw them in the bag,
If there are comments in the ontology that contain the lexeme, then:
Transform them and throw them in the bag
Now that the senses for our candidate lexemes have been constructed, we can
proceedtotermweighting,thatisassigningtoeverywordineverybaganumber
whichmeasureshowwellthatwordrepresentsitsbag.Thiswillbeessentialforthe
retrieval model used in the next stage (section 3.1.1.3) and can be precomputed
beforetheagentsstarttocommunicate.
IntheSemanticMatcherIusethe tfidf weightingscheme(RobertsonandSprck

Jones1976;see(Ramos2003)forabriefoverview).AsCroftandhiscolleagues(Croft
51
etal.2010:22)explain,therearemanyvariationsoftheseweights,buttheyareall
based on acombination ofthefrequency orcountofindextermoccurences in a
document(thetermfrequency,ortf)[i.e.numberofwordsinabag]andthefrequency
ofindextermoccurencesovertheentirecollectionofdocuments(inversedocument
frequency,or idf)[i.e.numberofbagsthatcontainthiswords](authors'emphasis).
Below I will briefly explain the intuition behind the version I am using in the
SemanticMatcher(adaptedfrom(Lavrenko2009)).
Imaginethatwehaveacollectionofterms(i.e.allwordsinallbags)andforevery
bagwewanttodeterminetheweightofeachoneoftheseterms.Thiswillbedoneon
thebasisof6observationsabouthowwellwordsrepresenttheirbag:
1)Presenceorabsenceofawordisthemaincriterion.Ifawordisnotinthebag,its
weightshouldbe0;otherwise,itshouldbegreaterthan0(e.g.1).
2) Frequencyofawordinthebagmightindicatethatthiswordisa'keyword'(i.e.
indicativeofthebag'stopic).Therefore,morefrequentwordsshouldbegivenhigher
weights.Observations1and2explaintheroleofthenumeratorinformulas1.1and
1.2.
3)Ifawordisoftenrepeatedinabag,itmayonlybebecausethebagistoobigand
notbecausethewordisakeyword.Thus,allotherthingsbeingequal,longerbags
shouldhavesmallerweights.Thisexplainstheexistenceof|D|(or|PBc|)inthe
denominator.
4)Rarewords(i.e.wordsthatdonotappearinmanybags)candifferentiatebetween
bags betterandthereforecarry moremeaning(ibid).Thisis whatSprckJones
(SprckJones 1972) calls 'term specificity', the rationale behind inverse document
frequency:thefewerthebagsthatcontainthisword,themorepowerfulthewordisin
52
its own bag. This explains the existence of the denominator dfw in the formulas
below.
5) Thefirstoccurrenceofawordinabagismoreimportantthanitssubsequent
occurrences.Forexample,ifweseetheword'prosopagnosia'inatext,itisverylikely
thatthistexttalksaboutprosopagnosiaorperceptiondisordersorsomethingsimilar,
butifweseethewordagain,thisdoesnotaddasmuchmeaningtoitsbagasthefirst
time it was encountered. Hence, we need some correction so that subsequent
encountersofthetermarelessandlessimportant.Thisexplainstheadditionoftfw,D
(ortfw,PBc)inthedenominator.
6)Inlongerdocuments,repetitionsaremoreimportant,therefore,thelongerthebag
theweakertheabovecorrectionshouldbe;hencetheexistenceofthedenominator
avg.doc.len(oravg.bag.len).
tf w , D
tfidf w ,D =
tf w , D
kD
log
C
df w
avg.doc.len
where:
w is a word,
D is a document,
tfidfw,D is the term-frequency/inverse-document-frequency weight of the word in the document,
tfw,D is the frequency of the word in the document,
k is a constant (usually set to small values, e.g. 0.1),
D is the length of the document (i.e. how many words it contains),
avg.doc.len is the average document length in the collection,
C is the length of the collection (i.e. how many documents are available to the search engine),
dfw is the document frequency of the word (i.e. how many documents from the collection contain
this word)
Formula1.1:tfidftermweighting(InformationRetrievalnotation)
53
tf
tfidf w ,PB c =
tf w , PB
w , PB c
kPB c
log
C
df w
c
avg.bag.len
where:
w is a word,
PBc is the Planning Agent's bag of words for a particular lexeme,
tfidfw,PBc is the term-frequency/inverse-document-frequency weight of the word in the PA's bag for this
particular lexeme,
tfw,PBc is the frequency of the word in the PA's bag for this particular lexeme,
k is a constant (usually set to small values, e.g. 0.1),
PBc is the length of the PA's bag for this lexeme,
avg.bag.len is the average bag length in the PA's ontology,
C is the length of the collection (i.e. how many bags there are in the PA's ontology),
dfw is the document frequency of the word (i.e. how many bags from the collection contain this word)
Formula1.2:tfidftermweighting(SemanticMatchernotation)
Atthis phaseallthesenses createdforeach candidatelexemearestoredasbags

whichcontainwordswiththeirtfidfscores.Allthisinformationwillbeusefulforthe
nextcomponentoftheSemanticMatcher,discussedbelow.
3.1.1.3 Query processing
Thenextstageinoursystem'sarchitecture(highlightedindiagram7above)isquery
processing(alsoknownasqueryexecution).Thisphaseoccurs'online',thatisduring
agent interaction: when the PA receives a surprising lexeme from the SPA, plan
execution fails.Then the PAsends therelevantsurprising queries to ORS,which
identifies a word in the query which is not in the PA's ontology and asks the
SemanticMatchertosearchthroughallofPA'scandidatelexemesfortheonewhich
54
ismorelikelytobesynonymoustothesurprisinglexeme.Ascanbeseenindiagram
7, the input to this stage is the PA's bags of weighted words and the SPA's bag.
However,whatwehavenotexplainedyetishowtheSPA'slexemeacquiresasense
sincetheSenseCreationandTermWeightingphasehasalreadybeencompleted.
TheSemanticMatcherispartofORS,whichispluggedintothePlanningAgentto
facilitateitsinteractionwithServiceProvidingAgents.Atthemoment,ORSsupports
ontologies written in KIF while its newly created semantic matching module is
suitableforPAsthatuseSUMOsubontologiesinparticular87(writteninSUOKIF).
However, as mentioned in the previous two chapters, ORS can't make any
assumptionsabouttheSPA'sontology,nordoesithaveaccesstoanypartsofitunless
theSPAitselfiswillingtorevealpartofitsrepresentationduringtheinteraction.All
thatORSknowsisthatSPAscanbequeried(andthisisdoneusingProloginthe
currentimplementation).ThismeansthatiftheSemanticMatcherhastocreatea
sensefortheSPA'slexeme,itcan'tfollowthesameprocessastheonedescribedinthe
SenseCreationandTermWeightingstage,eveniftheSPAhappenstohaveaSUMO
ontology.Moreover,giventhelimitedaccesstoSPA'sontology,weareleftwithout
contextfromwhichtobuildamentalrepresentation(sense)forthelexeme.Itseems
tomethatifpowerfulsemanticmatchingsystemsaretobeconstructedinthefuture,
theyshouldbeprovidedwithsomecluesastolexeme'smeaning,whichthesystem
canenrichwithsynonymsandotherrelatedwordsinordertoachievesensecreation.
Someideasare:
Bags arealready availableandcan beexpandedbythesemanticmatching

system.ThiswouldbeidealandcaneasilybeachievedthroughanOntology
87 EventhoughORSisnotexpectedtoknowpartsofthePA'sontologybeforehand,thisisnotanadhocdesign
decisiongivenhowwidespreadSUMOis.ThatmeansthatthesemanticmatchingsupportingversionofORS
isinfactaSUMOpluginwheretheknowledgeofpredicatessuchas'subclass','instance'or'documentation'
canbesafelypresupposedsincetheyoccurinalltheSUMOontologies.Furthermore,itisnothardto
imagineasystemwhichactuallyguessesthewordsusedforsuchpredicates.Forexample,asystemcantry
wordsthathasaverysmallLevenshteindistance(Levenshtein1965)to'subclass','isa','isa','subClassOf'or
'instance','type'andothers.
55
EngineeringpracticethatIsuggestinthelastchapter:engineerscantagtheir
lexemes withkeywords oruseURLs (URIs thatpointtoretrievabledigital
resources)whichdirectthesystemtoacollaborativelycreated'tagclouds'(in
fact'broadfolksonomies'(seesection4.2)).
Somecontextismadeavailabletothesemanticmatchingsystemthroughthe
useofURLspointingtonaturallanguagedefinitions,relatedtexts,WordNet
synsets(orsimilargroupingsof'senses'indifferentthesauri)orperhapsother
formalontologies(e.g.OWL,RDF(S)andothers)88.
IntheSemanticMatcher,Ihaverestrictedtheimplementationtooneoftheseoptions,
namelylexemesbeingannotatedwithURLswhichpointtonaturallanguagetext89.
WhentheORSDiagnosisSystemdeterminesthatthereisasemanticmismatch,it
queriestheSPAtoobtaintheURLforthisparticularlexeme90.TheURListhengiven
totheSemanticMatcher,whichextractsthecontentfromthewebpage(i.e.removes
'noise'suchasscripts,tags,advertisementsetc.)byusingregularexpressionsandthe
TagPlateauoptimisationalgorithm(Finnetal.2001).Thisalgorithmisbasedonthe
observationthatthecontentofawebpagecontainsfewerHTMLtagsthanother
partsofthepage(e.g.advertisements).We representthepageasalistof'tokens'
(whichareeithertagsor nontags)andtrytofindthetagpositionsiandjthat
maximisethefollowingobjectivefunction:
88 Thelatterwouldstill befacedwithdifficulties ofinterpretinglexemes butwouldat leastprovide some

context,whichwouldbeusefuliftheURLhaspointedtoadomainontology,wheremeaningstendtobeless
generalandtherefore,lessambiguous.
89 HowtheseURLsarechosenfortheevaluationoftheSemanticMatcherwillbeexplainedinthenextsection.
90 WealsotakeforgrantedthattheSPAwillbewillingtorevealitsURL,butthatmakessensesincethe
purposeofURLsisfacilitatingdisambiguationanyway.
56
i1 j N1
b n 1bn bn
n=0 n=i n = j1
where:
bn is the nth token in the web page; bn=1 if the nth
token is a tag and bn=0 otherwise,
i and j are the two values (token positions in the page)
which are assumed to include the page's content if they
maximise the above objective function
N is the number of tokens in the web page
Objectivefunctionforfindingtagplateau
ThistechniqueisusedinInformationRetrievalintheTextAcquisitionphase,before
thecreationofdocuments.HereIuseitinasomewhatunorthodoxway,inorderto
extracttextforthe'query'(i.e.theSPA'sbagofwords)91.Oncethecontentisextracted,
thelistofwordsacquiredundergoesstoppingandstemmingandtheresultingbagis
thesenseoftheSPA'slexeme92.Aswewillseelater,termweightingisverysimplein
queries(justfrequencyofthewordinthequery;thisisobvioussincethereisno
notionof'collection'ofqueriesor'average'querylengthetc.)soitdoesnotneedto
bestoredatthisstage.
Now it is time for query processing: the PA's bags and the SPA's bag enter the
compute_similaritymodule,whichreturnsarankingofthebestcandidatelexemes.
OurrankingalgorithmisbasedontheVectorSpaceModel (Saltonetal.1975;Salton
and McGill 1983) (see (Raghavan and Wong 1986) and (Lee et al. 1997) for an
overview),aretrievalmodelwhichtreatsdocumentsandqueriesasvectorsinahigh
dimensionalspace,whereeachwordinthecollectionisadimension.Eachvectorhas
a particularpositionintheVectorSpaceaccordingtowhatvalues itgetsineach
dimension.Thesevaluesarenothingbuttermweights(seeprevious section).For
91 Ofcourse,inrealIRsituations,queriesareinputbytheuseranddon'thavetobecreated(althoughtheycan
beexpanded(Manningetal.2008,chapter10)).
92 InthisprojectIdidnotattempttoexpandthesensewithsynonymsorhypernymsfromWordNet,butthisis
somethingthatcanbeimplementedinthefutureandmightproduceevenbetterresults.
57
example, suppose that there are 5 words in the collection ('go', 'California', 'sun',
'yoghurt'and'battery')and3documentsD1 = {go, sun, sun},D2 = {California,
battery, sun}andD3 = {yoghurt}.Thevectorsofthethreedocumentswouldbe5
tuples (i.e. would have 5 coordinates in the space), which, given a simplistic tf
weightingscheme,wouldlooklikethis:D1 = (1, 0, 2, 0, 0),D2 = (0, 1, 1, 0,
1),D3 = (0, 0, 0, 1, 0, 0)93.Withthemoresophisticatedtfidfweightingscheme
thatwedescribed,thevectorswouldhavethefollowingcoordinates:D1 = (0.61934,
0, 0.65675, 0, 0), D2 = (0, 0.61934, 0.61934, 0, 0.35261), D3 = (0, 0, 0,
0.67025, 0, 0).Diagram10showshowtwodocumentsandonequerycouldbe
positionedina3dimensionalspace.Diagram11illustratesthesameinthecontextof
theSemanticMatcher.
93 Thisistheweightingschemeusedforqueries,asshowninformula...
58
Diagram10:VectorSpaceModelwith3dimensionsand3vectors(InformationRetrievalnotation)
59
Diagram11:VectorSpaceModelwith3dimensionand3vectors(SemanticMatchernotation)
60
Similaritybetweenaqueryandadocumentisdeterminedbytheirproximityinthe
multidimensionalspace(i.e.thesmallerthedistancebetweenthem,themoresimilar
theyare)andcanbemeasuredinmanyways(e.g.cosinecoefficient,diceproduct,
Jaccardcoefficientetc.)94.IntheSemanticMatcherIusethetfidf weightedsum,
which is shown in Formulas 2.1and2.2.Formulas 3.1and3.2showthesame in
simplernotation.
Q D
s D , Q = tf w ,Q tfidf
w ,D i j
i=1 j=1
precomputed
where:
s (D, Q) is the similarity of the document to the query,
tfwi,Q is the frequency of word i in the query,
tfidfwj,D is the term-frequency/inverse-document-frequency weight of word j in the document,
Q is the query length (i.e. how many words it contains),
D is the document length,
Formula2.1:tfidfWeightedSum(InformationRetrievalnotation)
94 Foranintroductiontothesemethodssee(Hillenmeyer2005).
61
SB PBc
s PBc , SB = tf w , SB tfidf
i=1 j=1
w , PB
i j c
precomputed
where:
s (PBc , SB) is the similarity of the Planning Agent's bag for a particular lexeme to the Service
Providing Agent's bag,
tfwi,SB is the frequency of word i in the Service Providing Agent's bag,
tfidfwj,PBc is the term-frequency/inverse-document-frequency weight of word j in the Planning
Agent's bag for a particular lexeme,
SB is the length of the Service Providing Agent's bag (i.e. how many words it contains),
PBc is the length of the Planning Agent's bag for a particular lexeme,
Formula2.2:tfidfWeightedSum(SemanticMatchernotation)
s D , Q = tf w Q tfidf w D tf w Q tfidf w D ... tf w Qtfidf w ,D

1, 1, 1, 2, 1, D
tf w Qtfidf w D tf w Qtfidf w D ... tf w Qtfidf w , D ...

2, 1, 2, 2, 2, D
tf w , Qtfidf w D tf w ,Q tfidf w D ... tf w ,Q tfidf w , D

Q 1, Q 2, Q D
where:
s (D, Q) is the similarity of the document to the query,
tfwi,Q is the frequency of word i in the query,
tfidfwj,D is the term-frequency/inverse-document-frequency weight of word j in the document,
Q is the query length (i.e. how many words it contains),
D is the document length,
Formula3.1:tfidfWeightedSum(easierInformationRetrievalnotation)
62
s PB c , SB = tf w SBtfidf w PB tf w SB tfidf w PB ... tf w SB tfidf w , PB

1, 1, c 1, 2, c 1, PBc c
tf w SB tfidf w PB tf w SBtfidf w PB ... tf w SBtfidf w , PB ...

2, 1, c 2, 2, c 2, PBc c
tf w , SBtfidf w PB tf w , SBtfidf w PB ... tf w , SBtfidf w , PB

SB 1, c SB 2, c SB PBc c
where:
s (PBc , SB) is the similarity of the Planning Agent's bag for a particular lexeme to the Service Providing
Agent's bag,
tfwi,SB is the frequency of word i in the Service Providing Agent's bag,
tfidfwj,PBc is the term-frequency/inverse-document-frequency weight of word j in the Planning Agent's
bag for a particular lexeme,
SB is the length of the Service Providing Agent's bag (i.e. how many words it contains),
PBc is the length of the Planning Agent's bag for a particular lexeme,
Formula3.2:tfidfWeightedSum(easierSemanticMatchernotation)
TheVSM,asopposedtootherretrievalmodels95,isdistancebasedandthisproves
useful in a semantic matching system as it can address the issue of similarity of
meanings,whichisnotpossiblewithastrictlyformalapproachtoontologies(see
section4.3).
HavingdiscussedtheSemanticMatcherarchitectureandtheintuitionsbehindit,we
canproceedtothesystem'sevaluation,whichisthesubjectofournextsection.
3.2 Evaluation & Analysis of results
Asmentionedintheintroduction,theimplementationoftheSemanticMatcheris
tryingto testthehypothesisthatintegratingfolksonomies(seenasbagsofwords
here)intoformalontologiesallowsforeffectiveandefficientmatchingwhereusing
ontologiesalinewouldhavecausedfailedorpoormatching.Belowwewillseeto
whatextentourhypothesisisconfirmed.
95 OtherretrievalmodelsaretheBooleanModel,RegionModels,theProbabilisticModel,the2PoissonModel
etc.;see(Hiemstra2009;Croftetal2010:237300)
63
3.2.1 Effectiveness
In this section I describe how the Semantic Matcher was evaluated and how the
results are to be interpreted. The evaluation process is inspired by techniques in
Information Retrieval but had to diverge from mainstream methodologies due to
some special restrictions thatIwillexplain.TheSUMOrepositoryofontologies96
servedasatestbedforobjectiveevaluation.Somedegreeofsubjectivity(intheformof
assumptions)wasalsonecessarybutIwilltrytoshowthatthisdoesnotaffectthe
qualityoftheresults.
Searchenginesaretypicallyevaluatedwith testcollections (Cleverdon1970),which

havethreecomponents:1)asetofdocuments('corpus'),2)asetofqueries('topics')
and 3) relevance judgements (i.e. human judgements on which documents are
relevant and which documents are nonrelevant for every query)97. This is an
objectivebasisforevaluationsincedifferentsearchenginescancomparetheirresults
usingthesametestcollection.Themostcommonlyusedmetricisrecallandprecision
atparticularranks98.InordertoevaluatetheSemanticMatcherweneedtomeetall
threerequirements.The first oneiseasy:theSUMOrepositorycontainsmorethan
600ontologyfiles(i.e.differentversionof38ontologies)andeachoneofthemcan
serveasacollectionofdocuments.Rememberthatdocumentsareinfactbagsof
words,or'senses'forlexemesinoneontology.So,anyofthe645ontologyfilesthatI
usedcanbeseenasacollectionofdocuments.Thethirdrequirementcanbefulfilled
ifwetakeconceptsthathavebeenrenamedfromoneontologyversiontoanother.For
96 http://sigmakee.cvs.sourceforge.net/viewvc/sigmakee/KBs/
97 Relevancejudgementsshouldbedoneeitherbythepeoplewhoaskedthequestionsorbyindependent
judgeswhohavebeeninstructedinhowtodeterminerelevancefortheapplicationbeingevaluated.(Croftet
al.2010:308)
98 For example, precision at rank 5 means: Among the top 5 documents in the ranking, how many were
relevant?(theidealwouldbe5,inwhichcasewehave100%precision)andrecallatrank5means:Among
alltherelevantdocumentsinthetestcollections,howmanywereretrievedinthetop5ranks?(Obviously,
hereitisimpossibletohave100%recallunlessthenumberofrelevantdocumentsinthewholecollectionis
5).
64
example, the lexeme 'Corn' from the SUMO Midlevel ontology version 3399 has
changed to 'MaizeGrain' in version 34. Since this is an official renaming of the
concept,wecanbesurethatthesetwolexemesaresynonyms,thereforewehave
excellentrelevancejudgementsifweusethebagcreatedfor'MaizeGrain'asaquery
and all the bags corresponding to lexemes in version 33 of the ontology as
documents: 'Corn' is relevant to 'MaizeGrain'. The second requirement is more
difficult to meet, as it is hard to find ontology versions where more than one
renaminghasoccurred.Inotherwords,whilewehaveasetofdocumentswecan't
haveasetofquerieswithrelevancejudgementsforthisparticularsetofdocuments.
Usuallywehaveoneoratmosttwoqueries.Anotherconstraintisthatforevery
query,thereisonlyonerelevantdocument.Forexample,for'MaizeGrain'only'Corn'
isrelevant100.Thismeansthatevaluationmeasureslikerecallandprecisionwouldbe
meaningless:forexample,if'Corn'returnsinrank1,wehave100%recalland100%
precisionatrank1,100%recalland50%precisionatrank2andsoon;ifitreturnsin
rank2,wehave0%recalland0%precisionatrank1,100%recalland50%precisionat
rank2.Evenifwefindsuchnumbersuseful,onequeryisnotenoughtoassessthe
qualityoftheSemanticMatcher.Ifwetakeanotherpairofsynonyms(againofficially
renamed concepts), say, 'TelevisionReceiver' from the Communications ontology
version4and'Television'inversion3,wehaveadifferentsetofdocumentsnow
(sincewearesearchingthroughlexemesofadifferentontology).Allthisshowsthat
IRtechniquescannotbeappliedtotheletter.WhatIdidinsteadwastakeallthe
guaranteedsynonymsfromdifferentSUMOontologies101,treatthebagofthenew
99 http://sigmakee.cvs.sourceforge.net/viewvc/sigmakee/KBs/Midlevelontology.kif?revision=1.33
100RelevancejudgementsareusuallybinaryinInformationRetrieval(i.e.adocumentiseitherrelevantornon
relevant;nothinginbetween).Similarly,inthisproject,Itrytofindtheoneandonlyexactsynonymforthe
surprisinglexemefromlexemesinthePlanningAgent'sontology.Sensesimilarityisonlyusedasameansto
predictidentity.Aswewillseeinthelastchapter,itwillbeinterestingtoextendagentcommunication
systemslikeORStohandlesemanticsimilarityasopposedtoidentity.Forexample,ifthePAdoesnothave
an exact synonym in its ontology, can it achieve its goals or something close to its goals with similar
predicates,classesorindividuals?(seesection4.3)
101Ihadalreadyexcludedcasesofcorrection,inwhichbothtermsappearedwhensomeoldnameshadbeen
mistakenlyleftintheontology.Ialsoexcludedcaseswhereanamewasdifferentinthenextontologyversion
buttherewasnoofficialrenaming.Thisreasonablesince,forinstance,thedifferentassertions (subclass
Investor CognitiveAgent) and (subclass Investor SocialRole) [from FinancialOntology
versions2and3respectively]don'tmeanthat'CognitiveAgent'issynonymousto'SocialRole'.
65
lexemeasaqueryandthebagsofthelexemesintheontologypriortotherenaming
asdocuments;thenseehowwelltheSemanticMatchercanpredictsynonymy.
Producing accurate results is of high importance in an agent communication

environment.Typically,ifthetwoagentsusedifferentontologiesandthediagnostic
algorithmdeterminesthatthereisasemanticmismatch,itfollowsthatfortheSPA's
lexemethereisonlyonelexemeinthePA'sontologywhichithastomatch.Thisisas
if we submit a query to a search engine and expect the one and only relevant
documenttobereturnedtous.Inotherwords,nomatterhowmanylexemesthere
areinoursearchspace102,itiscrucialthattherightlexemereturnveryhighupinthe
ranking103.
Belowareallthetestcollectionsavailableforevaluationandthesystem'sprediction.
Theoutputoftheevaluationmodulecanbeseenintheappendix.
102Forexample,intheMidlevelontologiesthatIuseformyagentcommunicationscenario(section3.3)the
enginehastosearchthroughmorethan1,000differentlexemes.
103ideallyinrank1,buteveniftheycomeinrank2or3(orslightlylowerdependingonourthreshold),itwill
notbedisastrousbecausethePAcantryallofthemonebyone.IntheORSscenariopresentedinsection3.3,
bothofthesemanticmismatchesthatarise,havetheirsynonymreturnedinrank1and,forthemoment,the
agentdoesnottryanyotherlexemesinaloopeveniffailuredidoccur.Thisiseasytoimplementinthe
futurebutnotpracticallynecessaryforourdemonstration.
66
QUERY DOCUMENT RELEVANCE RANKOF

(fromthenextversionof COLLECTION JUDGEMENT RELEVANT
theontologyin
DOCUMENT LEXEME
COLLECTION) (SemanticMatcher
output)
Test ComputerReport QoSontology.kif ComputerReport 2

collection version3 =Report
1
Test coordinates MilitaryProcesses.kif coordinates 1
collection version13 =coordinate
2
Test DrinkingCup Midlevel DrinkingCup 1
collection ontology.kif =Cup
3 version34
Test ElectricalConductor engineering.kif ElectricalConductor 1
collection version18 =Conductor
4
Test familyName Midlevel familyName 1
collection ontology.kif =lastName
5 version54
Test FishingIndustry naics.kif FishingIndustry 1
collection version2 =Fishing
6
Test Flammable Midlevel Flammable 6
collection ontology.kif =Combustible
7 version87
Test FluidCylinder engineering.kif FluidCylinder 3
collection version11 =Cylinder
8
Test incomeOf FinancialOntology.kif incomeOf 3
collection version2 =income
9
Test InformationIndustry naics.kif InformationIndustry 3
collection version15 =Information
10
Test JuniorCollegeIndustr naics.kif JuniorCollegeIndustr 2
collection y version21 y
11 =JuniorColleges
Test MaizeGrain Midlevel MaizeGrain 2
collection ontology.kif =Corn
12 version33
Test ProjectileLauncher Midlevel ProjectileLauncher 1
collection ontology.kif =Launcher
13 version86
Test ProjectileWeapon Midlevel ProjectileWeapon 1
collection ontology.kif =ProjectileLauncher
67
14 version87
Test RepublicOfGeorgia Economy.kif RepublicOfGeorgia 1
collection version18 =Georgia
15
Test RiceGrain pictureList.kif RiceGrain 2
collection version23 =Rice
16
Test ScientificLaw engineering.kif ScientificLaw 3
collection version4 =Law
17
Test TelevisionReceiver Communications.kif TelevisionReceiver 1
collection version3 =Television
18
Test TurkeyBird Economy.kif TurkeyBird 1
collection version25 =Turkey
19
Test VehicleTire Midlevel VehicleTire 1
collection ontology.kif =Tire
20 version86
Test WaterVehicle Midlevel WaterVehicle 1
collection ontology.kif =Watercraft
21 version13
Table4:TestcollectionsfortheevaluationoftheSemanticMatcher
Theresultsinthelastcolumncanberepresentedinthefollowingpiechart,which
showswhatproportionofthe21correctmatcheswasreturnedinrank1rank3orin
lower ranks. As we can see, the majority of the correct lexemes are in rank 1,
approximately1/5ofthemcomeinrank2andanequalnumbercomeinrank3.A
smallportionofthemcomesinalowerrank.
68
5%
19%
Rank 1
Rank 2
Rank 3
Lower ranks
57%
19%
Howarewetointerprettheseresults?Thisiswheresomequalitativeanalysishasto
comeintoplay,sincetheSemanticMatcheroutputhastobeevaluatedinthecontext
ofORS.Inanagentcommunicationenvironmentitisimportantthattherightlexeme
isina'highenough'rank.Itdoesnotneedtoberank1sincethePlanningAgentcan
trythesecondoptionandthenthirdandsoon,ifitsplanfails104.Buthowhighis
'highenough'?SincePlanningAgentsincanperformanumberofontologyrepairs
(if we assume dissimilarity between their ontologies) before they achieve their
goals105, it would be acceptable to set something like rank 3 as a threshold for
semanticmatching.InpracticethismeansthatifthePAhastotrythreedifferent
lexemesbeforeitsplansucceeds,itwouldnotbedisastrous.Infact,evenamore
permissivethresholdmightbepossiblesincethePAdoesnotneedtoreplan106;only
tryonewordaftertheother,whichwouldtakeperhapslessthan1secondtogether
withtheSPA'sresponses.Giventhisreasonableassumption,thegraphabovetellsus
that95%ofthe'right'lexemescomeinahighenoughrankandamongthem,abig
104notcurrentlyimplemented
105whichisreasonablesincethePAandSPAshavedifferentontologies.
106althoughinthecurrentimplementationitdoes;thiscaneasilychangeinthefuture
69
portioncomesinrank1,whichisideal.Whatisalsointerestingtoseeisthatresults
in the first ranks contain words with similar meanings to the 'correct' one. For
example, the top6 results for 'VehicleTire' are 1) Tire (the synonymous one), 2)
VehicleWheel, 3) Wheel, 4) MaterialHandlingEquipment, 5) ArtilleryGun, 6)
Motorcycle.ThefulllistscanbeseenintheAppendix.
Another point where some subjectivity is involved is the choice of URIs107 which
annotatetheSPA'slexemes.AsIexplainedinsection3.1.1.3,wemakethisassumption
abouthowontologyengineersmighthelptodisambiguatethewordstheyuse.This
issomethingthatalreadyhappensinRDF(S)andOWL108ontologies,so,Ibelieve,it
is only a matter of time until this becomes a common practice for firstorder
ontologies,liketheonesderivedfromSUMO.TheURLswerechoseninasystematic
way but with some 'filtering': the lexeme which served as a 'query' (e.g.
'DrinkingCup')wastokenised(i.e.'DrinkingCup')andtypedintoGoogle;theURL
waschosenfromthetop20results.Thisprocesscouldnotbeautomatedbecausewe
hadtomakesurethatthewebpageis appropriate (i.e.static(HTML)asopposedto
dynamic(e.g.php),withenoughtext(typicallyoneparagraphormore),withatext
thatsomehowdescribesthemeaningofthelexeme109).
TheabovediscussionsuggeststheSemanticMatcherissuccessfulif,givenappropriate
URLsfor'query'lexemes110,itcanreturntherightlexemeinahighenoughrank.What
the'right'lexemeiscanbeobjectivelydeterminedfromtherenamingsintheSUMO
repository.'Highenough'and'appropriate'aresubjectivejudgements,butIshowed
whatthesenotionsmightmeaninanagentcommunicationcontext.Aswesawin
section3.1.1.3,the'appropriate'bagforalexemedoesnothavetocomefromaURL.
Thebestpracticeistogetontologyengineerstoprovidetagdataforthewordsthey
107URLsinparticular
108althoughquiteoftentheURIspointtononretrievableresources
109Forexample,ifwetypetheword'university'intoGoogle,thetoprankedresultsareverylikelytobepagesof
particularuniversitiesratherthanpagesthatdescribeswhatauniversityis.
110whatwewouldcall'surprisinglexemes'inthecontextofORS.
70
useasthiswouldcapturetheirintendedmeaningbetterthanURLscan(seesection
4.2).
With the above assumptions in mind, the Semantic Matcher produced very
satisfactoryresults.Ofcourse,becauseofthesmalldatasetusedforevaluation,it
wouldbetoooptimistictoinfertoomuchfrom95%ofcorrectresultsinhighranks.
However,theseresultsaredefinitelyencouragingandshowthatusingInformation
Retrievaltechniquestosolvesemanticmismatchesinthefutureisontherighttrack
andcanbeveryeffective.
3.2.2 Efficiency
Aswesaidinchapter1,efficiencyisveryimportantinagentcommunicationsincea
PlanningAgentmighttrytoachieveagoalbycontactingmanydifferentagentsand
repairingitsontologyanumberoftimes.Thiswasoneofthereasonswhyreasoning
or other 'deep' processing was avoided. To make the semantic matching process
faster,thesenses(bags)ofthelexemesinthePA'sontologyandtheirweightscanbe
precomputedbeforeagentcommunication(e.g.whilethe'ORSplugin'isinstalledto
thePA),eventhoughtheprocessdoesnottakemorethan4secondsforthelargest
ontologies111.ThebagswiththeirweightedwordsarestoredinaPythonfileasa
dictionaryandcanbeloadedbythecompute_similaritymoduleveryfast.Theactual
matchingtakeslessthan1second,whichissuitableforORSsinceitcanbecomputed
once during agent interaction even when the PA needs to try the lexeme ranked
secondorthird.Thiswasachievedbyusingdatastructuressuchassetsinsteadof
listswherepossible112,andalsolimitingthesearchspaceaccordingtowhetherthe
matcherislookingforarelationoraclass/individual113.
111 i.e.theMidlevelontologies(wherebagsofmorethan2,000lexemeshavetobecreated).Computational
timeismeasuredonalaptopwithanIntelCore2processor.
112 Setslackpositioninformationandcontainnoduplicates,thereforetheyarea'lighter'datastructurethan
lists.
113Classesandindividualsaretreatedassimilar(bothstartwithacapitalletter)intheSUMOontologies.Their
onlydifferenceisthatindividualsareinstancesandnotsubclassesofthetoplevelconcept'Entity'.
71
Theaboveresultsseemtosupportourinitialhypothesisthatmatchingcanbemade
not just possible but also effective and efficient when semistructured data (here
representedasbagsofwords,later(section4.2)relatedtofolksonomies)enterthe
semanticsofformalontologies.
3.3 Integration with ORS
After the Semantic Matcher was implemented and evaluated, it had to be

incorporated into the Ontology Repair System (ORS). To demonstrate an agent
communicationscenariowithinORSIneededtocreateaPlanningAgent(PA)anda
ServiceProvidingAgent(SPA)fromexistingPrologtemplates114andequipthemwith
knowledgebases (i.e.ontologies).Asmentionedearlier,ORSisatoolthatcanbe
pluggedintoaPAinordertohelpitcommunicatewhenitneedstorequestservices.
GiventhattheSemanticMatcherisdealswithSUMOanditssubontologies115,the
PA'sontologyhadtobedrawnfromtheSUMOrepository.IchosetouseMidlevel
ontologyversion34116fortworeasons:1)someofitslexemeshavebeenrenamedin
laterversionsand 2)itislongandthereforehasalargesearchspacefor'candidate'
lexemes(>1,000),whichwillbetterdisplaythesystem'scapabilities.TothisontologyI
hadtoaddactionconcepts,withtheirpreconditionsandeffects(seesection1.1)since
theSUMOontologiesarenotdesignedtobeusedforagentcommunicationandare,
therefore,staticasopposedtodynamic.Ialsoaddedsomefacts(i.e.assertions)that
wererelevantforthisscenario.Noneoftheseadditionsinterfereswiththequalityof
theSemanticMatcheroutputand,thus,shouldnotbeseenasadhoc.Moreover,Ikept
theadditionallinesasclosetothetheoriginalaspossiblebyfollowingtheSUOKIF
114Infact,manySPAscouldhavebeenused,butthiswouldnotprovideuswithabetterdemonstrationofthe
SemanticMatcher.
115AlthoughitcaneasilybeextendedtosupportanythingwritteninSUOKIF
116http://sigmakee.cvs.sourceforge.net/viewvc/sigmakee/KBs/Midlevelontology.kif?revision=1.34
72
syntax, reusing existing SUMO classes and relations and respecting all the type
restrictionsspecified117.
OurPlanningAgent(called'JerrytheBot')wantstobeemployedasanartistatthe
ScottishNationalGalleryofModernArtandthereforeitsgoalistobringabouta
stateofaffairswherethefollowingistrue:
employs(scottishNationalGalleryOfModernArt, jerryTheBot).
BeforeagentinteractionbeginsORScallstheSemanticMatcher,whichprecomputes
allthebagsofwordsfromthecandidatelexemesinJerry'sontology.Toachievehis
goalJerryplanshiscourseofaction(specifiedinthepreconditionsandeffectsofhis
actionconcepts(seeAppendix))andwilltryto 1) beapproved(bychangingthe
worldsothatexpressingApproval(jerryTheBot)istrue)andoncethisisachieved2)
behired(bymakingappointing(jerryTheBot) true).Theseactionscanbeperformed
byoneormoreSPAs.Inthisscenarioitis'TomRecruiterAgent'thatcanofferthis
service. Jerry contacts Tom and asks him to perform the first action. Tom starts
submitting queries to Jerry in order to check that he meets all the prerequisites
('preconditions')forbeingapprovedasanartist.DuringthisprocessJerryreceivesa
surprisingquestion,soplanexecutionfails.ORStriestoresolvethisbydiagnosing
theproblemanddecidesthatweneedtoperformsemanticmatchingbecausethe
word drinkingCup was a surprising lexeme. Then it requests Tom for the URI of
drinkingCupandgivesitasinputtotheSemanticMatcher,whichreturnsaranking
ofthebestcandidatelexemesfromJerry'sontology.Thefirstcandidate(cup)istried:
theORSRefinementSystemreplacescupwithdrinkingCupinJerry'sontology.Then
Jerry,whopersistsinhisgoal,replans118andthefirstactionsucceeds.Thenhegoes
117Thismeansthatifwerunatheoremprover(e.g.Vampirehttp://www.vprover.org/)ontheseontologyfiles,
theyshouldn'tfindanyinconsistencies.
118 CurrentlyORSreplanseverytimeitencountersamismatchedlexeme,althoughitwouldbemuchmore
efficienttotryeachoneofthe'candidates'withouthavingtofindanewplan.Furthermore,thebagsof
weightedwords,canbeprecomputedatthebeginningoftheinteraction,arecomputedeverytimethePA
hastoformanotherplan.ThisisalimitationofthecurrentORSimplementationbutthesystemcaneasilybe
73
on to ask Tom to appoint him. Plan execution will fail again because of another
semanticmismatch,butitwillresolvedagainandintheendJerrywillachievehis
initialgoal(SeeAppendixforORSoutput).
TheintegrationofthetwosystemswasdoneinconjunctionwithFionaMcNeill,the
creatorofORS.TheSemanticMatcherisimplementedinPythonandiscalledfrom
within ORS (written in Prolog) through the Unix shell. The Python modules are
designed to execute their main function with arguments specified externally, as
commandlineoptions.
Summaryofchapter3
InthischapterIpresentedtheSemanticMatcher,asystemthatresolvessemantic
mismatchesbetweenagents'ontologiesduringagentcommunicationandIdescribed
howitwasintegratedintotheexistingOntologyRepairSystem(ORS).ThisnewORS
modulewasdesignedinanattempttomeetbothimplementationchallenges(i.e.need
foreffectivenessandefficiency;limitedaccesstoSPA'sontologyetc.)and theoretical
challenges (i.e.constructionofintensionalmeaningintheagent's'head').Evaluation
gave us encouraging results and showedthatpractical problems wereadequately
solved,hence,ourfirstgoalwasachieved.Whetherthisisanacceptabledesignata
theoreticallevelwillbeansweredinthenextchapter.
extendedtohandlethisinthefuture.
74
CHAPTER 4 Discussion
In chapter 2 I exposed the problem of symbol grounding, which is inevitable in

formalontologiesandexplainedwhyIthinkthisisthemainobstacleformeaning
sharingbetweenagents.Inchapter3Ipresentedasemanticmatchingsystemthat
tries to provide a solution to this problem by building a mental representation
('sense') for each lexeme in the Planning Agent's ontology which would help
determinereferencetorealworldentities.Insection4.1Iwillshowwhyusing'folk'
mental representations like 'bags of words' (as opposed to formal definitions) is
justified by theories of conceptual structure within Philosophy of Language and
CognitiveScience. Later on (section 4.2) I will discuss what implications such an
approachmayhaveforthefutureofOntologyEngineering.
4.1 Theoretical justification
Insection2.3.2wesawthatsenses,or'concepts'suchasCATresideinourmindand
actasreferencedeterminingmediatorsbetweenaword('cat')andtheworld(setof
catsintheworld).Butwhatisthestructureoftheserepresentations?WhatisCAT
composedof?
ManymodelsofconceptualstructurehavebeenproposedintheCognitiveScience
and Philosophy of Language literature119. Below I will briefly examine four basic
theories:1)theClassicalTheory,2)thePrototypeTheory,3)theAtomicTheoryand4)
theDualTheory.
119 Allthetheoriesofconceptualstructureexaminedinthispaperhavebeendiscussedextensivelyinboth
PhilosophyofLanguageandCognitiveScience.
75
According to the Classical Theory, which dates back to Plato's dialogues (see
(LaurenceandMargolis1999:14))andhasbeenpopularforcenturies,conceptshave
definitional structure, that is they are composed of features that are individually
necessaryandjointlysufficientforfixingthedenotation.ForexampleBACHELOR
canberepresentedinthemindasUNMARRIED+MALE.Thistheorywasappealing
because it was in keeping with the Principle of Compositionality120 and could
determinereferencebyassemblingfeatures:wecandeterminewhatentitiesinthe
worldbelongtothesetofbachelorsifwecheckwhethertheysatisfyalloftheabove
conditions(i.e.beinginintersectionofthesetofunmarriedentitiesandmaleentities)
(Thisiscalledthe'checklisttheory'in(Aitchison1994)).But,isthePopeabachelor?
Although he fulfils the requirements,he is nota typical bachelor.Such a mental
representation isvery closetoformal definitions in ontologies,butifwewant to
achievealanguageworldconnectionforagentcommunication,weshouldlookfor
another theory of concepts; one that can achieve the same for humans. To solve
typicalityproblems,asinthecaseofthePopebeinga'bachelor',RoschandMervis
(RoschandMervis1975)developedthefamousPrototypeTheory.
In the Prototype Theory concepts point to fuzzy sets in the world, where
membershipisgraded.Forexample,oneentitycanbeapropermemberofaset,
anotherentitycanbeless'welcome'intheset121.Referenceisdeterminedbychecking
howmanyfeatureseachentitysatisfies(e.g.thePopeisMALEandUNMARRIED
butnotELIGIBLEFORMARRYING),whichtellsushowwellthisentitybelongsto
thesetofthingsthattheconceptdenotes.Oneseriousproblem,however,isthatthis
theorydoesnotconformtothePrincipleofCompositionality,sincetypicalitydoes
notcompose(e.g.atypicalpetfishisnottheintersectionoftypicalpetsandtypical
120AccordingtothePrincipleofCompositionality,themeaningofacomplexexpressionisafunctionofthe
meaningsofits partsplus syntax.Thisprinciple,whichisfundamentalinsemantictheory,isattributedto
FregeandwasadoptedandfurtherdevelopedbyRichardMontague(Montague1970).
121 Thishasconsequencesforclassicallogic,whichpresupposesthatanassertioniseithertrueorfalse.In
ProtypeTheoryweareallowedtosaythatbachelor(Pope)is,say,30%truewhileforamanwhoiseligible
formarriagebutpreferstobesingle,letscallhimGeorge, bachelor(George)canbecloseto100%true,
becauseGeorgeenjoysahighstatusinthesetofbachelorswhilethePopedoesnot.Tohandlefuzzysets,we
needFuzzyLogic(Zadeh1965),wheretherearemorethantwotruthvalues.
76
fish,becausetypicalpetsarecats,dogsetc.;thus,themeaningofthewholeisnota
functionofthemeaningsofitsparts).JerryFodor(Fodor1998)attemptedtotackle
thisproblembyintroducingtheAtomicTheory.
TheAtomicTheory(alsoknownasConceptualAtomism)positsthatconceptshave
nostructureatall,thatistheyareatoms.Forexample,BACHELORisrepresentedin
themindasBACHELORandnothingelse.Buthowisreferencedeterminednowthat
wehavenofeaturestocheckentitiesagainst?AccordingtoFodor(ibid)thereisa
causallinkbetweenthepropertyexhibitedbythesetofthingsintheworldandthe
concept.Forexample,whenweseeabachelor,theirpropertyofbelongingtotheset
ofbachelorscausesustoentertaintheconceptBACHELOR122.Thisisaveryattractive
theory because it solves the compositionality problems discussed above (since
complex concepts can be composed of atoms) but not as strong as the Prototype
Theoryinexplainingwhyaconcept,sayBIRD,appliesbettertosomeentities(e.g.
sparrows)thantoothers(e.g.penguins).Togetroundthisproblem,Laurenceand
Margolis (Laurence and Margolis 1999) propose the Dual Theory of conceptual
structure.
Accordingtothe DualTheory, conceptsarecomposedofatoms(asintheAtomic

Theory),butthisdoesnotexhausttheconcept'scontentasitisonlyitscore.Around
thecorethereisa periphery;an'identificationstructure'thathelpsusidentitythe
'good'andthe'bad'membersofaset.Forexample,ourconceptfor'bird'isanatom,
BIRD,butaroundthisatomthereissomestructurethathelpsustelltypicalbirds
fromnontypicalones.ThistheoryisasstrongastheAtomicTheoryinaccounting
forcompositionalityandfixingreferenceandasstrongasthePrototypeTheoryin
122JustificationforthiscausalrelationcanbefoundinDretske'sInformationbasedSemantics(Dretske1981)
andKripke's'causaltheoryofreference'(Kripke1972):whenaneventAcausesaneventB,thenBcarries
informationaboutA.Forexample,abrokenwindowcarriesinformationaboutsomekind ofeventwhich
precededthebreaking,thewrinklesonsomeone'sfacecarryinformationabouttheperson'sageandsoon.
ForadiscussionofInformationBasedSemanticssee(Margolis1998:349).
Furthermore,weshouldnotethatFodor'sinitialargumentthatallconceptsshouldbeinnate(Fodoretal1980,
citedinLaurenceandMargolis1999:63)wasabandonedlater(Fodor1998).
77
explainingourintuitionsof'typicality'.Theperipheryoftheconceptcanbeseenas
anepistemicstructure(i.e.astructurewhichencodesencyclopaedicknowledge)since
ourknowledgeofwhatcountsas,say,atypicalbachelorhelpsusidentifygoodand
badexamplesofBACHELOR.
Now,let'sgobacktothesensesforlexemesthatIproposedinthepreviouschapters.
Asmentionedearlier,itisreasonabletoexpectthatontologyengineersinthefuture
will annotate the lexemes of their formal ontologies with URIs123 since these
identifiersarealreadyusedasnamesforrelationsandentitiesinRDFandOWL.
URIs,ifpresent,canserveasthecoreofthementalrepresentation(inLaurenceand
Margolis'terms(ibid)),sincetheirjobistouniquelyandunambiguouslyidentifyan
informationresource(digital,physicalorabstract)(BernersLeeetal.1998).Inour
context this means that if two different lexemes in two agents' ontologies are
annotatedwiththesameURI,theycanbeseenassynonymous.Inchapter3Ishowed
thatlexemescanacquiremeaningbybeingenrichedwith'bagsofwords':thesebags
willserveastheperipheryofthementalrepresentation,thatistheworldknowledge
that surrounds thecore.An analogybetween theDualTheoryandtheapproach
followedinthispapercanbeseeninthetablebelow:
DualTheory core periphery

ofconcepts (CAT) (encyclopaedicknowledgeaboutcats)
Semantic URI bagofwords

Matcher (e.g.http://www.cats.com/setofcats#) (e.g.[cat, cat, whisker,
mouse, jump, pet, cat,
domestic, domestic, breed,
felis, claw])
Table5:Thestructureofthementalrepresentationsfor'cat'intheDualTheoryandthe
SemanticMarcher
Oneproblem,however,isthattheexistenceofURIs,thoughreasonable,cannotbe
123SUMOalreadyhastherelationuniqueIdentifier/2,whichcantakeastring(e.g.aURI)asargument1
andanentity(whatIcall'lexeme')asargument2.However,itisfairtosaythatannotationwithURIshardly
existsatthemoment.
78
guaranteedforeveryontologylexeme.Inaddition,URIsmightnotbeasuniqueand
unambiguousastheyclaimtobe124.Hence,bagsthemselvesshouldbeabletofixthe
denotation. Reference determination might not be perfect or completely
unambiguous but some semanticgroundingcanbeachieved:thelargerandmore
'appropriate'thebags,thebetterthesemanticgrounding125.
Nowthatwehaveseenwhatmodelofconceptualstructureliesbehindourapproach
to sense creation within the Semantic Matcher, we can go on and see what
implicationssuchanideacanhaveforOntologyEngineering.
4.2 Implications for Ontology Engineering
In section 2.3.3 we saw that bags of words can seen as little folksonomies and
visualised as tag clouds. Equipping lexemes of formal ontologies with informal
sensesamountstobringingfolksonomiesinsideontologies.BelowIwillshowthatifthe
combination ofthesedifferentkinds ofrepresentation is adoptedas an Ontology
Engineeringpractice,itcanminimisetheproblemofsymbolgroundingandhelp
agentsinteroperatemoreeffectively.
Folksonomyisabottomup,notcentrallycontrolledclassificationsysteminwhich
structureemergesoutofthepracticeofuserslabellingdigitalresources('objects')
withkeywords('tags').VanderWaldistinguishesbetween broad folksonomiesand
narrow folksonomies (Vander Wal 2005; see also Weller 2007). The former are
createdwhenaparticularobjectcanbetaggedbydifferentpeoplesothesametag
124 For example, two ontology engineers might use the same URIs with a different intended meaning
(ambiguityofURIs),ormightcreateanewURIforobjectsforwhichaURIisalreadyavailable('synonymy'
ofURIs).(seealsoHayesandHalpin2008)
125Asmentionedinsection3.2.1,'appropriate'bagsareonesthatarerepresentativeofthelexeme'sintended
meaning.
79
canbeassignedmorethanonce.ForexampleaDelicious126 bookmarkabout,say,
chocolatecanhavetheword'recipes'assignedtoit600times,theword'chocolate'578
times,theword'food'423timesandsoon.Thispatternrevealssometrendsasto
what vocabularies are generally considered appropriate to describe this resource.
Narrowfolksonomies,ontheotherhand,areformedinsystemswhere oneobject
canbelabelledonlybyitsauthorwithdistincttags.Forexample,aFlickr127usercan
submitaphotographandannotateitwithkeywordssuchas'surfing','waves','beach'
and'summer'.Ifitismadepubliclyavailable,itcanbefoundbyotheruserswho
searchforphotosabout'surfing'or'waves'andsoon.
Hereweareconcernedonlywithbroadfolksonomiesbecausetheyhavethesame
structureasourbagsofwords.WhatIsuggestistousebroadfolksonomiesasfolk
descriptionsofnotjust digitalresources (e.g.Deliciousbookmarks)butalso physical
andabstractresources(i.e.entitiesintheobjectiveworldorideas)denotedbylexemes
inaformalontology.Asdiscussedinsection3.1.1.3,ontologyengineerscanannotate
theirlexemeswithtagsoftheirchoiceorwithURLsthatpointtocollaboratively
created broad folksonomies. Alternative or complementary practices could be
annotationswithWordNetsynsetsorURLswhichpointtonaturallanguagetext,as
wasimplementedintheSemanticMatcher.Anybags,orfolksonomiesthathavebeen
createdbyontologyengineerscanbeenrichedwithprocessesinthesamespiritas
theSenseCreationalgorithmthatIdescribedinchapter3.Thisisnotonlyapractical
way to achieve some semantic grounding for lexemes in ontologies, but also a
demonstrationofhowontologiesandfolksonomiescanworkintandemtofacilitate
meaning disambiguation, which has immediate applications in agent
communication. The fact that a semantic matcher with bags of words produces
encouragingresultsmakessuchasuggestionforontologyengineeringplausible.
126http://www.delicious.com/
127http://www.flickr.com/
80
4.3 Implications for Ontology Matching
The 'bag of words' model of representing senses for lexemes enables the
implementation oftheideaofmeaningsimilarityas opposedtoidentity,because
even in human language perfect synonymy is impossible. Furthermore, since
different ontologies are created by different humans, their conceptualisations of
ontologytermsmightbedifferentandcanonlybecomparedforsimilarity,thatis
semantic 'distance'. Similarity cannot be measured with analytic tools such as
theoremprovers.
IntheVectorSpaceModelwesawhowlexemesarerankedfromthemosttotheleast
relevantonthebasisofhowsemantically similar theyaretotheSPA's lexeme.In
section2.1wetalkedaboutidentity:thereisonecandidatelexemewhosemeaningis
identicaltothatoftheSPA.Thisisnotcontradictoryastherankingalgorithmistrying
topredictidentity(i.e.samedenotationunderthesameinterpretation)onthebasisof
similarity(i.e.similarfolksonomiesaroundthelexeme).
However,ifontologiescomefromcompletelydifferentsources,wecan'tassumeany
sort of identity unless we happen to find the same URIs. In the future, when
communication between agents with disparate ontologies will a be tractable task,
semantic similarity will be a very important issue. Two relations might have a
differentURIbutstillbesimilar.ThePlanningAgentwillhavetousesomeofthem
tocommunicatewiththeSPAeventhoughtheymightnothaveexactlythemeaning
originallyintended.Forexample,thePAmightwanttobuyapurplebagwiththe
action buy(me, purple_bag) but after semantic matching itmightask the SPA to
execute the action buy(me, LilacBag). It might not be exactly what the PA was
intendingtobuybutstillclosetoit.
The use of similarity (semantic proximity) as opposed to synonymy (semantic
81
identity)asacriterionformatchinglexemesmightpavethewayforsystemslikeORS
to deal with more heterogeneous ontologies, where meaning identity cannot be
presupposed.
Summaryofchapter4
InthischapterIbrieflydescribedfourmodelsofconceptualstructureandshowed
howtheDualTheoryprovidedthetheoreticalframeworkfortheapproachtosense
creation that I followed during the design of the Semantic Matcher. I also
demonstrated how such a design favours the combination of ontologies with
folksonomiesandmadesomesuggestionforontologydesigninthefuture.Finally,I
showed how the system I presented supports semantic matching with respect to
similarityandnotnecessarilyidentity.
CONCLUDING REMARKS
Inthepreviouschapterswesawthatontologymismatchinanagentcommunication
environment is inevitable and ORS is the first example of a system that has the
infrastructure(theoreticalorimplemented)fordealingwithitinthisway.Themost
frequently occurring type of heterogeneities, namely semantic mismatch, was
addressedinthispaper,wheretheSemanticMatcherwaspresented.
InthecourseofbuildingthisnewORSmodule,manydesigndecisionshadtobe
made,themostimportantbeingextendingthecurrentsystemtoworkwithgenuine
ontologies, hence, the SUMO parser and the SUOKIFtoProlog translator were
implemented.ForthepurposesofdemonstratingagentinteractionwithinORS,some
82
slightmodificationtotheinputontologieshadtobeperformed.Thiswasdonewith
the addition of some facts and Action Concepts to the SUMO ontologies, always
makingsurethatthenonadhocnatureoftheSemanticMatcherisnotaffected.
TheSemanticMatcherwascreatedonthebasisofInformationRetrievalprinciples,
treatingsensesforlexemesintheontologyas'bagsofwords'.Aswesawearlier,this
wasnotjustanengineeringdecisionbutalsoaproposalforincorporatingunordered
sets of words into the semantics offormal ontologies in order to achievesymbol
grounding.ThissuggestionfoundtheoreticaljustificationintheoriesofPhilosophy
ofLanguageandCognitiveScienceandwentfurthertoshowhow folksonomiescan
enrichformalontologiesinpractice.
Givensomereasonableassumptions,theSemanticMatchergaveusveryencouraging
resultsandsupportedourinitialhypothesisthattheintegrationoffolksonomiesinto
formal ontologies can lead to effective and efficient matching, in cases where
ontologiesalonewouldhaveresultedinfailedorpoormatching.Ifnothingelse,this
study showed that facilitating communication between agents with disparate
ontologiesisnotatotallyintractabletask.Thetakehomemessageisthatontology
mismatchdeservestobeviewedmoreoptimistically.
83
REFERENCES
Aitchison,J.(1994)WordsintheMind:AnIntroductiontotheMentalLexicon,Oxford:
Blackwell.
Akinsola,T.M.(2008)AutomatedOntologyEvolution,MScthesis,Universityof
Edinburgh.
Bach,T.L.,DiengKuntz,R.andGandon,F.(2004)Onontologymatchingproblems
(forbuildingacorporatesemanticwebinamulticommunitiesorganisation,in
Proceedingsof6thInternationalConferenceonEnterpriseInformationSystems(ICEIS),
Porto,pp.236243.
Baral,C.(2010)Reasoningaboutactionsandchange:Fromsingleagentactionsto
multiagentactions(extendedabstract),Proceedingsofthe12thInternational
ConferenceonthePrinciplesofKnowledgeRepresentationandReasoning,KR2010.
Bergman,M.(2009)TheFundamentalImportanceofKeepinganABoxandTBox
Splitonlineathttp://www.mkbergman.com/489/ontologybestpracticesfordata
drivenapplicationspart2/
BernersLee,T.,Fielding,R.T.andMasinter,L.(1998)Uniformresourceidentifiers
(URI):Genericsyntax,InternetRFC2396,August1998,onlineat
http://www.ietf.org/rfc/rfc2396.txt
BernersLee,T.,Hendler,J.andLassila,O.(2001)Thesemanticweb,Scientific
American,284(5):3443.
Blizard,W.D.(1988)Multisettheory,NotreDameJ.FormalLogic30(1):3666.
Bratman,M.E.(1987)Intentions,Plans,andPracticalReason,CambridgeMass:
HarvardUniversityPress.
Brown,K.(ed.)(1994,2006)EncyclopediaofLanguage&Linguistics,Oxford:Elsevier.
Buchholz,W.(2006)OntologyinD.G.Schwartz(ed.)(2006),pp.694702.
Candlish,S.(2008)PrivatelanguageinZalta,E.N.(ed.)(2003),
http://plato.stanford.edu/entries/privatelanguage/
Carlsson,M.etal.(2010)SICStusProloguser'smanual,SwedishInstituteof
84
ComputerScience,onlineathttp://www.sics.se/sicstus/docs/4.0.7/pdf/sicstus.pdf
Carnap,R.(1947)MeaningandNecessity,Chicago:UniversityofChicagoPress.
Casati,R.(2006)EventsinZalta,E.N.(ed.)(2003),
http://plato.stanford.edu/entries/events/
Chandrasekaran,B.,Josephson,J.andBenjaminsV.(1999)Whatareontologies,and
whydoweneedthem?,IEEEIntelligentSystems14(1):2026.
Chomsky,N.(1957)SyntacticStructures,TheHague:Mouton.
Chomsky,N.andSchtzenberger,M.P.(1963)Thealgebraictheoryofcontextfree
languagesinBraffort,P.andHirschberg,D.(1963)ComputerProgrammingandFormal
Languages,Amsterdam:NorthHolland,pp.118161.
Cleverdon,C.(1970)Evaluationtestsofinformationretrievalsystems,Journalof
Documentation26(1):5567.
Corcho,O.andGmezPrez,A.(2000)Aroadmaptoontologyspecification
languagesinProceedingsofthe12thEuropeanWorkshoponKnowledgeAcquisition,
ModelingandManagement,pp.8096.
Cohen,S.M.(2008)Aristotle'smetaphysicsinZalta,E.N.(ed.)(2003),
http://plato.stanford.edu/entries/aristotlemetaphysics/
Colucci,S.,DiNoia,T.,DiSciascio,E.,M.Donini,F.M.andMongiello,M.(2006)
DescriptionlogicbasedinformationretrievalinSchwartz,D.G.(ed.)(2006),105
114.
Croft,W.,Metzler,D.andStrohman,T.(2010)SearchEngines:InformationRetrievalin
Practice,Boston:AddisonWesley.
Daconta,M.C.,Obrst,L.J.andSmith,K.T.(2003)TheSemanticWeb:Aguidetothe
futureofXML,Webservicesandknowledgemanagement,Indianapolis,IN:Wiley.
Devlin,K.(1993)TheJoyofSets:FundamentalsofContemporarySetTheory,NewYork:
SpringerVerlag.
Dretske,F.(1981)KnowledgeandtheFlowofInformation,Cambridge,Mass:MITPress.
Enderton,H.B.(2009)Secondorderandhigherorderlogic,inZalta,E.N.(ed.)
(2003),http://plato.stanford.edu/entries/logichigherorder/
85
Euzenat,J.andShvaiko,P.(2007)OntologyMatching,Berlin:SpringerVerlag.
Fellbaum,C.(ed.)(1998)WordNet:AnElectronicLexicalDatabase,CambridgeMass.:
MITPress.
Finkelstein,A.,Gabbay,D.M.,Hunter,A.,Kramer,J.,andNuseibeh,B.(1993)
Inconsistencyhandlinginmultiperspectivespecifications,inEuropeanSoftware
EngineeringConference,pp.8499.
Finn,A.,Jushmerick,N.andSmyth,B.(2001)Factorfiction:Contentclassification
fordigitallibraries,inDELOSworkshop:PersonalisationandRecommenderSystemsin
DigitalLibraries(2001).
Fitting,M.(2007)IntensionalLogic,inZalta,E.N.(ed.)(2003),
http://plato.stanford.edu/entries/logicintensional/
Flouris,G.,Plexousakis,D.andAntoniou,G.(2006)"Evolvingontologyevolution",in
Proceedingsofthe32ndInternationalConferenceonCurrentTrendinTheoryand
PracticeofComputerScience(SOFSEM2006).
Fodor,J.A.(1998)Concepts:WhereCognitiveScienceWentWrong,Oxford:Clarendon
Press.
Fortier,J.andKassel,G.(2006)OrganizationalSemanticWebsinSchwartz,D.G.
(ed.)(2006),pp.772779
Frege,G.(1892)OnSenseandReference,inLudlow,P.(ed.)(1997),pp.563583.
Fuller,M.andZobel,J.(1998)Conflationbasedcomparisonofstemming
algorithms,inProceedingsofthe3rdAustralianDocumentComputingSymposium,
Sydney,1998.
Grdenfors,P.andRott,H.(1995)Beliefrevision,inHandbookofLogicinArtificial
IntelligenceandLogicProgramming,Vol.4,Oxford:OxfordUniversityPress,pp.35
132.
Genesereth,M.andFikes,R.(1992)KnowledgeInterchangeFormatversion3.0
referencemanual,LogicGroup,ReportLogic92(1),StanfordCalifornia:Stanford
University.
Ghallab,M.,Howe,A.,Knoblock,C.andMcDermott,D.(1998)PDDL:Theplanning
domaindefinitionlanguage,TechnicalReportDCSTR1165,YaleCenterfor
86
ComputationalVisionandControl.
Giunchiglia,F.andShvaiko,P.(2003)Semanticmatching,inTheKnowledge
EngineeringReview18(3):265280.
Giunchiglia,F.,Shvaiko,P.andYatskevich,M.(2004)SMatch:analgorithmandan
implementationofsemanticmatching,ProceedingsoftheEuropeanSemanticWeb
Symposium(ESWS),pp.6175.
GmezPrez,A.,CorchoGarcia,O.andFernandezLopez,M.(2003)Ontological
Engineering,NewYork:SpringerVerlag.
Grefenstette,G.andTapanainen,P.(1994).Whatisaword,whatisasentence?
Problemsoftokenization,inProceedingsof3rdConferenceonComputational
LexicographyandTextResearch1994.
Gruber,T.R.(1992)Ontolingua:Amechanismtosupportportableontologies,
TechnicalReport,KnowledgeSystemsLaboratory9166,StanfordCalifornia:Stanford
University.
Gruber,T.(1993)Atranslationapproachtoportableontologyspecifications,
KnowledgeAcquisition5(2):199220.
Gruber,T.(2007)Ontologyoffolksonomy:Amashupofapplesandoranges,
InternationalJournalonSemanticWebandInformationSystems3(1):111.
Gruber,T.(2009)OntologyinLingLiu,L.andTamerzsu,M.(eds.)(2009),
EncyclopediaofDatabaseSystems,SpringerVerlag.
Halpin,H.,Robu,V.,Shepherd,H.(2007)Thecomplexdynamicsofcollaborative
tagging,inProceedingsofthe16thInternationalConferenceontheWorldWideWeb,
Banff,pp.211220.
HarnadS.(1990)ThesymbolgroundingproblemPhysicaD42:335346.
Hawley,K.(2004)TemporalpartsinZalta,E.N.(ed.)(2003),
http://plato.stanford.edu/entries/temporalparts/
Hayes,P.andHalpin,H.(2008)Indefenseofambiguity,InternationalJournalof
SemanticWebandInformationSystems4(3):118.
Heflin,J.(2003)OWLWebOntologyLanguage:Usecasesandrequirements,W3C,
onlineathttp://www.w3.org/TR/webontreq/
87
Hiemstra,D.(2009)InformationRetrievalModels,inGoker,A.andDavies,J.(eds)
(2009)InformationRetrieval:Searchinginthe21stCentury,WileyBlackwell,pp.119.
Hillenmeyer,M.(2005)DistanceMetrics,inHillenmeyer,M.(2005)Machine
Learning,StanfordUniversity,onlineat
http://www.stanford.edu/~maureenh/quals/html/ml/node47.html
Hofweber,T.(2004)LogicandOntologyinZalta,E.N.(ed.)(2003),
http://plato.stanford.edu/entries/logicontology/
Kalfoglou,Y.(2002)Exploringontologies,inChang,S.(ed)HandbookofSoftware
EngineeringandKnowledgeEngineering,vol.1:Fundamentals,WorldScientific
PublishingCompany.
Kripke,S.(1972)NamingandNecessity,Cambridge,Mass:HarvardUniversityPress.
Krovetz,R.(1993)ViewingmorphologyasaninferenceprocessinR.Korfhageet
al.,Proceedingsof16thACMSIGIRConference,Pittsburgh,June27July11993,pp.191
202.
Laurence,S.andMargolis,E.(1999)'Conceptsandcognitivescience'inMargolis,E.
andLaurence,S.(eds.)(1999)Concepts:CoreReadings,CambridgeMass:MITPress,
pp.381.
Lavrenko,V.(2009)VectorSpaceModel,onlineat
http://www.inf.ed.ac.uk/teaching/courses/tts/pdf/vspace2x2.pdf
Lebanon,G.,Mao,Y.andDillon,J.(2007)Thelocallyweightedbagofwords
frameworkfordocumentrepresentation,JournalofMachineLearningResearch8:2405
2441.
Lee,D.L.,Chuang,H.,andSeamos,K.(1997)DocumentrankingandtheVector
SpaceModel,IEEESoftware14(2):6775.
Levenshtein,V.(1965)Binarycodescapableofcorrectingdeletions,insertionsand
reversals,DokladyAkademiiNaukSSSR163(4):845848,translatedintoEnglishin
SovietPhysicsDoklady10(8):707710.
Lew,M.S.,Sebe,N.,Djeraba,C.andJain,R.(2006)Contentbasedmultimedia
informationretrieval:Stateoftheartandchallenges,TransactionsonMultimedia
Computing,CommunicationsandApplications2(1):119.
Lewis,D.(1998)Naive(Bayes)atforty:Theindependenceassumptionin
88
informationretrieval,ProceedingsofECML98,10thEuropeanConferenceonMachine
Learning,Chemnitz,DE:SpringerVerlag,pp.415.
Lovins,J.B.(1968)Developmentofastemmingalgorithm,MechanicalTranslation
andComputationalLinguistics11:2231.
Ludlow,P.(ed.)(1997)ReadingsinthePhilosophyofLanguage,Cambridge,Mass.:MIT
Press.
Manna,Z.(1974)MathematicalTheoryofComputation,NewYork:McGrawHill.
Manning,C.D.,Raghavan,P.,andSchtze,H.(2008)IntroductiontoInformation
Retrieval,Cambridge:CambridgeUniversityPress.
Margolis,E.(2006)Concepts,inZalta,E.N.(ed.)(2003),
http://plato.stanford.edu/entries/concepts/
McNeill,F.(2006)DynamicOntologyRefinement,PhDthesis,UniversityofEdinburgh.
McNeill,F.andBundy,A.(2007)Dynamic,automatic,firstorderontologyrepairby
diagnosisoffailedplanexecution,InternationalJournalonSemanticWeband
InformationSystems,specialissueonOntologyMatching3(3):135.
McNeill,F.,Bundy,A.andWalton,C.(2003)Planexecutionfailureanalysisusing
plandeconstruction,presentedatPlanningSpecialInterestGroup,Glasgow,
December2003.
Miller,G.A.,Beckwith,R.,Fellbaum,C.,Gross,D.andMiller,K.(1993)Introduction
toWordNet:Anonlinelexical
database,inFellbaum,C.(ed.)1998,onlineat
http://courses.media.mit.edu/2002fall/mas962/MAS962/miller.pdf
Miller,G.A.,Fellbaum,C.,Kegl,J.andMiller,K.(1988)Wordnet:Anelectronic
lexicalreferencesystembasedontheoriesoflexicalmemory,RevueQubecoisede
Linguistique17,181211.
Montague,R.(1970)Englishasaformallanguage,inVisentini,B.etal.(eds)
LinguaginellaSocietenellaTecnica,pp.189224,Milan:EdizionidiComunit.
ReprintedinMontague(1974),pp.188221.
Niles,I.,andPease,A.(2001)TowardsaStandardUpperOntology,inProceedings
ofthe2ndInternationalConferenceonFormalOntologyinInformationSystems(FOIS
2001),Welty,C.andSmith,B.(eds),Ogunquit,Maine.
89
Niles,I.andPease,A.(2003)"Linkinglexiconsandontologies:MappingWordNetto
theSuggestedUpperMergedOntology",inProceedingofthe2003International
ConferenceonInformationandKnowledgeEngineering(IKE03),LasVegas.
Ogden,C.K.andRichards,I.A.(1923)TheMeaningofMeaning,NewYork:Harcourt,
Brace&WorldInc.
Pease,A.,Niles,I.andLi,J.(2002)TheSuggestedUpperMergedOntology:Alarge
ontologyfortheSemanticWebanditsapplications,inProceedingsoftheAAAI2002
WorkshoponOntologiesandtheSemanticWeb,Edmonton,Canada.
Pease,A.(2009)"StandardUpperOntologyKnowledgeInterchangeFormat",online
athttp://sigmakee.cvs.sourceforge.net/viewvc/sigmakee/sigma/suokif.pdf
Peters,I.(2009)Folksonomies.IndexingandRetrievalinWeb2.0.,Berlin:DeGruyter
Saur.
Peirce,C.S.(19311958)CollectedPapersofC.S.Peirce,ed.byC.Hartshorne,Weiss,P.
andBurks,A.,8vols.,Cambridge,Mass:HarvardUniversityPress,.
Porter,M.F.(1980)Analgorithmforsuffixstripping,Program14(3):130137.
Putnam,H.(1975)'Themeaningofmeaning'inGunderson(ed.)Language,Mind
andKnowledge,MinessotaStudiesinthePhilosophyofScience,vol.7,Minneapolis:
UniversityofMinessotaPress.ReprintedinMind,LanguageandReality,Philosophical
Papers,vol.2,Cambridge:CambridgeUniversityPress,pp.215271.
Qu,W.,Hu,W.andChen,G.(2006)Constructingvirtualdocumentsforontology
matching,inProceedingsofthe15thInternationalWorldWideWebConference,
Edinburgh,pp.2331.
Raghavan,V.V.andWong,S.K.M.(1986)AcriticalanalysisofVectorSpaceModelfor
informationretrieval,JournaloftheAmericanSocietyforInformationScience37(5):279
287.
RamosJ.(2003)UsingTFIDFtoDetermineWordRelevanceinDocumentQueries,
FirstInternationalConferenceonMachineLearning,RutgersUniversity.
Robertson,S.andSprckJones,K.(1976)Relevanceweightingofsearchterms,
JournaloftheAmericanSocietyforInformationScience27(3):129146.
Rosch,E.andMervis,C.(1975)Familyresemblances:Studiesintheinternal
90
structureofcategories,CognitivePsychology7:573605.
Russell,S.J.andPeterNorvig(1995)ArtificialIntelligence:AModernApproach,New
Jersey:PrenticeHall.
Salton,G.,Wong,A.andYang,C.S.(1975)Avectorspacemodelforautomatic
indexing,CommunicationsoftheACM18(11):613620.
Salton,G.andMcGill,M.J.(1983)IntroductiontoModernInformationRetrieval,New
York:McGrawHill.
Saussure,F.(1916),NatureoftheLinguisticsSign,inBally,C.andSechehaye,A.
(eds),CoursdeLinguistiqueGnrale,London:McGrawHillEducation.
Schwartz,D.G.(ed.)(2006)EncyclopediaofKnowledgeManagement,Hershey,PA:Idea
GroupReference.
Searle,J.(1980)Minds,brainsandprograms,BehavioralandBrainSciences3(3):417
457.
Shapiro,S.(2009)Classicallogic,inZalta,E.N.(ed.)(2003),
http://plato.stanford.edu/entries/logicclassical/
Schmid,H.(2007)Tokenizing,inLdeling,A.andKyt,M.(eds)Corpus
Linguistics:AnInternationalHandbook,Berlin:MoutondeGruyter.
SprckJones,K.(1972)Astatisticalinterpretationoftermspecificityandits
applicationinretrieval,JournalofDocumentation28(1):1121.
Speaks,J.(2010)TheoriesofMeaning,inZalta,E.N.(ed.)(2003),
http://plato.stanford.edu/entries/meaning/
Sterling,L.andShapiro,E.(1994)TheArtofProlog:AdvancedProgrammingTechniques
(2ndedition),CambridgeMass:MITPress.
Steup,M.(2006)TheanalysisofknowledgeinZalta,E.N.(ed.)(2003),
http://plato.stanford.edu/entries/knowledgeanalysis/
VanderWal,T(2005)Explainingandshowingbroadandnarrowfolksonomies,
onlineathttp://www.vanderwal.net/random/entrysel.php?blog=1635
VanderWal,T.(2007)Folksonomy,onlineat
91
http://www.vanderwal.net/folksonomy.html
Wallace,R.J.(2008)PracticalreasoninZalta,E.N.(ed.)(2003),
http://plato.stanford.edu/entries/practicalreason/
Weller,K.(2007)FolksonomiesandontologiesTwonewplayersinindexingand
knowledgerepresentation,inProceedingsofOnlineInformation,London,pp.108115.
Wittgenstein,L.(1953)PhilosophicalInvestigations,translatedbyG.E.M.Anscombe,
Oxford:Blackwell.
Wooldridge,M.(2009)AnIntroductiontoMultiagentSystems,NewYork:JohnWiley&
Sons.
Zadeh,L.(1965)Fuzzysets,InformationandControl8(3):338353.
Zalta,E.N.(ed.)(2003)TheStanfordEncyclopediaofPhilosophy(online),
http://plato.stanford.edu
APPENDIX
A.1 Glossary
The vocabulary below is defined in the context of the Semantic Matcher. Some terms might have
different meanings across disciplines. Where there is no author citation, the definition is mine, for the
purposes of this work only.
bag of words: an unordered multiset of words, that is a set in which words

(members) can appear more than once. The frequency of the word in the bag
determines its weight (importance), although in Information Retrieval other
parametersalsocontributetotermweighting(forexample,documentlength,inverse
92
documentfrequencyetc).Thestructureofabagofwordscanbecomparedtothatof
atagcloud.Seealsoconcept,tagcloud.
candidatelexeme: AlexemeinthePlanningAgent'sontologywhichiseligiblefor
semanticmatching,thatis,itcanbecomparedtothesurprisinglexemeforsimilarity.
Seealsosurprisinglexeme.
concept: a mental representation of an entity or a set of entities in the world. A

conceptcanbeconsideredsynonymousto'intension'(Carnap1947)or'sense'(Frege
1892).Forrepresentationaltheories(i.e.theorieswhichviewmeaningasintension)a
conceptisthemeaningofthewordwhichevokesit.Inchapter4Iclaimthatan
agent'sconceptcanbecreatedoutofa'bagofwords'('tagcloud'),whichconstitutes
theconcept'sepistemic(encyclopaedic)part,andaURI,whichistheconcept'score
andachievessymbolgrounding(connectstheconceptand,byconsequencetheword,
withtheworld).
IR:InformationRetrieval
lexeme:astringofcharactersrepresentingapredicate(relation),classorindividual
inanontology
needed lexeme: A lexeme that the Planning Agent needs to use in order to be
understood by the Service Providing Agent, but has never been presented to the
formeragentandthereforethePAwillhavetoguess.Seealsosurprisinglexeme.
PA:SeePlanningAgent
PlanningAgent:Anagentwhoformsplanstoachieveaparticulargoalandrequests
services(actionstobeperformed)fromServiceProvidingAgents.Seealso Service
93
ProvidingAgent.
sense:seeconcept
ServiceProvidingAgent: AnagentwhoprovidesservicestoPlanningAgentsby
performingactions.SeealsoPlanningAgent.
SPA:SeeServiceProvidingAgent
surprising lexeme: A lexeme in the Service Providing Agent's ontology that has
appearedinthequeriessubmittedtothePlanningAgent(usuallywhilecheckingthe
preconditionsofanaction)andhassurprisedthelatter,whohasneverseenitbefore.
Seealsocandidatelexeme,neededlexeme.
tagcloud: thevisualisationofamultisetofwordsinwhichmoreimportantwords
appearinabiggersize.Thestructureofatagcloudcanbecomparedtothatofabag
ofwords.Seealsoconcept,bagofwords.
A.2 Output of evaluation module
user@user:~/ORS/semantic_matching/modules/sem-matching/evaluation$ python
evaluation.py
** ComputerReport ** is matched to the following PA lexemes:
1 NetworkResource
2 Report <---
3 ProcessTask
4 ComputerResource
5 BusNetwork
6 Database
7 CPU
8 NetworkAdapter
9 Server
10 UserName
94
** coordinates ** is matched to the following PA lexemes:
1 coordinate <---
2 partition
3 range
4 measure
5 located
6 inScopeOfInterest
7 geographicSubregion
8 origin
9 subProcess
10 prevents
** DrinkingCup ** is matched to the following PA lexemes:
1 Cup <---
2 Bottle
3 Beverage
4 Tooth
5 Chewing
6 Birth
7 Toothbrush
8 Dentist
9 Eating
10 DistilledAlcoholicBeverage
** ElectricalConductor ** is matched to the following PA lexemes:
1 Conductor <---
2 ResistorElement
3 InsulatorSubstance
4 Electrical
5 Current
6 SemiconductorComponent
7 ElectricalEngineeringMethod
8 Brushless
9 DcMotor
10 ElectricalDrivesDomain
** familyName ** is matched to the following PA lexemes:
1 lastName <---
2 familyRelation
3 cohabitant
4 legalGuardian
5 stranger
6 acquaintance
7 friend
8 coworker
9 mutualStranger
10 mutualAcquaintance
** FishingIndustry ** is matched to the following PA lexemes:
1 Fishing <---
2 FishAndSeafoodWholesalers
3 FinfishFishing
4 FishingHuntingAndTrapping
5 ShellfishFishing
6 AgricultureForestryFishingAndHunting
7 FishAndSeafoodMarkets
8 OtherMarineFishing
9 DeepSeaPassengerTransportation
10 FinfishFarmingAndFishHatcheries
95
** Flammable ** is matched to the following PA lexemes:
1 FluidContainer
2 LiquidMixture
3 Spraying
4 Diluting
5 Bubble
6 Combustible <---
7 Drinking
8 LiquidBodySubstance
9 Stirring
10 Liquid
** FluidCylinder ** is matched to the following PA lexemes:
1 PressureControlValve
2 Device
3 Cylinder <---
4 Pressure
5 FluidPowerDomain
6 DirectionalControlValve
7 VolumeControlValve
8 ReliefValve
9 FluidPower
10 Valve
** incomeOf ** is matched to the following PA lexemes:
1 beforeTaxIncome
2 afterTaxIncome
3 income <---
4 taxDeferredIncome
5 lender
6 loanForPurchase
7 customer
8 inflationRate
9 monetaryValue
10 issuedBy
** InformationIndustry ** is matched to the following PA lexemes:
1 MattressManufacturing
2 GrantmakingFoundations
3 Information <---
4 HardwareStores
5 HardwareWholesalers
6 SpecialDieAndToolDieSetJigAndFixtureManufacturing
7 OnLineInformationServices
8 AllOtherInformationServices
9 HardwareManufacturing
10 InformationServices
** JuniorCollegeIndustry ** is matched to the following PA lexemes:
1 CollegesUniversitiesAndProfessionalSchools
2 JuniorColleges <---
3 BarberShops
4 ContinuingCareRetirementCommunities
5 AdministrationOfUrbanPlanningAndCommunityAndRuralDevelopment
6 AdministrationOfHousingProgramsUrbanPlanningAndCommunityDevelopment
7 CommunityHousingServices
8 CommunityCareFacilitiesForTheElderly
9 CommunityFoodAndHousingAndEmergencyAndOtherReliefServices
96
10 AdministrationOfEducationPrograms
** MaizeGrain ** is matched to the following PA lexemes:
1 Constructing
2 Corn <---
3 IndustrialPlant
4 FinancialService
5 Outdoors
6 GovernmentBuilding
7 Garage
8 Residence
9 PoliceFacility
10 Store
** ProjectileLauncher ** is matched to the following PA lexemes:
1 Launcher <---
2 ArrowProjectile
3 Bullet
4 Missile
5 ProjectileShell
6 GunBarrel
7 TouristSite
8 Motion
9 AutomaticGun
10 GunPowder
** ProjectileWeapon ** is matched to the following PA lexemes:
1 ProjectileLauncher <---
2 Projectile
3 Gun
4 WeaponOfMassDestruction
5 Spear
6 Sword
7 Bomb
8 Bullet
9 AutomaticGun
10 GunBarrel
** RepublicOfGeorgia ** is matched to the following PA lexemes:
1 Georgia <---
2 GeorgianLari
3 Italy
4 Kazakhstan
5 Ukraine
6 Russia
7 Portugal
8 Substance
9 Indonesia
10 Croatia
** RiceGrain ** is matched to the following PA lexemes:
1 RiceFarming
2 Rice <---
3 CerealGrain
4 Flour
5 Whiskey
6 Baking
7 PotOrPan
8 HayFarming
97
9 PeanutFarming
10 PreparedFood
** ScientificLaw ** is matched to the following PA lexemes:
1 MultipolePostulate
2 CircuitTheoryDomain
3 Law <---
4 NewtonsLaw
5 ScienceDomain
6 NaturalSciencesDomain
7 Set
8 Proposition
9 Process
10 Method
** TelevisionReceiver ** is matched to the following PA lexemes:
1 Television <---
2 TelevisionSystem
3 Radio
4 CableTelevisionSystem
5 CommunicationRadio
6 MobileCellPhone
7 BroadcastingStation
8 Internet
9 TelevisionStation
10 GeopoliticalArea
** TurkeyBird ** is matched to the following PA lexemes:
1 Turkey <---
2 TurkishLira
3 Poultry
4 UnitedStates
5 Meat
6 Canada
7 Seed
8 Ethiopia
9 Brazil
10 AnimalSkin
** VehicleTire ** is matched to the following PA lexemes:
1 Tire <---
2 VehicleWheel
3 Wheel
4 MaterialHandlingEquipment
5 ArtilleryGun
6 Motorcycle
7 LetterBombAttack
8 VehicleController
9 Ballot
10 Mailing
** WaterVehicle ** is matched to the following PA lexemes:
1 Watercraft <---
2 Ice
3 Submarine
4 WaterMotion
5 Water
6 SwimmingPool
7 FreshWaterArea
98
8 SalineSolution
9 WaterArea
10 Washing
A.3 Additions to PA's ontology
Mid-level Ontology version 34 (added information)
;; Below is the part of the ontology that I created for the

;; purposes of agent communication. It includes actions (T-box/signature)
;; and facts (A-box/theory)
;; I define the individuals JerryTheBot, TomRecruiterBot, MugPaintingContest2010,
CupStillLifeByJerry, JerrysCup, PaintingJerrysCupProcess
;; The relations and classes are all from SUMO & its domain ontologies.
;; Domain restrictions in relations are respected.
(instance JerryTheBot CognitiveAgent)

(instance TomRecruiterBot CognitiveAgent)
(instance MugPaintingContest2010 Contest)
(instance CupStillLifeByJerry PaintedPicture)
(instance JerrysCup Cup)
(instance PaintingJerrysCupProcess Painting)
(serviceProvider Appointing TomRecruiterAgent)

(serviceProvider ExpressingApproval TomRecruiterAgent)
;; JerryTheBot created CupStillLifeByJerry,

;; which depicts JerrysCup
(agent PaintingJerrysCupProcess JerryTheBot)

(result PaintingJerrysCupProcess CupStillLifeByJerry)
(represents CupStillLifeByJerry JerrysCup)
;; JerryTheBot participated in MugPaintingContest2010 with

;; the picture CupStillLifeByJerry.
;; The contest resulted in a victory for JerryTheBot.
(contestParticipant MugPaintingContest2010 JerryTheBot)

(involvedInEvent MugPaintingContest2010 CupStillLifeByJerry)
(result MugPaintingContest2010 Won)
(property JerryTheBot Won)
;; This is the action concept 'Appointing'

;; which takes one argument (?AGENT)
;; However, because this is a Process in SUMO
;; we had to express the action concept in
;; an appropriate way. e.g. involvedInEvent
;; has domain for argument 1 'Process'. This is
;; my way of saying that something is an argument
;; of the action concept.
99
(=>
(and
(instance ?AP Appointing)
(involvedInEvent ?AP ?AGENT))
(causesProposition
(and
(hasSkill Painting ?AGENT)
(not (employs ?X ?AGENT))
(lastName ?LN ?AGENT))
(and
(employs ScottishNationalGalleryOfModernArt ?AGENT)
(instance YourEmploymentCertificate Certificate)
(titles ?LN YourEmploymentCertificate))))
(=>
(and
(instance ?EX ExpressingApproval)
(involvedInEvent ?EX ?AGENT)
(instance ?AGENT CognitiveAgent))
(causesProposition
(and
(exists (?P ?PROCESS ?ITEM)
(instance ?P PaintedPicture)
(instance ?PROCESS Painting)
(instance ?ITEM Cup)
(represents ?P ?ITEM)
(agent ?PROCESS ?AGENT)
(result ?PROCESS ?P)
(contestParticipant MugPaintingContest2010 ?AGENT)
(involvedInEvent MugPaintingContest2010 ?P)
(result MugPaintingContest2010 Won)
(property ?AGENT Won)))
(and
(hasSkill Painting ?AGENT))))
A.4 ORS output (irrelevant parts have been removed)

SICStus 4.0.4 (x86-linux-glibc2.3): Tue Jun 17 00:01:59 CEST 2008
Licensed to SP4dai.ed.ac.uk
| ?- plan.
_________________________________________________
Consulting the PLANNER...

_________________________________________________
Consulting the PLAN DECONSTRUCTOR...

________________________________________________
Consulting the ONTOLOGY UPDATER...

_________________________________________________
Consulting the DIAGNOSTIC ALGORITHM...
100
_________________________________________________
Consulting the REFINEMENT SYSTEM...

_________________________________________________
What is the goal?
|: employs(scottishNationalGalleryOfModernArt,jerryTheBot).
Pre-computing NESTED LISTS, BAGS OF WORDS & TF-IDF...
__________
| |
| Goal is: | employs(scottishNationalGalleryOfModernArt,jerryTheBot)
|__________|
Translating ...
Need to find a plan ...
This is the plan:

[expressingApproval(jerryTheBot),appointing(jerryTheBot)]
deconstructing the plan ...
executing the plan ...
im going to ask tomRecruiterAgent to perform expressingApproval(jerryTheBot) for me

i have been asked class(_14987,drinkingCup)
answer is no
plan has failed at expressingApproval(jerryTheBot)
Im diagnosing what the problem is ...
I dont know the class drinkingCup
I am replacing Cup with drinkingCup
__________
| |
|__________|
Translating ...
This is the plan:

[expressingApproval(jerryTheBot),appointing(jerryTheBot)]
im going to ask tomRecruiterAgent to perform expressingApproval(jerryTheBot) for me

i have been asked class(_28157,drinkingCup)
101
answer is class(jerrysCup,drinkingCup)
expressingApproval(jerryTheBot) completed satisfactorily
im going to ask tomRecruiterAgent to perform appointing(jerryTheBot) for me

i have been asked familyName(_30560,jerryTheBot)
answer is no
plan has failed at appointing(jerryTheBot)
Im diagnosing what the problem is ...
I dont know the predicate familyName
I am replacing lastName with familyName
__________
| |
|__________|
Translating ...
This is the plan:

[appointing(jerryTheBot)]
im going to ask tomRecruiterAgent to perform appointing(jerryTheBot) for me

i have been asked familyName(_40486,jerryTheBot)
answer is familyName(hal9000,jerryTheBot)
i have been asked hasSkill(painting,jerryTheBot)
answer is hasSkill(painting,jerryTheBot)
appointing(jerryTheBot) completed satisfactorily
The plan is completed
The following refinements have been performed:

[semantic,semantic]
To terminate the program, type "t."

yes
| ?-
102

Automated Ontology Evolution Through Semantic Matching

Caricato da

Informazioni sul documento

Titolo originale

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

Automated Ontology Evolution Through Semantic Matching

Caricato da

Copyright:

Formati disponibili

Automated Ontology Evolution:

This thesis deals with the problem of semantic mismatch in an agentcommunication

ORGANISATION OF THE THESIS........................................................................................... 10

CHAPTER 1 Setting the scene............................................................................................... 11

CHAPTER 2 Semantic Matching......................................................................................... 21

These issues are addressed within Automated Ontology Evolution, a theoretical

My approach to the problem of semantic matching in an agent communication

The system that I present is built on the basis of Information Retrieval

ORGANISATION OF THE THESIS

In Chapter 1 I introduce the notion of 'ontologies' and some of their basic

In Chapter3 Ipresentthe SemanticMatcher,discussitsdesigndetails,analysethe

CHAPTER 1 Setting the scene

An ontology can represent objects in domains and relationships between them,

Place), isSubclassOf(City, Place). These can also include instances (e.g.

isInstanceOf(China, Country)) or partof relations (e.g. isPartOf(CityHall,

Ontologiestypicallyhave signatures (alsoknownasTbox;terminologicalbox)and

1.2 Ontology mismatch

1.2.1 Ontology Repair System

toProlog),a Diagnosticsystem (whichdiagnosesthetypeofmismatchthat

CHAPTER 2 Semantic Matching

2.1 Our problem

2.2 Previous work

such as tokenisation ('easytocook cake' [easy, to, cook, cake]), lemmatisation

2.3 Challenges for on-line semantic matching

2.3.1 Implementation challenges

2.3.2 Theoretical challenges

Meaning is understood in two different ways in Philosophy of Language; as

symbols; words) refer to entities by means of thoughts. In other words, sense

presented in chapter 3 is theoretically founded on the idea of building mental

LANGUAGE THOUGHT WORLD

2.3.3 The proposed solution

As we saw in section 1.2, semantic mismatch is very common across existing

stay, mainly, plain].Abagofwordscanbevisuallyrepresentedasatagcloud,a

The outcome of this implementation cannot be compared against previous work

3.1 The Semantic Matcher

3.1.1 Building a search engine

Searchengines havetwomajorcomponents: indexing process and queryprocess

66 i.e.bringingthetextinanappropriaterepresentation.Typicallythisstageinvolves documentparsing (i.e.

DATABASES MODULES DATABASES MODULES OUTPUT MODULES OUTPUT

WordNet WorNet parser

Compute Bags-of-words & tf-idf

SEMANTIC Planning INPUTExtract content

3.1.1.1 Training the Text Acquisition Model

'gloss' (thatis anaturallanguagedefinition ofthesense)andexamples featuring

words_n_synsets Wordsand AllwordsfoundinWordNetwith

synsets_info Synsets Synonym sets of synset ids, their

3.1.1.2 Sense Creation and Term Weighting

CRAWLING TEXT TOKENISATION

Bagofwords(document) STEMMING STOPPING

READ TEXT TOKENISATION

Bagofwords(sense) STEMMING STOPPING

Tokenisation is the process of segmenting natural language text into strings of

For every candidate lexeme, do:

Create an empty bag,

If the lexeme exists in sumo_wordnet, then:

Extract the i.d. of its equivalent synsets,

For every synset i.d., do:

Go to synsets_info and extract the synset's members (i.e.

words), its gloss and its hypernyms,

If nothing has been extracted from synsets_info so far, then:

Go to words_n_synsets and try to find the word,

If you find it, then:

Go to synsets_info and extract the synset's members (i.e. words),

ComputerReport is matched to the following PA lexemes:

coordinates is matched to the following PA lexemes:

DrinkingCup is matched to the following PA lexemes:

ElectricalConductor is matched to the following PA lexemes:

familyName is matched to the following PA lexemes:

FishingIndustry is matched to the following PA lexemes:

Flammable is matched to the following PA lexemes:

FluidCylinder is matched to the following PA lexemes:

incomeOf is matched to the following PA lexemes:

InformationIndustry is matched to the following PA lexemes:

JuniorCollegeIndustry is matched to the following PA lexemes:

MaizeGrain is matched to the following PA lexemes:

ProjectileLauncher is matched to the following PA lexemes:

ProjectileWeapon is matched to the following PA lexemes:

RepublicOfGeorgia is matched to the following PA lexemes:

RiceGrain is matched to the following PA lexemes:

ScientificLaw is matched to the following PA lexemes:

TelevisionReceiver is matched to the following PA lexemes:

TurkeyBird is matched to the following PA lexemes:

VehicleTire is matched to the following PA lexemes:

WaterVehicle is matched to the following PA lexemes: