Sei sulla pagina 1di 68

By: Ankita Joshi & Triveni Pal

M. Tech. CSE
NIT Hamirpur, H.P.
How can we represent knowledge?

 We need to create a logical view of the data, based


on how we want to process it

 Natural language is very descriptive, but doesn’t


lend itself to efficient processing

 Semantic networks and search trees are promising


techniques for representing knowledge
 Representational Adequacy: the ability to
represent all of the kind of knowledge that are
needed in that domain.

 Inferential Adequacy: the ability to manipulate the


representational structures in such a way as to
derive new structures corresponding to new
knowledge inferred from old.
 Inferential Efficiency : the ability to incorporate into
the knowledge structure additional information that
can be used to focus the attention of the inference
mechanisms in the most promising directions.

 Acquisitional Efficiency : the ability to acquire new


information easily. The simplest case involves direct
insertion, by a person, of new knowledge into the
database.
 A Semantic Network (SN) is a simple notation scheme for
logical knowledge representation.

 A SN consists of a concepts and relations between


concepts.

 Representing a SN with a directed graph :


o Vertices : denote concepts.
o Edges : represent relation between concepts.
 The graphical depiction associated with a SN is a
significant reason for their popularity.
 Semantic networks can
oshow natural relationships between objects/concepts
obe used to represent declarative/descriptive knowledge
 A node can represent a fact description
ophysical object
oconcept
oevent
An arc (or link) represents relationships between nodes
 Uses of Semantic Nets
o Coding static world knowledge
o Built-in fast inference method (inheritance)
o Localization of information
 Such representations have had a long and distinguished history in
Philosophy and Science:

 Porphyry’s tree (3rd AD)


 Charles Peirce’s existential graphs (1890's) --philosophy/logic
 O. Stelz's concept hierarchies (1920's) – psychology
 Ross Quillian’s associative memory model (1966) --
Psychology/Computer Science
 first invented for computers by Richard H. Richens of the Cambridge
Language Research Unit in 1956 as an "interlingua" for machine
translation of natural languages
 by Robert F. Simmons at System Development Corporation in the early
1960s and later featured prominently in the work of Allan M. Collins
and colleagues
 In the 1960s to 1980s the idea of a semantic link was developed
within hypertext systems as the most basic unit, or edge, in a
semantic network
Living Organism

isa isa isa


Plant Animal
isa
Locomotion isa isa … Locomotion
Fly Bird
Mammal walk
isa Eats
Locomotion isa isa …
isa
Cat family
Penguin Sparrow Eats
Swim Eagle isa isa

House Cats Mice


isa
isa
Fred Morris
 “Ram is taller than Shyam .”
 Non appropriate scheme :
Taller than
Ram Shyam

 Appropriate scheme :

Ram Shyam
height height
Greater than
180 h1 h2
value
haspart
Animal head
subclass
subclass

Reptile Mammal

subclass
livesin
Africa Elephant size large

instance

Nellie apples
likes
 By traversing network we can find:
◦ That Nellie has a head (by inheritance)
◦ That certain concepts related in certain ways (e.g.,
apples and elephants).
 BUT: Meaning of semantic networks was not always
well defined.
◦ Are all Elephants big, or just typical elephants?
◦ Do all Elephants live in the “same” Africa?
◦ Do all animals have the same head?
 For machine processing these things must be
defined.
 For an appropriate scheme:
◦ Draw relations on the basic of primitives.
◦ Represent complicated relations with this
primitives.
Basic
primitive

height height
Taller than Greater than
h1 h2
 The ISA (is-a) relation is often used to link
instances to classes, classes to superclasses.
 Some links (e.g. isPart) are inherited along ISA
paths.
 The semantics of a SN can be relatively very formal
or informal.
 often defined at the implementation level
furniture

is - a
isPart
Chairs Seat

is - a

owner color
Me My chair Tan

is - a covering is - a

Person Leather Brown


 Some times we had to override a relation for an
inherited node (e.g travel by).

Eagle is - a
is - a
Birds Animal
is - a Travel by
Panguin
Travel by
Fly
Walk
 VALVE partOf ENGINE
⇒ VALVE CAR
partOf
 ENGINE partOf CAR

 PERSON Likes BIRD


⇒ PERSON WORM
Likes
 BIRD Likes WORM
 Non-binary relationships can be represented by
“turning the relationship into an object”
 This is an example of what logicians call “reification”.
◦ consider an abstract concept to be real.
◦ We might want to represent the generic give event as
a relation involving four things: an agent, a
recipient, an object and an activation time.
 Consider this : “Ravi gave Anil the book.”
◦ Abstract concept (gave) => real.

Time is - a
Past Gave Verb

Recipient Object
Agent

Anil Ravi The book


 Inheritance is a key concept in semantic networks
and can be represented naturally by following ISA
links.
 In general, if concept X has property P, then all
concepts that are a subset of X should also have
property P.
 But exceptions are pervasive in the real world!

 In practice, inherited properties are usually treated


as default values. If a node has a direct link that
contradicts an inherited property, then the default
is overridden.
 Multiple inheritance allows an object to inherit
properties from multiple concepts.

 Multiple inheritance can sometimes allow an object


to inherit conflicting properties.

 Conflicts are potentially unavoidable, so conflict


resolution strategies are needed.
 “Every dogs has bitten a postman.”
 Is equal to :

X ( dog(X) Y ( postman(Y) & bitten(X, Y)))

 Represent SN for one (dog, postman).


 Quantify the represented SN.

 GS is the set of generilized statements that has


been quanified.
Dog Bit Postman

is - a is - a is - a
Agent Object
GS d b p

is - a

g
Form
30
John went downtown to deposit his money in the bank.

31
Every batsman hit a ball.

32
All Batsman like the umpire.

33
Take the case of cricket player, create a complete semantic
with problem definition and different queries.

34
1. WordNet
 WordNet, a lexical database of English.
 It groups English words into sets of synonyms called synsets,
 provides short, general definitions, and records the various
 semantic relations between these synonym sets.
 Some of the most common semantic relations defined are
 meronymy (A is part of B, i.e. B has A as a part of itself),
 holonymy (B is part of A, i.e. A has B as a part of itself),
 hyponymy (or troponymy) (A is subordinate of B; A is kind of B),
 hypernymy (A is superordinate of B),
 synonymy (A denotes the same as B) and
 antonymy (A denotes the opposite of B).
 WordNet properties have been studied from a network theory
perspective and compared to other semantic networks
created from Roget's Thesaurus and word association tasks.
 From this perspective the three of them are a small world
structure.
 These have expressive power equal to or exceeding standard
firstorder predicate logic.
 Unlike WordNet or other lexical or browsing networks,
semantic networks using these representations can be used
for reliable automated logical deduction. Some automated
reasoners exploit the graph-theoretic features of the
networks during processing.
 Gellish English with its Gellish English dictionary, is a formal language that is
defined as a network of relations between concepts and names of concepts.
 Gellish English is a formal subset of natural English, just as Gellish Dutch is a
formal subset of Dutch, whereas multiple languages share the same concepts.
 Other Gellish networks consist of knowledge models and information models that
are expressed in the Gellish language.
 A Gellish network is a network of (binary) relations between things.
 Each relation in the network is an expression of a fact that is classified by a relation
type.
 Each relation type itself is a concept that is defined in the Gellish language
dictionary.
 Each related thing is either a concept or an individual thing that is classified by a
concept.
 The definitions of concepts are created in the form of definition models (definition
networks) that together form a Gellish Dictionary.
 A Gellish network can be documented in a Gellish database and is computer
interpretable
 The Hindi WordNet is a system for bringing together different lexical and semantic
relations between the Hindi words.
 It organizes the lexical information in terms of word meanings and can be termed
as a lexicon based on psycholinguistic principles.
 In the Hindi WordNet the words are grouped together according to their similarity
of meanings.
 Two words that can be interchanged in a context are synonymous in that context.
 For each word there is a synonym set, or synset, in the Hindi WordNet, representing
one lexical concept.
 This is done to remove ambiguity in cases where a single word has multiple
meanings.
 Synsets are the basic building blocks of WordNet.
 The Hindi WordNet deals with the content words, or open class category of words.
 Thus, the Hindi WordNet contains the following category of words-
Noun, Verb, Adjective and Adverb.
A Very-Large Semantic Network of Common Sense Knowledge
Supports many practical textual-reasoning tasks over
realworld documents right out-of-the-box (without
additional statistical training) including
topic-jisting (e.g. a news article containing the
concepts, “gun,” “convenience store,” “demand money”
and “make getaway” might suggest the topics “robbery”
and “crime”),
affect-sensing (e.g. this email is sad and angry),
analogy-making (e.g. “scissors,” “razor,” “nail clipper,”
and “sword” are perhaps like a “knife” because they are
all “sharp,” and can be used to “cut something”),
text summarization
contextual expansion
causal projection
cold document classification
and other context-oriented inference available in two
versions: concise (200,000 assertions) and
full (1.6 million assertions).
Commonsense knowledge in ConceptNet encompasses
the spatial, physical, social, temporal, and psychological
aspects of everyday life.
 ArtsSemNet is a lexical reference system for
the fine arts terminology in Bulgarian and
Russian
 The terms are organized into a semantic
network on the base of several important
lexical relations:
.Polysemy .Homonymy .Synonymy
.Antonymy .Hyponymy
 Providesa specialised term browser for
search and navigation between the terms and
the corresponding terminological relations
 We used ArtsDict for the extraction of
homonyms, synonyms and polysemous terms
 For the extraction of hyponyms and
antonyms we used two techniques:
o A formal technique for extraction of hyponyms
sharing a common term-element (suffix/stem,
affix, …)
• Given a target term-element ArtsDict generates a list
of terms from the dictionary that contain it
• The list is manually examined afterwards
o A semantic technique (based on LSA) for extraction
of hyponyms/antonyms with no common term-
element
 Searching for terms:
o Exact and inexact searching
 Browsing the terms:
o Term glosses list
o Homonyms list
o Absolute synonyms list, relative
synonyms list
o Antonym chains
o Hyponym chains (with target
term hypernym or cohyponym)
 Supports changes between
languages:
o Russian and Bulgarian are
supported
hasAuthor Person
term
ETD Metadata
hasAbstract occursInAbstract
id Abstract term

hasSubject
Subject term
occursInAbstract
describes

hasParagraph
ETD Doc hasChapter hasSection term
id Section Paragraph occursInSubject
Chapter
cites
term
Section Paragrap
h
Paper
… …
id

term
 SNS contains a bi-lingual (German/English) semantic
network which consists of three components: the
Environmental Thesaurus UMTHES® with more than
50,000 inter-networked terms. (Descriptors and Non-
Descriptors).
 the Geo-Thesaurus-Environment (GTU) with more than
25,000 geographic names and the spatial intersections
of all these places.
 an Environmental Chronology containing more than
600 contemporary and historical events that affected
the environment.
 Global Biodiversity Initiative Facility (GBIF)
 MMI has long considered a semantic component
critical to enabling the highest levels of data
interoperability.
 To that end, MMI has developed a set of guidance
documents, worked with the marine science
community to establish a set of best practices, and
developed tools to allow users to work with
semantic technologies
 The resulting Semantic Framework facilitate data
interoperability in the marine science community
while providing linkages to the broader semantic
web.
Sentient Buildings that Sense,
Think, and Adapt

A semantic network represents events and their dependencies in our everyday lives.
(left);The user interface of the ContextSense system shows how to guess user
situation, intention, and activities in space. (right)
 The global environment is lying on trans-disciplinary fields, such as meteorology,
hydrology, geology, geography, agriculture, biology, and so on.
 Measures of the global environmental problems, such as climate change, global
warming, various disasters, and so on.
 One of the key issues is data interoperability arrangement under the trans-
disciplinary condition.
 a semantic network dictionary constructed for information sharing by using a
Semantic MediaWiki, which helps to gather ontological information and associations
for data interoperability among diversified and distributed data sources.
 There are a few key requirements of the semantic network dictionary: reliability,
simple structure, and easy browsing and modification.
Similar works
 SWEET (Semantic Web for Environment and Technology) by NASA (National
Aeronautics and Space Administration)
 FAO (Food and Agriculture Organization of the United Nations) based on AGROVOC,
that is a multilingual, structured and controlled vocabulary designed to cover the
terminology of all subject fields in agriculture, forestry, fisheries, food and related
domains .
The graph representation is developed by
KeyGraph that is open source of Java
library.
XML data that is constructed in the Wiki is
visualized with the result of information
retrieval by the reverse dictionary.
All the related terms from various
ontologies and terminologies are
represented at once.
One of the examples of graph representation is a term from land use classification schema in
Thailand and Indonesia.
The term “water body” land use class can be found in both countries.
Apparently, both land use classes are the same, but the level of hierarchy is a bit different in each
classification schema.
In the case of Indonesian land use, “water body” does not include watercourses, but “water body” in
Thailand includes all water-related geographical features.
Consequently, graph representation proves a clear distinction between the two terms. Then, the
new information such as the relations of “water body” in both countries can be created that “water
body” class in Thailand is the same as “water” class in Indonesia. This kind of information is treated
as newly-created ontological information, and is added through the Semantic MediaWiki. The
ontological information can grow autonomously by adding relations, becoming more and more
useful.
 Investigates techniques to automate the analysis of (Dutch)
newspaper articles.
 Semantic Web and Natural Language Processing techniques to solve
problems in communication, especially Political Communication such
as newspaper coverage of election campaigns.
 This allow questions about, for example, the relation between media
and politics and the objectivity of media to be answered more easily
and more systematically.
 Training statistical models (possibly Maximum Entropy models) on a
corpus of manually annotated newspaper articles that has been
created in the past decade.
 In order to improve performance, number of features containing
linguistic and background knowledge is included
The Axon Idea Processor is developed entirely in Visual Prolog,
provides an environment that supports the thinking processes.
Helps to create, communicate, explore, plan, draw, compose,
design and learn.
Axon also provides a variety of tools for working with ideas starting
with a blank screen or using templates.
Axon requires no special knowledge of particular techniques. Axon
enables you to:
oWork with ideas & concepts rather than words.
oSee the Big Picture and not get lost in details.
oAnalyze and solve more complex problems.
oStimulate creativity and discovery.
oEffectively amplify your mental potential.
oFocus attention and minimize distractions.
oReduce mental fatigue and writers' blocks.
 The knowledge structures created in Semantica are based on an adaptation of semantic network
theory, which attempts to replicate the way that humans observe, organize and store knowledge
mentally.
 All Semantica Knowledge Structures are composed of four basic primitive elements:
 Concepts: Basically any idea unit that can be described in language (person, place, thing, event,
etc).
 Relation Types: An unambiguous, bi-directional relationship that connects any two related
concepts. Relation types may be symmetric (the same in each direction), or Asymmetric (different
in each direction).
 Triplets:Triplets are the building blocks of Semantica. They are a uniting Element formed when
two Concepts are joined by a Relation Type. A triplet should be thought of as a sentence, whether
it would make sense to pronounce it or not. Below is an example of a bi-directional sentence, as
seen from the triplet's two reversed Graphic Frame views. The Relation Type ray's arrows extend
from the Central Concept (Subject) to the Related Concept (Object).
 Note that the Relation Type is grammatically reversed, while the Concepts remain identical.
 Knowledge Objects: Any file or image on the visible computer screen can be easily dragged and
attached as a knowledge object to any other element within Semantica. These can be text files,
images or URL direct links to websites. Simply clicking on any icon will open the Knowledge
Object or navigate to an active website.
 Visual and highly interactive framework for manipulating and analyzing data from
multiple sources, whether structured databases or unstructured text documents.
 Semantica's abstract data model, based on semantic networks, provides powerful
extended fusion and analytical capabilities through improved automation.
 This data model enables analysts to quickly perform sophisticated link and node
analysis of vehicles, transport, cargo, people, places, organizations, etc. without
spending hours on data transformation.
 Semantica provides an easy to use interface that helps analysts see the
relationships among entities contained in information by layering Time and Space
and Relationships between all entities of interest.
 The Semantica software enables access to disparate information sources that
have not been brought together in a single analytical user-defined operating
picture.
 Already fielded with defense/intelligence-related agencies and groups. This
reduces both the risks and the time required to achieve successful field
deployment.
 Tracking, storing, visualizing, and sharing information about aircraft, vessels, vehicles, monetary
systems or other modes of physical or electronic transport, the commodities or cargo transported
with them, and the individuals doing the transportation all rely on being able to store the
information in a manner that helps analysts to quickly discover the relationships among the three.
 Ability to store information about suspected drug traffickers, their transportation routes, vehicles
used, and the dates and times of the specific transactions for quite some time.
 has more recently been applied to tracking vessels, cargo, individuals, and organization that are
related to each of the above.
 The tool can easily store and provide link analysis of many other kinds of cargo, people, locations,
and organizations of interest. For each of the associated types of nodes, the system can also track
all of the various pieces of metadata associated with those concepts.
 For cargo, as an example, the tool can capture what vessel the cargo container is on, what the
contents of the container are, who shipped the cargo, who the planned recipient is, the date and
time it was shipped, as well as the date and time the cargo was received.
 This would enable the analyst to have a visual display of the links or connections relating to the
cargo container in question, as well as any of the associated information in just a mouse click or
two.
 Using Semantica Pro's built in geo-spatial and temporal capabilities; analysts can quickly see their
network on a map and show its changes over time.
 This is critical when looking for patterns that can only be revealed when watching how a network
transforms over time and n relation to the specific places or regions on a map or other imagery.
There are also elaborate types of semantic networks
connected with corresponding sets of software
tools used for
 lexical knowledge engineering, like the Semantic
Network Processing System (SNePS) of Stuart
C.Shapiro
 the MultiNet paradigm of Hermann Helbig,
especially suited for the semantic representation of
natural language expressions and used in several
NLP applications.
 The semantics behind a knowledge representation model
depends on the way that it is used (implemented). Notation is
irrelevant!
 Whether a statement is written in logic or as a semantic
network is not important -- what matters is whether the
knowledge is used in the same manner.
 Most knowledge representation models can be made to be
functionally equivalent. It is a useful exercise to try converting
knowledge in one form to another form.
 From a practical perspective, the most important consideration
usually is whether the KR model allows the knowledge to be
encoded and manipulated in a natural fashion.
 Some types of properties are not easily expressed using a
semantic network. For example: negation, disjunction, and
general non-taxonomic knowledge.
 There are specialized ways of dealing with these
relationships, for example partitioned semantic networks
and procedural attachment. But these approaches are ugly
and not commonly used.
 Negation can be handled by having complementary
predicates (e.g., A and NOT A) and using specialized
procedures to check for them. Also very ugly, but easy to
do.
 If the lack of expressiveness is acceptable, semantic nets
have several advantages: inheritance is natural and modular,
and semantic nets can be quite efficient.
 As we stated before, semantic networks and frames
are often used because inheritance is represented
so naturally.

 But rule based systems can also be used to do


inheritance!

 Semantic networks (and frames) have an


implementation advantage for inheritance because
special-purpose algorithms can be used to follow
the ISA links.
 Rules are appropriate for some types of knowledge,
but do not easily map to others.

 Semantic nets can easily represent inheritance and


exceptions, but are not well-suited for representing
negation, disjunction, preferences, conditionals, and
cause/effect relationships.

 Frames allow arbitrary functions (demons) and typed


inheritance. Implementation is a bit more
cumbersome.
We see hierarchical organizations in the real world all
the time. They may not be "pure" hierarchies, but
they're hierarchical in spirit at least.

It might be easier to think of these things as "networks"


instead of hierarchies.

Take for example the common dictionary. At first


glance, it looks like a very linear organization of the
words in our language.

But what a dictionary really specifies is a very complex


and somewhat hierarchical map of the relationships
between the words in our language.
PHILOSOPY
Frame system = Semantic Net +
structured nodes +
procedural attachment
INFERENCE PROCESSES:
◦ Inheritance
◦ Procedural attachment (demons)
◦ Frame Matching (a type of unification)
HISTORY
◦ Minsky, 1975 (first ideas)
◦ Bobrow & Winograd, 1977 (KRL)
◦ By 1980 in wide-spread use (FRL, SRL, Units)
◦ By 1985 in robust packaged form (CRL, KEE, FrameKit,…)
◦ By 1990 in general use for knowledge bases, and evolved
into object-oriented data bases (OODBs)
FRAME SLOT FACET FILLER
[PC [isa [value COMPUTER]]
[manufacturer [type-r COMPANY]]
[retail-price [puller (* &markup &wholesale)]
[range-min 500]
[range-max 10000]
[unit USD]]
[markup [value 1.5]]
[owner [type-r LEGAL-ANIMATE]]]
[DELL-150/L
[isa [value PC]]
[manufacturer [value DELL]]
[processor [value pentium-4L]]
[wholesale [value 1400]]]
 Semantic vs. Episodic
◦ Events vs. Facts
◦ Temporal and Causal sequences
◦ Use Semantic memory as component
 Scripts
◦ Causally-connected event sequence
◦ Generalized by alternate paths:
 Tree or DAG structure
 Conditionals on branches
◦ Script-role generalization
 Constants Typed variables with restrictions
 Climb a frame hierarchy
 Script Application Process
◦ Match Trigger events, including roles
◦ Instantiate forwards and backwards ruling out
alternate branches
◦ Interpolation inference (abduction)
◦ Extrapolation inference (prediction)
 Because the syntax is the same
◦ We can guess that Julia’s age
is similar to Bryan’s
 Graphical representation (a graph)
◦ Links indicate subset, member, relation, ...
 Equivalent to logical statements (usually FOL)
◦ Easier to understand than FOL?
◦ Specialised SN reasoning algorithms can be faster
 Example: natural language understanding
◦ Sentences with same meaning have same graphs
◦ e.g. Conceptual Dependency Theory (Schank)
Disadvantages of a semantic network
 incomplete (no explicit operational/procedural
knowledge)
 lack of standards, ambiguity in node/link descriptions

 not temporal (i.e. doesn't represent time or sequence)

Coclusion
 semantic networks are mainly used as an aid to
analysis to visually represent parts of the problem
domain. The `knowledge' can be transformed into
rules or frames for implementation

Potrebbero piacerti anche