Natural Language Processing Parsing Techniques:: Unit IV

Unit IV Natural Language Processing Parsing Techniques:
Context – free Grammar; Recursive Transition Nets (RTN); Augmented Transition Nets (ATN); Semantic
Analysis, Case and Logic Grammars; Planning Overview – An Example Domain: The Blocks Word;
Component of Planning Systems; Goal Stack Planning (linear planning); Non-linear Planning using
constraint posting; Probabilistic Reasoning and Uncertainty; Probability theory; Bayes Theorem
and Bayesian networks;
Certainty
Natural Factor. Processing (NLP) refers to a method of communicating with an intelligent system using a
Language
natural language such as English. Processing of Natural Language is required when you want an intelligent
system like robot to perform as per your instructions, when you want to hear decision from a dialogue based
clinical expert system, etc.
The field of NLP involves making computers to perform useful tasks with the natural languages’ humans use
. The input and output of an NLP system can be −
• Speech
• Written Text
COMPONENTS OF NLP
1. NATURAL LANGUAGE UNDERSTANDING (NLU)
• Mapping the given input in natural language into useful representations.
• Analyzing different aspects of the language.
2. NATURAL LANGUAGE GENERATION (NLG)
It is the process of producing meaningful phrases and sentences in the form of natural language from
some internal representation. It involves:
• Text planning − It includes retrieving the relevant content from knowledge base.
• Sentence planning − It includes choosing required words, forming meaningful phrases, setting
tone of the sentence.
• Text Realization − It is mapping sentence plan into sentence structure.
STEPS IN NLP
1. Lexical Analysis − It involves identifying and analyzing the structure of words. Lexicon of a languag
e means the collection of words and phrases in a language. Lexical analysis is dividing the whole
chunk of text into paragraphs, sentences, and words.
DMISJBU- Lilongwe Campus. Ms. Tawonga Mkandawire (BE(CS)) Artificial Intelligence &Expert Systems
2. Syntactic Analysis (Parsing) − It involves analysis of words in the sentence for grammar and arrang
ing words in a manner that shows the relationship among the words.
3. Semantic Analysis − It draws the exact meaning or the dictionary meaning from the text. The text is
checked for meaningfulness. It is done by mapping syntactic structures and objects in the task domain
. The semantic analyzer disregards sentence such as “hot ice-cream”.
4. Discourse Integration − The meaning of any sentence depends upon the meaning of the sentence jus
t before it. In addition, it also brings about the meaning of immediately succeeding sentence.
5. Pragmatic Analysis − During this, what was said is re-interpreted on what it actually meant. It invol
ves deriving those aspects of language which require real world knowledge.
IMPLEMENTATION ASPECTS OF SYNTACTIC ANALYSIS
There are a number of algorithms researchers have developed for syntactic analysis, but we consider only the
following simple methods −
• Context-Free Grammar
• Top-Down Parser
CONTEXT-FREE GRAMMAR
It is the grammar that consists rules with a single symbol on the left-hand side of the rewrite rules. Let us
create grammar to parse a sentence − “The bird pecks the grains”
Articles (DET) − a | an | the
Nouns − bird | birds | grain | grains
Noun Phrase (NP) − Article + Noun | Article + Adjective + Noun
= DET N | DET ADJ N
Verbs − pecks | pecking | pecked
Verb Phrase (VP) − NP V | V NP
Adjectives (ADJ) − beautiful | small | chirping
The parse tree breaks down the sentence into structured parts so that the computer can easily understand and
process it. In order for the parsing algorithm to construct this parse tree, a set of rewrite rules, which describe
what tree structures are legal, need to be constructed.
These rules say that a certain symbol may be expanded in the tree by a sequence of other symbols. According
to first order logic rule, if there are two strings Noun Phrase (NP) and Verb Phrase (VP), then the string com
bined by NP followed by VP is a sentence.
The rewrite rules for the sentence are as follows −
S → NP VP
NP → DET N | DET ADJ N
VP → V NP
Lexocon −
DET → a | the
ADJ → beautiful | perching
N → bird | birds | grain | grains
V → peck | pecks | pecking
TOP-DOWN PARSER
Here, the parser starts with the S symbol and attempts to rewrite it into a sequence of terminal symbols that m
atches the classes of the words in the input sentence until it consists entirely of terminal symbols.
These are then checked with the input sentence to see if it matched. If not, the process is started over again w
ith a different set of rules. This is repeated until a specific rule is found which describes the structure of the s
entence.
TYPES OF TRANSITION NETWORKS
1. AUGMENTED TRANSITION NETS (ATN)
An augmented transition network is a top-down parsing procedure that allows various kinds of knowledge to
incorporated into the parsing system so it can operate efficiently. ATNs build on the idea of using finite state
machines (Markov model) to parse sentences. Instead of building an automaton for a particular sentence, a
collection of transition graphs built. A grammatically correct sentence parsed by reaching a final state in any
state graph.
Transitions between these graphs simply subroutine calls from one state to any initial state on any graph in the
network. A sentence determined to be grammatically correct if a final state reached by the last word in the
sentence. The ATN is similar to a finite state machine in which the class of labels that can attach to the arcs
that define the transition between states has augmented.
Arcs may label with:
• Specific words such as “in’.
• Word categories such as noun.
• Procedures that build structures that will form part of the final parse.
• Procedures that perform arbitrary tests on current input and sentence components that have identified.
An augmented transition network (ATN. An ATN uses a set of registers to store information. A set of
actions is defined for each arc and the actions can look at and modify the registers. An arc may have a test
associated with it. The arc is traversed (and its action) is taken only if the test succeeds. When a lexical arc
is traversed, it is put in a special variable (*) that keeps track of the current word. The ATN was first used
in LUNAR system. In ATN, the arc can have a further arbitrary test and an arbitrary action. The structure
of ATN is illustrated in figure. Like RTN, the structure of ATN is also consisting of the substructures of
S, NP and PP.
The ATN collects the sentence features for further analysis. The additional features that can be captured by
the ATN are; subject NP, the object NP, the subject verb agreement, the declarative or interrogative mood,
tense and so on. So, we can conclude that ATN requires some more analysis steps compared to that of RTN.
If these extra analysis tests are not performed, then there must some ambiguity in ATN. The ATN represents
sentence structure by using a slot filter representation, which reflects more of the functional role of phrases in
a sentence. For example, one noun phrase may be identified as “subject” (SUBJ) and another as the “object”
of the verb. Within noun phrases, parsing will also identify the determiner structure, adjectives, the noun etc.
For the sentence “Ram ate an apple”, we can represent as follows:
(S SUBJ (NP NAME Ram)
MAIN_V ate
TENSE PAST
OBJ (NP DET an
HEAD apple))
The ATN maintains the information by having various registers like DET, ADJ and HEAD etc. Registers are
set by actions that can be specified on the arcs. When the arc is followed, the specified action associated with
it is executed. An ATN can recognize any language that a general-purpose computer can recognize. The ATNs
have been used successfully in a number of natural language systems as well as front ends for databases and
expert systems.
Example for Syntactic Processing – Augmented Transition Network
Syntactic Processing is the step in which a flat input sentence is converted into a hierarchical structure that
corresponds to the units of meaning in the sentence. This process called parsing. It plays an important role in
natural language understanding systems for two reasons:
• Semantic processing must operate on sentence constituents. If there is no syntactic parsing step, then
the semantics system must decide on its own constituents. If parsing is done, on the other hand, it
constrains the number of constituents that semantics can consider.
• Syntactic parsing is computationally less expensive than is semantic processing. Thus, it can play a
significant role in reducing overall system complexity.
2. RECURSIVE TRANSITION NETWORKS (RTN)
RTNs are considered as development for finite state automata with some essential conditions to take the
recursive complexion for some definitions in consideration. A recursive transition network consists of nodes
(states) and labelled arcs (transitions). It permits arc labels to refer to other networks and they in turn may refer
back to the referring network rather than just permitting word categories. It is a modified version of transition
network. It allows arc labels that refer to other networks rather than word category. A recursive transition
network can have 5 types of arcs:
• CAT: Current word must belong to category.
• WORD: Current word must match label exactly.
• PUSH: Named network must be successfully traversed.
• JUMP: Can always be traversed.
• POP: Can always be traversed and indicates that input string has been accepted by the network. In
RTN, one state is specified as a start state. A string is accepted by an RTN if a POP arc is reached and
all the input has been consumed. Let us consider a sentence “The stone was dark black”.
The: ART
Stone: ADJ NOUN
Was: VERB
Dark: ADJ
Black: ADJ NOUN
Description of the total RTN structure in three parts of a sentence (S), Noun Phrase (NP), Preposition Phrase
(PP).
The number of sentences accepted by an RTN can be extended if backtracking is permitted when a failure
occurs. This requires that states having alternative transitions be remembered until the parse progresses past
possible failure points. In this way, if a failure occurs at some point, the interpreter can backtrack and try
alternative paths. The disadvantage with this approach is that parts of a sentence may be parsed more than time
resulting in excessive computations. During the traversal of an RTN, a record must be maintained of the word
position in the input sentence and the current state and return nodes to be used as return points when control
has been transformed to a lower level network.
PLANNING
Planning refers to the process of computing several steps of a problem-solving procedure before executing any
of them. Planning has methods which focus on ways of decomposing the original problem into appropriate
subparts and on ways of recording and handling interactions among the subparts as they are detected during
the problem-solving process. The planning in Artificial Intelligence is about the decision-making tasks
performed by the robots or computer programs to achieve a specific goal. The execution of planning is about
choosing a sequence of actions with a high likelihood to complete the specific task.
The features of an ideal planner:
• The planner should be able to represent the states, goals and actions
• The planner should be able to add new actions at any time
• The planner should be able to use Divide and Conquer method for solving very big problems.
BLOCKS WORLD PROBLEM
The blocks world is one of the most famous planning domains in artificial intelligence. The program was
created by Terry Winograd and is a limited-domain natural-language system that can understand typed
commands and move blocks around on a surface. The blocks-world problem is known as Sussman Anomaly.
In order to compare the variety of methods of planning, we should find it useful to look at all of them in a
single domain that is complex enough that the need for each of the mechanisms is apparent yet simple enough
that easy-to-follow examples can be found. Features of block world environment:
• There is a flat surface on which blocks can be placed.
• There are a number of square blocks, all the same size.
• They can be stacked one upon the other.
• There is robot arm that can manipulate the blocks.
Illustration: The n blocks-world problem has three blocks labeled as 'A', 'B', 'C' which are allowed to rest on
the flat surface. The given condition is that only one block can be moved at a time to achieve the goal. The
start state and goal state are shown in the following diagram:
A
A B
B C C
Initial state Goal State
An early AI study of planning and robotics (STRIPS) used a block world in which a robot arm performed tasks
involving the manipulation of blocks. In this problem you will model a simple block world under certain rules
and constraints. Rather than determine how to achieve a specified state, you will "program" a robotic arm to
respond to a limited set of commands. The problem is to parse a series of commands that instruct a robot arm
in how to manipulate blocks that lie on a flat table. Initially there are n blocks on the table (numbered from 0
to n-1)
ILLUSTRATION
Actions of the robot arm
• UNSTACK (A, B): Pick up block A from its current position on block B.
• STACK (A, B): Place block A on block B.
• PICKUP(A): Pick up block A from the table and hold it.
• PUTDOWN(A): Put block A down on the table.
Notice that the robot arm can hold only one block at a time.
Predicates
In order to specify both the conditions under which an operation may be performed and the results of
performing it, we need the following predicates:
• ON (A, B): Block A is on Block B.
• ONTABLES(A): Block A is on the table.
• CLEAR(A): There is nothing on the top of Block A.
• HOLDING(A): The arm is holding Block A.
• ARMEMPTY: The arm is holding nothing.
Robot problem-solving systems (STRIPS)
• List of new predicates that the operator causes to become true is ADD List
• Moreover, List of old predicates that the operator causes to become false is DELETE List
• PRECONDITIONS list contains those predicates that must be true for the operator to be
• applied.
STRIPS style operators for BLOCKs World
STACK (x, y)
P: CLEAR(y)^HOLDING(x)
D: CLEAR(y)^HOLDING(x)
A: ARMEMPTYÔN (x, y)
UNSTACK (x, y)
PICKUP(x)
P: CLEAR(x) ^ ONTABLE(x) ÂRMEMPTY
D: ONTABLE(x) ^ ARMEMPTY
A: HOLDING(x)
PUTDOWN(x)
Goal stack planning
To start with goal stack is simply:
ON (C, A) ÔN (B, D) ÔNTABLE(A)ÔNTABLE(D). This problem is separate into four sub-problems,
one for each component of the goal. Two of the sub-problems ONTABLE(A) and ONTABLE(D) are
already true in the initial state. Alternative 1: Goal Stack:
• ON (C, A)
• ON (B, D)
• ON (C, A) ÔN (B, D) ÔTAD
Alternative 2: Goal stack:

• ON (B, D)
• ON (C, A)
Exploring Operators
Pursuing alternative 1, we check for operators that could cause ON (C, A). Out of the 4 operators, there is
only one STACK. So, it yields:
o STACK (C, A)
o ON (B, D)
o ON (C, A) ÔN (B, D) ÔTAD
Preconditions for STACK (C, A) should be satisfied, we must establish them as sub-goals:
• CLEAR(A)
• HOLDING(C)
• CLEAR(A)^HOLDING(C)
• STACK (C, A) o ON (B, D)
Here we exploit the Heuristic that if HOLDING is one of the several goals to be achieved at once, it should
be tackled last. Next, we see if CLEAR(A) is true. It is not. The only operator that could make it true is
UNSTACK (B, A). Also, this produces the goal stack:
• ON (B, A)
• CLEAR(B)
• ON (B, A) ^CLEAR(B)ÂRMEMPTY
• UNSTACK (B, A)
• HOLDING(C)
• CLEAR(A)^HOLDING(C)
• STACK (C, A)
• ON (B, D)
We see that we can pop predicates on the stack till we reach HOLDING(C) for which we need to find a suitable
operator. Moreover, the operators that might make HOLDING(C) true: PICKUP(C) and UNSTACK (C, x).
Without looking ahead, since we cannot tell which of these operators is appropriate. Also, we create two
branches of the search tree corresponding to the following goal stacks:
ALT1 ALT2
ONTABLE (C, x) ON (C, x)
CLEAR(C) CLEAR(C)
ARMEMPTY ARMEMPTY
ONTABLE (C, x) ON (C, x) ^ CLEAR(C)ÂRMEM
^CLEAR(C)ÂRMEMPTY PTY
PICKUP(C) UNSTACK (C, x)
CLEAR(A)^HOLDING(C) CLEAR(A)^HOLDING(C)
STACK (C, A) STACK (C, A)
ON (B, D) ON (B, D)
ON (C, A) ÔN (B, D) ^ OTAD ON (C, A) ÔN (B, D) ^ OTAD
Complete plan
• UNSTACK (C, A)
• PUTDOWN (C)
• PICKUP(A)
• STACK (A, B)
• UNSTACK (A, B)
• PUTDOWN(A)
• PICKUP(B)
• STACK (B, C)
• PICKUP(A)
• STACK (A, B)
COMPONENTS OF A PLANNING SYSTEM
1. Choose the best rule to apply next, based on the best available heuristic information.
• The most widely used technique for selecting appropriate rules to apply is first to isolate a set of
differences between desired goal state and then to identify those rules that are relevant to reduce those
differences.
• If there are several rules, a variety of other heuristic information can be exploited to choose among
them.
2. Apply the chosen rule to compute the new problem state that arises from its application.
• In simple systems, applying rules is easy. Each rule simply specifies the problem state that would result
from its application.
• In complex systems, we must be able to deal with rules that specify only a small part of the complete
problem states.
• One way is to describe, for each action, each of the changes it makes to the state description.
3. Detect when a solution is found.
• A planning system has succeeded in finding a solution to a problem when it has found a sequence of
operators that transform the initial problem state into the goal state.
• How will it know when this has done?
• In simple problem-solving systems, this question is easily answered by a straightforward match of the
state descriptions.
• One of the representative systems for planning systems is predicate logic. Suppose that as a part of
our goal, we have the predicate P(x).
• To see whether P(x) satisfied in some state, we ask whether we can prove P(x) given the assertions
that describe that state and the axioms that define the world model. Detect dead ends so that they can
abandon and the system’s effort directed in more fruitful directions.
• As a planning system is searching for a sequence of operators to solve a particular problem, it must be
able to detect when it is exploring a path that can never lead to a solution.
• The same reasoning mechanisms that can use to detect a solution can often use for detecting a dead
end.
• If the search process is reasoning forward from the initial state. It can prune any path that leads to a
state from which the goal state cannot reach.
• If search process reasoning backward from the goal state, it can also terminate a path either because it
is sure that the initial state cannot reach or because little progress made. Detect when an almost correct
solution has found and employ special techniques to make it totally correct.
• The kinds of techniques discussed are often useful in solving nearly decomposable problems.
• One good way of solving such problems is to assume that they are completely decomposable, proceed
to solve the sub-problems separately. And then check that when the sub-solutions combined. They do
in fact give a solution to the original problem.
COMPONENTS FOR REPRESENTING AN ACTION
• Action description
• Precondition
• Effect
TYPES OF PLANNING
• Classical planning is also called goal stack planning, the environment is fully observable, deterministic,
finite, static and discrete.
• Non classical planning is also called nonlinear planning, the environment is partially observable or
stochastic.
GOAL STACK PLANNING METHOD
The goal stack planning method attacks problems involving conjoined goals by solving the goals one at a time,
in order. A plan generated by this method contains a sequence of operators for attaining the first goal, followed
by a complete sequence for the second goal etc. The problem solver makes use of a single stack that contains
both goals and operators that have proposed to satisfy those goals. The problem solver also relies on a database
that describes the current situation and a set of operators described as PRECONDITION, ADD and DELETE
lists.
At each succeeding step of the problem-solving process, the top goal on the stack will pursue. When a sequence
of operators that satisfies it, found, that sequence applied to the state description, yielding new description.
Next, the goal that then at the top of the stack explored. And an attempt made to satisfy it,
starting from the situation that produced as a result of satisfying the first goal. This process continues until
the goal stack is empty. Then as one last check, the original goal compared to the final state derived from the
application of the chosen operators. If any components of the goal not satisfied in that state. Then those
unsolved parts of the goal reinserted onto the stack and the process resumed.
NONLINEAR PLANNING USING CONSTRAINT POSTING
This planning is used to set a goal stack and is included in the search space of all possible sub goal
orderings. It handles the goal interactions by interleaving method. It has the following features:
• The operators used to solve one sub-problem may interfere with the solution to a previous
sub-problem.
• Most problems require an intertwined plan in which multiple sub-problems worked on
simultaneously.
• It is not composed of a linear sequence of complete sub-plans.
Constraint posting
• The idea of constraint posting is to build up a plan by incrementally hypothesizing operators, partial
orderings between operators, and binding of variables within operators.
• At any given time in the problem-solving process, we may have a set of useful operators but perhaps
no clear idea of how those operators should order with respect to each other.
• A solution is a partially ordered, partially instantiated set of operators to generate an actual plan. And
we convert the partial order into any number of total orders.
Constraint posting versus state space search
State Space Search
• Moves in the space: Modify world state via operator
• Model of time: Depth of node in search space
• Plan stored in Series of state transitions
Constraint Posting Search
• Moves in the space: Add operators, Oder Operators, bind variables Or Otherwise constrain plan
• Model of Time: Partially ordered set of operators
• Plan stored in Single node
Algorithm: Nonlinear Planning (TWEAK)
1. Initialize S to be the set of propositions in the goal state.
2. Remove some unachieved proposition P from S.
3. Moreover, Achieve P by using step addition, promotion, DE clobbering, simple establishment or
separation.
4. Review all the steps in the plan, including any new steps introduced by step addition, to see if any of their
preconditions unachieved. Add to S the new set of unachieved preconditions.
5. Also, If S is empty, complete the plan by converting the partial order of steps into a total order, instantiate
any variables as necessary.
6. Otherwise, go to step 2.
PARTIAL-ORDER PLANNING
It is a set of actions that make up the steps of the plan. These are taken from the set of actions in the planning
problem. The “empty” plan contains just the Start and Finish actions. Start has no preconditions and has as its
effect all the literals in the initial state of the planning problem. Finish has no effects and has as its
preconditions the goal literals of the planning problem.
• Advantage: Partial-order planning has a clear advantage in being able to decompose problems into sub
problems.
• Disadvantage: Disadvantage is that it does not represent states directly, so it is harder to estimate how
far a partial-order plan is from achieving a goal.
OVERVIEW OF PARTIAL ORDER PLANNER
– Search in plan space and use least commitment, when possible
• Plan Space Search
– Search space is set of partial plans
– A plan is like a tuple <A, O, B>
• A: Set of actions, of the form (ai: Opj)
• O: Set of orderings, of the form (ai < aj)
• B: Set of bindings, of the form (vi = C), (vi ¹ C), (vi = vj) or (vi ¹ vj)
– Initial plan:
• < {start, finish}, {start < finish}, {}>
• start has no preconditions; Its effects are the initial state
• finish has no effects; Its preconditions are the goals
CONDITIONAL PLANNING
conditional planning deals with incomplete information by constructing a conditional plan that accounts for
each possible situation or contingency that could arise, it is also known as contingency planning. It’s a planning
method for handling bounded indeterminacy which are actions can have unpredictable effects, but the possible
effects can be determined. Ex: flip a coin (outcome will be head or tail). It constructs a conditional plan with
different branches for the different contingencies that could arise. It’s a way to deal with uncertainty by
checking what is actually happening in the environment at predetermined points in the plan
Function AND-OR-GRAPH-SEARCH (problem) returns a conditional plan, or failure

OR-SEARCH (INITIAL-STATE [problem], problem, [])
Function OR-SEARCH (state, problem, path) returns a conditional plan, or failure
if GOAL-TEST [problem](state) then return the empty plan
if state is on path then return failure
for each action, state_set in SUCCESSORS [problem](state) do
plan←AND-SEARCH (state_set, problem, [state | path])
if plan! = failure then return [action | plan]
return failure
Function AND-SEARCH (state_set, problem, path) returns a conditional plan, or failure
for each S(i) in state_set do
plan(i) ← OR-SEARCH(S(i), problem, path)
if plan = failure then return failure
return [if s (1) then plan (1) else if s (2) then plan (2) else… if S(n-1)
then plan(n-1) else plan(n)]
Conditional planning algorithm
PROBABILISTIC REASONING AND UNCERTAINTY
Uncertainty means that many of the simplifications that are possible with deductive inference are no longer
valid. Utility theory says that every state has a degree of usefulness, or utility to an agent, and that the agent
will prefer states with higher utility. The use utility theory to represent and reason with preferences.
Probability provides the way of summarizing the uncertainty that comes from our laziness and ignorance e.g.
medical diagnosis when using FOL rules:
• Laziness: It is hard to lift complete set of antecedents of consequence, needed to ensure and exception
less rule.
• Theoretical Ignorance: Medical science has no complete theory for the domain.
• Practical ignorance: Even if we know all the rules, we may be uncertain about a particular item needed.
Probability statements do not have quite the same kind of semantics known as evidences, preferences as
eexpressed by uutilities are ccombined with pprobabilities in the general theory of rrational ddecisions Called
Decision Theory:
Decision Theory =Probability Theory + Utility Theory.
Conditional probability occurs once the agents has obtained some evidence concerning the previously
unknown propositions making up the domain conditional or posterior probabilities with the notation p(A/B)
is used. This is important that p(A/B) can only be used when all be is known.
Conditional probability is defined as:
• It means for any value x of A and any value y of B
• If A and B are independent then
• Conditional probabilities can represent causal relationships in both directions.
i. From cause to (probable) effects
ii. From effect to (probable) cause
Example:
• P(Cavity/Toothache) =0.8
Conditional probabilities can be defined in terms of unconditional probabilities. The equation is:
P(a/b) =p(a^b)/p(b)
Which holds whenever p(b)>0. This equation can be written as:
P(a^b) =p(a| b)p(b)
Prior probability is associated with a proposition ‘a’ is the degree of belief according to it in the absence of
any other information is written as p(a)
Example:
If the prior probability that has cavity is 0.1
It is written as P (Cavity= True) =0.1 0r P(Cavity)=0.1
P(a) can be used only when there is no other information. When new information is known must reason with
the conditional probability of the given information.
Prior probability distribution
If we want to have probabilities of all the possible values of a random variable probability distribution is used
and this type of notations simplifies many equations.E.g.:
Instead of writing the following four equations:
P (Weather= sunny) =0.7
P (Weather= rain) =0.2
P (Weather= cloudy) =0.08
P (Weather= snow) =0.02
We have:
P(weather) = (0.7,0.2,0.08,0.02).
Joint probability distribution completely specifies an agent's probability assignments to all propositions in
the domain. The joint probability distribution p (x1, x2, --------Xn) assigns probabilities to all possible atomic
events; where x1, x2------Xn=variables. This can be applicable in expressions such as p (weather, cavity) to
denote the probabilities of all combinations of the set of random variables. P (Weather, Cavity) can be
represented by a 4*2 table of probabilities.
Full joint probability distribution: The complete set of random variables used to describe the world. A joint
probability distribution covers this complete set
Example:
If the world consists of just the variables cavity, toothache, and weather, then the full joint distribution is
shown by:
P (Cavity, Toothache, Weather)
This joint distribution can be represented as a 2*2*4 table with 16 entries. This distribution specifies the
probability specifies the probability of every atomic event and is a complete specification of one’s uncertainty
about the world.
BAYESIAN NETWORK
Bayesian network represent the dependencies among variables and to give a concise specification of any full
joint probability distribution. Bayesian network is a directed graph in which each node is annotated with
quantitative probability information. It is also called a Bayes network, belief network, decision network,
or Bayesian model.
Specifications:
• A set of random variables makes up the nodes of the network variables may be discrete or continuous
• A set of directed links or arrows connects pairs of nodes. If there is an arrow from node x to node y, x
is said to be a parent of y.
• Each node Xi has a conditional probability distribution P (Xi| Parents (Xi)) that quantifies the effect of
the parents on the node.
• The graph has no directed cycles
There are two ways for understanding the semantics of Bayesian networks:
• The network as a representation of the joint probability distribution- how to construct networks.
• Network as an encoding of a collection of conditional independence statements- designing inference
procedures.
Note: Bayesian Network can be used for building models from data and experts’ opinions, and it consists of
two parts:
i. Directed Acyclic Graph
ii. Table of conditional probabilities.
Real world applications are probabilistic in nature, and to represent the relationship between multiple events,
we need a Bayesian network. It can also be used in various tasks including prediction, anomaly detection,
diagnostics, automated insight, reasoning, time series prediction, and decision making under uncertainty. The
generalized form of Bayesian network that represents and solve decision problems under uncertain knowledge
is known as an Influence diagram.
A Bayesian network graph is made up of nodes and Arcs (directed links), where:
➢ Each node corresponds to the random variables, and a variable can be continuous or discrete.
➢ Arc or directed arrows represent the causal relationship or conditional probabilities between
random variables. These directed links or arrows connect the pair of nodes in the graph.
These links represent that one node directly influence the other node, and if there is no directed link
that means that nodes are independent with each other
➢ In the above diagram, A, B, C, and D are random variables represented by the nodes of the
network graph.
➢ If we are considering node B, which is connected with node A by a directed arrow, then node
A is called the parent of Node B.
➢ Node C is independent of node A.
The Bayesian network has mainly two components:
➢ Causal Component
➢ Actual numbers
Each node in the Bayesian network has condition probability distribution P (Xi |Parent (Xi)), which
determines the effect of the parent on that node. Bayesian network is based on Joint probability distribution
and conditional probability. So, let's first understand the joint probability distribution:
Joint probability distribution: If we have variables x1, x2, x3..., Xn, then the probabilities of a different
combination of x1, x2, x3... Xn, are known as Joint probability distribution.
P [x1, x2, x3......, Xn], it can be written as the following way in terms of the joint probability distribution.
= P [x1| x2, x3..., Xn] P [x2, x3......, Xn]
= P [x1| x2, x3......, Xn] P [x2|x3......, Xn] .... P[xn-1|xn] P[Xn].
In general, for each variable Xi, we can write the equation as:
P (Xi|Xi-1, ........., X1) = P (Xi |Parents (Xi))
INFERENCES IN BAYESIAN NETWORK
➢ Enumeration
➢ The variable elimination algorithm
➢ The complexity of exact inference
➢ Clustering algorithms
➢ Approximate inference
➢ Direct sampling
➢ Rejection Sampling
➢ Likelihood weighting
➢ Inference by Markov chain simulation
EXPLANATION OF BAYESIAN NETWORK
Let's understand the Bayesian network through an example by creating a directed acyclic graph:
Example: Harry installed a new burglar alarm at his home to detect burglary. The alarm reliably responds at
detecting a burglary but also responds for minor earthquakes. Harry has two neighbours David and Sophia,
who have taken a responsibility to inform Harry at work when they hear the alarm. David always calls Harry
when he hears the alarm, but sometimes he got confused with the phone ringing and calls at that time too. On
the other hand, Sophia likes to listen to high music, so sometimes she misses to hear the alarm. Here we would
like to compute the probability of Burglary Alarm.
Problem:
Calculate the probability that alarm has sounded, but there is neither a burglary, nor an earthquake
occurred, and David and Sophia both called the Harry.
Solution:
➢ The Bayesian network for the above problem is given below. The network structure is showing that
burglary and earthquake is the parent node of the alarm and directly affecting the probability of alarm's
going off, but David and Sophia's calls depend on alarm probability.
➢ The network is representing that our assumptions do not directly perceive the burglary and also do not
notice the minor earthquake, and they also not confer before calling.
➢ The conditional distributions for each node are given as conditional probabilities table or CPT.
➢ Each row in the CPT must be sum to 1 because all the entries in the table represent an exhaustive set
of cases for the variable.
➢ In CPT, a boolean variable with k boolean parents contains 2K probabilities. Hence, if there are two
parents, then CPT will contain 4 probability values
List of all events occurring in this network:
➢ Burglary (B)
➢ Earthquake(E)
➢ Alarm(A)
➢ David Calls(D)
➢ Sophia calls(S)
We can write the events of problem statement in the form of probability: P [D, S, A, B, E], can rewrite
the above probability statement using joint probability distribution:
P [D, S, A, B, E] = P [D | S, A, B, E]. P [S, A, B, E]

=P [D | S, A, B, E]. P [S | A, B, E]. P [A, B, E]
= P [D| A]. P [ S| A, B, E]. P [ A, B, E]
= P [D | A]. P [ S | A]. P [A| B, E]. P [B, E]
= P [D | A]. P [S | A]. P [A| B, E]. P [B |E]. P[E]
Let's take the observed probability for the Burglary and earthquake component:
P (B= True) = 0.002, which is the probability of burglary.
P (B= False) = 0.998, which is the probability of no burglary.
P (E= True) = 0.001, which is the probability of a minor earthquake
P (E= False) = 0.999, Which is the probability that an earthquake not occurred.
We can provide the conditional probabilities as per the below tables:
Conditional probability table for Alarm A:
The Conditional probability of Alarm A depends on Burglar and earthquake:
B E P (A= True) P (A=False)
True True 0.94 0.06

True False 0.95 0.04
False True 0.31 0.69
False False 0.001 0.999
Conditional probability table for David Calls:
The Conditional probability of David that he will call depends on the probability of Alarm.
A P (D= True) P (D= False)
True 0.91 0.09
False 0.05 0.95
Conditional probability table for Sophia Calls:
The Conditional probability of Sophia that she calls is depending on its Parent Node "Alarm."
A P (S= True) P (S= False)
True 0.75 0.25
False 0.02 0.98
From the formula of joint distribution, we can write the problem statement in the form of probability
distribution:
P (S, D, A, ¬B, ¬E) = P (S|A) *P (D|A) *P (A|¬B ^ ¬E) *P (¬B) *P (¬E).
= 0.75* 0.91* 0.001* 0.998*0.999
= 0.00068045.
Hence, a Bayesian network can answer any query about the domain by using Joint distribution.
The semantics of Bayesian Network:
There are two ways to understand the semantics of the Bayesian network, which is given below:
1. To understand the network as the representation of the Joint probability distribution.
It is helpful to understand how to construct the network.
2. To understand the network as an encoding of a collection of conditional
independence statements.
APPLICATION OF BAYES' RULE.
Suppose you have been tested positive for a disease; what is the probability that you actually have the disease?
It depends on the accuracy and sensitivity of the test, and on the background (prior) probability of the disease.
Let P (Test=+ve | Disease=true) = 0.95, so the false negative rate,
P (Test=-ve | Disease=true), is 5%.
Let P (Test=+ve | Disease=false) = 0.05, so the false positive rate is also 5%.
Suppose the disease is rare: P(Disease=true) = 0.01 (1%).
Let D denote Disease and "T=+ve" denote the positive test.
Then,
P (T=+ve |D=true) * P(D=true)
P (D=true |T=+ve) = ------------------------------------------------------------
P (T=+ve| D=true) * P(D=true) + P (T=+ve| D=false) * P(D=false)
0.95 * 0.01
= -------------------------------- = 0.161
0.95*0.01 + 0.05*0.99
So, the probability of having the disease given that you tested positive is just 16%. This seems too low, but
here is an intuitive argument to support it. Of 100 people, we expect only 1 to have the disease, but we expect
about 5% of those (5 people) to test positive. So of the 6 people who test positive, we only expect 1 of them
to actually have the disease; and indeed 1/6 is approximately 0.16.
BELIEF NETWORK
A Belief network with one node for each state and sensor variable for each time step is called a Dynamic
Belief Network. (DBN). A decision network is obtained by adding utility nodes, decision nodes for action in
DBN. DDN calculates the expected utility of each decision sequence.
A belief network is a graph in which the following holds
• A set of random variables
• A set of directive links or arrows connects pairs of nodes.
• The conditional probability table for each node
• The graph has no directed cycles.
Uses of a belief network
• Making decisions based on probabilities in the network and on the agent's utilities;
deciding which additional evidence variables should be observed in order to gain useful information;
• Performing sensitivity analysis to understand which aspects of the model have the greatest impact on
the probabilities of the query variables (and therefore must be accurate);
• Explaining the results of probabilistic inference to the user.

Natural Language Processing Parsing Techniques:: Unit IV

Caricato da

Informazioni sul documento

Titolo originale

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

Natural Language Processing Parsing Techniques:: Unit IV

Caricato da

Copyright:

Formati disponibili

Unit IV Natural Language Processing Parsing Techniques:

Initial state Goal State

Alternative 2: Goal stack:

Function AND-OR-GRAPH-SEARCH (problem) returns a conditional plan, or failure

Conditional planning algorithm

P [D, S, A, B, E] = P [D | S, A, B, E]. P [S, A, B, E]

True True 0.94 0.06

Potrebbero piacerti anche