Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
com
QUESTION BANK
PART -A (2 Marks)
PROBLEM SOLVING
1. What is artificial intelligence?
The exciting new effort to make computers think machines with minds in the
full and literal sense. Artificial intelligence systemizes and automates intellectual tasks and is
therefore potentially relevant to any sphere of human intellectual activities.
2. List down the characteristics of intelligent agent.
Intelligent Agents are autonomous because they function without requiring that the Console
or Management Server be running.
An Agent that services a database can run when the database is down, allowing the Agent to
start up or shut down the database.
The Intelligent Agents can independently perform administrative job tasks at any time,
without active participation by the administrator.
Similarly, the Agents can autonomously detect and react to events, allowing them to monitor
the system and execute a fixit job to correct problems without the intervention of the
administrator.
The golden section search is a technique for finding the extremum (minimum or maximum) of a
strictly unimodal function by successively narrowing the range of values inside which the extremum
is known to exist. The technique derives its name from the fact that the algorithm maintains the
function values for triples of points whose distances form a golden ratio. The algorithm is the limit
of Fibonacci search (also described below) for a large number of function evaluations.
5. List the capabilities that a computer should possess for conducting a Turing Test?
The capabilities that a computer should possess for conducting a Turing Test are,
Natural Language Processing;
Knowledge Representation;
Automated Reasoning;
www.Vidyarthiplus.com
R. Loganathan, AP/CSE. Mahalakshmi Engineering College, Trichy
www.Vidyarthiplus.com
6. Define an agent.
An agent is anything that can be viewed as perceiving its environment through Sensors and acting
upon the environment through effectors.
9. What are the factors that a rational agent should depend on at any given time?
The factors that a rational agent should depend on at any given time are,
The performance measure that defines criterion of success;
Agent’s prior knowledge of the environment;
Action that the agent can perform;
The agent’s percept sequence to date.
Game playing;
Autonomous control;
Diagnosis;
Logistics planning;
Robotics.
PART- B
A task environment is effectively fully observable if the sensors detect all aspects that are
relevant to the choice of action.
An environment might be partially observable because of noisy and inaccurate sensors or
because parts of the state are simply missing from the sensor data.
Deterministic vs. stochastic:
If the next state of the environment is completely determined by the current state and the
action executed by the agent, then we say the environment is deterministic; other-wise, it is
stochastic.
Episodic vs. sequential:
In an episodic task environment, the agent's experience is divided into atomic episodes.
Each episode consists of the agent perceiving and then performing a single action.
Crucially, the next episode does not depend on the actions taken in previous episodes.
For example, an agent that has to spot defective parts on an assembly line bases each
decision on the current part, regardless of previous decisions.
In sequential environments, on the other hand, the current decision could affect all
future decisions. Chess and taxi driving are sequential.
www.Vidyarthiplus.com
R. Loganathan, AP/CSE. Mahalakshmi Engineering College, Trichy
www.Vidyarthiplus.com
results in breadth-first-search.
The FIFO queue puts all newly generated successors at the end of the
queue,which means that
Shallow nodes are expanded before deeper nodes.
Breadth-first searches on a simple binary tree. At each stage, the node to be expanded next is
indicated by a marker.
Properties of breadth-first-search
www.Vidyarthiplus.com
R. Loganathan, AP/CSE. Mahalakshmi Engineering College, Trichy
www.Vidyarthiplus.com
the solution is at depth d. In the worst case, we would expand all but the last node at level
degenerating bd+1 - b nodes at level d+1.
Then the total number of nodes generated is
b + b2 + b3 + …+ bd + ( bd+1 + b) = O(bd+1).
Every node that is generated must remain in memory, because it is either part of the fringe or is
an ancestor of a fringe node. The space complexity is, therefore, the same as the time complexity
UNIFORM-COST SEARCH:
Instead of expanding the shallowest node, uniform-cost search expands the
node n with the lowest path cost. Uniform-cost search does not care about the number of steps a
path has, but only about their total cost.
www.Vidyarthiplus.com
R. Loganathan, AP/CSE. Mahalakshmi Engineering College, Trichy
www.Vidyarthiplus.com
Properties of Uniform-cost-search
2 DEPTH-FIRST-SEARCH
Depth-first-search always expands the deepest node in the current fringe of
the search tree. The progress of the search is illustrated in figure 1.31. The search proceeds
immediately to the deepest level of the search tree, where the nodes have no successors. As
those nodes are expanded, they are dropped from the fringe, so then the search “backs up” to
the next shallowest node that still has unexplored successors.
www.Vidyarthiplus.com
R. Loganathan, AP/CSE. Mahalakshmi Engineering College, Trichy
www.Vidyarthiplus.com
www.Vidyarthiplus.com
www.Vidyarthiplus.com
Using the same assumptions as Figure 1.15, and assuming that nodes at the same
depth as the goal node have no successors, we find the depth-first-search would require 118
kilobytes instead of 10 pet bytes, a factor of 10 billion times less space.
Drawback of Depth-first-search
The drawback of depth-first-search is that it can make a wrong choice and get
stuck going down very long(or even infinite) path when a different choice would lead to solution
near the root of the search tree. For example, depth-first-search will explore the entire left sub
tree even if node C is a goal node.
3 DEPTH-LIMITED-SEARCH:
www.Vidyarthiplus.com
www.Vidyarthiplus.com
www.Vidyarthiplus.com
www.Vidyarthiplus.com
www.Vidyarthiplus.com
www.Vidyarthiplus.com
S S
S
Limit = 0 A D
Limit = 1
S S S
A D A D
B D A E
Limit = 2
www.Vidyarthiplus.com
www.Vidyarthiplus.com
5 BIDIRECTIONAL SEARCH:
The idea behind bidirectional search is to run two simultaneous searches one forward from he
initial state and the other backward from the goal, stopping when the two searches meet in the
middle (Figure 1.18)
The motivation is that bd/2 + bd/2 much less than ,or in the figure ,the area of the two small circles
is less than the area of one big circle centered on the start and reaching to the goal.
www.Vidyarthiplus.com
www.Vidyarthiplus.com
Evaluation of search strategies is the branching factor; d is the depth of the shallowest solution;
m is the maximum depth of the search tree; l is the depth limit. Superscript caveats are as
follows: a complete if b is finite; b complete if step costs >= E for positive E; c optimal if step
costs are all identical; d if both directions use breadth-first search.
Informed search strategy is one that uses problem-specific knowledge beyond the
definition of the problem itself. It can find solutions more efficiently than uninformed strategy.
Best-first search
Best-first search is an instance of general TREE-SEARCH or GRAPH-SEARCH
algorithm in which a node is selected for expansion based on an evaluation function f(n). The
node with lowest evaluation is selected for expansion, because the evaluation measures the
distance to the goal.
This can be implemented using a priority-queue, a data structure that will maintain the
fringe in ascending order of f-values.
Heuristic functions:
A heuristic function or simply a heuristic is a function that ranks alternatives in various
search algorithms at each branching step basing on available information in order to make a
decision which branch is to be followed during a search.
The key component of Best-first search algorithm is a heuristic function, denoted by
R. Loganathan, AP/CSE. Mahalakshmi Engineering College, Trichy
www.Vidyarthiplus.com
www.Vidyarthiplus.com
h (n):
h (n) = estimated cost of the cheapest path from node n to a goal node.
For example, in Romania, one might estimate the cost of the cheapest path from Arad to
Bucharest via a straight-line distance from Arad to Bucharest.
Heuristic function is the most common form in which additional knowledge is imparted
to the search algorithm.
www.Vidyarthiplus.com
www.Vidyarthiplus.com
Using the straight-line distance heuristic hSLD ,the goal state can be reached faster.
www.Vidyarthiplus.com
www.Vidyarthiplus.com
Figure 2.2 shows the progress of greedy best-first search using hSLD to find a path from Arad to
Bucharest. The first node to be expanded from Arad will be Sibiu, because it is closer to
Bucharest than either Zerind or Timisoara. The next node to be expanded will be Fagaras,
www.Vidyarthiplus.com
www.Vidyarthiplus.com
www.Vidyarthiplus.com
www.Vidyarthiplus.com
www.Vidyarthiplus.com
www.Vidyarthiplus.com
It keeps track of the f-value of the best alternative path available from any ancestor of the
current node.
If the current node exceeds this limit, the recursion unwinds back to the alternative path.
As the recursion unwinds, RBFS replaces the f-value of each node along the path with the
best f-value of its children.
function RFBS( problem, node, f_limit) return a solution or failure and a new f-
cost limit
if GOAL-TEST[problem](STATE[node]) then return node
successors EXPAND(node, problem)
if successors is empty then return failure, ∞
for each s in successors do
f [s] max(g(s) + h(s), f [node])
repeat
best the lowest f-value node in successors
if f [best] > f_limit then return failure, f [best]
alternative the second lowest f-value among successors
result, f [best] RBFS(problem, best, min(f_limit, alternative))
if result failure then return result
www.Vidyarthiplus.com
www.Vidyarthiplus.com
Stages in an RBFS search for the shortest route to Bucharest. The f-limit value for each
recursive call is shown on top of each current node. (a) The path via Rimnicu Vilcea is
followed until the current best leaf (Pitesti) has a value that is worse than the best
alternative path (Fagaras).
(b) The recursion unwinds and the best leaf value of the forgotten sub tree (417) is backed
up to Rimnicu Vilcea; then Fagaras is expanded, revealing a best leaf value of 450.
www.Vidyarthiplus.com
www.Vidyarthiplus.com
(c) The recursion unwinds and the best leaf value of the forgotten sub tree (450) is backed
up to Fagaras; then Rimni Vicea is expanded. This time because the best alternative
path(through Timisoara) costs at least 447,the expansion continues to Bucharest
RBFS Evaluation:
RBFS is a bit more efficient than IDA*.Still excessive node generation (mind changes). Like
A*, optimal if h(n) is admissible. Space complexity is O(bd).IDA* retains only one single
number (the current f-cost limit).Time complexity difficult to characterize. Depends on accuracy
if h(n) and how often best path changes.IDA* en RBFS suffer from too little memory.
2 HEURISTIC FUNCTIONS:
A heuristic function or simply a heuristic is a function that ranks alternatives in various
search algorithms at each branching step basing on available information in order to make a
decision which branch is to be followed during a search
www.Vidyarthiplus.com
www.Vidyarthiplus.com
The 8-puzzle:
The 8-puzzle is an example of Heuristic search problem. The object of the puzzle is to
slide the tiles horizontally or vertically into the empty space until the configuration matches the
goal configuration (Figure 2.4)
The average cost for a randomly generated 8-puzzle instance is about 22 steps. The
branching factor is about 3. (When the empty tile is in the middle, there are four possible moves;
when it is in the corner there are two; and when it is along an edge there are three).
This means that an exhaustive search to depth 22 would look at about 322 approximately
= 3.1 X 1010 states. By keeping track of repeated states, we could cut this down by a factor of
about 170,000, because there are only 9!/2 = 181,440 distinct states that are reachable. This is a
manageable number, but the corresponding number for the 15-puzzle is roughly 1013.
If we want to find the shortest solutions by using A*, we need a heuristic function that
never overestimates the number of steps to the goal.
The two commonly used heuristic functions for the 15-puzzle is:
h1 = the number of misplaced tiles.
For Figure 2.4, all of the eight tiles are out of position, so the start state would have h1 =
8. h1 is an admissible heuristic.
h2 = the sum of the distances of the tiles from their goal positions. This is called the city block
distance or Manhattan distance.
h2 is admissible ,because all any move can do is move one tile one step closer to the goal.
Tiles 1 to 8 in start state give a Manhattan distance of
h2 = 3 + 1 + 2 + 2 + 2 + 3 + 3 + 2 = 18.
Neither of these overestimates the true solution cost, which is 26.
www.Vidyarthiplus.com
www.Vidyarthiplus.com
For example, if A* finds a solution at depth 5 using 52 nodes, then effective branching factor is
1.92.
A well designed heuristic would have a value of b* close to 1, allowing failure large
problems to be solved.
To test the heuristic functions h1 and h2, 1200 random problems were generated with
solution lengths from 2 to 24 and solved them with iterative deepening search and with A* search
using both h1 and h2. Figure 2.5 gives the average number of nodes expanded by each strategy
and the effective branching factor.
The results suggest that h2 is better than h1, and is far better than using iterative deepening
search. For a solution length of 14, A* with h2 is 30,000 times more efficient than uninformed
iterative deepening search.
www.Vidyarthiplus.com
www.Vidyarthiplus.com
If the rules of the 8-puzzle are relaxed so that a tile can move anywhere, then h1 (n) gives
the shortest solution
If the rules are relaxed so that a tile can move to any adjacent square, then h2 (n) gives
the shortest solution.
4. Explain about Local Search Algorithms And Optimization Problems.
In many optimization problems, the path to the goal is irrelevant; the goal state itself is
the solution
For example, in the 8-queens problem, what matters is the final configuration of queens,
not the order in which they are added.
In such cases, we can use local search algorithms. They operate using a single current
state (rather than multiple paths) and generally move only to neighbors of that state.
The important applications of these classes of problems are
(a) Integrated-circuit design,
(b)Factory-floor layout,
(c) Job-shop scheduling
(d)Automatic programming,
(e) Telecommunications network optimization,
(f)Vehicle routing, and
(g) Portfolio management.
OPTIMIZATION PROBLEMS:
In addition to finding goals, local search algorithms are useful for solving pure
optimization problems, in which the aim is to find the best state according to an objective
function.
www.Vidyarthiplus.com
www.Vidyarthiplus.com
Figure 2.6 A one dimensional state space landscape in which elevation corresponds to
the objective function. The aim is to find the global maximum. Hill climbing search
modifies the current state to try to improve it, as shown by the arrow. The various
topographic features are defined in the text.
1. Hill-climbing search:
The hill-climbing search algorithm as shown in Figure 2.6 is simply a loop that
continually moves in the direction of increasing value – that is, uphill. It terminates when it
reaches a “peak” where no neighbor has a higher value.
www.Vidyarthiplus.com
www.Vidyarthiplus.com
Hill-climbing is sometimes called greedy local search because it grabs a good neighbor state
without thinking ahead about where to go next. Greedy algorithms often perform quite well.
www.Vidyarthiplus.com
www.Vidyarthiplus.com
Figure 2.8 Illustration of why ridges cause difficulties for hill-climbing. The grid of
states(dark circles) is superimposed on a ridge rising from left to right, creating a sequence
of local maxima that are not directly connected to each other. From each local maximum,
all the available options point downhill.
Hill-climbing variations
Stochastic hill-climbing: Random selection among the uphill moves. The selection
probability can vary with the steepness of the uphill move.
First-choice hill-climbing: stochastic hill climbing by generating successors randomly
until a better one is found.
Random-restart hill-climbing: Tries to avoid getting stuck in local maxima.
www.Vidyarthiplus.com
www.Vidyarthiplus.com
situation, it is always accepted. Otherwise, the algorithm accepts the move with some probability
less than 1. The probability decreases exponentially with the “badness” of the move – the amount
E by which the evaluation is worsened.
Simulated annealing was first used extensively to solve VLSI layout problems in the
early 1980s. It has been applied widely to factory scheduling and other large-scale optimization
tasks.
Figure 2.9 The simulated annealing search algorithm, a version of stochastic hill climbing
where some downhill moves are allowed.
3. Genetic algorithms:
A Genetic algorithm (or GA) is a variant of stochastic beam search in which successor
states are generated by combining two parent states, rather than by modifying a single state.
Like beam search, Genetic algorithm begin with a set of k randomly generated states,
called the population.
Each state, or individual, is represented as a string over a finite alphabet – most
commonly, a string of 0s and 1s. For example, an 8 8-quuens state must specify the positions of
8 queens, each in a column of 8 squares, and so requires 8 x log2 8 = 24 bits.
www.Vidyarthiplus.com
www.Vidyarthiplus.com
Figure 2.10 The genetic algorithm. The initial population in (a) is ranked by the fitness
function in (b), resulting in pairs for mating in (c). They produce offspring in (d), which are
subjected to mutation in (e).
Figure 2.10 shows a population of four 8-digit strings representing 8-queen states
Fitness function should return higher values for better states.
Cross Over: Each pair to be mated, a cross over point is randomly chosen from the
positions in the string.
Offspring: Created by crossing over the parent strings at the crossover point.
www.Vidyarthiplus.com
www.Vidyarthiplus.com
Figure 2.11 Genetic algorithms. The algorithm is same as the one diagrammed in Figure.
www.Vidyarthiplus.com
www.Vidyarthiplus.com
www.Vidyarthiplus.com
www.Vidyarthiplus.com
www.Vidyarthiplus.com
www.Vidyarthiplus.com
Varieties of CSPs
(i) Discrete variables
a. Finite domains:
The simplest kind of CSP involves variables that are discrete and have finite domains.
Map coloring problems are of this kind. The 8-queens problem can also be viewed as finite-
domain CSP, where the variables Q1,Q2,…..Q8 are the positions each queen in columns 1,….8
and each variable has the domain {1,2,3,4,5,6,7,8}.
If the maximum domain size of any variable in a CSP is d, then the number of possible
complete assignments is O(dn) – that is, exponential in the number of variables. Finite domain
CSPs include Boolean CSPs, whose variables can be either true or false.
b. Infinite domains:
Discrete variables can also have infinite domains – for example, the set of integers or the
set of strings. With infinite domains, it is no longer possible to describe constraints by
enumerating all allowed combination of values.
Instead a constraint language of algebraic inequalities such as Start job1 + 5 <= Startjob3.
www.Vidyarthiplus.com
www.Vidyarthiplus.com
Varieties of constraints:
(i) Unary constraints involve a single variable.
Example: SA # green
(ii) Binary constraints involve pairs of variables.
Example: SA # WA
(iii) Higher order constraints involve 3 or more variables.
Example: cryptarithmetic puzzles.
Figure 2.14 Cryptarithmetic problems. Each letter stands for a distinct digit; the aim
is to find a substitution of digits for letters such that the resulting sum is arithmetically
correct, with the added restriction that no leading zeros are allowed. (b) The
constraint hypergraph for the cryptarithmetic problem, shown in the Alldiff constraint
as well as the column addition constraints. Each constraint is a square box connected
to the variables it contains.
www.Vidyarthiplus.com
www.Vidyarthiplus.com
Figure Part of search tree generated by simple backtracking for the map coloring
problem.
www.Vidyarthiplus.com
www.Vidyarthiplus.com
1. Forward checking
One way to make better use of constraints during search is called forward checking.
Whenever a variable X is assigned, the forward checking process looks at each unassigned
variable Y that is connected to X by a constraint and deletes from Y ’s domain any value that is
inconsistent with the value chosen for X. the following Figure shows the progress of a map-
coloring search with forward checking.
www.Vidyarthiplus.com
www.Vidyarthiplus.com
www.Vidyarthiplus.com
www.Vidyarthiplus.com
www.Vidyarthiplus.com
www.Vidyarthiplus.com
Tree-Structured CSPs
www.Vidyarthiplus.com
www.Vidyarthiplus.com
QUESTION BANK
PART -A (2 Marks)
LOGICAL REASONING
1. What factors determine the selection of forward or backward reasoning
approach for an AI problem?
A search procedure must find a path between initial and goal states. There are two directions in
which a search process could proceed. Reason forward from the initial states: Being formed the
root of the search tree. General the next level of the tree by finding all the rules whose left sides
match the root node, and use their right sides to generate the siblings. Repeat the process until a
configuration that matches the goal state is generated.
5. What is a symbol?
The basic syntactic elements of first-order logic are the symbols. It stands for
objects, relations and functions.
8. What is Logic?
Logic is one which consist of
A formal system for describing states of affairs, consisting of a)
Syntax b) Semantics;
Proof Theory – a set of rules for deducing the entailment of set
sentences.
9. Define a Sentence?
Each individual representation of facts is called a sentence. The sentences are
expressed in a language called as knowledge representation language.
12. What are the three levels in describing knowledge based agent?
The three levels in describing knowledge based agent
Logical level;
Implementation level;
Knowledge level or epistemological level.
R. Loganathan, AP/CSE. Mahalakshmi Engineering College, Trichy
www.Vidyarthiplus.com
www.Vidyarthiplus.com
1^ 2^…….^ n
PRAT B
1. Explain in detail about knowledge engineering process in FOL.
KNOWLEDGE REPRESENTATION
Intelligent agents need knowledge about the world in order to reach good
decisions.
Knowledge is contained in agents in the form of sentences in a knowledge
representation language that are stored in knowledge base.
Logic is the formal systematic study of the principles of valid inference and
correct reasoning.
A system of inference rules and axioms allows certain formulas to be derived,
called theorems: which may be interpreted as true propositions.
Knowledge representation languages should be declarative, compositional,
expressive, context-independent, and unambiguous.
Logical Symbols: These are symbols that have a standard meaning, like: AND, OR,
NOT, ALL, EXISTS, IMPLIES, IFF, FALSE, =.
Constants:
Functions: 0-ary, 1-ary, 2-ary, n-ary. These are usually just identifiers. 0-ary
functions are also called individual constants.
Where predicates return true or false, functions can return any value.
Atomic Formulae
An Atomic Formula is either FALSE or an n-ary predicate applied to n terms: P(t1 t2 ..
tn). In the case that "=" is a logical symbol in the language, (t1 = t2), where t1 and t2 are terms, is
an atomic formula.
Literals
A Literal is either an atomic formula (a Positive Literal), or the negation of an atomic
formula (a Negative Literal). A Ground Literal is a variable-free literal.
Clauses
A Clause is a disjunction of literals. A Ground Clause is a variable-free clause. A Horn
Clause is a clause with at most one positive literal. A Definite Clause is a Horn Clause with
exactly one positive Literal.
Notice that implications are equivalent to Horn or Definite clauses:
(A IMPLIES B) is equivalent to ( (NOT A) OR B)
• an atomic formula, or
• a Universally Quantified Formula, that is a formula of the form (ALL variable formula).
We say that occurrences of variable are bound in formula [we should be more precise].
Or
A formula that is the disjunction of clauses is said to be in Clausal Form. We shall see
that there is a sense in which every formula is equivalent to a clausal form.
Often it is convenient to refer to terms and formulae with a single name. Form or
Expression is used to this end.
Substitutions
Given a term s, the result [substitution instance] of substituting a term t in s for a variable x,
s[t/x], is:
t, if s is the variable x
FALSE, if A is FALSE,
The substitution [t/x] can be seen as a map from terms to terms and from formulae to
formulae. We can define similarly [t1/x1 t2/x2 .. tn/xn], where t1 t2 .. tn are terms and x1 x2 .. xn
are variables, as a map, the [simultaneous] substitution of x1 by t1, x2 by t2, .., of xn by tn. [If
all the terms t1 .. tn are variables, the substitution is called an alphabetic variant, and if they are
ground terms, it is called a ground substitution.] Note that a simultaneous substitution is not the
same as a sequential substitution.
The set of functions (predicates) so introduced form the Functional Basis (Relational Basis) of
the conceptualization.
Given a language L and a conceptualization (U,I), an Assignment is a map from the variables of
L to U. An X-Variant of an assignment s is an assignment that is identical to s everywhere
except at x where it differs.
Given a conceptualization M=(U,I) and an assignment s it is easy to extend s to map each term t
of L to an individual s(t) in U by using induction on the structure of the term.
Then
Formula A is valid or logically true in M iff M satisfies A under any s. We then say that M
is a model of A.
• Formula A is Valid or Logically True iff for any L-structure M and any assignment s, M
satisfies A under s.
{A, NOT A OR B}
B
For example:
{Sam is tall, if Sam is tall then Sam is unhappy}
Sam is unhappy
When we introduce inference rules we want them to be Sound, that is, we want the consequence
of the rule to be a logical consequence of the premises of the rule. Modus Ponens is sound. But
the following rule, called Abduction, is not:
{B, NOT A OR B} is not. For example:
John is wet
If it is raining then John is wet
It is raining gives us a conclusion that is usually, but not always true [John takes a shower even
when it is not raining].
A Logic or Deductive System is a language, plus a set of inference rules, plus a set of logical
axioms [formulae that are valid].
In this case we say that Bn is Derived from GAMMA in D, and in the case that GAMMA is
empty, we say that Bn is a Theorem of D.
Soundness, Completeness, Consistency, Satisfiability
A Logic D is Sound iff for all sets of formulae GAMMA and any formula A:
• if A is derived from GAMMA in D, then A is a logical consequence of GAMMA
R. Loganathan, AP/CSE. Mahalakshmi Engineering College, Trichy
www.Vidyarthiplus.com
www.Vidyarthiplus.com
A Logic D is Complete iff for all sets of formulae GAMMA and any formula A:
A Logic D is Refutation Complete iff for all sets of formulae GAMMA and any formula A:
Note that if a Logic is Refutation Complete then we can enumerate all the logical
consequences of GAMMA and, for any formula A, we can reduce the question if A is or not a
logical consequence of GAMMA to the question: the union of GAMMA and NOT A is or not
consistent. We will work with logics that are both Sound and Complete, or at least Sound and
Refutation Complete.
The knowledge engineer must define that the range of question that the knowledge base
will support and the kind of facts that will be available for each specific problem instance
For example will the same facts include current location
2) ASSEMBLE THE RELEVANT KNOWLEDGE:-
The knowledge engineer might already be an expert in the domain or might need to work
real experts to extract what they know –a process called knowledge
Acquisition:-
For real domain the issue of relevance can be quit difficult for example system for
simulating VLSI design might or might not need to take into account stray capacitance and skin
effects
3) DECIDE ON A VOCABULARY OF PREDICATES, FUNCTION AND CONSTANTS:-
The important domain level concept are translated into logic level names.
This involves many question of knowledge engineering style.
Like programming style this can have a significant impact on the eventual success of the
project.
Once the choice have been made the result is a vocabulary that is known as the ontology
of the domain.
Ontology means a particular theory of the nature of being or existence it determine
What kind of things exists but does not determine their specific properties and
interrelationships.
All gates have one output terminal. Circuits, like gates, have input and output terminals.
to reason about functionality and compare connectivity, it is needed to talk about the
wires themselves, the path the wires take , all the junctions where the two wires comes
together one output terminal is connected to another input terminal without having to
mention the wire that actually connects them.
REPRESENTATION OF GATES:-
A gate must be distinguished from the other gate by remaining them with constants,
x1and x2.
Ways to represent gates:-
Function: - type (x1)=XOR
Binary predicates: type(x1, XOR)
Several individual types predicates: XOR(x1)
The function type avoids the need for axioms stating that each individual gate can have
only one type.
REPRESENTATION OF TERMINALS:-
A gate of circuit can have one or more input terminals and one or more output terminals
Each terminal could be named with a constant. Thus a gate x1 could have terminals
named x1 IN1, X1 IN2 and X1OUT1.
The suggestion into the avoid generating long compound names.
It is probably better to named the gate using a function, like IN (1, X1) to denote the first
input terminal for gate X1. A similar function OUT is used for output terminals.
REPRESENTATION OF CONNECTIVITY:-
The connectivity between the gates can be represented by the predicates connected
(OUT(1,X1),IN(1,X2)).
REPRESENTATION OF SIGNALS:-
To know whether a signal is on or off, use a many predicates ‘on’, which is true when the
signal at the terminal is on.
For answering question like, what are all the possible values of signals at the output terminals of a
circuit C1? introduce to signal values 1 and 0, and a function ‘signal’ that takes a terminal as
argument and denotes the signal value for that terminal
LIFTING:-
Generalized modus ponens is a lifted main of modus ponens – it main modus ponens
from proportional to first – order logic. The key advantage of lifted inference rules over
proportionalization is that they make only those substitutions which are replied to allow
particular inferences to proceed.
UNIFICATION:-
Lifted inference rules require finding substitution that make difficult logical expressions
look identical. This process is called ‘Unification’ and is a key component of all first order
inference algorithms. The UNIFY algorithm takes two sentences and return a ‘unifier’ for then if
are exists,
UNIFY(p,q)= θ where SUBSET(θ,p)=SUBSET(θ,q)
STANDARDIZING AGENT:-
The problem can be avoided by ‘Standardizing agent’ one of the two sentences being
unified, which means renaming its variables to avoid name clashes. For example , we can rename
x in knows(x,Elizabeth) to Z17( a new variable name) without changing its meaning. Now the
unification will work,
UNIFY(knows(John,x), knows(Z17,Elizabeth))={x/Elizabeth,Z17/John}
MOST GENERAL UNIFIER:-
For every unfixable pair of expression, there is a single most general unifier that is unique
up to renaming of variables.
In this case, it is{y/John,x/z}
Occur check:-
When matching a variable against a complex term, one must check whether the variable
itself occur inside the term, if it does, the match fails became no consistent unifier can be
constructed. This is so called occur check make the complexity of the entire algorithm quadratic
in the size of the expression being unified.
R. Loganathan, AP/CSE. Mahalakshmi Engineering College, Trichy
www.Vidyarthiplus.com
www.Vidyarthiplus.com
These queries from a sub assumption lattice, A sentence with repeated constants has a slightly
different lattice.
FORWARD CHAINING:-
A forward chaining algorithm start with the atomic sentence in the knowledge base and
applies modus ponens in the forward direction, adding new atomic sentences, until no further
influences can be made.
First order literals can include variables, in which case those variables are assumed to be
universally qualified.
DATALOG:-
The knowledge base contains no function symbols and is therefore an instance of the
class of ‘data log’ knowledge bases – that is, sets of first order definite clauses with no function
symbols.
SIMPLE FORWARD CHAINING ALGORITHM:-
Function FOL-FC-ASK (KB, α) returns a substitution or false
Inputs: KB, the knowledge base, a set of first – order definite clauses
Α, the query, an atomic sentence
Local variables: new, the new sentences inferred on each iteration
Repeat until new is empty
New{}
For each sentence r in KB do
(P1Ʌ….Ʌ Pn=>q)STANDARDIZE – APART(r)
For each θ such that SUBSET (θ, P1Ʌ….Ʌ Pn)= SUBSET(θ, P11Ʌ….Ʌ P n1)
1
EXAMPLE:-
Crime problem can be used to show how FOL-FC-ASK
words
The implication sentences are rule 3, 6,7
& 8. Two iterations are required,
On the first iteration, rule(3) has unsatisfied
premises.
Rule 6 is satisfied with {x/M1}, and sells{West, M1, none) is
added
Rule 7 is satisfied with {x/ M1} and weapon (M1) is
added. Rule 8 is satisfied with {x/none} and
Hostile(none)is added.
On the second iteration, rule 3 is satisfied with {x/West, y/ M1, z/none}, and
(revival
(west) is
added.
Figure:- proof tree generated by forward
chaining
R. Loganathan, AP/CSE. Mahalakshmi Engineering College, Trichy
www.Vidyarthiplus.com
www.Vidyarthiplus.com
FIXED POINT:-
Notice that no new inferences are possible at this point because every sentence that
could be concluded by forward chaining is already contained explicitly in the KB. Such a
knowledge base is called a fixed point of the inference process.
6 BACKWARD CHAINING:-
These algorithm work background from the goal, chaining through rules to find
known facts that support the proof.
BACKWARD CHAINING ALGORITHM:-
Function FOL-BC-ASK (KB, goals, θ) returns a set of
substitutions
Inputs: KB, a knowledge
base
Goals, a list of conjuncts forming a query (θ already
applied) Θ, the current substitution, initially the empty
substitution {} Local variables: answer, a set of
substitutions, initially empty If goals are empty then
return {θ}
QSUBSET (θ, FIRST
(goals))
For each sentence r is KB where STANDARDIZE-APART(r)=(P1Ʌ….Ʌ
Pn=>q)
And θ1UNIFY
(q,q1)succeeds
New goals
[P1……Pn/REST(goals)]
AnswerFOL-BC-ASK(KB,var-goals, COMPOSE(θ1,θ)) U
answers
Return answer
WORKING:-
FOL-BC-ASK algorithm is called with a list of goals containing a single element,
the original query and returns the set of all substitutions satisfying the query.
The algorithm takes the first goal in the list and finds every clause in the knowledge
R. Loganathan, AP/CSE. Mahalakshmi Engineering College, Trichy
www.Vidyarthiplus.com
www.Vidyarthiplus.com
DISADVANTAGES:-
(i) Since it is clearly a depth – first search algorithm, its space requirements are linear
in size.
(ii) It suffers from problem with repeated states and incompleteness.
LOGIC PROGRAMMING:-
Logic programming is a technology that comes fairly close to representing that
systems should be constructed by expressing knowledge is a formal language.
First the ‘inner loop’ of the algorithm involves finding all possible unifies such that
the premise of a rule unifier with a suitable set of facts in the knowledge base. This is often
called pattern matching and can be very expensive.
Second, the algorithm rechecks every rule on every iteration to see whether its
premises are satisfied, even if every few additions are made to the knowledge base on each
www.Vidyarthiplus.com
www.Vidyarthiplus.com
iteration. Finally, the algorithm might generate many facts that are irrelevant to the goal.
www.Vidyarthiplus.com
www.Vidyarthiplus.com
www.Vidyarthiplus.com
www.Vidyarthiplus.com
www.Vidyarthiplus.com
www.Vidyarthiplus.com
www.Vidyarthiplus.com
www.Vidyarthiplus.com
QUESTION BANK
PART -A (2 Marks)
PLANNING
1. Define partial order planner.
Basic Idea
– Search in plan space and use least commitment, when possible
• Plan Space Search
– Search space is set of partial plans
– Plan is tuple <A, O, B>
• A: Set of actions, of the form (ai : Opj)
• O: Set of orderings, of the form (ai < aj)
• B: Set of bindings, of the form (vi = C), (vi ¹ C), (vi = vj) or (vi ¹ vj)
– Initial plan:
• <{start, finish}, {start < finish}, {}>
• start has no preconditions; Its effects are the initial state
• finish has no effects; Its preconditions are the goals
2. What are the differences and similarities between problem solving and
planning?
we put these two ideas together to build planning agents. At the most abstract level, the task of planning is the
same as problem solving. Planning can be viewed as a type of problem solving in which the agent uses beliefs
about actions and their consequences to search for a solution over the more abstract space of plans, rather than
over the space of situations
3. Define state-space search.
The most straightforward approach is to use state-space search. Because the
descriptions of actions in a planning problem specify both preconditions and effects, it is
possible to search in either direction: either forward from the initial state or backward from the
goal
4. What are the types of state-space search?
The types of state-space search are,
Forward state space search;
Backward state space search.
5.What is Partial-Order Planning?
A set of actions that make up the steps of the plan. These are taken from the set of
actions in the planning problem. The “empty” plan contains just the Start and Finish actions.
www.Vidyarthiplus.com
www.Vidyarthiplus.com
Start has no preconditions and has as its effect all the literals in the initial state of the planning
problem. Finish has no effects and has as its preconditions the goal literals of the planning
problem.
6. What are the advantages and disadvantages of Partial-Order Planning?
Advantage: Partial-order planning has a clear advantage in being
able to decompose problems into sub problems.
Disadvantage: Disadvantage is that it does not represent states
directly, so it is harder to estimate how far a partial-order plan is
from achieving a goal.
7. What is a Planning graph?
A Planning graph consists of a sequence of levels that correspond to time steps in
the plan where level 0 is the initial state. Each level contains a set of literals and a set of actions.
8. What is Conditional planning?
Conditional planning is also known as contingency planning, conditional planning
deals with incomplete information by constructing a conditional plan that accounts for each
possible situation or contingency that could arise
9. What is action monitoring?
The process of checking the preconditions of each action as it is executed, rather
than checking the preconditions of the entire remaining plan. This is called action monitoring.
10. Define planning.
Planning can be viewed as a type of problem solving in which the agent uses
beliefs about actions and their consequences to search for a solution.
11. List the features of an ideal planner?
The features of an ideal planner are,
The planner should be able to represent the states, goals and
actions;
The planner should be able to add new actions at any time;
The planner should be able to use Divide and Conquer method for
solving very big problems.
12. What are the components that are needed for representing an action?
The components that are needed for representing an action are,
Action description;
Precondition;
Effect.
13. What are the components that are needed for representing a plan?
The components that are needed for representing a plan are,
A set of plans steps;
A set of ordering constraints;
A set of variable binding constraints;
www.Vidyarthiplus.com
www.Vidyarthiplus.com
www.Vidyarthiplus.com
www.Vidyarthiplus.com
PART - B
PROBLEMSOLVING TO PLANNING
Forward search
Backward search
Heuristic search
Solutions
Why Planning ?
Intelligent agents must operate in the world. They are not simply passive reasoners (Knowledge
Representation, reasoning under uncertainty) or problem solvers (Search), t hey must also acton
the world.
We want intelligent agents to act in “intelligent ways”. Taking purposeful actions, predicting the
expected effect of such actions, composing actions together to achieve complex goals.
E.g. if we have a robot we want robot to decide what to do; how to act to achieve our goals
www.Vidyarthiplus.com
www.Vidyarthiplus.com
Planning Problem
How to change the world to suit our needs
Critical issue: we need to reason about what the world will be like after doing a few actions, not
just what it is like now
GOAL: Craig has coffee
CURRENTLY: robot in mailroom, has no coffee, coffee not made, Craig in office etc.
TO DO: goto lounge, make coffe
PARTIAL ORDER PLANNING Partial-Order Planning Algorithms
Partially Ordered Plan
• Plan
• Steps
• Ordering constraints
• Variable binding constraints
• Causal links
• POP Algorithm
• Make initial plan
• Loop until plan is a complete
– Select a subgoal
– Choose an operator
– Resolve threats
Choose Operator
• Choose operator(c, Sneeds)
www.Vidyarthiplus.com
www.Vidyarthiplus.com
www.Vidyarthiplus.com
www.Vidyarthiplus.com
• Another strategy is to ignore such threats until the very end, hoping that the variables will
become bound and make things easier to deal.
2. Discuss about planning graphs in detail.
Planning graphs for heuristic estimation;
The GRAPHPLAN algorithm;
Termination of GRAPHPLAN.
www.Vidyarthiplus.com
R. Loganathan, AP/CSE., Mahalakshmi Engineering College, Trichy
www.Vidyarthiplus.com
www.Vidyarthiplus.com
R. Loganathan, AP/CSE., Mahalakshmi Engineering College, Trichy
www.Vidyarthiplus.com
Ʉx,y x=y=>(p2(x)p2(y))
Ʉwx,y w=y^x=z=>(f1(w,x)f1(y,z))
Ʉwx,y w=y^x=z=>(f2(w,x)f2(y,z))
Demodulation:-
For any terms x,y and z, where
UNIFY(x,z)=Q And mn[z] is a literal
containing ‘Z’
X=y,
m,v….vmn
[z]
-------------------------------------------
---
M,v…….vmn[SUBST(Q,
Y)]
Paramodulation:-
For any terms x,y and z, where
UNIFY(x,y)=Q,
L,v……vlm` vx=y, m,v……….vmn[z]
-------------------------------------------------------------------
Subst(q,L,V….VLK VM,V….VMN[Y])
Equational
Unification:-
Equation unification of this kind can be done with efficient algorithm designed for
the particular axioms need.
Resolution
Strategies:-
Unit
preference
Set of
support
Input
resolution
Substations
www.Vidyarthiplus.com
R. Loganathan, AP/CSE., Mahalakshmi Engineering College, Trichy
www.Vidyarthiplus.com
THEOREM PROVERS:-
We describe the theorem prover OTTER (Organized Techniques For thermo –
proving and Effective Research), the we must divided the knowledge into form
parts,
A set of clauses known as the set of support
A set of unable axioms
A set of equation known as rewrite or demodulations.
A set of parameters and clauses that define the control strategy.
SKETCH OF THE OTTER THEOREM POWER:-
Procedure OTTER (sos, unable)
Input: sos, a set of support –clauses defining the problem
Usable, background knowledge potentially relevant to the problem
repeat Clauses the lightest member of sos move clauses from sos to
unable PROCESS (INFER (clauses, unable),sos)
Until sos=[]or a salutation has been found
Function INFER (clauses, unable)reruns clauses resolve clauses with each membrane
of unable return the resulting clauses after applying FILTER
www.Vidyarthiplus.com
R. Loganathan, AP/CSE., Mahalakshmi Engineering College, Trichy
www.Vidyarthiplus.com
socratic reasoned
In backward chaining ,we start from a conclusion,which is the hypothesis we wish to prove,and we
aim to show how that conclusion can be reached from the rules and facts in the data base.
The conclusion we are aiming to prove is called a goal ,and the reasoning in this way is known as
goal-driven.
Fig : Proof tree constructed by backward chaining to prove that West is criminal.
Note:
(a) To prove Criminal(West) ,we have to prove four conjuncts below it.
(b) Some of which are in knowledge base,and others require further backward chaining.
(2) Explain conjunctive normal form for first-order logic with an example.
www.Vidyarthiplus.com
R. Loganathan, AP/CSE., Mahalakshmi Engineering College, Trichy
www.Vidyarthiplus.com
very sentence of first-order logic can be converted into an inferentially equivalent CNF
sentence. In particular, the CNF sentence will be unsatisfiable just when the original sentence
is unsatisfiable, so we have a basis for doing proofs by contradiction on the CNF sentences.
Here we have to eliminate existential quantifiers. We will illustrate the procedure by translating the
sentence "Everyone who loves all animals is loved by someone," or
Ontology refers to organizing every thing in the world into hierarch of categories.
www.Vidyarthiplus.com
R. Loganathan, AP/CSE., Mahalakshmi Engineering College, Trichy
www.Vidyarthiplus.com
interaction with the world takes place at the level of individual objects, much reasoning
takes place at the level of categories.
What is taxonomy?
Subclass relations organize categories into a taxonomy, or taxonomic hierarchy. Taxonomies
have been used explicitly for centuries in technical fields. For example, systematic
biology aims to provide a taxonomy of all living and extinct species; library science has
developed a taxonomy of all fields of knowledge, encoded as the Dewey Decimal system;
and
tax authorities and other government departments have developed extensive taxoriornies of
occupations and commercial products. Taxonomies are also an important aspect of general
commonsense knowledge.
First-order logic makes it easy to state facts about categories, either by relating objects
to categories or by quantifying over their members:
www.Vidyarthiplus.com
R. Loganathan, AP/CSE., Mahalakshmi Engineering College, Trichy
www.Vidyarthiplus.com
www.Vidyarthiplus.com
R. Loganathan, AP/CSE., Mahalakshmi Engineering College, Trichy
www.Vidyarthiplus.com
example, lHoldzng(G1, So) says that the agent is not holding the gold GI in the initial
situation So. Age( Wumpus, So) refers to the wumpus's age in So.
Atemporal or eternal predicates and functions are also allowed. Examples include the
predicate Gold (GI) and the function LeftLeg Of ( Wumpus).
www.Vidyarthiplus.com
R. Loganathan, AP/CSE., Mahalakshmi Engineering College, Trichy
www.Vidyarthiplus.com
www.Vidyarthiplus.com
R. Loganathan, AP/CSE., Mahalakshmi Engineering College, Trichy
www.Vidyarthiplus.com
QUESTION BANK
PART -A (2 Marks)
3. Define Uncertainty.
Uncertainty means that many of the simplifications that are possible with deductive
inference are no longer valid.
4. State the reason why first order, logic fails to cope with that the mind like medical
diagnosis.
Three reasons:
Laziness: It is hard to lift complete set of antecedents of consequence,
needed to ensure and exception less rule.
Theoretical Ignorance: Medical science has no complete theory for the
domain.
Practical ignorance: Even if we know all the rules, we may be uncertain
about a particular item needed.
www.Vidyarthiplus.com
6. What is the need for utility theory in uncertainty?
Utility theory says that every state has a degree of usefulness, or utility to In agent,
www.Vidyarthiplus.com
and that the agent will prefer states with higher utility. The use utility theory to represent
and
reason with preferences.
7. What Is Called As Decision Theory?
Preferences As Expressed by Utilities Are Combined with Probabilities in the
General Theory of Rational Decisions Called Decision Theory. Decision Theory =
Probability
Theory + Utility Theory.
8. Define conditional probability?
Once the agents has obtained some evidence concerning the previously unknown
propositions making up the domain conditional or posterior probabilities with the notation
p(A/B) is used. This is important that p(A/B) can only be used when all be is known.
15. List the three basic classes of algorithms for evaluating multiply connected graphs.
www.Vidyarthiplus.com
The three basic classes of algorithms for evaluating multiply connected graphs
www.Vidyarthiplus.com
Clustering methods;
Conditioning methods;
Stochastic simulation methods.
16. What is called as principle of Maximum Expected Utility (MEU)?
The basic idea is that an agent is rational if and only if it chooses the action that
yields the highest expected utility, averaged over all the possible outcomes of the action.
This is
known as MEU
PART - B
www.Vidyarthiplus.com
1. Explain about Probabilistic Reasoning.
www.Vidyarthiplus.com
At the end of this lesson the student should be able to do the following:
• Represent a problem in terms of probabilistic statemenst
• Apply Bayes rule and product rule for inferencing
• Represent a problem using Bayes net
• Perform probabilistic inferencing using Bayes net.
Probabilistic Reasoning
Using logic to represent and reason we can represent knowledge about the world with
facts and rules, like the following ones:
bird(tweety).
fly(X) :- bird(X).
We can also use a theorem-prover to reason about the world and deduct new facts about
the world, for e.g.,
?- fly(tweety).
Yes
However, this often does not work outside of toy domains - non-tautologous certain
rules are hard to find.
Unfortunately cannot really adapt logical inference to probabilistic inference, since the
latter is not context-free.
Replace
smoking -> lung cancer
or
lotsofconditions, smoking -> lung cancer
with
P(lung cancer | smoking) = 0.6
A probabilistic model describes the world in terms of a set S of possible states - the
sample space. We don’t know the true state of the world, so we (somehow) come up with
a probability distribution over S which gives the probability of any state being the true
one. The world usually described by a set of variables or attributes.
Consider the probabilistic model of a fictitious medical expert system. The ‘world’ is
described by 8 binary valued variables:
Visit to Asia? A
Tuberculosis? T
Either tub. or lung cancer? E
Lung cancer? L
Smoking? S
Bronchitis? B
Dyspnoea? D
Positive X-ray? X
www.Vidyarthiplus.com
The primitives in probabilistic reasoning are random variables. Just like primitives in
Propositional Logic are propositions. A random variable is not in fact a variable, but a
www.Vidyarthiplus.com
function from a sample space S to another space, often the real numbers.
For example, let the random variable Sum (representing outcome of two die throws) be
defined thus:
Sum(die1, die2) = die1 +die2
Consdier the probabilistic model of the fictitious medical expert system mentioned
before. The sample space is described by 8 binary valued variables.
Visit to Asia? A
Tuberculosis? T
Either tub. or lung cancer? E
Lung cancer? L
Smoking? S
Bronchitis? B
Dyspnoea? D
Positive X-ray? X
There are 28 = 256 events in the sample space. Each event is determined by a joint
instantiation of all of the variables.
Each of the random variables {A,T,E,L,S,B,D,X} has its own distribution, determined by
the underlying joint distribution. This is known as the margin distribution. For example,
the distribution for L is denoted P(L), and this distribution is defined by the two
probabilities P(L = f) and P(L = t). For example,
P(L = f)
= P(A = f, T = f,E = f,L = f, S = f,B = f,D = f,X = f)
+ P(A = f, T = f,E = f,L = f, S = f,B = f,D = f,X = t)
www.Vidyarthiplus.com
+ P(A = f, T = f,E = f,L = f, S = f,B = f,D = t,X = f)
...
www.Vidyarthiplus.com
We get the marginal distribution over B by simply adding up the different possible values
of A for any value of B (and put the result in the “margin”).
In general, given a joint distribution over a set of variables, we can get the marginal
distribution over a subset by simply summing out those variables not in the subset.
In the medical expert system case, we can get the marginal distribution over, say, A,D by
simply summing out the other variables:
This has 64 summands! Each of whose value needs to be estimated from empirical data.
For the estimates to be of good quality, each of the instances that appear in the summands
should appear sufficiently large number of times in the empirical data. Often such a large
amount of data is not available.
However, computation can be simplified for certain special but common conditions. This
www.Vidyarthiplus.com
is the condition of independence of variables.
www.Vidyarthiplus.com
P(A,B) = P(A)P(B)
This is quite a strong statement: It means for any value x of A and any value y of B
Note that the independence of two random variables is a property of a the underlying
Two rules in probability theory are important for inferencing, namely, the product rule
and the Bayes' rule.
Suppose you have been tested positive for a disease; what is the probability that you
actually have the disease?
It depends on the accuracy and sensitivity of the test, and on the background (prior)
probability of the disease.
Let P(Test=+ve | Disease=false) = 0.05, so the false positive rate is also 5%.
Suppose the disease is rare: P(Disease=true) = 0.01 (1%).
Then,
P(T=+ve|D=true) * P(D=true)
P(D=true|T=+ve) = ------------------------------------------------------------
P(T=+ve|D=true) * P(D=true)+ P(T=+ve|D=false) * P(D=false)
www.Vidyarthiplus.com
www.Vidyarthiplus.com
0.95 * 0.01
= -------------------------------- = 0.161
0.95*0.01 + 0.05*0.99
So the probability of having the disease given that you tested positive is just 16%. This
seems too low, but here is an intuitive argument to support it. Of 100 people, we expect
only 1 to have the disease, but we expect about 5% of those (5 people) to test positive. So
of the 6 people who test positive, we only expect 1 of them to actually have the disease;
and indeed 1/6 is approximately 0.16.
In other words, the reason the number is so small is that you believed that this is a rare
disease; the test has made it 16 times more likely you have the disease, but it is still
unlikely in absolute terms. If you want to be "objective", you can set the prior to uniform
(i.e. effectively ignore the prior), and then get
P(T=+ve|D=true) * P(D=true)
P(D=true|T=+ve) = ------------------------------------------------------------
P(T=+ve)
This, of course, is just the true positive rate of the test. However, this conclusion relies on
your belief that, if you did not conduct the test, half the people in the world have the
disease, which does not seem reasonable.
A better approach is to use a plausible prior (eg P(D=true)=0.01), but then conduct
multiple independent tests; if they all show up positive, then the posterior will increase.
For example, if we conduct two (conditionally independent) tests T1, T2 with the same
reliability, and they are both positive, we get
www.Vidyarthiplus.com
www.Vidyarthiplus.com
In full: X and Y are conditionally independent given Z iff for any instantiation x, y, z of
X, Y,Z we have
Given
1. A Bayesian network BN
3. A query variable Q
We will demonstrate below the inferencing procedure for BNs. As an example consider
the following linear BN without any apriori evidence.
Consider computing all the marginals (with no evidence). P(A) is given, and
Now,
P(B) (the marginal distribution over B) was not given originally. . . but we just computed
it in the last step, so we’re OK (assuming we remembered to store P(B) somewhere).
If C were not independent of A given B, we would have a CPT for P(C|A,B) not
P(C|B).Note that we had to wait for P(B) before P(C) was calculable.
If each node has k values, and the chain has n nodes this algorithm has complexity
www.Vidyarthiplus.com
O(nk2). Summing over the joint has complexity O(kn).
www.Vidyarthiplus.com
Dynamic programming may also be used for the problem of exact inferencing in the
above Bayes Net. The steps are as follows:
1. We first compute
2. f1(B) is a function representable by a table of numbers, one for each possible value of
B.
3. Here,
This method of solving a problem (ie finding P(D)) by solving subproblems and storing
the results is characteristic of dynamic programming.
The above methodology may be generalized. We eliminated variables starting from the
root, but we dont have to. We might have also done the following computation.
www.Vidyarthiplus.com
www.Vidyarthiplus.com
The following points are to be noted about the above algorithm. The algorithm computes
intermediate results which are not individual probabilities, but entire tables such as
f1(C,E). It so happens that f1(C,E) = P(E|C) but we will see examples where the
intermediate tables do not represent probability distributions.
It was noticed from the above computation that conditional distributions are basically just
normalised marginal distributions. Hence, the algorithms we study are only concerned
with computing marginals. Getting the actual conditional probability values is a trivial
“tidying-up” last step.
It can be done by plugging in the observed values for A and E and summing out B and D.
www.Vidyarthiplus.com
www.Vidyarthiplus.com
We don’t really care about P(A = t), since it will cancel out.
Now let us see how evidence-induce independence can be exploited. Consider the
following computation.
Since,
Clever variable elimination would jump straight to (5). Choosing an optimal order of
variable elimination leads to a large amount of computational sving. However, finding
the optimal order is a hard problem.
www.Vidyarthiplus.com
www.Vidyarthiplus.com
Notice that, as we perform the innermost sums, we create new terms, which need to be
summed over in turn e.g.,
where,
where,
1. Pick a variable Xi
For the multiplication, we must compute a number for each joint instantiation of all
variables in f, so complexity is exponential in the largest number of variables
participating in one of these multiplicative subexpressions.
If we wish to compute several marginals at the same time, we can use Dynamic
Programming to avoid the redundant computation that would be involved if we used
variable elimination repeatedly.
Exact inferencing in a general Bayes net is a hard problem. However, for networks with
some special topologies efficient solutions inferencing techniques. We discuss one such
technque for a class of networks called Poly-trees.
Inferencing in Poly-Trees
A poly-tree is a graph where there is at most one undirected path between any two pair of
nodes. The inferencing problem in poly-trees may be stated as follows.
X: Query variable
E+X is the set of causal support for X comprising of the variables above X connected
through its parents, which are known.
E-X is the set of evidential support for X comprising of variables below X connected
through its children.
P(X|E) = P(X|EX+,EX-)
P(EX-|X,EX+ )P(X|EX+ )
= -------------------------------
P(EX-|EX+ )
P(X|E) = α P(EX-|X)P(X|EX+ )
www.Vidyarthiplus.com
www.Vidyarthiplus.com
Chain (MCMC), and includes as special cases Gibbs sampling and the Metropolis- Hasting
algorithm.
Bounded cutset conditioning. By instantiating subsets of the variables, we can break loops
in the graph. Unfortunately, when the cutset is large, this is very slow. By instantiating only a
subset of values of the cutset, we can compute lower bounds on the probabilities of interest.
Alternatively, we can sample the cutsets jointly, a technique known as block Gibbs sampling.
QUESTION BANK
PART -A (2 Marks)
LEARNING
1. Explain the concept of learning from example.
Each person will interpret a piece of information according to their level of understanding and their
own way of interpreting things.
general rules from single examples by explaining the examples and generalizing the explanation.
9. Define Inductive learning. How the performance of inductive learning algorithms can be
measured?
Learning a function from examples of its inputs and outputs is called inductive
learning.
It is measured by their learning curve, which shows the prediction accuracy as a
function of the number of observed examples.
15. State the factors that play a role in the design of a learning system.
The factors that play a role in the design of a learning system are,
Learning element
Performance element
Critic
Problem generator
16. What is memorization?
Memorization is used to speed up programs by saving the results of computation.
The basic idea is to accumulate a database of input/output pairs; when the function is called, it
first checks the database to see if it can avoid solving the problem from scratch.
PART - B
Each of the components can be learned from appropriate feedback. Consider, for
example, an agent training to become a taxi driver. Every time the instructor shouts “Break!” the
agent can learn a condition-action rule for when to breaks (component 1).
By seeing many camera image that it is told contain human, it can learn to recognize
them.
By trying actions and observing the result-for example, breaking hard on a wet road-it
can learn the effects of the action.
Then, when it receives no tip from passengers who have been thoroughly shacks up
during the trip, it can learn a useful component of it’s overall utility function.
The type of feedback available for learning is usually the most important factor is
determining the nature of the learning problem that the agent faces.
Inductive learning:
An algorithm for deterministic supervised learning is given as input) the current value of
the unknown function for particular inputs and (must try to become the unknown function
or some-thing close to it
We say that an example is a pair (x,f(x)), where x is the input and f(x) is the output of the
R. Loganathan, AP/CSE. Mahalakshmi Engineering College, Trichy
www.Vidyarthiplus.com
www.Vidyarthiplus.com
function applied to x.
The task of pure inductive inference (or induction) is this,
Given a collection of examples of ‘f’, return a function ‘h’ that approximates ‘f’.
The function ‘h’ is called a hypothesis. The reason that learning is difficult, from a
conceptual point of view, is that it is not easy to tell whether any particular ‘h’ is a good
approximation of ‘f’ (A good hypothesis will generalize well-that is, will predict example
correctly. This is the fundamental problem induction)
Example:-
The problem of whether to wait for a table at a restaurant. The aim here is to learn a
definition for the goal predicate ‘will wait’.
Patrons
No Yes No
Bar? Yes Raining?
No Yes No Yes
No
Decision tree
We will see how to automate this task, for now, let’s suppose are decide on the following list of
attributes:
1. Alternate: Whether there is a suitable alternative restaurant nearly.
2. Bar: Whether the restaurant has a comfortable bar area to wait is.
3. Fri/Sat: True on Fridays & Saturdays.
www.Vidyarthiplus.com
www.Vidyarthiplus.com
If the original hypothesis space allows for a simple and efficient learning algorithm, then
the ensemble method provides a way to learn a much more expressive class of hypothesis
without much additional computational or algorithmic complexity.
Boosting:-
Boosting is the most widely used ensemble method. To understand how it work. It is
necessary to understand the concept of a weighted training set.
Weighted training set:-
In a weighted training set, each example has an associated weight wj≥0. The higher the
weight of an example, the higher is the importance attached to it during the learning of a
hypothesis.
Working:-
Boosting starts with wj=1 for all the examples.
From this set, it generates the first hypothesis h1. This hypothesis will classify some of
the training examples correctly and some incorrectly.
The hypothesis can be made better by increasing the weight of misclassified Examples
while decreasing the weight of the correctly classified examples.
From this new weighted training set, hypothesis h2 is generated.
The process continuous until ‘M’ hypothesis has been generated, where M is an input to
the boosting algorithm.
The final ensemble hypothesis is a weighted-majority combination of all the AI
hypothesis, each weight according to how well it performed on the training set.
ADABOOST algorithm:-
ADABOOST algorithm is one among the variants.
ADABOOST is one of the most commonly used boosting algorithms.
It has a important property is which if the input learning algorithm ‘L’ is a weak learning
algorithm, which means that L always returns a hypothesis with weighted ever on the
training set.
www.Vidyarthiplus.com
www.Vidyarthiplus.com
www.Vidyarthiplus.com
www.Vidyarthiplus.com
Alt Bar Fri Hun Pat Price Rain Re Type Est Willwait
s
X1 Y N N Y Some $$$ N Y French 0-10 Y
X2
X3 Y N N Y Full $ N N Thai 30- N
X4 60
X5 N Y N N Some $ N N Buya Y
X6 0-10
X7 Y N Y Y Full $ Y N Thai Y
X8 10-
X9 Y N Y N Full $$$ N Y French 30 N
X10
X11 N Y N Y Some $$ Y Y Italian >60 Y
X12
N Y N N None $ Y N Buya 0-10 N
www.Vidyarthiplus.com
www.Vidyarthiplus.com
Y Y Y Y Full $ N N Buya Y
0-10
30-
60
Figure 4.2 Inducing decision tree from example
The positive example are the once in which the goal ‘Will Wait’ is true (x1,x3,…); the
negative examples are the ones is which it is false (x2,x5,…). The complete set of example is
called training set.
Trivial tree:-
The problem of finding a decision tree that agree with tracing set might seem difficult,
but is fact there is a trivial solution that is constructing trivial tree.
Construct a decision tree that has one path to a leaf for each example, where the path tests
each attributes is turn and follows the classification of the example.
Problem with trivial tree:-
It just memorizes the observations.
It does not extract any pattern from the example, so it is not expected to be able to
extrapolate to example it has not seen.
The above Figure 4.2 shows how the algorithm gets started 12 training example are
given, which are classified into positive and negative sets. Then which attribute to be on the first
test is the tree is decided, because it leaves in with four possible outcomes, each of which has the
same number of +-ve & -ve examples.
On the other hand, the following Figure 4.2. Shown that patrons is a fairly important attribute,
There are four cases to consider for the recursive problem:-
1) If there is some positive & negative example, then choose the best attribute to split them.
Figure 4.2 (h) shows ‘Hungry’ being used to split the remaining examples.
2) If all the remaining example are positive, then it is done: can answer Yes or No, Figure(a)
shows examples of this is the Name & Some cases.
3) If there are no examples left, it means that no such example has been observed, and a
default values calculated from the majority classification at the nodes parent is returned.
R. Loganathan, AP/CSE. Mahalakshmi Engineering College, Trichy
www.Vidyarthiplus.com
www.Vidyarthiplus.com
4) If there are no attributes left, but both positive and negative example, then there examples
have exactly the same description, but different classification.
The decision-tree-learning algorithm:-
Function DECISION-TREE-LEARNING(example, at ribs, default)returns a decision tree
Input: example, attains, default
If example is empty then return default Else if all examples have the same classifications then
return the classification else if attributes is empty then return MAJORITY-VALUE (example)
Else
BestCHOOSSE-ATTRIBUTE (attibs, examples)
Tree a new decision tree with cost test best
MMAJORITY-VALUE(EXAMPLES)
For each values of Vi best do
Example {element of example with best=Vi}
Subtree DECISION-TREE-LEARNING(example, attribs-best, m)
Add a branch to tree with label u, and sub tree
Return tree
The final tree produced by the algorithm applied to the 12-example data set is shown in Figure
4.2.
Choosing attribute test:-
To choose attribute test, the idea is to pick the attribute that goes as for as possible toward
providing an exact classification of the examples. A perfect attribute divides the examples into
sets that are all positive or all negative.
Measure to find ‘finely good’ and ‘really useless’ attributes:-
In general, if possible ensure Vi have probabilities P(Vi), then the information content I of
the actual answer is given by ,
I(p(vi),…p(vn))=∑ −p(vi)log2p(vi)
To check this equ, for the tossing of a fair coin, the following can be used,
I[1/2,1/2]=-1/2log1/2-1/2log21/2=1bit
www.Vidyarthiplus.com
www.Vidyarthiplus.com
Gain(A)=I[p/p+n,n/p+n]-Remainder(A)
Gain(patterns)=1-[(2/12)I(0,1)+(4/12)I(0,1)+(6/12)I(2/6,4/6)]
~0.54bits
Gain(Type)=1-[(2/12)I(1/2,1/2)+(2/12)I(1/2,1/2)+(4/12)I(2/4,2/4)+(4/12)I(2/4,2/4)]
=0
Assume the performance of the learning algorithm:-
A learning algorithm is good if it produces hypothesis that do a good job of predicting the
classification of example prediction quality can be estimated in advance on it can be estimated
after the fact.
1) Collect a large set of example.
2) Divide it into two disjoint sets, the training set and the test set.
3) Apply the learning algorithm to training set, generating a hypothesis ‘h’.
4) Measure the percentage of example in the test set that is correctly classified by ‘h’.
5) Repeat step 1 to 4 for different size of training sets and different randomly selected
training sets of each size.
Noise and over fitting:-
The problem of finding meaningless “regularity” in the data, whenever there is a large set
of possible hypothesis is called over fitting.
Solution:-
Decision tree pruning
Cross validation
www.Vidyarthiplus.com
www.Vidyarthiplus.com
Restaurant learning problem i.e., learning a rule for deciding whether to wait for a table.
The example object is described by logical sentence, the attribute is way predicates.
www.Vidyarthiplus.com
www.Vidyarthiplus.com
Example alternate Bar Fri Hungry Patern Price Rain Rearch Est Goal
will
wait
X1 Y N N Y Some $$ No Yes 0-10 yes
www.Vidyarthiplus.com
www.Vidyarthiplus.com
Generalization:-
Hypothesis says it should be negative but it is true. Hence this entering has to be included
in the hypothesis. This is called Generalization.
Specialization:-
Hypothesis says that the new example is positive, but it is actually negative. This
criterion must be decreased to exclude this example. This is called Specialization. Current-best
learning algorithm is used in many machine-learning alg.
www.Vidyarthiplus.com
www.Vidyarthiplus.com
Background ≠ hypothesis
Extracting rules from example:-
If is a method for extracting general rules from individual observations.
Idea is to construct an explanation of the observation using prior knowledge.
Consider (X2,X)=2X
Suppose to simply 1X(0+X). the knowledge here include the following rules,
Rewrite (u,v)˄Simplify (v,w)=>Simplify(u,w)
Primitive(u)=>Simplify(u,u)
Arithmetic unknoun=>primitive(u)
Number(u)primitive(u)
Rewrite(1*u,u)
Rewrite(0+u,u)
EBL process working:-
1. Construct a proof that the goal predicted applies to the example using the available
background knowledge.
2. In parallel, construct a generalized proof tree for the variabilized goal using the same
inference steps as in the original proof.
3. Construct a new rule where left hand side consists of leaves of the proof tree and RHS is
the variabilized goal.
4. Drop any conditions that are tree regardless the values of the variable is the goal.
Improving efficiency:-
1. Reduce the large number of rules. It increases the branching factor in the search space.
2. Derived rules must offer significant increase in speed.
3. Derived rule is a general as possible, so that they apply to the largest possible set of
cases.
www.Vidyarthiplus.com
www.Vidyarthiplus.com
www.Vidyarthiplus.com
www.Vidyarthiplus.com
www.Vidyarthiplus.com
www.Vidyarthiplus.com
Hypothesis:-
The hypothesis is probabilistic theories of how the domain works including
logical theories as a special case.
Example;
Candy comes as two flavors: cherry & lime
Bayesian learning:
It calculates the probability of each hypothesis, given the data & makes predication
by use all the hypothesis.
Bayesion view of learning is extremely powerful, providing general solution the
problem of noise, over fitting & optional prediction.
Ex:-
h1:100%
cherry
h2:75% cherry +25%
lime h3:50% cherry +
50% lime h4:25%
cherry +75% lime
h5:100% lime
Learning with complete data:-
Parameter learning task involves finding the numerical parameter for the
probability model.
The structure of the model is
fixed.
Data are complete when each data point contains values for every variable in
the probability model.
Complete data simplify the problem of learning the parameter of complex model.
REINFORCEMENT LEARNING:-
If involves finding a balance b/w exploration of new knowledge and exploitation
of current knowledge.
Ex:- chess by supervised learning.
www.Vidyarthiplus.com
R. Loganathan, AP/CSE. Mahalakshmi Engineering College, Trichy
www.Vidyarthiplus.com
3. Reflex agent:-
It learns a policy that maps directly from states to
action. Basic reinforcement learning model:
Types of reinforcement
learning:
www.Vidyarthiplus.com
R. Loganathan, AP/CSE. Mahalakshmi Engineering College, Trichy