AI 2marks Questions

www.Vidyarthiplus.
com
QUESTION BANK
DEPARTMENT: CSE SEMESTER: VI
SUBJECT CODE / Name: CS2351 – ARTIFICIAL INTELLIGENCE

UNIT – I
PART -A (2 Marks)
PROBLEM SOLVING
1. What is artificial intelligence?
The exciting new effort to make computers think machines with minds in the
full and literal sense. Artificial intelligence systemizes and automates intellectual tasks and is
therefore potentially relevant to any sphere of human intellectual activities.
2. List down the characteristics of intelligent agent.
 Intelligent Agents are autonomous because they function without requiring that the Console
or Management Server be running.
 An Agent that services a database can run when the database is down, allowing the Agent to
start up or shut down the database.
 The Intelligent Agents can independently perform administrative job tasks at any time,
without active participation by the administrator.
 Similarly, the Agents can autonomously detect and react to events, allowing them to monitor
the system and execute a fixit job to correct problems without the intervention of the
administrator.
3. What do you mean by local maxima with respect to search technique?
The golden section search is a technique for finding the extremum (minimum or maximum) of a
strictly unimodal function by successively narrowing the range of values inside which the extremum
is known to exist. The technique derives its name from the fact that the algorithm maintains the
function values for triples of points whose distances form a golden ratio. The algorithm is the limit
of Fibonacci search (also described below) for a large number of function evaluations.
4. Define Turing test.

The Turing test proposed by Alan Turing was designed to provide a satisfactory operational
definition of intelligence. Turing defined intelligent behavior as the ability to achieve human-level
performance in all cognitive tasks, sufficient to fool an interrogator.
5. List the capabilities that a computer should possess for conducting a Turing Test?
The capabilities that a computer should possess for conducting a Turing Test are,
Natural Language Processing;
Knowledge Representation;
Automated Reasoning;
www.Vidyarthiplus.com
R. Loganathan, AP/CSE. Mahalakshmi Engineering College, Trichy
6. Define an agent.
An agent is anything that can be viewed as perceiving its environment through Sensors and acting
upon the environment through effectors.
7. Define rational agent.

A rational agent is one that does the right thing. Here right thing is one that will
cause agent to be more successful. That leaves us with the problem of deciding how and when to
evaluate the agent’s success.
8. Define an Omniscient agent.

An omniscient agent knows the actual outcome of its action and can act accordingly; but
omniscience is impossible in reality.
9. What are the factors that a rational agent should depend on at any given time?
The factors that a rational agent should depend on at any given time are,
The performance measure that defines criterion of success;
Agent’s prior knowledge of the environment;
Action that the agent can perform;
The agent’s percept sequence to date.
10. List the measures to determine agent’s behavior.

The measures to determine agent’s behavior are,
Performance measure,
Rationality,
Omniscience, Learning and Autonomy.
11. List the various types of agent programs.

The various types of agent programs are,
Simple reflex agent program;
Agent that keep track of the world;
Goal based agent program;
Utility based agent program.
12. List the components of a learning agent?

The components of a learning agent are,
Learning element;
Performance element;
Critic;
Problem generator.
13. List out some of the applications of Artificial Intelligence.
Some of the applications of Artificial Intelligence are,
Autonomous planning and scheduling;
Game playing;
Autonomous control;
Diagnosis;
Logistics planning;
Robotics.
14. What is depth-limited search?

Depth-limited avoids the pitfalls of DFS by imposing a cut off of the maximum
depth of a path. This cutoff can be implemented by special depth limited search algorithm or by
using the general search algorithm with operators that keep track of the depth.
15. Define breadth-first search.

The breadth-first search strategy is a simple strategy in which the root-node is
expanded first, and then all the successors of the root node are expanded, then their successors
and so on. It is implemented using TREE-SEARCH with an empty fringe that is a FIFO queue,
assuring that the nodes that are visited first will be expanded first.
16. Define problem formulation.

Problem formulation is the process of deciding what actions and states to consider
for a goal that has been developed in the first step of problem solving.
17. List the four components of a problem?

The four components of a problem are,
An initial state;
Actions;
Goal test;
Path cost.
18. Define iterative deepening search.

Iterative deepening is a strategy that sidesteps the issue of choosing the best depth
limit by trying all possible depth limits: first depth 0, then depth 1, then depth 2& so on.
19. Mention the criteria’s for the evaluation of search strategy.

The criteria’s for the evaluation of search strategy are,
Completeness;
Time complexity;
Space complexity;
Optimality.
20. Define the term percept.

The term percept refers to the agents perceptual inputs at any given instant. An
agent’s percept sequence is the complete history of everything that the agent has perceived
.
21. Define Constraint Satisfaction Problem (CSP).

A constraint satisfaction problem is a special kind of problem satisfies some
additional structural properties beyond the basic requirements for problem in general. In a CSP,
the states are defined by the values of a set of variables and the goal test specifies a set of
constraint that the value must obey.
22. List some of the uninformed search techniques.

Some of the uninformed search techniques are,
Breadth-First Search(BFS);
Depth-First Search(DFS);
Uniform Cost Search;
Depth Limited Search;
Iterative Deepening Search;
Bidirectional Search.
PART- B
1. Explain Agents in detail.

An agent is anything that can be viewed as perceiving its environment through
sensors and SENSOR acting upon that environment through actuators.
Percept
We use the term percept to refer to the agent's perceptual inputs at any given instant.
Percept Sequence
An agent’s percept sequence is the complete history of everything the agent has ever
perceived.
Agent function
Mathematically speaking, we say that an agent's behavior is described by the agent
function
Properties of task environments
Fully observable vs. partially observable
Deterministic vs. stochastic
Episodic vs. sequential
Static vs. dynamic
Discrete vs. continuous
Single agent vs. multiagent
 Fully observable vs. partially observable:
If an agent's sensors give it access to the complete state of the environment at each point
in time, then we say that the task environment is fully observable.
A task environment is effectively fully observable if the sensors detect all aspects that are
relevant to the choice of action.
An environment might be partially observable because of noisy and inaccurate sensors or
because parts of the state are simply missing from the sensor data.
Deterministic vs. stochastic:
If the next state of the environment is completely determined by the current state and the
action executed by the agent, then we say the environment is deterministic; other-wise, it is
stochastic.
Episodic vs. sequential:
In an episodic task environment, the agent's experience is divided into atomic episodes.
Each episode consists of the agent perceiving and then performing a single action.
Crucially, the next episode does not depend on the actions taken in previous episodes.
For example, an agent that has to spot defective parts on an assembly line bases each
decision on the current part, regardless of previous decisions.
In sequential environments, on the other hand, the current decision could affect all
future decisions. Chess and taxi driving are sequential.
Discrete vs. continuous:

The discrete/continuous distinction can be applied to the state of the environment, to the
way time is handled, and to the percepts and actions of the agent.
For example, a discrete-state environment such as a chess game has a finite number of
distinct states.
Chess also has a discrete set of percepts and actions.
Taxi driving is a continuous- state and continuous-time problem: the speed and location
of the taxi and of the other vehicles sweep through a range of continuous values and do
so smoothly over time.
Taxi-driving actions are also continuous (steering angles, etc.).
Single agent vs. multiagent:
An agent solving a crossword puzzle by itself is clearly in a single-agent environment,
whereas an agent playing chess is in a two-agent environment. As one might expect, the hardest
case is partially observable, stochastic, sequential, dynamic, continuous, and multi agent.
Examples of task environments and their characteristics.
2. Explain uninformed search strategies.

Uninformed Search Strategies have no additional information about states beyond
that provided in the problem that knows whether one non goal state is “more promising” than
another are called informed search or heuristic search strategies.
There are five uninformed search strategies as given below.
Breadth-first search;
Uniform-cost search;
Depth-first search;
Depth-limited search;
Iterative deepening search.
BREADTH-FIRST SEARCH
Breadth-first search is a simple strategy in which the root node is expanded

first, then all successors of the root node are expanded next, then their
successors, and so on.
In general, all the nodes are expanded at a given depth in the search tree
before any nodes at the next level are expanded.
Breath-first-search is implemented by calling TREE-SEARCH with an
empty fringe that is a first-in-first-out (FIFO) queue, assuring that the nodes
that are visited first will be expanded first.
In other wards, calling TREE-SEARCH (problem, FIFO-QUEUE ())
results in breadth-first-search.
The FIFO queue puts all newly generated successors at the end of the
queue,which means that
Shallow nodes are expanded before deeper nodes.
Breadth-first searches on a simple binary tree. At each stage, the node to be expanded next is
indicated by a marker.
Properties of breadth-first-search
Time and memory requirements for breadth-first-search. The numbers

shown assume branch factor of b = 10 ; 10,000 nodes/second; 1000
bytes/node
Time complexity for BFS
Assume every state has b successors. The root of the search tree generates b nodes at
the first level, each of which generates b more nodes, for a total of b2 at the second level. Each of
these generates b more nodes, yielding b3 nodes at the third level, and so on. Now suppose that
the solution is at depth d. In the worst case, we would expand all but the last node at level
degenerating bd+1 - b nodes at level d+1.
Then the total number of nodes generated is
b + b2 + b3 + …+ bd + ( bd+1 + b) = O(bd+1).
Every node that is generated must remain in memory, because it is either part of the fringe or is
an ancestor of a fringe node. The space complexity is, therefore, the same as the time complexity
UNIFORM-COST SEARCH:
Instead of expanding the shallowest node, uniform-cost search expands the
node n with the lowest path cost. Uniform-cost search does not care about the number of steps a
path has, but only about their total cost.
Properties of Uniform-cost-search
2 DEPTH-FIRST-SEARCH
Depth-first-search always expands the deepest node in the current fringe of
the search tree. The progress of the search is illustrated in figure 1.31. The search proceeds
immediately to the deepest level of the search tree, where the nodes have no successors. As
those nodes are expanded, they are dropped from the fringe, so then the search “backs up” to
the next shallowest node that still has unexplored successors.
Depth-first-search on a binary tree. Nodes that have been expanded and

have no descendants in the fringe can be removed from the memory; these are shown in
black. Nodes at depth 3 are assumed to have no successors and M is the only goal node.
This strategy can be implemented by TREE-SEARCH with a last-in-
first-out (LIFO) queue, also known as a stack. Depth-first-search has very modest memory
requirements. It needs to store only a single path from the root to a leaf node, along with the
remaining unexpanded sibling nodes for each node on the path. Once the node has been
expanded, it can be removed from the memory, as soon as its descendants have been fully
explored. For a state space with a branching factor b and maximum depth m, depth-first-search
requires storage of only bm + 1 nodes.
Using the same assumptions as Figure 1.15, and assuming that nodes at the same
depth as the goal node have no successors, we find the depth-first-search would require 118
kilobytes instead of 10 pet bytes, a factor of 10 billion times less space.
Drawback of Depth-first-search
The drawback of depth-first-search is that it can make a wrong choice and get
stuck going down very long(or even infinite) path when a different choice would lead to solution
near the root of the search tree. For example, depth-first-search will explore the entire left sub
tree even if node C is a goal node.
3 DEPTH-LIMITED-SEARCH:
The problem of unbounded trees can be alleviated by supplying depth-first-

search with a pre-determined depth limit l. That is, nodes at depth l are treated as if they have no
successors. This approach is called depth-limited-search. The depth limit saves the infinite path
problem.
Depth limited search will be no optimal if we choose l > d. Its time
complexity is O(bl) and its space complexity is O(bl). Depth-first-search can be viewed as a
special case of depth-limited search with l = oo. Sometimes, depth limits can be based on
knowledge of the problem. For, example, on the map of Romania there are 20 cities. Therefore,
we know that if there is a solution. It must be of length 19 at the longest, so l = 10 is a possible
choice. However, it ocean be shown that any city can be reached from any other city in at most 9
steps. This number known as the diameter of the state space gives us a better depth limit.
Depth-limited-search can be implemented as a simple modification to the
general tree-search algorithm or to the recursive depth-first-search algorithm. It can be noted that
the above algorithm can terminate with two kinds of failure: the standard failure value indicates
no solution; the cutoff value indicates no solution within the depth limit. Depth-limited search =
depth-first search with depth limit l, returns cut off if any path is cut off by depth limit
4. ITERATIVE DEEPENING DEPTH-FIRST SEARCH:

Iterative deepening search (or iterative-deepening-depth-first-search) is a general
strategy often used in combination with depth-first-search that finds the better depth limit. It does
this by gradually increasing the limit – first 0, then 1, then 2, and so on – until a goal is found.
This will occur when the depth limit reaches d, the depth of the shallowest goal node. The
algorithm is shown in Figure 1.16.
Iterative deepening combines the benefits of depth-first and breadth-first-search
Like depth-first-search, its memory requirements are modest;O(bd) to be precise. Like Breadth-
first-search, it is complete when the branching factor is finite and optimal when the path cost is a
non decreasing function of the depth of the node.
Four iterations of iterative deepening search on a binary tree.
Iterative search is not as wasteful as it might seem
ITERATIVE DEEPENING SEARCH
S S
S
Limit = 0 A D
Limit = 1
S S S
A D A D
B D A E
Limit = 2
Iterative search is not as wasteful as it might seem

Properties of iterative deepening search
In general, iterative deepening is the preferred uninformed search method when

there is a large search space and the depth of solution is not known.
5 BIDIRECTIONAL SEARCH:
The idea behind bidirectional search is to run two simultaneous searches one forward from he
initial state and the other backward from the goal, stopping when the two searches meet in the
middle (Figure 1.18)
The motivation is that bd/2 + bd/2 much less than ,or in the figure ,the area of the two small circles
is less than the area of one big circle centered on the start and reaching to the goal.
A schematic view of a bidirectional search that is about to succeed,when a

Branch from the Start node meets a Branch from the goal node.
6 COMPARING UNINFORMED SEARCH STRATEGIES:

.
Figure 1.19 Comparing Uninformed Search Strategies

Evaluation of search strategies is the branching factor; d is the depth of the shallowest solution;
m is the maximum depth of the search tree; l is the depth limit. Superscript caveats are as
follows: a complete if b is finite; b complete if step costs >= E for positive E; c optimal if step
costs are all identical; d if both directions use breadth-first search.
3. Explain informed search strategies.

Informed search strategy is one that uses problem-specific knowledge beyond the
definition of the problem itself. It can find solutions more efficiently than uninformed strategy.
Best-first search;
Heuristic function;
Greedy-Best First Search(GBFS);
A* search;
Memory Bounded Heuristic Search.
INFORMED (HEURISTIC) SEARCH STRATEGIES:
Informed search strategy is one that uses problem-specific knowledge beyond the
definition of the problem itself. It can find solutions more efficiently than uninformed strategy.
Best-first search
Best-first search is an instance of general TREE-SEARCH or GRAPH-SEARCH
algorithm in which a node is selected for expansion based on an evaluation function f(n). The
node with lowest evaluation is selected for expansion, because the evaluation measures the
distance to the goal.
This can be implemented using a priority-queue, a data structure that will maintain the
fringe in ascending order of f-values.
Heuristic functions:
A heuristic function or simply a heuristic is a function that ranks alternatives in various
search algorithms at each branching step basing on available information in order to make a
decision which branch is to be followed during a search.
The key component of Best-first search algorithm is a heuristic function, denoted by
h (n):
h (n) = estimated cost of the cheapest path from node n to a goal node.
For example, in Romania, one might estimate the cost of the cheapest path from Arad to
Bucharest via a straight-line distance from Arad to Bucharest.
Heuristic function is the most common form in which additional knowledge is imparted
to the search algorithm.
2.1GREEDY BEST-FIRST SEARCH:

Greedy best-first search tries to expand the node that is closest to the goal, on the
grounds that this is likely to a solution quickly.
It evaluates the nodes by using the heuristic function f (n) = h (n).
Taking the example of Route-finding problems in Romania, the goal is to reach
Bucharest starting from the city Arad. We need to know the straight-line distances to Bucharest
from various cities as shown in Figure 2.1. For example, the initial state is In (Arad), and the
straight line distance heuristic hSLD (In (Arad)) is found to be 366.
Using the straight-line distance heuristic hSLD ,the goal state can be reached faster.
Figure 2.1 Values of hSLD - straight line distances to Bucharest
stages in greedy best-first search for Bucharest using straight-line distance

heuristic hSLD. Nodes are labeled with their h-values.
Figure 2.2 shows the progress of greedy best-first search using hSLD to find a path from Arad to
Bucharest. The first node to be expanded from Arad will be Sibiu, because it is closer to
Bucharest than either Zerind or Timisoara. The next node to be expanded will be Fagaras,
because it is closest. Fagaras in turn generates Bucharest, which is the goal.
Properties of greedy search:

Complete: No–can get stuck in loops,
Complete in finite space with repeated-state checking
Time :O(bm), but a good heuristic can give dramatic improvement
Space: O(bm)—keeps all nodes in memory
Optimal: No
Greedy best-first search is not optimal, and it is incomplete.
The worst-case time and space complexity is O (bm), where m is the maximum depth of the
search space.
2 A* SEARCH:
A* Search is the most widely used form of best-first search. The evaluation function f(n) is
obtained by combining
g(n) = the cost to reach the node.
h(n) = the cost to get from the node to the goal .
f(n) = g(n) + h(n).
A* Search is both optimal and complete. A* is optimal if h(n) is an admissible heuristic.
The obvious example of admissible heuristic is the straight-line distance hSLD. It cannot be an
overestimate.
A* Search is optimal if h(n) is an admissible heuristic – that is, provided that h(n) never
overestimates the cost to reach the goal.
An obvious example of an admissible heuristic is the straight-line distance hSLD that we
used in getting to Bucharest. The progress of an A* tree search for Bucharest is shown in Figure
2.2.
Recursive Best-first Search (RBFS):

Recursive best-first search is a simple recursive algorithm that attempts to mimic the
operation of standard best-first search, but using only linear space.
Its structure is similar to that of recursive depth-first search, but rather than continuing
indefinitely down the current path,
It keeps track of the f-value of the best alternative path available from any ancestor of the
current node.
If the current node exceeds this limit, the recursion unwinds back to the alternative path.
As the recursion unwinds, RBFS replaces the f-value of each node along the path with the
best f-value of its children.
Coding for how RBFS reaches Bucharest.
function RECURSIVE-BEST-FIRST-SEARCH(problem) return a solution or failure

return RFBS(problem,MAKE-NODE(INITIAL-STATE[problem]),∞)
function RFBS( problem, node, f_limit) return a solution or failure and a new f-
cost limit
if GOAL-TEST[problem](STATE[node]) then return node
successors EXPAND(node, problem)
if successors is empty then return failure, ∞
for each s in successors do
f [s] max(g(s) + h(s), f [node])
repeat
best the lowest f-value node in successors
if f [best] > f_limit then return failure, f [best]
alternative the second lowest f-value among successors
result, f [best] RBFS(problem, best, min(f_limit, alternative))
if result failure then return result
Stages in an RBFS search for the shortest route to Bucharest. The f-limit value for each
recursive call is shown on top of each current node. (a) The path via Rimnicu Vilcea is
followed until the current best leaf (Pitesti) has a value that is worse than the best
alternative path (Fagaras).
(b) The recursion unwinds and the best leaf value of the forgotten sub tree (417) is backed
up to Rimnicu Vilcea; then Fagaras is expanded, revealing a best leaf value of 450.
(c) The recursion unwinds and the best leaf value of the forgotten sub tree (450) is backed
up to Fagaras; then Rimni Vicea is expanded. This time because the best alternative
path(through Timisoara) costs at least 447,the expansion continues to Bucharest
RBFS Evaluation:
RBFS is a bit more efficient than IDA*.Still excessive node generation (mind changes). Like
A*, optimal if h(n) is admissible. Space complexity is O(bd).IDA* retains only one single
number (the current f-cost limit).Time complexity difficult to characterize. Depends on accuracy
if h(n) and how often best path changes.IDA* en RBFS suffer from too little memory.
2 HEURISTIC FUNCTIONS:
A heuristic function or simply a heuristic is a function that ranks alternatives in various
search algorithms at each branching step basing on available information in order to make a
decision which branch is to be followed during a search
A typical instance of the 8-puzzle.

The solution is 26 steps long.
The 8-puzzle:
The 8-puzzle is an example of Heuristic search problem. The object of the puzzle is to
slide the tiles horizontally or vertically into the empty space until the configuration matches the
goal configuration (Figure 2.4)
The average cost for a randomly generated 8-puzzle instance is about 22 steps. The
branching factor is about 3. (When the empty tile is in the middle, there are four possible moves;
when it is in the corner there are two; and when it is along an edge there are three).
This means that an exhaustive search to depth 22 would look at about 322 approximately
= 3.1 X 1010 states. By keeping track of repeated states, we could cut this down by a factor of
about 170,000, because there are only 9!/2 = 181,440 distinct states that are reachable. This is a
manageable number, but the corresponding number for the 15-puzzle is roughly 1013.
If we want to find the shortest solutions by using A*, we need a heuristic function that
never overestimates the number of steps to the goal.
The two commonly used heuristic functions for the 15-puzzle is:
h1 = the number of misplaced tiles.
For Figure 2.4, all of the eight tiles are out of position, so the start state would have h1 =
8. h1 is an admissible heuristic.
h2 = the sum of the distances of the tiles from their goal positions. This is called the city block
distance or Manhattan distance.
h2 is admissible ,because all any move can do is move one tile one step closer to the goal.
Tiles 1 to 8 in start state give a Manhattan distance of
h2 = 3 + 1 + 2 + 2 + 2 + 3 + 3 + 2 = 18.
Neither of these overestimates the true solution cost, which is 26.
The Effective Branching factor:

One way to characterize the quality of a heuristic is the effective branching factor b*. If the
total number of nodes generated by A* for a particular problem is N, and the solution depth is d,
then b* is the branching factor that a uniform tree of depth d would have to have in order to
contain N+1 nodes. Thus,
N + 1 = 1 + b* + (b*) 2+…+ (b*) d
For example, if A* finds a solution at depth 5 using 52 nodes, then effective branching factor is
1.92.
A well designed heuristic would have a value of b* close to 1, allowing failure large
problems to be solved.
To test the heuristic functions h1 and h2, 1200 random problems were generated with
solution lengths from 2 to 24 and solved them with iterative deepening search and with A* search
using both h1 and h2. Figure 2.5 gives the average number of nodes expanded by each strategy
and the effective branching factor.
The results suggest that h2 is better than h1, and is far better than using iterative deepening
search. For a solution length of 14, A* with h2 is 30,000 times more efficient than uninformed
iterative deepening search.
Comparison of search costs and effective branching factors for the

ITERATIVE-DEEPENING-SEARCH and A* Algorithms with h1,and h2. Data are average
over 100 instances of the 8-puzzle,for various solution lengths.
Inventing admissible heuristic functions:

Relaxed problems:
A problem with fewer restrictions on the actions is called a relaxed problem
The cost of an optimal solution to a relaxed problem is an admissible heuristic for the
original problem
If the rules of the 8-puzzle are relaxed so that a tile can move anywhere, then h1 (n) gives
the shortest solution
If the rules are relaxed so that a tile can move to any adjacent square, then h2 (n) gives
the shortest solution.
4. Explain about Local Search Algorithms And Optimization Problems.
In many optimization problems, the path to the goal is irrelevant; the goal state itself is
the solution
For example, in the 8-queens problem, what matters is the final configuration of queens,
not the order in which they are added.
In such cases, we can use local search algorithms. They operate using a single current
state (rather than multiple paths) and generally move only to neighbors of that state.
The important applications of these classes of problems are
(a) Integrated-circuit design,
(b)Factory-floor layout,
(c) Job-shop scheduling
(d)Automatic programming,
(e) Telecommunications network optimization,
(f)Vehicle routing, and
(g) Portfolio management.
Key advantages of Local Search Algorithms:

(1) They use very little memory – usually a constant amount;
(2) They can often find reasonable solutions in large or infinite (continuous) state spaces for
which systematic algorithms are unsuitable.
OPTIMIZATION PROBLEMS:
In addition to finding goals, local search algorithms are useful for solving pure
optimization problems, in which the aim is to find the best state according to an objective
function.
State Space Landscape

A landscape has both “location” (defined by the state) and “elevation”(defined by the
value of the heuristic cost function or objective function).
If elevation corresponds to cost, then the aim is to find the lowest valley – a global
minimum; if elevation corresponds to an objective function, then the aim is to find the highest
peak – a global maximum.
Local search algorithms explore this landscape. A complete local search algorithm
always finds a goal if one exists; an optimal algorithm always finds a global
minimum/maximum.
Figure 2.6 A one dimensional state space landscape in which elevation corresponds to
the objective function. The aim is to find the global maximum. Hill climbing search
modifies the current state to try to improve it, as shown by the arrow. The various
topographic features are defined in the text.
1. Hill-climbing search:
The hill-climbing search algorithm as shown in Figure 2.6 is simply a loop that
continually moves in the direction of increasing value – that is, uphill. It terminates when it
reaches a “peak” where no neighbor has a higher value.
function HILL-CLIMBING( problem) return a state that is a local maximum

input: problem, a problem
local variables: current, a node.
neighbor, a node.
current MAKE-NODE(INITIAL-STATE[problem])
loop do
neighbor a highest valued successor of current
if VALUE [neighbor] ≤ VALUE[current] then return STATE[current]
current neighbor
Figure 2.7 The hill-climbing search algorithm (steepest ascent version), which is the
most basic local search technique. At each step the current node is replaced by the best
neighbor; the neighbor with the highest VALUE. If the heuristic cost estimate h is used, we
could find the neighbor with the lowest h.
Hill-climbing is sometimes called greedy local search because it grabs a good neighbor state
without thinking ahead about where to go next. Greedy algorithms often perform quite well.
Problems with hill-climbing:

Hill-climbing often gets stuck for the following reasons:
Local maxima: a local maximum is a peak that is higher than each of its neighboring
states, but lower than the global maximum. Hill-climbing algorithms that reach the
vicinity of a local maximum will be drawn upwards towards the peak, but will then be
stuck with nowhere else to go
Ridges: A ridge is shown in Figure 2.10. Ridges results in a sequence of local maxima
that is very difficult for greedy algorithms to navigate.
Plateaux: A plateau is an area of the state space landscape where the evaluation function
is flat. It can be a flat local maximum, from which no uphill exit exists, or a shoulder,
from which it is possible to make progress.
Figure 2.8 Illustration of why ridges cause difficulties for hill-climbing. The grid of
states(dark circles) is superimposed on a ridge rising from left to right, creating a sequence
of local maxima that are not directly connected to each other. From each local maximum,
all the available options point downhill.
Hill-climbing variations
Stochastic hill-climbing: Random selection among the uphill moves. The selection
probability can vary with the steepness of the uphill move.
First-choice hill-climbing: stochastic hill climbing by generating successors randomly
until a better one is found.
Random-restart hill-climbing: Tries to avoid getting stuck in local maxima.
2. Simulated annealing search

A hill-climbing algorithm that never makes “downhill” moves towards states with lower
value (or higher cost) is guaranteed to be incomplete, because it can stuck on a local maximum.
In contrast, a purely random walk –that is, moving to a successor chosen uniformly at random
from the set of successors – is complete, but extremely inefficient.
Simulated annealing is an algorithm that combines hill-climbing with a random walk in
some way that yields both efficiency and completeness.
Figure 2.9 shows simulated annealing algorithm. It is quite similar to hill climbing.
Instead of picking the best move, however it picks the random move. If the move improves the
situation, it is always accepted. Otherwise, the algorithm accepts the move with some probability
less than 1. The probability decreases exponentially with the “badness” of the move – the amount
E by which the evaluation is worsened.
Simulated annealing was first used extensively to solve VLSI layout problems in the
early 1980s. It has been applied widely to factory scheduling and other large-scale optimization
tasks.
Figure 2.9 The simulated annealing search algorithm, a version of stochastic hill climbing
where some downhill moves are allowed.
3. Genetic algorithms:
A Genetic algorithm (or GA) is a variant of stochastic beam search in which successor
states are generated by combining two parent states, rather than by modifying a single state.
Like beam search, Genetic algorithm begin with a set of k randomly generated states,
called the population.
Each state, or individual, is represented as a string over a finite alphabet – most
commonly, a string of 0s and 1s. For example, an 8 8-quuens state must specify the positions of
8 queens, each in a column of 8 squares, and so requires 8 x log2 8 = 24 bits.
Figure 2.10 The genetic algorithm. The initial population in (a) is ranked by the fitness
function in (b), resulting in pairs for mating in (c). They produce offspring in (d), which are
subjected to mutation in (e).
Figure 2.10 shows a population of four 8-digit strings representing 8-queen states
Fitness function should return higher values for better states.
Cross Over: Each pair to be mated, a cross over point is randomly chosen from the
positions in the string.
Offspring: Created by crossing over the parent strings at the crossover point.
function GENETIC_ALGORITHM( population, FITNESS-FN) return an individual

input: population, a set of individuals
FITNESS-FN, a function which determines the quality of the individual
repeat
new_population empty set
loop for i from 1 to SIZE(population) do
x RANDOM_SELECTION(population, FITNESS_FN)
y RANDOM_SELECTION(population, FITNESS_FN)
child REPRODUCE(x,y)
if (small random probability) then child MUTATE(child )

add child to new_population
population new_population
until some individual is fit enough or enough time has elapsed
return the best individual
Figure 2.11 Genetic algorithms. The algorithm is same as the one diagrammed in Figure.
Local search in continuous spaces:

We have considered algorithms that work only in discrete environments, but real-world
environment are continuous. Local search amounts to maximizing a continuous objective
function in a multi-dimensional vector space. This is hard to do in general. Can immediately
retreat. Discretize the space near each state apply a discrete local search strategy (e.g., stochastic
hill climbing, simulated annealing) Often resists a closed-form solution.
Online search interleave computation and action Compute—Act—Observe—Compute—·
Online search good. For dynamic, semi-dynamic, stochastic domains whenever offline search would yield
exponentially many contingencies Online search necessary for exploration problem States and actions
unknown to agent uses actions as experiments to determine what to do. se
5. Explain CSP in detail.
A constraint satisfaction problem is a special kind of problem satisfies some
additional structural properties beyond the basic requirements for problem in general. In a CSP,
the states are defined by the values of a set of variables and the goal test specifies a set of
constraint that the value must obey.
CSP can be viewed as a standard search problem as follows:
Initial state: the empty assignment {}, in which all variables are
unassigned.
Successor function: a value can be assigned to any unassigned variable,
provided that it does not conflict with previously assigned variables.
Goal test: the current assignment is complete.
Path cost: a constant cost for every step.
Varieties of CSP’s:
Discrete variables.
CSPs with continuous domains.
Varieties of constraints :
Unary constraints involve a single variable.
Binary constraints involve pairs of variables.
Higher order constraints involve 3 or more variables.

A Constraint Satisfaction Problem (or CSP) is defined by a set of variables, X1, X2….Xn,
and a set of constraints C1, C2,…, Cm. Each variable Xi has a nonempty domain D of possible
values. Each constraint Ci involves some subset of variables and specifies the allowable
combinations of values for that subset.
A State of the problem is defined by an assignment of values to some or all of the
variables,{Xi = vi, Xj = vj,…}.
An assignment that does not violate any constraints is called a consistent or legal
assignment. A complete assignment is one in which every variable is mentioned, and a solution
to a CSP is a complete assignment that satisfies all the constraints.
Some CSPs also require a solution that maximizes an objective function.
Example for Constraint Satisfaction Problem:

The map of Australia showing each of its states and territories. We are given the task of
coloring each region either red, green or blue in such a way that the neighboring regions have the
same color.
To formulate this as CSP, we define the variable to be the regions: WA, NT, Q, NSW, V,
SA, and T.
The domain of each variable is the set {red, green, blue}. The constraints require
neighboring regions to have distinct colors; for example, the allowable combinations for WA and
NT are the pairs {(red,green),(red,blue),(green,red),(green,blue),(blue,red),(blue,green)}.
The constraint can also be represented more succinctly as the inequality WA not = NT, provided
the constraint satisfaction algorithm has some way to evaluate such expressions.)
There are many possible solutions such as
{WA = red, NT = green = red, NSW = green, V = red, SA = blue, T = red}.
It is helpful to visualize a CSP as a constraint graph, as shown in Figure 2.13.
The nodes of the graph corresponds to variables of the problem and the arcs correspond
to constraints.
Figure 2.13 The map coloring problem represented as a constraint graph.
CSP can be viewed as a standard search problem as follows:

Initial state: the empty assignment {}, in which all variables are unassigned.
Successor function: a value can be assigned to any unassigned variable, provided that it
does not conflict with previously assigned variables.
Goal test: the current assignment is complete.
Path cost: a constant cost (E.g., 1) for every step.

Every solution must be a complete assignment and therefore appears at depth n if there
are n variables.
Varieties of CSPs
(i) Discrete variables
a. Finite domains:
The simplest kind of CSP involves variables that are discrete and have finite domains.
Map coloring problems are of this kind. The 8-queens problem can also be viewed as finite-
domain CSP, where the variables Q1,Q2,…..Q8 are the positions each queen in columns 1,….8
and each variable has the domain {1,2,3,4,5,6,7,8}.
If the maximum domain size of any variable in a CSP is d, then the number of possible
complete assignments is O(dn) – that is, exponential in the number of variables. Finite domain
CSPs include Boolean CSPs, whose variables can be either true or false.
b. Infinite domains:
Discrete variables can also have infinite domains – for example, the set of integers or the
set of strings. With infinite domains, it is no longer possible to describe constraints by
enumerating all allowed combination of values.
Instead a constraint language of algebraic inequalities such as Start job1 + 5 <= Startjob3.
(ii) CSPs with continuous domains

CSPs with continuous domains are very common in real world. For example, in operation
research field, the scheduling of experiments on the Hubble Telescope requires very precise
timing of observations;
The start and finish of each observation and maneuver are continuous-valued variables
that must obey a variety of astronomical, precedence and power constraints.
The best known category of continuous-domain CSPs is that of linear programming
problems, where the constraints must be linear inequalities forming a convex region.
Varieties of constraints:
(i) Unary constraints involve a single variable.
Example: SA # green
(ii) Binary constraints involve pairs of variables.
Example: SA # WA
(iii) Higher order constraints involve 3 or more variables.
Example: cryptarithmetic puzzles.
Figure 2.14 Cryptarithmetic problems. Each letter stands for a distinct digit; the aim
is to find a substitution of digits for letters such that the resulting sum is arithmetically
correct, with the added restriction that no leading zeros are allowed. (b) The
constraint hypergraph for the cryptarithmetic problem, shown in the Alldiff constraint
as well as the column addition constraints. Each constraint is a square box connected
to the variables it contains.
5 BACKTRACKING SEARCH FOR CSPS

The term backtracking search is used for depth-first search that chooses values for one
variable at a time and backtracks when a variable has no legal values left to assign.
Figure A simple backtracking algorithm for constraint satisfaction problem.
Figure Part of search tree generated by simple backtracking for the map coloring
problem.
Any backtracking search should answer the following questions:

1. Which variable should be assigned next, and in what order should its values be tried?
2. What are the implications of the current variable assignments for the order unassigned
variables?
3. When a path fails – that is a state is reached in which a variable has no legal values can
the search avoid repeating this failure in subsequent paths?
Variable and value ordering

Choosing the variable with the fewest legal values is called minimum remaining values
(MRV) heuristics. It is also called as most constrained variable or fail first heuristic
Degree Heuristic: If the tie occurs among most constrained variables then most
constraining variable is chosen (i.e.,) choose the variable with the most constraints on
remaining variables.
Once a variable has been selected, choose the least constraining value, it is the line that
rules out, the fewest values in the remaining variables.
PROPAGATING INFORMATION THROUGH CONSTRAINTS:

So far our search algorithm considers the constraints on a variable only at the time that
the variable is chosen by SELECT-UNASSIGNED-VARIABLE. But by looking at some of the
constraints earlier in the search, or even before the search has started, we can drastically reduce
the search space.
1. Forward checking
One way to make better use of constraints during search is called forward checking.
Whenever a variable X is assigned, the forward checking process looks at each unassigned
variable Y that is connected to X by a constraint and deletes from Y ’s domain any value that is
inconsistent with the value chosen for X. the following Figure shows the progress of a map-
coloring search with forward checking.
Figure 2.17 forward checking

2. Constraint propagation:
Although forward checking detects many inconsistencies, it does not detect all of them.
Constraint propagation is the general term for propagating the implications of a constraint
on one variable onto other variables.
Figure Constraint propagation
5 LOCAL SEARCH FOR CSPS
6 THE STRUCTURE OF PROBLEMS

Problem Structure
Independent Sub problems
Figure Independent Sub problems
Tree-Structured CSPs
Figure Tree-Structured CSPs
QUESTION BANK

UNIT – II
PART -A (2 Marks)
LOGICAL REASONING
1. What factors determine the selection of forward or backward reasoning
approach for an AI problem?
A search procedure must find a path between initial and goal states. There are two directions in
which a search process could proceed. Reason forward from the initial states: Being formed the
root of the search tree. General the next level of the tree by finding all the rules whose left sides
match the root node, and use their right sides to generate the siblings. Repeat the process until a
configuration that matches the goal state is generated.
2. What are the limitations in using propositional logic to represent the

knowledge base?
Formalise the following English sentences:
Al is small
Ted is small
Someone is small
Everyone is small
No-one is not small
Propositional Logic would represent each of these as a different Proposition,
so the five propositions might be represented by P. Q, R, S and T
What this representation is missing is the similarity between the propositions,
they are all concerned with the relation ’small’
Predicate logic allows relations and quantification (which allows the
representation of English descriptors like someone, everyone and noone)
3. What are Logical agents?

Logical agents apply inference to a knowledge base to derive new information and
make decisions.
4.What is first-order logic?

The first-order logic is sufficiently expressive to represent a good deal of our
commonsense knowledge. It also either subsumes or forms the foundation of many other
representation languages.

5. What is a symbol?
The basic syntactic elements of first-order logic are the symbols. It stands for
objects, relations and functions.
6. What are the types of Quantifiers?

The types of Quantifiers are,
Universal Quantifiers;
Existential Quantifiers.
7. What are the three kinds of symbols?

The three kinds of symbols are,
Constant symbols standing for objects;
Predicate symbols standing for relations;
Function symbols standing for functions.
8. What is Logic?
Logic is one which consist of
A formal system for describing states of affairs, consisting of a)
Syntax b) Semantics;
Proof Theory – a set of rules for deducing the entailment of set
sentences.
9. Define a Sentence?
Each individual representation of facts is called a sentence. The sentences are
expressed in a language called as knowledge representation language.
10. Define a Proof.

A sequence of application of inference rules is called a proof. Finding proof is
exactly finding solution to search problems. If the successor function is defined to generate all
possible applications of inference rules then the search algorithms can be applied to find proofs.
11. Define Interpretation

Interpretation specifies exactly which objects, relations and functions are referred
to by the constant predicate, and function symbols.
12. What are the three levels in describing knowledge based agent?
The three levels in describing knowledge based agent
Logical level;
Implementation level;
Knowledge level or epistemological level.
13. Define Syntax?

Syntax is the arrangement of words. Syntax of a knowledge describes the possible
configurations that can constitute sentences. Syntax of the language describes how to make
sentences.
14. Define Semantics
The semantics of the language defines the truth of each sentence with respect to
each possible world. With this semantics, when a particular configuration exists within an agent,
the agent believes the corresponding sentence.
15. Define Modus Ponen’s rule in Propositional logic?

The standard patterns of inference that can be applied to derive chains of
conclusions that lead to the desired goal is said to be Modus Ponen’s rule.
16. Define a knowledge Base.

Knowledge base is the central component of knowledge base agent and it is
described as a set of representations of facts about the world.
17. Define an inference procedure.

An inference procedure reports whether or not a sentence is entitled by
knowledge base provided a knowledge base and a sentence . An inference procedure ‘i’ can be
described by the sentences that it can derive.
If i can derive from knowledge base, we can write, KB Alpha is derived from
KB or i derives alpha from KB
18. What are the basic Components of propositional logic?

The basic Components of propositional logic
Logical Constants (True, False)
Propositional symbols (P, Q)
Logical Connectives (^,=, , )
19. Define AND –Elimination rule in propositional logic.

AND elimination rule states that from a given conjunction it is possible to inference
any of the conjuncts.
1 ^ 2------^ n
20. Define AND-Introduction rule in propositional logic.

AND-Introduction rule states that from a list of sentences we can infer their
conjunctions.
1, 2,…….. n
1^ 2^…….^ n

21. What is forward chaining?

A deduction to reach a conclusion from a set of antecedents is called forward
chaining. In other words, the system starts from a set of facts, and a set of rules, and tries to find
the way of using these rules and facts to deduce a conclusion or come up with a suitable course
of action.
22. What is backward chaining?

In backward chaining, we start from a conclusion, which is the hypothesis we wish
to prove, and we aim to show how that conclusion can be reached from the rules and facts in the
data base.
PRAT B
1. Explain in detail about knowledge engineering process in FOL.
KNOWLEDGE REPRESENTATION
Intelligent agents need knowledge about the world in order to reach good
decisions.
Knowledge is contained in agents in the form of sentences in a knowledge
representation language that are stored in knowledge base.
Logic is the formal systematic study of the principles of valid inference and
correct reasoning.
A system of inference rules and axioms allows certain formulas to be derived,
called theorems: which may be interpreted as true propositions.
Knowledge representation languages should be declarative, compositional,
expressive, context-independent, and unambiguous.
FIRST ORDER LOGIC:

Syntax
Let us first introduce the symbols, or alphabet, being used. Beware that there are all sorts
of slightly different ways to define FOL.
Alphabet
Logical Symbols: These are symbols that have a standard meaning, like: AND, OR,
NOT, ALL, EXISTS, IMPLIES, IFF, FALSE, =.

Non-Logical Symbols: divided in:
 Constants:
Predicates: 1-ary, 2-ary, n-ary. These are usually just identifiers.
Functions: 0-ary, 1-ary, 2-ary, n-ary. These are usually just identifiers. 0-ary
functions are also called individual constants.
Where predicates return true or false, functions can return any value.
 Variables: Usually an identifier.

One needs to be able to distinguish the identifiers used for predicates, functions, and
variables by using some appropriate convention, for example, capitals for function and predicate
symbols and lower cases for variables.
Terms
A Term is either an individual constant (a 0-ary function), or a variable, or an n-ary

function applied to n terms: F(t1 t2 ..tn)
[We will use both the notation F(t1 t2 ..tn) and the notation (F t1 t2 .. tn)]
Atomic Formulae
An Atomic Formula is either FALSE or an n-ary predicate applied to n terms: P(t1 t2 ..
tn). In the case that "=" is a logical symbol in the language, (t1 = t2), where t1 and t2 are terms, is
an atomic formula.
Literals
A Literal is either an atomic formula (a Positive Literal), or the negation of an atomic
formula (a Negative Literal). A Ground Literal is a variable-free literal.
Clauses
A Clause is a disjunction of literals. A Ground Clause is a variable-free clause. A Horn
Clause is a clause with at most one positive literal. A Definite Clause is a Horn Clause with
exactly one positive Literal.
Notice that implications are equivalent to Horn or Definite clauses:
(A IMPLIES B) is equivalent to ( (NOT A) OR B)

(A AND B IMPLIES FALSE) is equivalent to ((NOT A) OR (NOT B)).

Formulae
A Formula is either:
• an atomic formula, or
• a Negation, i.e. the NOT of a formula, or
• a Conjunctive Formula, i.e. the AND of formulae, or

• a Disjunctive Formula, i.e. the OR of formulae, or
• an Implication, that is a formula of the form (formula1 IMPLIES formula2), or
• an Equivalence, that is a formula of the form (formula1 IFF formula2), or
• a Universally Quantified Formula, that is a formula of the form (ALL variable formula).
We say that occurrences of variable are bound in formula [we should be more precise].
Or
• a Existentially Quantified Formula, that is a formula of the form (EXISTS variable

formula). We say that occurrences of variable are bound in formula [we should be more
precise].
An occurrence of a variable in a formula that is not bound is said to be free. A formula

where all occurrences of variables are bound is called a closed formula, one where all variables
are free is called an open formula.
A formula that is the disjunction of clauses is said to be in Clausal Form. We shall see
that there is a sense in which every formula is equivalent to a clausal form.
Often it is convenient to refer to terms and formulae with a single name. Form or
Expression is used to this end.
Substitutions
Given a term s, the result [substitution instance] of substituting a term t in s for a variable x,
s[t/x], is:
 t, if s is the variable x
 y, if s is the variable y different from x

 F(s1[t/x] s2[t/x] .. sn[t/x]), if s is F(s1 s2 .. sn).
Given a formula A, the result (substitution instance) of substituting a term t in A for a

variable x, A[t/x], is:
 FALSE, if A is FALSE,
 P(t1[t/x] t2[t/x] .. tn[t/x]), if A is P(t1 t2 .. tn),

 (B[t/x] AND C[t/x]) if A is (B AND C), and similarly for the other connectives,
 (ALL x B) if A is (ALL x B), (similarly for EXISTS),
 (ALL y B[t/x]), if A is (ALL y B) and y is different from x (similarly for EXISTS).
The substitution [t/x] can be seen as a map from terms to terms and from formulae to
formulae. We can define similarly [t1/x1 t2/x2 .. tn/xn], where t1 t2 .. tn are terms and x1 x2 .. xn
are variables, as a map, the [simultaneous] substitution of x1 by t1, x2 by t2, .., of xn by tn. [If
all the terms t1 .. tn are variables, the substitution is called an alphabetic variant, and if they are
ground terms, it is called a ground substitution.] Note that a simultaneous substitution is not the
same as a sequential substitution.
3 SYNTAX AND SEMANTICS OF FOL:

FOL have objects in them.
The domain of a model is the set of objects it contains; these objects are sometimes called
as domain elements
SEMANTICS :
Before we can continue in the "syntactic" domain with concepts like Inference Rules and
Proofs, we need to clarify the Semantics, or meaning, of First Order Logic.
An L-Structure or Conceptualization for a language L is a structure M= (U,I), where:
• U is a non-empty set, called the Domain, or Carrier, or Universe of Discourse of M, and
• I is an Interpretation that associates to each n-ary function symbol F of L a map
I(F): UxU..xU -> U

and to each n-ary predicate symbol P of L a subset of UxU..xU.

The set of functions (predicates) so introduced form the Functional Basis (Relational Basis) of
the conceptualization.
Given a language L and a conceptualization (U,I), an Assignment is a map from the variables of
L to U. An X-Variant of an assignment s is an assignment that is identical to s everywhere
except at x where it differs.
Given a conceptualization M=(U,I) and an assignment s it is easy to extend s to map each term t
of L to an individual s(t) in U by using induction on the structure of the term.
Then
• M satisfies a formula A under s if
o A is atomic, say P(t1 .. tn), and (s(t1) ..s(tn)) is in I(P).
o A is (NOT B) and M does not satisfy B under s.
o A is (B OR C) and M satisfies B under s, or M satisfies C under s. [Similarly for all

other connectives.]
o A is (ALL x B) and M satisfies B under all x-variants of s.
o A is (EXISTS x B) and M satisfies B under some x-variants of s.
• Formula A is satisfiable in M iff there is an assignment s such that M satisfies A under s.
• Formula A is satisfiable iff there is an L-structure M such that A is satisfiable in M.
Formula A is valid or logically true in M iff M satisfies A under any s. We then say that M
is a model of A.
• Formula A is Valid or Logically True iff for any L-structure M and any assignment s, M
satisfies A under s.
Some of these definitions can be made relative to a set of formulae GAMMA:
• Formula A is a Logical Consequence of GAMMA in M iff M satisfies A under any s that

also satisfies all the formulae in GAMMA.
• Formula A is a Logical Consequence of GAMMA iff for any L-structure M, A is a

logical consequence of GAMMA in M. At times instead of "A is a logical consequence
of GAMMA" we say "GAMMA entails A".

USING FIRST ORDER LOGIC:
We say that formulae A and B are (logically) equivalent if A is a logical consequence of

{B} and B is a logical consequence of {A}.An Inference Rule is a rule for obtaining a new
formula [the consequence] from a set of given formulae [the premises].
A most famous inference rule is Modus Ponens:
{A, NOT A OR B}
B
For example:
{Sam is tall, if Sam is tall then Sam is unhappy}
Sam is unhappy
When we introduce inference rules we want them to be Sound, that is, we want the consequence
of the rule to be a logical consequence of the premises of the rule. Modus Ponens is sound. But
the following rule, called Abduction, is not:
{B, NOT A OR B} is not. For example:
John is wet
If it is raining then John is wet
It is raining gives us a conclusion that is usually, but not always true [John takes a shower even
when it is not raining].
A Logic or Deductive System is a language, plus a set of inference rules, plus a set of logical
axioms [formulae that are valid].
A Deduction or Proof or Derivation in a deductive system D, given a set of formulae

GAMMA, is a sequence of formulae B1 B2 .. Bn such that:
• for all i from 1 to n, Bi is either a logical axiom of D, or an element of GAMMA, or is

obtained from a subset of {B1 B2 .. Bi-1} by using an inference rule of D.
In this case we say that Bn is Derived from GAMMA in D, and in the case that GAMMA is
empty, we say that Bn is a Theorem of D.
Soundness, Completeness, Consistency, Satisfiability
A Logic D is Sound iff for all sets of formulae GAMMA and any formula A:
• if A is derived from GAMMA in D, then A is a logical consequence of GAMMA
A Logic D is Complete iff for all sets of formulae GAMMA and any formula A:
• If A is a logical consequence of GAMMA, then A can be derived from GAMMA in D.
A Logic D is Refutation Complete iff for all sets of formulae GAMMA and any formula A:
• If A is a logical consequence of GAMMA, then the union of GAMMA and (NON A) is

inconsistent
Note that if a Logic is Refutation Complete then we can enumerate all the logical
consequences of GAMMA and, for any formula A, we can reduce the question if A is or not a
logical consequence of GAMMA to the question: the union of GAMMA and NOT A is or not
consistent. We will work with logics that are both Sound and Complete, or at least Sound and
Refutation Complete.
KNOWLEDGE ENGINEERING IN FIRST-ORDER LOGIC:-

Knowledge engineering is the general process of knowledge base construction. A
knowledge engineer is someone who investigates particular domain, learns what
concepts are important is that domain ad creates a formal representation of the objects
and relations in the domain.
Types of knowledge bases:-
Two types:-
Special
General
THE KNOWLEDGE ENGINEERING PROCESS:-
1) Identify the task
2) Assemble the relevant knowledge
3) Decide on a vocabulary of prediction, function and constants
4) Encode general knowledge about the domain
5) Encode a description of the specific problem instance
6) Pose quries to the inference procedure and get answer
7) Debug the knowledge base
1) IDENTIFY THE TASK:-

The knowledge engineer must define that the range of question that the knowledge base
will support and the kind of facts that will be available for each specific problem instance
For example will the same facts include current location
2) ASSEMBLE THE RELEVANT KNOWLEDGE:-
The knowledge engineer might already be an expert in the domain or might need to work
real experts to extract what they know –a process called knowledge
Acquisition:-
For real domain the issue of relevance can be quit difficult for example system for
simulating VLSI design might or might not need to take into account stray capacitance and skin
effects
3) DECIDE ON A VOCABULARY OF PREDICATES, FUNCTION AND CONSTANTS:-
The important domain level concept are translated into logic level names.
This involves many question of knowledge engineering style.
Like programming style this can have a significant impact on the eventual success of the
project.
Once the choice have been made the result is a vocabulary that is known as the ontology
of the domain.
Ontology means a particular theory of the nature of being or existence it determine
What kind of things exists but does not determine their specific properties and
interrelationships.
4) ENCODE GENERAL KNOWLEDGE ABOUT THE DOMAIN:-

The knowledge engineer writes down the axioms for all the vocabulary terms.
This miss down the meaning of the terms, enabling the expert to check the content.
Often this step reveals misconceptions or gaps in the vocabulary that must fixed by
returning to step there and iterating through the process.
5) ENCODE A DESCRIPTION OF THE SPECIFIC PROBLEM INSTANCE:-

It will involve writing simple atomic sentences above instance of concepts that are
already part of the ontology.
For a logical agent, problem instances are supplied by the sensor, where a “real”
knowledge base is supplied with additional sentence in the same way that traditional
program are supplied with input data.
All gates have one output terminal. Circuits, like gates, have input and output terminals.
to reason about functionality and compare connectivity, it is needed to talk about the
wires themselves, the path the wires take , all the junctions where the two wires comes
together one output terminal is connected to another input terminal without having to
mention the wire that actually connects them.
REPRESENTATION OF GATES:-
A gate must be distinguished from the other gate by remaining them with constants,
x1and x2.
Ways to represent gates:-
Function: - type (x1)=XOR
Binary predicates: type(x1, XOR)
Several individual types predicates: XOR(x1)
The function type avoids the need for axioms stating that each individual gate can have
only one type.
REPRESENTATION OF TERMINALS:-
A gate of circuit can have one or more input terminals and one or more output terminals
Each terminal could be named with a constant. Thus a gate x1 could have terminals
named x1 IN1, X1 IN2 and X1OUT1.
The suggestion into the avoid generating long compound names.
It is probably better to named the gate using a function, like IN (1, X1) to denote the first
input terminal for gate X1. A similar function OUT is used for output terminals.
REPRESENTATION OF CONNECTIVITY:-
The connectivity between the gates can be represented by the predicates connected
(OUT(1,X1),IN(1,X2)).
REPRESENTATION OF SIGNALS:-
To know whether a signal is on or off, use a many predicates ‘on’, which is true when the
signal at the terminal is on.

For answering question like, what are all the possible values of signals at the output terminals of a
circuit C1? introduce to signal values 1 and 0, and a function ‘signal’ that takes a terminal as
argument and denotes the signal value for that terminal
3. Discuss in detail about unification and lifting.
LIFTING:-
Generalized modus ponens is a lifted main of modus ponens – it main modus ponens
from proportional to first – order logic. The key advantage of lifted inference rules over
proportionalization is that they make only those substitutions which are replied to allow
particular inferences to proceed.
UNIFICATION:-
Lifted inference rules require finding substitution that make difficult logical expressions
look identical. This process is called ‘Unification’ and is a key component of all first order
inference algorithms. The UNIFY algorithm takes two sentences and return a ‘unifier’ for then if
are exists,
UNIFY(p,q)= θ where SUBSET(θ,p)=SUBSET(θ,q)
STANDARDIZING AGENT:-
The problem can be avoided by ‘Standardizing agent’ one of the two sentences being
unified, which means renaming its variables to avoid name clashes. For example , we can rename
x in knows(x,Elizabeth) to Z17( a new variable name) without changing its meaning. Now the
unification will work,
UNIFY(knows(John,x), knows(Z17,Elizabeth))={x/Elizabeth,Z17/John}
MOST GENERAL UNIFIER:-
For every unfixable pair of expression, there is a single most general unifier that is unique
up to renaming of variables.
In this case, it is{y/John,x/z}
Occur check:-
When matching a variable against a complex term, one must check whether the variable
itself occur inside the term, if it does, the match fails became no consistent unifier can be
constructed. This is so called occur check make the complexity of the entire algorithm quadratic
in the size of the expression being unified.
THE UNIFICATION ALGORITHM:-

Function UNIFY(x,y, θ) returns a substitution to make x and y identical
Inputs:x, a variable, constant, list on compound
Y, a variable, constant, list or compound
Θ, the substitution built-up so far (options, default to empty)
If θ= failure then return failure
Else if x=y then return θ’else if VARIABLE?(x) then return UNIFY-VAR(x, y, θ)
Else if VARIABLE?(y) then return UNIFY-VAR(y, x, θ)
Else if COMPOUND?(x) and COMPOUND?(y) then return
UNIFY(ARGS[x],ARGS[y],UNIFY(op[x],op[y], θ))
Else if LIST?(x) and LIST?(y) then return
UNIFY(REST[x],REST[y],UNIFY(FIRST[x],FIRST[y], θ))
Else return failure
Function UNIFY-VAR (Var, x, θ) returns a substitution
Inputs: var, a variable
X, any expression
Θ, the substitution built up so far
If{var/val}Ɛ θ then return UNIFY(val, x, θ)
Else if {x/val}Ɛ θ then return UNIFY(var, val, θ)
Else if OCCUR-CHECK? (Var, x) then return failure
Else return add{var/x} to θ
WORKING:-
Recursively explore the two expressions simultaneously “side by side”, building up a
unifier along the way. But it fails if two compounding points in the structures do not
match. There is one expensive step.
STORAGE AND RETRIEVAL:-

STORE(S)-stores a sentence‘s’ into the knowledge base
FETCH (q)-returns all unifier such that the query ‘q’ unifier with some sentence in the
knowledge – base.
Predicate indexing:-
It is a simple scheme.
It puts all the ‘knows’ facts in one bucket and all the Brother – facts in another. The
buckets can be stored in a hash table – for efficient access.

Predicate indexing is useful when there are many predicate symbols but only a few
clauses for each symbol.
SUBSCRIPTION LATTICE:-
Employs (AIMA.org, Richard) DoesAIMA.org employ Richard?
Employs (x.Richard) who employs Richard/
Employs (AIMA.org, y) whom does AIMA.org employ?
Employs(x, y) who employs whom?
These queries from a sub assumption lattice, A sentence with repeated constants has a slightly
different lattice.
4. Explain in detail about forward and backward chaining with example.
Efficient forward chaining;

Incremental forward chaining;
Backward chaining;
FORWARD CHAINING:-
A forward chaining algorithm start with the atomic sentence in the knowledge base and
applies modus ponens in the forward direction, adding new atomic sentences, until no further
influences can be made.
FIRST ORDER DEFINITE CLAUSES:-

First – order definite clauses are disjunction of literals of which exactly one is positive.
They closely assemble propositional definite clauses.
The following are first – order definite clauses,
King(x) ^ greedy(x)=>evil(x)
King (john)
Greedy(y)
First order literals can include variables, in which case those variables are assumed to be
universally qualified.
DATALOG:-

The knowledge base contains no function symbols and is therefore an instance of the
class of ‘data log’ knowledge bases – that is, sets of first order definite clauses with no function
symbols.
SIMPLE FORWARD CHAINING ALGORITHM:-
Function FOL-FC-ASK (KB, α) returns a substitution or false
Inputs: KB, the knowledge base, a set of first – order definite clauses
Α, the query, an atomic sentence
Local variables: new, the new sentences inferred on each iteration
Repeat until new is empty
New{}
For each sentence r in KB do
(P1Ʌ….Ʌ Pn=>q)STANDARDIZE – APART(r)
For each θ such that SUBSET (θ, P1Ʌ….Ʌ Pn)= SUBSET(θ, P11Ʌ….Ʌ P n1)
1

For some, P11……. Pn

Q1SUBSET (θ,q)in KB
If q1 is not a renaming of some sentence already in KB or new then do
Add q1 to new
ΦUNIFY (q1,α)
If φ is not fail then return φ
Add new to
KB Return
false
WORKING:-
Starting from the known facts, it triggers all the rules whose premises are
satisfied, adding their conclusions to the known fact. The process repeats until the query is
answered or no new fact is added. Notice, that a fact is not ‘new’ if it s just a ‘renaming’ of a
known fact. One sentence is a renaming of another if they are identical except for the names of
the variables.
EXAMPLE:-
Crime problem can be used to show how FOL-FC-ASK
words
The implication sentences are rule 3, 6,7
& 8. Two iterations are required,
On the first iteration, rule(3) has unsatisfied
premises.
Rule 6 is satisfied with {x/M1}, and sells{West, M1, none) is
added
Rule 7 is satisfied with {x/ M1} and weapon (M1) is
added. Rule 8 is satisfied with {x/none} and
Hostile(none)is added.
On the second iteration, rule 3 is satisfied with {x/West, y/ M1, z/none}, and
(revival
(west) is
added.
Figure:- proof tree generated by forward
chaining
FIXED POINT:-
Notice that no new inferences are possible at this point because every sentence that
could be concluded by forward chaining is already contained explicitly in the KB. Such a
knowledge base is called a fixed point of the inference process.
6 BACKWARD CHAINING:-
These algorithm work background from the goal, chaining through rules to find
known facts that support the proof.
BACKWARD CHAINING ALGORITHM:-
Function FOL-BC-ASK (KB, goals, θ) returns a set of
substitutions
Inputs: KB, a knowledge
base
Goals, a list of conjuncts forming a query (θ already
applied) Θ, the current substitution, initially the empty
substitution {} Local variables: answer, a set of
substitutions, initially empty If goals are empty then
return {θ}
QSUBSET (θ, FIRST
(goals))
For each sentence r is KB where STANDARDIZE-APART(r)=(P1Ʌ….Ʌ
Pn=>q)
And θ1UNIFY
(q,q1)succeeds
New goals
[P1……Pn/REST(goals)]
AnswerFOL-BC-ASK(KB,var-goals, COMPOSE(θ1,θ)) U
answers
Return answer
WORKING:-
FOL-BC-ASK algorithm is called with a list of goals containing a single element,
the original query and returns the set of all substitutions satisfying the query.
The algorithm takes the first goal in the list and finds every clause in the knowledge
base whose positive literal or head unifies with the goal.

Each such clause creates a new recursive call in which the premise or body of the
clause is added to the goal stack.
TO PROVE THAT WEST IS A CRIMINAL THE FOLLOWING STEPS

ARE FOLLOWED:-
(i) The tree should be read depth first, left to right
(ii) Prove the form conjuncts below it to prove criminal (west).
(iii) Some of these are in the knowledge base, and others require further
backward chaining.
(iv) Binding for each successful unification is shown next to the corresponding sub
goal. (v) Thus, by the time FOL-BC-ASK gets to the last conjunct, originally Hostile
(z), z is already bound to Nono.
DISADVANTAGES:-
(i) Since it is clearly a depth – first search algorithm, its space requirements are linear
in size.
(ii) It suffers from problem with repeated states and incompleteness.
LOGIC PROGRAMMING:-
Logic programming is a technology that comes fairly close to representing that
systems should be constructed by expressing knowledge is a formal language.
EFFICIENT FORWARD CHAINING:-

The forward chaining algorithm designed for case of understanding rather than
for efficiency of operation. These are three possible success of complexity.
First the ‘inner loop’ of the algorithm involves finding all possible unifies such that
the premise of a rule unifier with a suitable set of facts in the knowledge base. This is often
called pattern matching and can be very expensive.
Second, the algorithm rechecks every rule on every iteration to see whether its
premises are satisfied, even if every few additions are made to the knowledge base on each
iteration. Finally, the algorithm might generate many facts that are irrelevant to the goal.
5. What is resolution? Explain it in detail.

Resolution interference rule.
RESOLUTION
Completeness
Theorem:-
For first order logic, showing that any entailed sentence has a finite proof.
Incompleteness Theorem:-
The theorem states that a logical system that includes the principle of induction –
without in which way a little of discrete mathematics can be constructed – is necessarily
incomplete. Conjunctive Normal Form for First Order Logic (CNF):-
That is, a conjunction of clauses, where each clause is a disjunction of literals – literals
can contain variables, which are assumed to be universally qualified. For
example, X American(x)Ʌweapon(y)Ʌsells(x,
y,z)ɅHostile(z)=>criminal(x)
Becomes, in CNF
A American(x)V weapon(y)Vsells(x, y, z)VHostile(z)Vcriminal(x)
Every sentence of first order logic can be converted into an inferentially equivalent CNF
sentence.
The procedure for conversion to CNF is very similar to the propositional case is given
below, We will illustrate the procedure by translating the sentence “Everyone who loves all
animals is loved by someone” or
Ʉx [Ʉy animal(y)=> lover(x,y)]=>[϶Y lover(y,x)]
THE STEPS ARE:-
Eliminate implications:-
Ʉx [Ʉy animal(y)=> lover(x,y)]=>[϶Y lover(y,x)]
More Forward:-
In addition to the usual rules for recapture connections, we need rules for regenerate
the quantifiers, thus we have,
┐Ʉxp becomes for all x ┐p
┐Ʉxp becomes for all x ┐p

Our sentence goes through the following transformation,
Ʉx[϶y┐(┐animal(y)ν lover(x,y))]ν[϶y lover(y,x)]
Ʉx[϶y┐┐animal(y)Ʌνlover(x,y))]ν[϶y lover(y,x)]
Ʉx[϶y animal(y)Ʌνlover(x,y))]ν[϶y lover(y,x)]
Standardize Variables:-
For sentence like (Ʉx p(x))ν(϶(x)(x))
Thus, we have,
Ʉx[϶y animal(y)Ʌ νlover(x,y))]ν[϶z lover(z,x)]
Skolemnize:- skolemization in the process of removing
Existential qualifiers by elimination.
In the simple case, it is just like the existential instant ion rule, translate ϶x p(x) into
p(a), where A is a new constant.
If we apply this rule, we get,
Ʉx[϶y animal(A)Ʌ νlover(x,A))]ν[϶z lover(B,x)]
Which has the wrong meaning entirely?
Thus we want the skolem entities to depend on x,
Ʉx[϶y animal(f(x))Ʌ νlover(y,f(x)))]ν[϶z lover(g(x),x)]
Here f & g are skolem function
The general rule is that the arguments of the skolem function are all the
universally quantified variables in where scope of the existential quantifier
appear.
Drop Universal Quantifier:-

At this point, all remaining variables must be universally quantified. Moreover,
the sentence is equivalent to one in which all the universal quantifiers have been moved to
the left. We can therefore drop the universal quantifies,
[Animal (f(x)Ʌ νlover(x,y))]^[┐ lover(x,f(x))]νlover(G(x),x)].
LEARNING HEURISTIC FROM EXPERIENCE:-
A heuristics function h(n) is supported to instance the cost of a solution beginning
from the state at node ‘n’.
Learning Solution can be Done In Two Ways:-

Relaxed problem can be devised for which an optical solution can be found
easily. Next is to learn from experience.
Inductive learning:-
Inductive learning algorithm can be used to constructed a function h(n) that can
predict solution cost for other states that arise during search.
The resolution inference rule:-
The resolution rule for first-order clauses is simply a lifted version of the
propositional resolution rule. Propositional literals are complementary if one is the
negation of the other. Thus, we have,
Lv…………………… vlk,
m,v………vmn
-------------------------------------------------------------------
----
SUBST(Q,l,V……vli-1vli+1…..vlk vm,v..vmj-1 vmj+1

v….vmn) Where UNIFY(li,┐mj)=Q.
Completeness of Resolution:-
This solution given competences proof of solution, the basic structure of proof is given below,
1. First, we observe that if ‘S’ is unsatisfiable, then there exists a particular set
of ground instances of the clauses of ‘S’ such that this set in who
unsatisfiable (heuristics them)
2. We then appeal to the ground resolution
theorem(OTTER).
3. We then are a lifting lemma
Structure of a competences proof for resolution

Any set of sentence S is repeatable in easel from
Assume S is unsatisfiable, and is clausal

Hubrands
them
Some set S’ of ground instance is unsatishiable
Ground resolution theorem

Resolution can find a contradiction in s’
Lifting
lemma
There is a resolution proof for the contradiction in S’

To carry out the first step, we need these new concepts:-
1. Herbrand universe
2.
Saturation
3. Her brand base
QUESTION BANK

UNIT – III
PART -A (2 Marks)
PLANNING
1. Define partial order planner.
Basic Idea
– Search in plan space and use least commitment, when possible
• Plan Space Search
– Search space is set of partial plans
– Plan is tuple <A, O, B>
• A: Set of actions, of the form (ai : Opj)
• O: Set of orderings, of the form (ai < aj)
• B: Set of bindings, of the form (vi = C), (vi ¹ C), (vi = vj) or (vi ¹ vj)
– Initial plan:
• <{start, finish}, {start < finish}, {}>
• start has no preconditions; Its effects are the initial state
• finish has no effects; Its preconditions are the goals
2. What are the differences and similarities between problem solving and
planning?
we put these two ideas together to build planning agents. At the most abstract level, the task of planning is the
same as problem solving. Planning can be viewed as a type of problem solving in which the agent uses beliefs
about actions and their consequences to search for a solution over the more abstract space of plans, rather than
over the space of situations
3. Define state-space search.
The most straightforward approach is to use state-space search. Because the
descriptions of actions in a planning problem specify both preconditions and effects, it is
possible to search in either direction: either forward from the initial state or backward from the
goal
4. What are the types of state-space search?
The types of state-space search are,
Forward state space search;
Backward state space search.
5.What is Partial-Order Planning?
A set of actions that make up the steps of the plan. These are taken from the set of
actions in the planning problem. The “empty” plan contains just the Start and Finish actions.
Start has no preconditions and has as its effect all the literals in the initial state of the planning
problem. Finish has no effects and has as its preconditions the goal literals of the planning
problem.
6. What are the advantages and disadvantages of Partial-Order Planning?
Advantage: Partial-order planning has a clear advantage in being
able to decompose problems into sub problems.
Disadvantage: Disadvantage is that it does not represent states
directly, so it is harder to estimate how far a partial-order plan is
from achieving a goal.
7. What is a Planning graph?
A Planning graph consists of a sequence of levels that correspond to time steps in
the plan where level 0 is the initial state. Each level contains a set of literals and a set of actions.
8. What is Conditional planning?
Conditional planning is also known as contingency planning, conditional planning
deals with incomplete information by constructing a conditional plan that accounts for each
possible situation or contingency that could arise
9. What is action monitoring?
The process of checking the preconditions of each action as it is executed, rather
than checking the preconditions of the entire remaining plan. This is called action monitoring.
10. Define planning.
Planning can be viewed as a type of problem solving in which the agent uses
beliefs about actions and their consequences to search for a solution.
11. List the features of an ideal planner?
The features of an ideal planner are,
The planner should be able to represent the states, goals and
actions;
The planner should be able to add new actions at any time;
The planner should be able to use Divide and Conquer method for
solving very big problems.
12. What are the components that are needed for representing an action?
The components that are needed for representing an action are,
Action description;
Precondition;
Effect.
13. What are the components that are needed for representing a plan?
The components that are needed for representing a plan are,
A set of plans steps;
A set of ordering constraints;
A set of variable binding constraints;
A set of casual link protection.

14. What are the different types of planning?
The different types of planning are,
Situation space planning;
Progressive planning;
Regressive planning;
Partial order planning;
Fully instantiated planning.
15. Define a solution.
A solution is defined as a plan that an agent can execute and that guarantees the
achievement of goal.
16. Define complete plan and consistent plan.
A complete plan is one in which every precondition of every step is achieved by
some other step.
A consistent plan is one in which there are no contradictions in the ordering or
binding constraints.
17. What are Forward state-space search and Backward state-space search?
Forward state-space search: It searches forward from the initial
situation to the goal situation.
Backward state-space search: It searches backward from the goal
situation to the initial situation.
18. What is Induction heuristics? What are the different types of induction heuristics?
Induction heuristics is a method, which enable procedures to learn descriptions
from positive and negative examples.
There are two different types of induction heuristics. They are:
Require-link heuristics.
Forbid-link heuristics.
19. Define Reification.
The process of treating something abstract and difficult to talk about as though it
were concrete and easy to talk about is called as reification.
20. What is reified link?
The elevation of a link to the status of a describable node is a kind of reification.
When a link is so elevated then it is said to be a reified link.
21. Define action monitoring.
The process of checking the preconditions of each action as it is executed, rather
than checking the preconditions of the entire remaining plan. This is called action monitoring.
22. What is meant by Execution monitoring?
Execution monitoring is related to conditional planning in the following way. An
agent that builds a plan and then executes it while watching for errors is, in a sense, taking into
account the possible conditions that constitute execution errors.
PART - B
1. Explain partial order planning.

SIMPLE PLANNING AGENT
The agent first generates a goal to achieve and then constructs aplan to achieve it from the
Current state
PROBLEMSOLVING TO PLANNING
Representation Using Problem Solving Approach
Forward search
Backward search
Heuristic search
Representation Using Planning Approach
STRIPS-standard research institute problem solver.
Representation for states and goals
Representation for plans
Situation space and plan space
Solutions
Why Planning ?
Intelligent agents must operate in the world. They are not simply passive reasoners (Knowledge
Representation, reasoning under uncertainty) or problem solvers (Search), t hey must also acton
the world.
We want intelligent agents to act in “intelligent ways”. Taking purposeful actions, predicting the
expected effect of such actions, composing actions together to achieve complex goals.
E.g. if we have a robot we want robot to decide what to do; how to act to achieve our goals
Planning Problem
How to change the world to suit our needs
Critical issue: we need to reason about what the world will be like after doing a few actions, not
just what it is like now
GOAL: Craig has coffee
CURRENTLY: robot in mailroom, has no coffee, coffee not made, Craig in office etc.
TO DO: goto lounge, make coffe
PARTIAL ORDER PLANNING Partial-Order Planning Algorithms
Partially Ordered Plan
• Plan
• Steps
• Ordering constraints
• Variable binding constraints
• Causal links
• POP Algorithm
• Make initial plan
• Loop until plan is a complete
– Select a subgoal
– Choose an operator
– Resolve threats
Choose Operator
• Choose operator(c, Sneeds)
• Choose a step S from the plan or a new step S by

instantiating an operator that has c as an effect
• If there’s no such step, Fail
• Add causal link S _c Sneeds
• Add ordering constraint S < Sneeds
• Add variable binding constraints if necessary
• Add S to steps if necessary
Nondeterministic choice
• Choose – pick one of the options arbitrarily
• Fail – go back to most recent non-deterministic choice and
try a different one that has not been tried before
Resolve Threats
• A step S threatens a causal link Si _c Sj iff ¬ c ∈
effects(S) and it’s possible that Si < S < Sj
• For each threat
• Choose
–Promote S : S < Si < Sj
–Demote S : Si < Sj < S
• If resulting plan is inconsistent, then Fail
Threats with Variables
If c has variables in it, things are kind of tricky.
• S is a threat if there is any instantiation of the
variables that makes ¬ c ∈ effects(S)
•We could possibly resolve the threat by adding a
negative variable binding constraint, saying that
two variables or a variable and a constant
cannot be bound to one another
• Another strategy is to ignore such threats until the very end, hoping that the variables will
become bound and make things easier to deal.
2. Discuss about planning graphs in detail.
Planning graphs for heuristic estimation;
The GRAPHPLAN algorithm;
Termination of GRAPHPLAN.
3. Explain planning with State-Space Search in detail.

LEARNING HEURISTIC FROM EXPERIENCE:-
A heuristics function h(n) is supported to instance the cost of a solution beginning
from the state at node ‘n’.
Learning Solution can be Done In Two Ways:-
Relaxed problem can be devised for which an optical solution can be found
easily. Next is to learn from experience.
Inductive learning:-
Inductive learning algorithm can be used to constructed a function h(n) that can
predict solution cost for other states that arise during search.
The resolution inference rule:-
The resolution rule for first-order clauses is simply a lifted version of the
propositional resolution rule. Propositional literals are complementary if one is the
negation of the other. Thus, we have,
Lv…………………… vlk,
m,v………vmn
-------------------------------------------------------------------
----
SUBST(Q,l,V……vli-1vli+1…..vlk vm,v..vmj-1 vmj+1

v….vmn) Where UNIFY(li,┐mj)=Q.
Completeness of Resolution:-
This solution given competences proof of solution, the basic structure of proof is given below,
1. First, we observe that if ‘S’ is unsatisfiable, then there exists a particular set
of ground instances of the clauses of ‘S’ such that this set in who
unsatisfiable (heuristics them)
2. We then appeal to the ground resolution
theorem(OTTER).
R. Loganathan, AP/CSE., Mahalakshmi Engineering College, Trichy
3. We then are a lifting lemma
Structure of a competences proof for resolution

Any set of sentence S is repeatable in easel from
Assume S is unsatisfiable, and is clausal

Hubrands
them
Some set S’ of ground instance is unsatishiable
Ground resolution theorem

Resolution can find a contradiction in s’
Lifting
lemma
There is a resolution proof for the contradiction in S’

To carry out the first step, we need these new concepts:-
1. Herbrand universe
2.
Saturation
3. Her brand base
LIFTING LEMMA:-
It lifts a proof step from ground clauses up to general first-order clauses. In order to
prove his basic lifting lemma, Robinson had to invent unification and derive all of the
properties of most general unifies.
Dealing with Equality:-
Ʉx x=x
Ʉx,y x=y=>y=x
Ʉx,y,z x=y^y=z=>x=z
Ʉx,y x=y=>(p1(x)p1(y))
Ʉx,y x=y=>(p2(x)p2(y))
Ʉwx,y w=y^x=z=>(f1(w,x)f1(y,z))
Ʉwx,y w=y^x=z=>(f2(w,x)f2(y,z))
Demodulation:-
For any terms x,y and z, where
UNIFY(x,z)=Q And mn[z] is a literal
containing ‘Z’
X=y,
m,v….vmn
[z]
-------------------------------------------
---
M,v…….vmn[SUBST(Q,
Y)]
Paramodulation:-
For any terms x,y and z, where
UNIFY(x,y)=Q,
L,v……vlm` vx=y, m,v……….vmn[z]
-------------------------------------------------------------------
Subst(q,L,V….VLK VM,V….VMN[Y])
Equational
Unification:-
Equation unification of this kind can be done with efficient algorithm designed for
the particular axioms need.
Resolution
Strategies:-
Unit
preference
Set of
support
Input
resolution
Substations
THEOREM PROVERS:-
We describe the theorem prover OTTER (Organized Techniques For thermo –
proving and Effective Research), the we must divided the knowledge into form
parts,
A set of clauses known as the set of support
A set of unable axioms
A set of equation known as rewrite or demodulations.
A set of parameters and clauses that define the control strategy.
SKETCH OF THE OTTER THEOREM POWER:-
Procedure OTTER (sos, unable)
Input: sos, a set of support –clauses defining the problem
Usable, background knowledge potentially relevant to the problem
repeat Clauses the lightest member of sos move clauses from sos to
unable PROCESS (INFER (clauses, unable),sos)
Until sos=[]or a salutation has been found
Function INFER (clauses, unable)reruns clauses resolve clauses with each membrane
of unable return the resulting clauses after applying FILTER
PROCEDURE PROCESS (clauses,

sos) For each clauses in clauses do
ClasuesSIMPLIFY (clauses)
Merge identical lituals
Discards clauses if it is a tautology
Sos [classes has no literals their a refutation has been found
If classes have one litieral then look for unit refutation
Extending Prolog:-
An alternatives way to builds a thermo power is to start with a prolog
complier and extend it to get a sound & complete
Theorem Provers As Assistants:-
proof –checker
socratic reasoned
(1) What is backward chaining ? Explain with an example.

Forward chaining applies a set of rules and facts to deduce whatever conclusions can be derived.
In backward chaining ,we start from a conclusion,which is the hypothesis we wish to prove,and we
aim to show how that conclusion can be reached from the rules and facts in the data base.
The conclusion we are aiming to prove is called a goal ,and the reasoning in this way is known as
goal-driven.
Backward chaining example
Fig : Proof tree constructed by backward chaining to prove that West is criminal.
Note:
(a) To prove Criminal(West) ,we have to prove four conjuncts below it.
(b) Some of which are in knowledge base,and others require further backward chaining.
(2) Explain conjunctive normal form for first-order logic with an example.
very sentence of first-order logic can be converted into an inferentially equivalent CNF
sentence. In particular, the CNF sentence will be unsatisfiable just when the original sentence
is unsatisfiable, so we have a basis for doing proofs by contradiction on the CNF sentences.
Here we have to eliminate existential quantifiers. We will illustrate the procedure by translating the
sentence "Everyone who loves all animals is loved by someone," or
Ontology refers to organizing every thing in the world into hierarch of categories.
Representing the abastract concepts such as Actions,Time,Physical Objects,and Beliefs is called

Ontological Engineering.
How categories are useful in Knowledge representation?

CATEGORIES AND OBJECTS
The organization of objects into categories is a vital part of knowledge representation. Although
interaction with the world takes place at the level of individual objects, much reasoning
takes place at the level of categories.
What is taxonomy?
Subclass relations organize categories into a taxonomy, or taxonomic hierarchy. Taxonomies
have been used explicitly for centuries in technical fields. For example, systematic
biology aims to provide a taxonomy of all living and extinct species; library science has
developed a taxonomy of all fields of knowledge, encoded as the Dewey Decimal system;
and
tax authorities and other government departments have developed extensive taxoriornies of
occupations and commercial products. Taxonomies are also an important aspect of general
commonsense knowledge.
First-order logic makes it easy to state facts about categories, either by relating objects
to categories or by quantifying over their members:
What is physical composition?
Explain the Ontology of Situation calculus.

Situations are logical terms consisting of the initial situation (usually called So) and
all situations that are generated by applying an action to a situation. The function
Result(a, s) (sometimes called Do) names the situation that results when action a is
executed in situation s. Figure 10.2 illustrates this idea.
Fluents are functions and predicates that vary from one situation to the next, such as
the location of the agent or the aliveness of the wumpus. The dictionary says a fluent
is something that fllows, like a liquid. In this use, it means flowing or changing across
situations. By convention, the situation is always the last argument of a fluent. For
example, lHoldzng(G1, So) says that the agent is not holding the gold GI in the initial
situation So. Age( Wumpus, So) refers to the wumpus's age in So.
Atemporal or eternal predicates and functions are also allowed. Examples include the
predicate Gold (GI) and the function LeftLeg Of ( Wumpus).
(3) What is event calculus?

Time and event calculus
Situation calculus works well when there is a single agent performing instantaneous, discrete
actions. When actions have duration and can overlap with each other, situation calculus
becomes somewhat awkward. Therefore, we will cover those topics with an alternative for-
EVENTCALCULUS malism known as event calculus, which is based on points in time
rather than on situations.
(The terms "event7' and "action" may be used interchangeably. Informally, "event" connotes
a wider class of actions, including ones with no explicit agent. These are easier to handle in
event calculus than in situation calculus.)

In event calculus, fluents hold at points in time rather than at situations, and the calculus
is designed to allow reasoning over intervals of time. The event calculus axiom says that a
fluent is true at a point in time if the fluent was initiated by an event at some time in the past
and was not terminated by an intervening event. The Initiates and Terminates relations
play a role similar to the Result relation in situation calculus; Initiates(e, f , t) means that
the occurrence of event e at time t causes fluent f to become true, while Terminates (w , f, t)
means that f ceases to be true. We use Happens(e, t) to mean that event e happens at time t,
(4) What are semantic networks?

(5) Semantic networks are capable of representing individual objects,categories of
objects,and relation among objects. Objects or Ctegory names are represented in ovals
and are connected by labeled arcs.
Semantic network example
QUESTION BANK

UNIT – IV
PART -A (2 Marks)
UNCERTAIN KNOWLEDGE AND REASONING
1. List down two applications of temporal probabilistic models.

A suitable way to deal with this problem is to identify a temporal causal model that may
effectively explain the patterns observed in the data. Here we will concentrate on
probabilistic models that provide a convenient framework to represent and manage
underspecified information; in particular, we will consider the class of Causal Probabilistic
Networks (CPN).
2. Define Dempster-Shafer theory.

The Dempster–Shafer theory (DST) is a mathematical theory of evidence. It allows one to
combine evidence from different sources and arrive at a degree of belief (represented by a
belief function) that takes into account all the available evidence. The theory was first
developed by Arthur P. Dempster and Glenn Shafer
3. Define Uncertainty.
Uncertainty means that many of the simplifications that are possible with deductive
inference are no longer valid.
4. State the reason why first order, logic fails to cope with that the mind like medical
diagnosis.
Three reasons:
Laziness: It is hard to lift complete set of antecedents of consequence,
needed to ensure and exception less rule.
Theoretical Ignorance: Medical science has no complete theory for the
domain.
Practical ignorance: Even if we know all the rules, we may be uncertain
about a particular item needed.
5. What is the need for probability theory in uncertainty?

Probability provides the way of summarizing the uncertainty that comes from our
laziness and ignorance. Probability statements do not have quite the same kind of semantics
known as evidences.
6. What is the need for utility theory in uncertainty?
Utility theory says that every state has a degree of usefulness, or utility to In agent,
and that the agent will prefer states with higher utility. The use utility theory to represent
and
reason with preferences.
7. What Is Called As Decision Theory?
Preferences As Expressed by Utilities Are Combined with Probabilities in the
General Theory of Rational Decisions Called Decision Theory. Decision Theory =
Probability
Theory + Utility Theory.
8. Define conditional probability?
Once the agents has obtained some evidence concerning the previously unknown
propositions making up the domain conditional or posterior probabilities with the notation
p(A/B) is used. This is important that p(A/B) can only be used when all be is known.
9. When probability distribution is used?

If we want to have probabilities of all the possible values of a random variable
probability distribution is used.
Eg:
P(weather) = (0.7,0.2,0.08,0.02). This type of notations simplifies many equations.
10. What is an atomic event?
An atomic event is an assignment of particular values to all variables, in other
words, the complete specifications of the state of domain.
11. Define joint probability distribution.
Joint probability distribution completely specifies an agent's probability
assignments to all propositions in the domain. The joint probability distribution p(x1,x2,----
----
xn) assigns probabilities to all possible atomic events; where x1,x2------xn=variables.
12. What is meant by belief network?

A belief network is a graph in which the following holds
A set of random variables
A set of directive links or arrows connects pairs of nodes.
The conditional probability table for each node
The graph has no directed cycles.
13. What are called as Poly trees?

The algorithm that works only on singly connected networks known as
Poly trees. Here at most one undirected path between any two nodes is present.
14. What is a multiple connected graph?

A multiple connected graph is one in which two nodes are connected by more than
one path.
15. List the three basic classes of algorithms for evaluating multiply connected graphs.
The three basic classes of algorithms for evaluating multiply connected graphs
Clustering methods;
Conditioning methods;
Stochastic simulation methods.
16. What is called as principle of Maximum Expected Utility (MEU)?
The basic idea is that an agent is rational if and only if it chooses the action that
yields the highest expected utility, averaged over all the possible outcomes of the action.
This is
known as MEU
17. What is meant by deterministic nodes?

A deterministic node has its value specified exactly by the values of its parents, with
no uncertainty.
18. What are all the uses of a belief network?

The uses of a belief network are,
Making decisions based on probabilities in the network and on the
agent's utilities;
Deciding which additional evidence variables should be observed in
order to gain useful information;
Performing sensitivity analysis to understand which aspects of the
model have the greatest impact on the probabilities of the query
variables (and therefore must be accurate);
Explaining the results of probabilistic inference to the user.
19. What is called as Markov Decision problem?

The problem of calculating an optimal policy in an accessible, stochastic
environment with a known transition model is called a Markov Decision Problem(MDP).
20. Define Dynamic Belief Network.

A Belief network with one node for each state and sensor variable for each time
step is called a Dynamic Belief Network.(DBN).
21. Define Dynamic Decision Network?

A decision network is obtained by adding utility nodes, decision nodes for action in
DBN. DDN calculates the expected utility of each decision sequence.
PART - B
1. Explain about Probabilistic Reasoning.
• The students should understand the role of uncertainty in knowledge representation

• Students should learn the use of probability theory to represent uncertainty
• Students should understand the basic of probability theory, including
o Probability distributions
o Joint probability
o Marginal probability
o Conditional probability
o Independence
o Conditional independence
• Should learn inference mechanisms in probability theory including
o Bayes rule
o Product rule
• Should be able to convert natural language statements into probabilistic statements
and apply inference rules
• Students should understand Bayesian networks as a data structure to represent
conditional independence
• Should understand the syntax and semantics of Bayes net
• Should understand inferencing mechanisms in Bayes net
• Should understand efficient inferencing techniques like variable ordering
• Should understand the concept of d-separation
• Should understand inference mechanism for the special case of polytrees
• Students should have idea about approximate inference techniques in Bayesian
networks
At the end of this lesson the student should be able to do the following:
• Represent a problem in terms of probabilistic statemenst
• Apply Bayes rule and product rule for inferencing
• Represent a problem using Bayes net
• Perform probabilistic inferencing using Bayes net.
Probabilistic Reasoning
Using logic to represent and reason we can represent knowledge about the world with
facts and rules, like the following ones:
bird(tweety).
fly(X) :- bird(X).
We can also use a theorem-prover to reason about the world and deduct new facts about
the world, for e.g.,
?- fly(tweety).
Yes
However, this often does not work outside of toy domains - non-tautologous certain
rules are hard to find.
A way to handle knowledge representation in real problems is to extend logic by using

certainty factors.
In other words, replace
IF condition THEN fact

with
IF condition with certainty x THEN fact with certainty f(x)
Unfortunately cannot really adapt logical inference to probabilistic inference, since the
latter is not context-free.
Replacing rules with conditional probabilities makes inferencing simpler.
Replace
smoking -> lung cancer
or
lotsofconditions, smoking -> lung cancer
with
P(lung cancer | smoking) = 0.6
Uncertainty is represented explicitly and quantitatively within probability theory, a

formalism that has been developed over centuries.
A probabilistic model describes the world in terms of a set S of possible states - the
sample space. We don’t know the true state of the world, so we (somehow) come up with
a probability distribution over S which gives the probability of any state being the true
one. The world usually described by a set of variables or attributes.
Consider the probabilistic model of a fictitious medical expert system. The ‘world’ is
described by 8 binary valued variables:
Visit to Asia? A
Tuberculosis? T
Either tub. or lung cancer? E
Lung cancer? L
Smoking? S
Bronchitis? B
Dyspnoea? D
Positive X-ray? X
We have 28 = 256 possible states or configurations and so 256 probabilities to find.
2.Explain about Review of Probability Theory.
The primitives in probabilistic reasoning are random variables. Just like primitives in
Propositional Logic are propositions. A random variable is not in fact a variable, but a
function from a sample space S to another space, often the real numbers.
For example, let the random variable Sum (representing outcome of two die throws) be
defined thus:
Sum(die1, die2) = die1 +die2
Each random variable has an associated probability distribution determined by the

underlying distribution on the sample space
Continuing our example : P(Sum = 2) = 1/36,

P(Sum = 3) = 2/36, . . . , P(Sum = 12) = 1/36
Consdier the probabilistic model of the fictitious medical expert system mentioned
before. The sample space is described by 8 binary valued variables.
Visit to Asia? A
Tuberculosis? T
Either tub. or lung cancer? E
Lung cancer? L
Smoking? S
Bronchitis? B
Dyspnoea? D
Positive X-ray? X
There are 28 = 256 events in the sample space. Each event is determined by a joint
instantiation of all of the variables.
S = {(A = f, T = f,E = f,L = f, S = f,B = f,D = f,X = f),

(A = f, T = f,E = f,L = f, S = f,B = f,D = f,X = t), . . .
(A = t, T = t,E = t,L = t, S = t,B = t,D = t,X = t)}
Since S is defined in terms of joint instantations, any distribution defined on it is called a

joint distribution. ll underlying distributions will be joint distributions in this module. The
variables {A,T,E, L,S,B,D,X} are in fact random variables, which ‘project’ values.
L(A = f, T = f,E = f,L = f, S = f,B = f,D = f,X = f) = f

L(A = f, T = f,E = f,L = f, S = f,B = f,D = f,X = t) = f
L(A = t, T = t,E = t,L = t, S = t,B = t,D = t,X = t) = t
Each of the random variables {A,T,E,L,S,B,D,X} has its own distribution, determined by
the underlying joint distribution. This is known as the margin distribution. For example,
the distribution for L is denoted P(L), and this distribution is defined by the two
probabilities P(L = f) and P(L = t). For example,
P(L = f)
= P(A = f, T = f,E = f,L = f, S = f,B = f,D = f,X = f)
+ P(A = f, T = f,E = f,L = f, S = f,B = f,D = f,X = t)
+ P(A = f, T = f,E = f,L = f, S = f,B = f,D = t,X = f)
...
P(A = t, T = t,E = t,L = f, S = t,B = t,D = t,X = t)
P(L) is an example of a marginal distribution.
Here’s a joint distribution over two binary value variables A and B
We get the marginal distribution over B by simply adding up the different possible values
of A for any value of B (and put the result in the “margin”).
In general, given a joint distribution over a set of variables, we can get the marginal
distribution over a subset by simply summing out those variables not in the subset.
In the medical expert system case, we can get the marginal distribution over, say, A,D by
simply summing out the other variables:
However, computing marginals is not an easy task always. For example,

P(A = t,D = f)
= P(A = t, T = f,E = f,L = f, S = f,B = f,D = f,X = f)
+ P(A = t, T = f,E = f,L = f, S = f,B = f,D = f,X = t)
+ P(A = t, T = f,E = f,L = f, S = f,B = t,D = f,X = f)
+ P(A = t, T = f,E = f,L = f, S = f,B = t,D = f,X = t)
...
P(A = t, T = t,E = t,L = t, S = t,B = t,D = f,X = t)
This has 64 summands! Each of whose value needs to be estimated from empirical data.
For the estimates to be of good quality, each of the instances that appear in the summands
should appear sufficiently large number of times in the empirical data. Often such a large
amount of data is not available.
However, computation can be simplified for certain special but common conditions. This
is the condition of independence of variables.
Two random variables A and B are independent iff
P(A,B) = P(A)P(B)
i.e. can get the joint from the marginals
This is quite a strong statement: It means for any value x of A and any value y of B
P(A = x,B = y) = P(A = x)P(B = y)
Note that the independence of two random variables is a property of a the underlying
probability distribution. We can have
Conditional probability is defined as:
It means for any value x of A and any value y of B
If A and B are independent then
Conditional probabilities can represent causal relationships in both directions.
From cause to (probable) effects
From effect to (probable) cause
3.Explain about Probabilistic Inference Rules.

Two rules in probability theory are important for inferencing, namely, the product rule
and the Bayes' rule.
Here is a simple example, of application of Bayes' rule.
Suppose you have been tested positive for a disease; what is the probability that you
actually have the disease?
It depends on the accuracy and sensitivity of the test, and on the background (prior)
probability of the disease.
Let P(Test=+ve | Disease=true) = 0.95, so the false negative rate,

P(Test=-ve | Disease=true), is 5%.
Let P(Test=+ve | Disease=false) = 0.05, so the false positive rate is also 5%.
Suppose the disease is rare: P(Disease=true) = 0.01 (1%).
Let D denote Disease and "T=+ve" denote the positive Tes.
Then,
P(T=+ve|D=true) * P(D=true)
P(D=true|T=+ve) = ------------------------------------------------------------
P(T=+ve|D=true) * P(D=true)+ P(T=+ve|D=false) * P(D=false)
0.95 * 0.01
= -------------------------------- = 0.161
0.95*0.01 + 0.05*0.99
So the probability of having the disease given that you tested positive is just 16%. This
seems too low, but here is an intuitive argument to support it. Of 100 people, we expect
only 1 to have the disease, but we expect about 5% of those (5 people) to test positive. So
of the 6 people who test positive, we only expect 1 of them to actually have the disease;
and indeed 1/6 is approximately 0.16.
In other words, the reason the number is so small is that you believed that this is a rare
disease; the test has made it 16 times more likely you have the disease, but it is still
unlikely in absolute terms. If you want to be "objective", you can set the prior to uniform
(i.e. effectively ignore the prior), and then get
P(T=+ve|D=true) * P(D=true)
P(D=true|T=+ve) = ------------------------------------------------------------
P(T=+ve)
0.95 * 0.5 0.475

= -------------------------- = ------- = 0.95
0.95*0.5 + 0.05*0.5 0.5
This, of course, is just the true positive rate of the test. However, this conclusion relies on
your belief that, if you did not conduct the test, half the people in the world have the
disease, which does not seem reasonable.
A better approach is to use a plausible prior (eg P(D=true)=0.01), but then conduct
multiple independent tests; if they all show up positive, then the posterior will increase.
For example, if we conduct two (conditionally independent) tests T1, T2 with the same
reliability, and they are both positive, we get
P(T1=+ve|D=true) * P(T2=+ve|D=true) * P(D=true)

P(D=true|T1=+ve,T2=+ve) = ------------------------------------------------------------
P(T1=+ve, T2=+ve)
0.95 * 0.95 * 0.01 0.009

= ----------------------------- = ------- = 0.7826
0.95*0.95*0.01 + 0.05*0.05*0.99 0.0115
The assumption that the pieces of evidence are conditionally independent is called the
naive Bayes assumption. This model has been successfully used for mainly application
including classifying email as spam (D=true) or not (D=false) given the presence of
various key words (Ti=+ve if word i is in the text, else Ti=-ve). It is clear that the words
are not independent, even conditioned on spam/not-spam, but the model works
surprisingly well nonetheless.
In many problems, complete independence of variables do not exist. Though many of

them are conditionally independent.
X and Y are conditionally independent given Z iff
In full: X and Y are conditionally independent given Z iff for any instantiation x, y, z of
X, Y,Z we have
An example of conditional independence:
Consider the following three Boolean random variables:
LeaveBy8, GetTrain, OnTime
Suppose we can assume that:
P(OnTime | GetTrain, LeaveBy8) = P(OnTime | GetTrain)
but NOT P(OnTime | LeaveBy8) = P(OnTime)
Then, OnTime is dependent on LeaveBy8, but independent of LeaveBy8 given GetTrain.
We can represent P(OnTime | GetTrain, LeaveBy8) = P(OnTime | GetTrain)
graphically by: LeaveBy8 -> GetTrain -> OnTime
Inferencing in Bayesian Networks

10.5.5.1 Exact Inference
The basic inference problem in BNs is described as follows:
Given
1. A Bayesian network BN
2. Evidence e - an instantiation of some of the variables in BN (e can be empty)
3. A query variable Q
Compute P(Q|e) - the (marginal) conditional distribution over Q

Given what we do know, compute distribution over what we do not. Four categories of
inferencing tasks are usually encountered.

1. Diagnostic Inferences (from effects to causes)
Given that John calls, what is the probability of burglary? i.e. Find P(B|J)
2. Causal Inferences (from causes to effects)
Given Burglary, what is the probability that
John calls, i.e. P(J|B)
Mary calls, i.e. P(M|B)
3. Intercausal Inferences (between causes of a common event)
Given alarm, what is the probability of burglary? i.e. P(B|A)
Now given Earthquake, what is the probability of burglary? i.e. P(B|A E)
4. Mixed Inferences (some causes and some effects known)
Given John calls and no Earth quake, what is the probability of Alarm, i.e.
P(A|J,~E)
We will demonstrate below the inferencing procedure for BNs. As an example consider
the following linear BN without any apriori evidence.
Consider computing all the marginals (with no evidence). P(A) is given, and
We don't need any conditional independence assumption for this.
For example, suppose A, B are binary then we have
Now,
P(B) (the marginal distribution over B) was not given originally. . . but we just computed
it in the last step, so we’re OK (assuming we remembered to store P(B) somewhere).
If C were not independent of A given B, we would have a CPT for P(C|A,B) not
P(C|B).Note that we had to wait for P(B) before P(C) was calculable.
If each node has k values, and the chain has n nodes this algorithm has complexity
O(nk2). Summing over the joint has complexity O(kn).
Complexity can be reduced by more efficient summation by “pushing sums into

products”.
Dynamic programming may also be used for the problem of exact inferencing in the
above Bayes Net. The steps are as follows:
1. We first compute
2. f1(B) is a function representable by a table of numbers, one for each possible value of
B.
3. Here,
4. We then use f1(B) to calculate f2(C) by summation over B
This method of solving a problem (ie finding P(D)) by solving subproblems and storing
the results is characteristic of dynamic programming.
The above methodology may be generalized. We eliminated variables starting from the
root, but we dont have to. We might have also done the following computation.
The following points are to be noted about the above algorithm. The algorithm computes
intermediate results which are not individual probabilities, but entire tables such as
f1(C,E). It so happens that f1(C,E) = P(E|C) but we will see examples where the
intermediate tables do not represent probability distributions.
Dealing with Evidence

Dealing with evidence is easy. Suppose {A,B,C,D,E} are all binary and we want
P(C|A = t,E = t). Computing P(C,A = t,E = t) is enough—it’s a table of numbers, one for
each value of C. We need to just renormalise it so that they add up to 1.
It was noticed from the above computation that conditional distributions are basically just
normalised marginal distributions. Hence, the algorithms we study are only concerned
with computing marginals. Getting the actual conditional probability values is a trivial
“tidying-up” last step.
Now let us concentrate on computing
It can be done by plugging in the observed values for A and E and summing out B and D.
We don’t really care about P(A = t), since it will cancel out.
Now let us see how evidence-induce independence can be exploited. Consider the
following computation.
Since,
Clever variable elimination would jump straight to (5). Choosing an optimal order of
variable elimination leads to a large amount of computational sving. However, finding
the optimal order is a hard problem.
10.5.5.1.1 Variable Elimination

For a Bayes net, we can sometimes use the factored representation of the joint probability
distribution to do marginalization efficiently. The key idea is to "push sums in" as far as
possible when summing (marginalizing) out irrelevant terms, e.g., for the water sprinkler
network
Notice that, as we perform the innermost sums, we create new terms, which need to be
summed over in turn e.g.,
where,
Continuing this way,
where,
In a nutshell, the variable elimination procedure repeats the following steps.
1. Pick a variable Xi
2. Multiply all expressions involving that variable, resulting in an expression f over a

number of variables (including Xi)
3. Sum out Xi, i.e. compute and store
For the multiplication, we must compute a number for each joint instantiation of all
variables in f, so complexity is exponential in the largest number of variables
participating in one of these multiplicative subexpressions.
If we wish to compute several marginals at the same time, we can use Dynamic
Programming to avoid the redundant computation that would be involved if we used
variable elimination repeatedly.
Exact inferencing in a general Bayes net is a hard problem. However, for networks with
some special topologies efficient solutions inferencing techniques. We discuss one such
technque for a class of networks called Poly-trees.
Inferencing in Poly-Trees
A poly-tree is a graph where there is at most one undirected path between any two pair of
nodes. The inferencing problem in poly-trees may be stated as follows.
U: U1 … Um, parents of node X
Y: Y1 … Yn, children of node Xwww.Vidyarthiplus.com

X: Query variable
E: Evidence variables (whose truth values are known)
Objective: compute P(X | E)
E+X is the set of causal support for X comprising of the variables above X connected
through its parents, which are known.
E-X is the set of evidential support for X comprising of variables below X connected
through its children.
In order to compute P(X | E) we have
P(X|E) = P(X|EX+,EX-)
P(EX-|X,EX+ )P(X|EX+ )
= -------------------------------
P(EX-|EX+ )
Since X d-separates EX+ from EX- we can simplify the numerator as
P(X|E) = α P(EX-|X)P(X|EX+ )
where 1/α is the constant representing denominator.

conditional independence relations. If the parents are known, X is conditionally
independent from all other nodes in the Causal support set. Similarly, given the children,
X is independent from all other variables in the evidential support set.
Approximate Inferencing in Bayesian Networks

Many real models of interest, have large number of nodes, which makes exact inference
very slow. Exact inference is NP-hard in the worst case.) We must therefore resort to
approximation techniques. Unfortunately, approximate inference is #P-hard, but we can
nonetheless come up with approximations which often work well in practice. Below is a
list of the major techniques.
Variational methods. The simplest example is the mean-field approximation, which

exploits the law of large numbers to approximate large sums of random variables by their
means. In particular, we essentially decouple all the nodes, and introduce a new
parameter, called a variational parameter, for each node, and iteratively update these
parameters so as to minimize the cross-entropy (KL distance) between the approximate
and true probability distributions. Updating the variational parameters becomes a proxy
for inference. The mean-field approximation produces a lower bound on the likelihood.
More sophisticated methods are possible, which give tighter lower (and upper) bounds.
Chain (MCMC), and includes as special cases Gibbs sampling and the Metropolis- Hasting
algorithm.
Bounded cutset conditioning. By instantiating subsets of the variables, we can break loops
in the graph. Unfortunately, when the cutset is large, this is very slow. By instantiating only a
subset of values of the cutset, we can compute lower bounds on the probabilities of interest.
Alternatively, we can sample the cutsets jointly, a technique known as block Gibbs sampling.
Parametric approximation methods. These express the intermediate summands in a simpler

form, e.g., by approximating them as a product of smaller factors. "Minibuckets" and the
Boyen-Koller algorithm fall into this category

QUESTION BANK

UNIT – V
PART -A (2 Marks)
LEARNING
1. Explain the concept of learning from example.
Each person will interpret a piece of information according to their level of understanding and their
own way of interpreting things.
2. What is meant by learning?

Learning is a goal-directed process of a system that improves the knowledge or the
Knowledge representation of the system by exploring experience and prior knowledge.
3. How statistical learning method differs from reinforcement learning method?

Reinforcement learning is learning what to do--how to map situations to actions--so as to maximize
a numerical reward signal. The learner is not told which actions to take, as in most forms of machine
learning, but instead must discover which actions yield the most reward by trying them. In the most
interesting and challenging cases, actions may affect not only the immediate reward but also the
next situation and, through that, all subsequent rewards. These two characteristics--trial-and-error
search and delayed reward--are the two most important distinguishing features of reinforcement
learning.
4. Define informational equivalence and computational equivalence.

A transformation from on representation to another causes no loss of information;
they can be constructed from each other.
The same information and the same inferences are achieved with the same amount
of effort.
5. Define knowledge acquisition and skill refinement.

knowledge acquisition (example: learning physics)—learning new
symbolic information coupled with the ability to apply that information in
an effective manner
skill refinement (example: riding a bicycle, playing the piano)—occurs at
a subconscious level by virtue of repeated practice
6. What is Explanation-Based Learning?

The background knowledge is sufficient to explain the hypothesis of Explanation-
Based Learning. The agent does not learn anything factually new from the instance. It extracts
general rules from single examples by explaining the examples and generalizing the explanation.
7. Define Knowledge-Based Inductive Learning.

Knowledge-Based Inductive Learning finds inductive hypotheses that explain set
of observations with the help of background knowledge.
8. What is truth preserving?

An inference algorithm that derives only entailed sentences is called sound or truth
preserving.
9. Define Inductive learning. How the performance of inductive learning algorithms can be
measured?
Learning a function from examples of its inputs and outputs is called inductive
learning.
It is measured by their learning curve, which shows the prediction accuracy as a
function of the number of observed examples.
10. List the advantages of Decision Trees

The advantages of Decision Trees are,
It is one of the simplest and successful forms of learning algorithm.
It serves as a good introduction to the area of inductive learning and is
easy to implement.
11. What is the function of Decision Trees?

A decision tree takes as input an object or situation by a set of properties, and
outputs a yes/no decision. Decision tree represents Boolean functions.
12. List some of the practical uses of decision tree learning.

Some of the practical uses of decision tree learning are,
Designing oil platform equipment
Learning to fly
13.What is the task of reinforcement learning?

The task of reinforcement learning is to use rewards to learn a successful agent
function.
14. Define Passive learner and Active learner.

A passive learner watches the world going by, and tries to learn the utility of being
in various states.
An active learner acts using the learned information, and can use its problem
generator to suggest explorations of unknown portions of the environment.

15. State the factors that play a role in the design of a learning system.
The factors that play a role in the design of a learning system are,
Learning element
Performance element
Critic
Problem generator
16. What is memorization?
Memorization is used to speed up programs by saving the results of computation.
The basic idea is to accumulate a database of input/output pairs; when the function is called, it
first checks the database to see if it can avoid solving the problem from scratch.
17. Define Q-Learning.

The agent learns an action-value function giving the expected utility of taking a
given action in a given state. This is called Q-Learning.
18. Define supervised learning & unsupervised learning.

Any situation in which both inputs and outputs of a component can be perceived is
called supervised learning.
Learning when there is no hint at all about the correct outputs is called
unsupervised learning.
19. Define Bayesian learning.

Bayesian learning simply calculates the probability of each hypothesis, given the
data, and makes predictions on that basis. That is, the predictions are made by using all the
hypotheses, weighted by their probabilities, rather than by using just a single “best” hypothesis.
20. What is utility-based agent?
A utility-based agent learns a utility function on states and uses it to select actions
that maximize the expected outcome utility.
21. What is reinforcement learning?

Reinforcement learning refers to a class of problems in machine learning which
postulate an agent exploring an environment in which the agent perceives its current state and
takes actions. The environment, in return, provides a reward (which can be positive or negative).
Reinforcement learning algorithms attempt to find a policy for maximizing cumulative reward
for the agent over the curse of the problem.
22. What is the important task of reinforcement learning?

The important task of reinforcement learning is to use rewards to learn a successful
agent function.

PART - B
1. Explain about Learning From Observations.

FORMS OF LEARNING:
A learning agent can be thought of as containing a ‘Performance element’ that decides

what actions to take and a ‘learning element’ that modifies the performance element so that it
makes better decisions.
The design of a learning element is affected by three major issues:-
Which ‘components’ of the performance element are to be learned?
What ‘feedback’ is available to learn these components?
What ‘reputation’ is used for the components?
The components of the agents include the following:-

A direct mapping from conditions on the current state to actions.
A means to infer relevant properties of the world from the percept sequence.
Information about the way the world evolves and about the results of possible actions the
agent can take.
Utility information indicating the desirability of world status.
Action-value information indicating the desirability of actions.
Goals that describe classes of states where achievement maximizing the agents utility.
Each of the components can be learned from appropriate feedback. Consider, for
example, an agent training to become a taxi driver. Every time the instructor shouts “Break!” the
agent can learn a condition-action rule for when to breaks (component 1).
By seeing many camera image that it is told contain human, it can learn to recognize
them.
By trying actions and observing the result-for example, breaking hard on a wet road-it
can learn the effects of the action.
Then, when it receives no tip from passengers who have been thoroughly shacks up
during the trip, it can learn a useful component of it’s overall utility function.
The type of feedback available for learning is usually the most important factor is
determining the nature of the learning problem that the agent faces.

The field of machine learning usually distinguishes three cases supervised,

unsupervised and reinforcement learning.
The problem of supervised learning involves learning a function from examples of it’s
inputs and outputs.
Cases (1), (2) and (3) are all instances of supervised learning problems.
In (1), the agent learns condition-action rule for breaking- this is a function from states to
a Boolean output (to breaks or not to breaks).
In (2), the agent learns a function from images to a Boolean output(whether the images
contain a bus).
In (3), the theory of breaking is a function from status and breaking action to say,
stopping distance is feet.
The problem of ‘unsupervised learning’ involves learning pattern is the input when no
specific output values are supplied.
For example, a taxi agent might gradually develop a concept of “good traffic days” and
“bad traffic days” without ever being given labeled examples of each.
A purely unsupervised learning agent cannot learn what to do, because it has no
information on to what constitutes a current action on a desirable state.
The problem of ‘reinforcement learning’ is the most general of three categories. Rather
than being told what to do by a teacher, a reinforcement learning agent must learn from
reinforcement. (Reinforcement learning typically includes the sub problem of learning
how the environment works)
The representation of the learned information also plays a way important role is
determining how the learning algorithm must work
The last major factor is the design of learning system is the availability of prior
knowledge. The majority of learning reach is AI, computer science, and psychology has
studied the case is which the agent begins with no knowledge at all about what it is trying
to learn.
Inductive learning:
An algorithm for deterministic supervised learning is given as input) the current value of
the unknown function for particular inputs and (must try to become the unknown function
or some-thing close to it
We say that an example is a pair (x,f(x)), where x is the input and f(x) is the output of the
function applied to x.
The task of pure inductive inference (or induction) is this,
Given a collection of examples of ‘f’, return a function ‘h’ that approximates ‘f’.
The function ‘h’ is called a hypothesis. The reason that learning is difficult, from a
conceptual point of view, is that it is not easy to tell whether any particular ‘h’ is a good
approximation of ‘f’ (A good hypothesis will generalize well-that is, will predict example
correctly. This is the fundamental problem induction)
LEARNING FROM DECISION TREES:

Decision tree induction is one of the simplest and yet most successful forms of
learning algorithm.
Decision tree as performance elements:-
A decision tree takes as input or object or situation described by a set of attributes and
returns a ‘decision,-the predicted output value for the input.
The input attributes can be discrete or continuous. For now, a we assume discrete inputs.
The output values can also be discrete or continuous; learning a discrete-valued function
is called regression.
We will concentrate a Boolean classification, where in each example is classified as true
(positive) or false (negative).
A decision tree reaches its decision by performing a sequence of tests. Each internal node
is the tree compounds to a test of the values of one of the properties, and the branches
from the node are labeled with the possible values of the cost.
Each leaf node in the tree specifies the values to be returned if that leaf is reached.

Example:-
The problem of whether to wait for a table at a restaurant. The aim here is to learn a
definition for the goal predicate ‘will wait’.
Patrons
None Some Full
No Yes Wait Estimate
>60 30-60 10-30 0-10
No Alternate Hungry Yes
Reservation? Fri/sat Yes Alternate?
No Yes No Yes No Yes
No Yes No
Bar? Yes Raining?
No Yes No Yes
No
Decision tree
We will see how to automate this task, for now, let’s suppose are decide on the following list of
attributes:
1. Alternate: Whether there is a suitable alternative restaurant nearly.
2. Bar: Whether the restaurant has a comfortable bar area to wait is.
3. Fri/Sat: True on Fridays & Saturdays.

4. Hungry: Whether we are hungry.

5. Patrons: How many people are in the restaurant (Values are Name, Some &Full).
6. Price: The restaurants price range ($,$$,$$$).
7. Raining: Whether it is raining outside.
8. Reservation: Whether we made a reservation.
9. Type: The kind of restaurant (French, Italian, Thai or Burger).
10. WaitEstimate: The wait estimation by the host (0-10, 10-30, 30-60,>60).
The decision tree usually used by one of us (SR) for this domain is given the Figure 4.1.
The tree dose not we the ‘price’ & ‘Type’ attributes, is effect considering them to be
irrelevant. Examples are processed by the tree starting at the root & following the appropriate
branch until a leaf is reached. For instance, an example with patrons=full and Wait Estimate=0-
10 will be classified positive.
2. Explain about Ensemble Learning.

The idea of ensemble learning methods is to select a whole collection of hypothesis from
the hypothesis space and combine these predictions.
Methods are:-
1. Boosting
2. Bagging
Motivation:-
The motivation for ensemble learning is simple. Consider an ensemble (collection) of H=
5 hypothesis and suppose that their prediction are combined using simple majority voting.
For the ensemble to misclassify it. The hope is that is much likely then a misclassification
by a single hypothesis. Furthermore, suppose assume that the made by each hypothesis are
independent.
1. In that case, if ‘p’ is small then the probability of a misclassification occurring is very small.
Now, obviously if different, thereby reducing the correlation between their ever, then ensemble
learning can be very useful.
2. Ensemble idea is a generic way of enlarging the hypothesis space. That is, the ensemble itself
can be thought of as a hypothesis and the new hypothesis space as the set of all possible
ensemble construction from hypothesis is the original space.
If the original hypothesis space allows for a simple and efficient learning algorithm, then
the ensemble method provides a way to learn a much more expressive class of hypothesis
without much additional computational or algorithmic complexity.
Boosting:-
Boosting is the most widely used ensemble method. To understand how it work. It is
necessary to understand the concept of a weighted training set.
Weighted training set:-
In a weighted training set, each example has an associated weight wj≥0. The higher the
weight of an example, the higher is the importance attached to it during the learning of a
hypothesis.
Working:-
Boosting starts with wj=1 for all the examples.
From this set, it generates the first hypothesis h1. This hypothesis will classify some of
the training examples correctly and some incorrectly.
The hypothesis can be made better by increasing the weight of misclassified Examples
while decreasing the weight of the correctly classified examples.
From this new weighted training set, hypothesis h2 is generated.
The process continuous until ‘M’ hypothesis has been generated, where M is an input to
the boosting algorithm.
The final ensemble hypothesis is a weighted-majority combination of all the AI
hypothesis, each weight according to how well it performed on the training set.
ADABOOST algorithm:-
ADABOOST algorithm is one among the variants.
ADABOOST is one of the most commonly used boosting algorithms.
It has a important property is which if the input learning algorithm ‘L’ is a weak learning
algorithm, which means that L always returns a hypothesis with weighted ever on the
training set.
LOGICAL FORMULATION OF LEARNING

Expressions of Decision tree:-
Any Particular decision tree hypothesis for the ‘Will wait’ goal predicate can be seen as
an aeration of the form,
¥s will wait(s)(p1(s)˅p2(s)˅…˅pn(s)),
Where each condition pi(s) is a conjunction of tests corresponding to a path from the root
of the tree to a leaf with a parities octane.
The decision tree is really deciding a relationship between ‘Will Wait’ and some logical
combination of attribute values.
We cannot decision tree to represent tests that refer to two or more different objects-for
example,
Эr2Near by (r2,r)˄Price(r, p)˄Price(r2,p2)˄Cheaper(p2,p)
Decision tree are fully expressive within the class of proportional language, that is, any
Boolean function can be written as a decision tree.
This can be done trivially by having each row is the truth table for the function
corresponds to a path in the tree.
This would yield an exponentially large decision tree representation because the truth
table has exponentially many rows. Clearly, decision tree can represent many functions with
much smaller tree.
Clearly, decision tree can represent much function with much smaller tree. For some
kinds of functions, however this is a real problem. For example, if the function is the parity
function, which returns 1 if and only if an every number of inputs are 1, then an exponentially
large decision tree will be needed.
It is also difficult to represent a majority function, which returns ‘1’ if more than half of
its inputs are 1.
The truth table has 2n rows, because input care is described by ‘n’ attributes. We can
consider the ‘answer’ column of the table as a 2n-bit number that defines the function.
If it takes 2n bits to define the function, then there are 22ˆn different functions on ‘n’
attributes. For an, with just six Boolean attributes, there are 22ˆ6=18,446,744,673,709,551,616
different function to choose from.
Inducing decision tree from example:-

An example for a Boolean decision tree consists of a vector of input attributes, X and a
single Boolean output value ‘y’. A set of examples (x1,y1)…,(x12,y12) is shown in the Figure
Example Attributes Goal
Alt Bar Fri Hun Pat Price Rain Re Type Est Willwait
s
X1 Y N N Y Some $$$ N Y French 0-10 Y
X2
X3 Y N N Y Full $ N N Thai 30- N
X4 60
X5 N Y N N Some $ N N Buya Y
X6 0-10
X7 Y N Y Y Full $ Y N Thai Y
X8 10-
X9 Y N Y N Full $$$ N Y French 30 N
X10
X11 N Y N Y Some $$ Y Y Italian >60 Y
X12
N Y N N None $ Y N Buya 0-10 N
N N N Y Some $$ Y Y Thai 0-10 Y
N Y Y N Full $ Y N Buya 0-10 N
Y Y Y Y Full $$$ N Y Italian >60 N
N N N N None $ N N Thai 10- N

30
Y Y Y Y Full $ N N Buya Y
0-10
30-
60
Figure 4.2 Inducing decision tree from example
The positive example are the once in which the goal ‘Will Wait’ is true (x1,x3,…); the
negative examples are the ones is which it is false (x2,x5,…). The complete set of example is
called training set.
Trivial tree:-
The problem of finding a decision tree that agree with tracing set might seem difficult,
but is fact there is a trivial solution that is constructing trivial tree.
Construct a decision tree that has one path to a leaf for each example, where the path tests
each attributes is turn and follows the classification of the example.
Problem with trivial tree:-
It just memorizes the observations.
It does not extract any pattern from the example, so it is not expected to be able to
extrapolate to example it has not seen.
The above Figure 4.2 shows how the algorithm gets started 12 training example are
given, which are classified into positive and negative sets. Then which attribute to be on the first
test is the tree is decided, because it leaves in with four possible outcomes, each of which has the
same number of +-ve & -ve examples.
On the other hand, the following Figure 4.2. Shown that patrons is a fairly important attribute,
There are four cases to consider for the recursive problem:-
1) If there is some positive & negative example, then choose the best attribute to split them.
Figure 4.2 (h) shows ‘Hungry’ being used to split the remaining examples.
2) If all the remaining example are positive, then it is done: can answer Yes or No, Figure(a)
shows examples of this is the Name & Some cases.
3) If there are no examples left, it means that no such example has been observed, and a
default values calculated from the majority classification at the nodes parent is returned.
4) If there are no attributes left, but both positive and negative example, then there examples
have exactly the same description, but different classification.
The decision-tree-learning algorithm:-
Function DECISION-TREE-LEARNING(example, at ribs, default)returns a decision tree
Input: example, attains, default
If example is empty then return default Else if all examples have the same classifications then
return the classification else if attributes is empty then return MAJORITY-VALUE (example)
Else
BestCHOOSSE-ATTRIBUTE (attibs, examples)
Tree a new decision tree with cost test best
MMAJORITY-VALUE(EXAMPLES)
For each values of Vi best do
Example {element of example with best=Vi}
Subtree DECISION-TREE-LEARNING(example, attribs-best, m)
Add a branch to tree with label u, and sub tree
Return tree
The final tree produced by the algorithm applied to the 12-example data set is shown in Figure
4.2.
Choosing attribute test:-
To choose attribute test, the idea is to pick the attribute that goes as for as possible toward
providing an exact classification of the examples. A perfect attribute divides the examples into
sets that are all positive or all negative.
Measure to find ‘finely good’ and ‘really useless’ attributes:-
In general, if possible ensure Vi have probabilities P(Vi), then the information content I of
the actual answer is given by ,
௜
I(p(vi),…p(vn))=∑௜ ௜ −p(vi)log2p(vi)
To check this equ, for the tossing of a fair coin, the following can be used,
I[1/2,1/2]=-1/2log1/2-1/2log21/2=1bit
Gain(A)=I[p/p+n,n/p+n]-Remainder(A)
Gain(patterns)=1-[(2/12)I(0,1)+(4/12)I(0,1)+(6/12)I(2/6,4/6)]
~0.54bits
Gain(Type)=1-[(2/12)I(1/2,1/2)+(2/12)I(1/2,1/2)+(4/12)I(2/4,2/4)+(4/12)I(2/4,2/4)]
=0
Assume the performance of the learning algorithm:-
A learning algorithm is good if it produces hypothesis that do a good job of predicting the
classification of example prediction quality can be estimated in advance on it can be estimated
after the fact.
1) Collect a large set of example.
2) Divide it into two disjoint sets, the training set and the test set.
3) Apply the learning algorithm to training set, generating a hypothesis ‘h’.
4) Measure the percentage of example in the test set that is correctly classified by ‘h’.
5) Repeat step 1 to 4 for different size of training sets and different randomly selected
training sets of each size.
Noise and over fitting:-
The problem of finding meaningless “regularity” in the data, whenever there is a large set
of possible hypothesis is called over fitting.
Solution:-
Decision tree pruning
Cross validation
3. Explain about KNOWLEDGE-IN-LEARNING:-

Prior knowledge can help an agent to learn from new experiences.
Logical formulation of learning:-
Inductive learning, which is a process of finding a hypothesis that, agrees with the
observed example
Hypothesis is represented by a set of logical sentences
Logical sentences such as prior knowledge, example description and classification
Thus, logical inference aids learning
Examples and hypothesis:-
Restaurant learning problem i.e., learning a rule for deciding whether to wait for a table.
The example object is described by logical sentence, the attribute is way predicates.
Example alternate Bar Fri Hungry Patern Price Rain Rearch Est Goal
will
wait
X1 Y N N Y Some $$ No Yes 0-10 yes
Figure 4.3 Knowledge-in-learning

Di(Xi)=Alternate(X1)˄┐Bar(X1)˄┐Fri/Sat(X1)˄Hungry(X1)˄Patern(X1)˄
Price($$$)˄┐Rain(X1)˄Research(X1)˄………
Where Di logical expression taking single argument.
Classification of the object is done by WillWait(X1) on the generic notation is,
Q(X1) if the example is positive.
┐Q(X1) if the example is negative.
The complete training set is just the conjunction of all description and classification
sentences.
The hypothesis can propose an expression called candidate definition of the goal
predicate.
Ci=candidate definition.
Then hypothesis Hi is a sentence of the form,
¥x Q(x)Ci(x)
For a decision tree, the goal predicate of an object is leading to tree is satisfied.
The decision tree induced from the training set,
¥r willwait (r)Pations (r, Some)

˅Pations (r, full) ˄Hungry(r) ˄Type (r, French)
˅Pations (r, full) ˄ Hungry(r) ˄ Type(r, thai)
˄Fri/Sat(r) ˅ Pations (r, full)
˄Hungry(r) ˄ Type(r, Buyer)
If is a function of branches where output on leaf node is true.
Current best hypothesis search:-

To maintain single hypothesis.
It is adjusted to maintain the consistory.
This search algorithm is painted by John Stuart Mill(1843)
Generalization:-
Hypothesis says it should be negative but it is true. Hence this entering has to be included
in the hypothesis. This is called Generalization.
Specialization:-
Hypothesis says that the new example is positive, but it is actually negative. This
criterion must be decreased to exclude this example. This is called Specialization. Current-best
learning algorithm is used in many machine-learning alg.
4. Explain about Explanation – Based Learning

(EBL):- Definition:-
When an agent can utilize a worked example of a problem as a problem-solving method,
the agent is said to have the capability of Explanation-based learning (EBL).
Advantage:-
A deductive mechanism, it requires only a single training example.
EBL algorithm requires all of the following:-
Accepts 4 kinds of inputs:-
1. A training example:-
What are the learning seen in the world.
2. A goal concept
A high level description of what the program is to learn.
An operational criterion:-
A description of which concept are usable.
A domain theory:-
A set of rule that describe relationship b/w object and action is a domain.
Entailment constraint satisfied by EBL is,
Hypothesis ˄ description ≠ classifications
Background ≠ hypothesis
Extracting rules from example:-
If is a method for extracting general rules from individual observations.
Idea is to construct an explanation of the observation using prior knowledge.
Consider (X2,X)=2X
Suppose to simply 1X(0+X). the knowledge here include the following rules,
Rewrite (u,v)˄Simplify (v,w)=>Simplify(u,w)
Primitive(u)=>Simplify(u,u)
Arithmetic unknoun=>primitive(u)
Number(u)primitive(u)
Rewrite(1*u,u)
Rewrite(0+u,u)
EBL process working:-
1. Construct a proof that the goal predicted applies to the example using the available
background knowledge.
2. In parallel, construct a generalized proof tree for the variabilized goal using the same
inference steps as in the original proof.
3. Construct a new rule where left hand side consists of leaves of the proof tree and RHS is
the variabilized goal.
4. Drop any conditions that are tree regardless the values of the variable is the goal.
Improving efficiency:-
1. Reduce the large number of rules. It increases the branching factor in the search space.
2. Derived rules must offer significant increase in speed.
3. Derived rule is a general as possible, so that they apply to the largest possible set of
cases.
5. Explain about Learning Using Relevance

INFORMATIONS:- Relevance based learning using
prior knowledge.
Eg:- traveler is Brazil concluded that all Brazilian speaks Portuguese.
Entailment constraint is given by,
Hypothesis ˄Description ≠ Classification
Background ˄Description ˄Classification ≠ Hypothesis
Express the above example is FOL as,
Nationality (x,n) ˄ Nationality (y,n) ˄ language (x,l) => Language (y,l)
If x & y have the same nationality & x speaks language ‘l’ & then y also speaks it.
Nationality (Fernando, Brazil) ˄ language (Fernando, Portuguese
Entails the following sentence as,
Nationality (x, Brazil) => Language (x, Portuguese)
The relevance for given nationality language is fully determined. There sentences are
called functional dependencies or determination.
Nationality (x, n) => language (x, l)
INDUCTIVE LOGIC PROGRAMMING:-

Knowledge based inductive learning (KBIL) find inductive hypothesis that explain set of
observation with the help of background knowledge.
ILP techniques perform KBIL on knowledge that expressed in first order logic.
ILP gained popularity for three reasons:-
1. It offers rigorous approach to general knowledge-based inductive learning problem.
2. It offers complete algorithm for including general first order theories from examples.
3. It produces hypothesis that are easy for humans to read.
Ex:-
Problem for learning family relationship. The descriptions consists of mother, father,
married relation, male and female properties.
A typical family tree:-
The corresponding descriptions are as follows, Father
(Philips, Charles) Father (Philips, Anne)…. Mother
(Mum, Margaret) Mother (Mum, Elizabeth)….
Married (Diana, Charles) Married (Elizabeth, Philip)…

Male (Philip) Male (Charles)….
Female (Diana) Female (Elizabeth)….
Grandparent (Mum, Charles) Grandparent (Elizabeth, Bectria)….
┐Grandparent (Mum, Harry) ┐Grandparent (Spencer, Peter)…
Statistical learning methods:-
The key concept of statistical learning is domain & hypothesis. (Domain work)
Data:-
Data are evidence of some or all of the random variables describing the domain.
Hypothesis:-
The hypothesis is probabilistic theories of how the domain works including
logical theories as a special case.
Example;
Candy comes as two flavors: cherry & lime
Bayesian learning:
It calculates the probability of each hypothesis, given the data & makes predication
by use all the hypothesis.
Bayesion view of learning is extremely powerful, providing general solution the
problem of noise, over fitting & optional prediction.
Ex:-
h1:100%
cherry
h2:75% cherry +25%
lime h3:50% cherry +
50% lime h4:25%
cherry +75% lime
h5:100% lime
Learning with complete data:-
Parameter learning task involves finding the numerical parameter for the
probability model.
The structure of the model is
fixed.
Data are complete when each data point contains values for every variable in
the probability model.
Complete data simplify the problem of learning the parameter of complex model.
REINFORCEMENT LEARNING:-
If involves finding a balance b/w exploration of new knowledge and exploitation
of current knowledge.
Ex:- chess by supervised learning.
No agent means, it tries some random moves.

The agent need to know that something good has happened when it wise and
something bad when it loss. This kind of feedback is called leeward or reinforcement.
In game like chess, the reinforcement is received only at the end of the
game. In ping-pong game each point scored can be considered as leeward.
The test of reinforcement learning is to be observed rewards to learn an optional poling for
the environment.
It is a sub-area of machine learning.
The basic reinforcement learning model consists of:-
1. A set of environment states s;
2. A set of action A; and
3. A set of scalar “rewards”; R
If is well suited for application like robot control, telecommunication and games.
Various types of agents:-
1. Utility based agents:-
It learns utility function on states, and cases it to select action that maximizes
the expected outcome utility.
2. Q-learning agent:-
It learns an action-value function or Q- function, giving the expected utility
of taking a given action is a given state.
3. Reflex agent:-
It learns a policy that maps directly from states to
action. Basic reinforcement learning model:
Types of reinforcement
learning:

AI 2marks Questions

Caricato da

Informazioni sul documento

Titolo originale

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

AI 2marks Questions

Caricato da

Copyright:

Formati disponibili

www.Vidyarthiplus.

DEPARTMENT: CSE SEMESTER: VI

SUBJECT CODE / Name: CS2351 – ARTIFICIAL INTELLIGENCE

3. What do you mean by local maxima with respect to search technique?

4. Define Turing test.

7. Define rational agent.

8. Define an Omniscient agent.

10. List the measures to determine agent’s behavior.

11. List the various types of agent programs.

12. List the components of a learning agent?

14. What is depth-limited search?

15. Define breadth-first search.

16. Define problem formulation.

17. List the four components of a problem?

18. Define iterative deepening search.

19. Mention the criteria’s for the evaluation of search strategy.

20. Define the term percept.

21. Define Constraint Satisfaction Problem (CSP).

22. List some of the uninformed search techniques.

1. Explain Agents in detail.

Discrete vs. continuous:

Examples of task environments and their characteristics.

2. Explain uninformed search strategies.

Breadth-first search is a simple strategy in which the root node is expanded

Time and memory requirements for breadth-first-search. The numbers

Depth-first-search on a binary tree. Nodes that have been expanded and

R. Loganathan, AP/CSE. Mahalakshmi Engineering College, Trichy

The problem of unbounded trees can be alleviated by supplying depth-first-

R. Loganathan, AP/CSE. Mahalakshmi Engineering College, Trichy

4. ITERATIVE DEEPENING DEPTH-FIRST SEARCH:

R. Loganathan, AP/CSE. Mahalakshmi Engineering College, Trichy

Four iterations of iterative deepening search on a binary tree.

R. Loganathan, AP/CSE. Mahalakshmi Engineering College, Trichy

Iterative search is not as wasteful as it might seem

ITERATIVE DEEPENING SEARCH

Iterative search is not as wasteful as it might seem

R. Loganathan, AP/CSE. Mahalakshmi Engineering College, Trichy

In general, iterative deepening is the preferred uninformed search method when

A schematic view of a bidirectional search that is about to succeed,when a

6 COMPARING UNINFORMED SEARCH STRATEGIES:

Figure 1.19 Comparing Uninformed Search Strategies

3. Explain informed search strategies.

INFORMED (HEURISTIC) SEARCH STRATEGIES:

2.1GREEDY BEST-FIRST SEARCH:

R. Loganathan, AP/CSE. Mahalakshmi Engineering College, Trichy

Figure 2.1 Values of hSLD - straight line distances to Bucharest

R. Loganathan, AP/CSE. Mahalakshmi Engineering College, Trichy

stages in greedy best-first search for Bucharest using straight-line distance

R. Loganathan, AP/CSE. Mahalakshmi Engineering College, Trichy

because it is closest. Fagaras in turn generates Bucharest, which is the goal.

R. Loganathan, AP/CSE. Mahalakshmi Engineering College, Trichy

Properties of greedy search:

Recursive Best-first Search (RBFS):

R. Loganathan, AP/CSE. Mahalakshmi Engineering College, Trichy

Coding for how RBFS reaches Bucharest.

function RECURSIVE-BEST-FIRST-SEARCH(problem) return a solution or failure

R. Loganathan, AP/CSE. Mahalakshmi Engineering College, Trichy

A typical instance of the 8-puzzle.

The Effective Branching factor:

Comparison of search costs and effective branching factors for the

Inventing admissible heuristic functions:

Key advantages of Local Search Algorithms:

State Space Landscape