Sei sulla pagina 1di 12

Contents

AAI ................................................................................................................................................................... 3
Goals of AI .................................................................................................................................................... 3
Application Areas of AI.................................................................................................................................. 3
AI Challenges ................................................................................................................................................ 3
A Framework for Building AI Systems ............................................................................................................ 3
Fundamental AI Issues .................................................................................................................................. 3
Design Methodology and Goals..................................................................................................................... 3
ELIZA: Dialog with a Machine ........................................................................................................................... 3
Describing and specifying ELIZA .................................................................................................................... 3
Pattern Matching.......................................................................................................................................... 4
Segment pattern Matching ........................................................................................................................... 4
ELIZA Program: rules ..................................................................................................................................... 4
Natural Language Processing............................................................................................................................ 4
Knowledge in Speech and Language Processing ............................................................................................ 4
Ambiguity ..................................................................................................................................................... 4
Models and Algorithms ................................................................................................................................. 4
History.......................................................................................................................................................... 4
Expressing language Constraints ...................................................................................................................... 4
X-bar Schema ............................................................................................................................................... 5
Questions and Commands................................................................................................................................ 5
Syntactic Transition Nets .............................................................................................................................. 5
Semantic transition trees .............................................................................................................................. 5
Intelligent Agents ............................................................................................................................................. 6
Characteristics: ............................................................................................................................................. 6
Structure ...................................................................................................................................................... 6
Behavior/Performance ................................................................................................................................. 6
Agent Types.................................................................................................................................................. 6
1. Simple Reflex Agent .............................................................................................................................. 6
2. Reflex Agent with Internal State ............................................................................................................ 6
3. Goal-Based Agent ................................................................................................................................. 6
4. Utility-Based Agent ............................................................................................................................... 7
Stimulus-Response agents................................................................................................................................ 7
Perception and Action .................................................................................................................................. 7
Boolean Algebra ........................................................................................................................................... 7
Representing and Implementing Action Functions......................................................................................... 7
Neural Networks(TLUs) .................................................................................................................................... 7

Training Single TLUs ...................................................................................................................................... 7


Neural Networks Motivation......................................................................................................................... 7
Generalization, Accuracy and Overfitting ...................................................................................................... 7
Training Neural Nets ........................................................................................................................................ 8
Back-propagation procedure......................................................................................................................... 8
Machine Evolution ........................................................................................................................................... 8
Evolutionary Computation ............................................................................................................................ 8
Genetic Programming(GP) ............................................................................................................................ 9
Program representation ........................................................................................................................... 9
The GP process ......................................................................................................................................... 9
State Machines................................................................................................................................................. 9
Representing the Environment by Feature Vectors........................................................................................ 9
Elman Networks ........................................................................................................................................... 9
Iconic Representations ................................................................................................................................. 9
Blackboard Systems ...................................................................................................................................... 9
Robot Vision................................................................................................................................................... 10
Introduction ............................................................................................................................................... 10
Steering an Automobile .............................................................................................................................. 10
Two Stages of Robot Vision ......................................................................................................................... 10
Image Processing ........................................................................................................................................ 11
Scene Analysis ............................................................................................................................................ 11
Interpreting Lines and Curves in the image ............................................................................................. 11
Model-Based Vision ................................................................................................................................ 11
Stereo Vision and Depth Information .......................................................................................................... 12

AAI
A field that focuses on developing techniques to enable computer systems to perform activities that are
considered intelligent (in humans and other animals).
Goals of AI
replicate human intelligence
solve knowledge-intensive tasks
intelligent connection of perception and action
enhance human-human, human-computer and computer-computer interaction/communication
Application Areas of AI
game playing
speech recognition
computer vision (e.g. face recognition programs used by banks)
expert systems: diagnostic systems, system configuration, financial decision making, classification
systems
mathematical theorem proving
natural language understanding
scheduling and planning
AI Challenges
translating telephone
accident-avoiding car
aids for the disabled
smart clothes
intelligent agents that monitor and manage information by filtering, digesting, abstracting
tutors
self-organizing systems
A Framework for Building AI Systems
Perception: intelligent biological systems are physically embodied in the world and experience the
world through their sensors(senses). Include signal processing, areas of vision, speech processing and
natural language processing.
Reasoning: inference, decision-making, classification from what is sensed and what the internal
modelis of the world. Include game theory, machine learning etc.
Action: all behavior is centered around actions in the world. Include robot actuation, natural language
generation and speech synthesis.
Fundamental AI Issues
Representation: deals with the questions of what to represent and how to do it.
Search: searching for a solution in a very large problem space.
Inference: from some facts others can be inferred. Related to search.
Learning
Planning
Design Methodology and Goals
Engineering Goal: develop concepts, theory and practice of building intelligent machines. Emphasis on
system building.
Science Goal: Develop concepts, mechanisms and vocabulary to understand biological intelligent
behavior. Emphasis on understanding intelligent behavior.
Methodologies can be defined by choosing (1) the goals of the computational model, and (2) the basis for
evaluating performance of the system
ELIZA: Dialog with a Machine
Describing and specifying ELIZA
ELIZA was one of the first programs to feature English output as well as input.

The ELIZA algorithm can be described simply as: (1) read an input, (2) find a pattern that matches the input, (3)
transform the input into a response, and (4) print the response. These four steps are repeated for each input.
Pattern Matching
There are four things to be concerned with: a general pattern and response, and a specific input and
transformation of that input.
Segment pattern Matching
We need to account for variables in any position that match a sequence of items in the input. We will call such
variables segment variables. We will need a notation to differentiate segment variables from normal variables.
The possibilities fall into two classes: either we use atoms to represent segment variables and distinguish them
by some spelling convention or we use a nonatomic construct.
ELIZA Program: rules
We want the patterns to be associated with responses. We can do this by inventing a data structure called a
rule, which consists of a pattern and one or more associated responses. So several rules may all be applicable
to the same input. One possibility would be to choose a rule at random from among the rules havingpatterns
that match the input. Another possibility is just to accept the first rule that matches. This implies that the rules
form an ordered list, rather than an unordered set.
Natural Language Processing
Knowledge in Speech and Language Processing
By speech and language processig, we have in mind those computational techniques that process spoken and
written human language, as language. What distuguishes these applicatons from others is their use of
knowledge of language. The knowledge of language needed to engage in complex language behavior can be
separate into 6 distinct categories:
Phonetics and Phonology the study of linguistic sounds.
Morphology the study of the meaningful components of words.
Syntax the study of the structural relationships between word.
Semantics the study of meaning.
Pragmatics the study of how language is used to accomplish goals.
Discourse the study of linguistic units larger than a single utterance.
Ambiguity
Most or all tasks in speech and language processing can be viewed as resolving ambiguity. Some input is
ambiguous if there are multiple alternative linguistic structures that can be built for it.
Models and Algorithms
Language processing can be captured through the use of a small number of formal models or theories, which
are drawn from the standard toolkits of Computer Science, Mathematics and Linguistics. Among the most
important elements in this toolkit are state machines, formal rule systems, logic, probability theory and other
machine learning tools. These models lend themselves to a small number of algorithms from well-known
computational paradigms, such as state space search and dynamic programming algorithms.
State machines are formal models that consist of states, transitions among states and an input
representation. Closely related to these are their declarative counterparts: formal rule systems(e.g. regular
grammars and relatios). State machines and formal rule systems are the main tools used when dealig with
knowledge of phonology, morphology and syntax.
The algorithms involved in a search through a space of states are the graph algorithms.
Each model can be augmented with probabilities. One major role of probability theory is to solve ambiguity
problems.
History
Historically, speech and language processing has been treated very differently inn computer science, electrical
engineering, linguistics and psychology/cognitive science. Because of this diversity, different fields in these
departments have appeared: computational linguistics in linguistics, natural language processing in computer
science, speech recognition in electrical engineering, computational psycholinguistics in psychology.
Expressing language Constraints
A full understanding of natural language lies beyond the present state of scientific knowledge. Nevertheless, it
is possible to achieve engineering goals in limited contexts.

X-bar Schema
The binary-tree hypothesis states that a binary tree is the best kind of tree for representing sentence structure
at all levels. Structure: a maximal projection node at the top, an intermediate connector tying in a phrase from
the right, and a head node at the bottom connected to a word.
A binary X-bar tree is a representation that is a tree in which
There are three general types of nodes: , and XP.
Each node has at most two branches.
Leaves are words.
In English, specifiers enter XP nodes from one side; complements enter X nodes from the other. The
direction from which specifiers and complements enter is language specific.
Verb, prepositional, noun, inflection and complementizer phrases all can be described by the same binary X-bar
schema.
Many linguists believe that the X-bar hypothesis offers the right perspective for determining what is in
universal grammar, because it provides a relatively easy way to express constraints. Constraints on a words
case illustrate how the X-bar hypothesis helps. More precisely, the way a word fits into the sentence
establishes the words case assignment. The case assignment, in turn, determines the exact form of the word
a process called case marking.
One way to describe an X-bar structure is to write down the rules that stipulate what sort of branches
can appear. For example, an NP node can have either an N node or both a determiner and an N node as its
direct descendants. Using standard notation, the following rewrite rules say the same thing:
NP > determiner
Questions and Commands
Syntactic Transition Nets
A syntactic transition-net grammar consists of a sentence net and a collection of supporting nets. To
test a string of words to see whether it constitutes a valid sentence, you try to move along a series of links from
the initial node in the sentence net to a terminal nodeone with a circle in the centerusing the words as
directions. If a particular string of words enables you to reach a terminal node, then that string is said to be
accepted or recognized by the transitionnet grammar, from which you conclude that the string of words is a
valid sentence with respect to the grammar.
A nested-box diagram is a diagram in which graphical inside corresponds to temporal during.
Semantic transition trees
Relational databases consist of one or more relations, each of which is a table consisting of labeled columns,
called fields and data-containing rows, called records.
The overall semantic transition-tree approach to translating an English question or command into
database commands is as follows:
Use the question or command to select database-oriented patterns.
Use the question or command to instantiate and combine the selected patterns.
Use the completed pattern to retrieve the database records specified in the question or command.
Use the retrieved database items to respond to the question or command.
Semantic transition-tree grammars differ from the syntactic transition-net grammars in several ways:
Some link transitions require specific words.
Some link transitions specify phrases semantically, rather than syntactically.
There are no nodes with two inputs - because of the change from nets to trees, there is a one-to-one
correspondence between paths and terminal nodes.
Whenever a tree is traversed successfully, the trees name is considered to be a tree variable, and is
bound to a pattern as the tree variables binding.
Tree variables, marked by left bracket symbols, <, are replaced by their previously established
bindings.
Note that semantic transition trees have no loops of the sort used in syntactic transition nets, because you can
replace loops with trees that use themselves(recursive tree analysis).
To traverse a transition tree:
Determine whether it is possible to reach a success node, denoted by a double circle, via word and
subtree links.
o If it is not possible, announce failure.
o Otherwise

Instantiate the pattern associated with the double circle. Replace pattern variables,
marked by <prefixes, with bindings established as subtree links are traversed.
Bind the trees name to the instantiated pattern.
Announce success.

To traverse a link:
If the link is a subtree link, marked by a right bracket, >, and the name of a subtree, try to traverse the
subtree. If the traversal is successful, bind the subtree name to the instantiated pattern found at the
terminal node of the subtree.
If the link is a word link, the next word in the sentence must be that word. The word is consumed as
the link is traversed.
Intelligent Agents
Def.: An agent is a computer software system whose main characteristics are situatedness, autonomy,
adaptivity, and sociability.
Examples: human agents( skin, hands etc.), robot( camera, lights etc.), software agent(softbot).
Characteristics:
Situatedness: The agent receives some form of sensory input from its environment, and it performs
some action that changes its environment in some way. Examples of environments: the physical world
and the Internet.
Autonomy: The agent can act without direct intervention by humans or other agents.
Adaptivity: The agent is capable of (1) reacting flexibly to changes in its environment; (2) taking goaldirected initiative, when appropriate; and (3) learning from its own experience, its environment, and
interactions with others.
Sociability: The agent is capable of interacting in a peer-to-peer manner with other agents or humans.
Structure
Agent= architecture + program
Agent program: the implementation of agents perception-action mapping.
Architecture: a device that can execute the agent program.
Behavior/Performance
Rationality => Need a performance measure to say how well a task has been achieved. An ideal
rational agent should, for each possible percept sequence, do whatever actions will maximize its
performance measure based on (1) the percept sequence, and (2) its built-in and acquired knowledge.
Hence includes information gathering, not "rational ignorance."
Types of objective performance measures: false alarm rate, false dismissal rate, time taken, resources
required, effect on environment, etc.
Examples: Benchmarks and test sets, Turing test.
Agent Types
1. Simple Reflex Agent
Table lookup of percept-action pairs defining all possible condition-action rules necessary to interact
in an environment
Problems: too big to generate and to store, no knowledge of non-perceptual parts of the current
state, not adaptive to changes in the environment, can't make actions conditional.
2. Reflex Agent with Internal State
Encode "internal state" of the world to remember the past as contained in earlier percepts
Needed because sensors do not usually give the entire state of the world at each input, so perception
of the environment is captured over time. "State" used to encode different "world states" that
generate the same immediate percept.
Requires ability to represent change in the world;
Example: Rodney Brooks's Subsumption Architecture
3. Goal-Based Agent
Choose actions so as to achieve a (given or computed) goal= a description of a desirable situation
Keeping track of the current state is often not enough--- need to add goals to decide which situations
are good
Deliberative instead of reactive

May have to consider long sequences of possible actions before deciding if goal is achieved--- involves
consideration of the future, "what will happen if I do...?"
4. Utility-Based Agent
A goal specifies a crude distinction between a happy and unhappy state, but often need a more
general performance measure that describes "degree of happiness"
Utility function U: State --> Reals
indicating a measure of success or happiness when at a given state
Allows decisions comparing choice between conflicting goals, and choice between likelihood of
success and importance of goal (if achievement is uncertain)
Stimulus-Response agents
Stimulus-response( S-R) agents= machines that have no internal state ad that simply react to immediate stimuli
in their environments.
Perception and Action
The processes of computing an action is divided in 2 phases: a perceptual processing phase that produces a
vector of features, and an action computation phase that selects an action based on the feature vector. The
values of the features can be either real numbers or categories(whose value is a name or property).
Boolean Algebra
Boolean algebra is a convenient notation for representing Boolean functions. Boolean algebra rules:
1+ 1= 1 , 1+0= 1 ,0+0=0 , 1 1= 1 , 10=0 ,00=0 , 1=0
,0= 1
Sometimes the arguments and values of Boolean functions are expressed in terms of the constant T and F.
A Boolean formula consisting of a single variable is called an atom. One consisting of either a single variable or
its complement is called a literal.
Boolean functions come in a variety of forms. An important form is 12k, where i are literals, and is called
a conjunction of literals or a monomial. Another form is 1+2++k and is called a disjunction of literals.
Representing and Implementing Action Functions
Production System: comprises an ordered list of rules called production rules or productions. Each rule is
written in the form ci -> ai, where ci is the condition part and ai is the action part.
Networks: e.g. implemented as electronic circuits.
The Subsumption Architecture: the general idea is that an agents behavior is controlled by a number of
behavior modules. Each module receives sensory information directly from the world.
Neural Networks(TLUs)
Training Single TLUs
A TLU is defined by its weights and threshold. A TLU has an output of 1 if the vector dot product, s = W, is
greater than the treshold, and has an output of 0 otherwise.
Training a TLU is accomplished by adjusting its variable weights.
Comparing Widrow-Hoff with generalized Delta reveals these differences:
1. in Widrow-Hoff, the desired output, i is either 1' or -1, whereas in generalized Delta it is 1 or 0.
2 In Widrow-HofF, the actual output equals sf the dot product whereas in generalized Delta, it is the output of
the sigmoid function,
3. In generalized Delta, there is the added term /< f -/) due to the presence of the sigmoid function. With the
sigmoid function,/{I -/) can vary in value from 0 to 1. When/ is 0,/(l -/} is also 0: when/ is 1,/(1 -/) is 0;/(l -/)
obtains its maximum value of 1/4 when /is Ml (that is, when the input to the sigmoid is 0). The sigmoid function
can be thought of as implementing a "fuzzy" hyperplane. For an input vector tar away from this fuzzy
hyperplane,/(I -/) has value close to 0, and the generalized Delta rule makes little or no change to the weight
values regardless of the desired output. Weight changes are made only within the region of "fuzz" surrounding
the hyperplane (the only place where changes have much effect on/), and these changes are in the direction of
correcting the error.
Neural Networks Motivation
It often happens that there are sets of stimuli and responses that cannot be learned by a single TLU. In that
case, it is possible that a network of TLUs can give correct responses. The function implemented by a network
of TLUs depends on its topology as well as on the weights of the individual TLUs. Feedforward networks have
no cycles: in a feedforward network, no TLU's input depends on the TLU's output. Networks that are not
feedforward are called recurrent networks.
Generalization, Accuracy and Overfitting
A network is said to generalize when it appropriately classifies vectors not in the training set. Generalization
ability is measured by the accuracy with which it makes these classifications.

Training Neural Nets


Real neurons consist of synapses, dendrites, axons, and cell bodies. Simulated neurons consist of multipliers,
adders, and thresholds.
A neural net is a representation that is a arithmetic constraint net in which:
Operation frames denote arithmetic constraints modeling synapses and neurons.
Demon procedures propagate stimuli through synapses and neurons. One moves information across
neurons; another moves information from one neuron to another.
When a value is written into a synapse's input slot,
r
Write the product of the value and the synapses w eight into the synapses output slot.
When a value is written into a synapses output slot,
Check the following neuron to see whether all its input synapses outputs have values.
If they do, add the output values of the input synapses together, compare the sum with the neurons
threshold, and write the appropriate value into the neurons output slot.
Otherwise, do nothing.
One way to learn is to train a simulated neural net to recognize regularity in data.
Back-propagation procedure
The back-propagation procedure is a procedure for training neural nets. Is a relatively efficient way to compute
how much performance improves with individual weight changes. Back propagation can be understood
heuristically or by way of a mathematical analysis.
To enable back propagation, you need to perform a simple trick that eliminates nonzero neuron thresholds.
You also need to convert stairstep threshold functions into squashed 5 threshold functions.
You can teach a neural net, via back propagation, to recognize several concepts. These concepts can be taught
one at a time or all at once.
You must choose a back-propagation rate parameter carefully. A rate parameter that is too small leads to slow
training; a rate parameter that is too large leads to instability.
The back-propagation equations are incorporated into the following back- propagation procedure:
To do back propagation to train a neural net,
Pick a rate parameter, r.
Until performance is satisfactory,
For each sample input,
Compute the resulting output.
Compute ( stands for the benefit obtained by changing the output value of a node)
for nodes in the output layer using
Compute for all other nodes using
Compute weight changes for all weights
Add up the weight changes for all sample inputs, and change the weights.
Because weight changes are proportional to output errors, the outputs will only approach the 1 and 0 values
used as training targets; they will never reach those values.
Machine Evolution
Evolutionary Computation
Another way in which biological systems adapt is by evolution: generations of descendants are produced that
perform better than do their ancestors.
Biological evolution proceeds by the production of descendants changed from their parents and by the
selective survival of some of these descendants to produce more descendants. These two aspects, change
through reproduction and selective survival, are sufficient to produce generations of individuals that are better
and better at doing whatever contributes to their ability to reproduce. The height of an individual on this
landscape is a measure of how well that individual performs its task relative to the performance of the other
individuals. Those individuals at low elevations cease to exist, with a probability that increases with decreasing
height. Those individuals at high elevations "reproduce" with a probability that increases with increasing
height. Reproduction involves the production of new individuals whose location on the landscape is related to,
but different from, that of the parent(s). The most interesting and efficacious kind of reproduction involves the
production of new individuals jointly by two parents. The location(s) of the offspring on the landscape is a
function of the locations of the parents.
The most straightforward application is in function optimization. There, we attempt to find the
maximum of a function, f. The arguments of the function specify the location of individuals, and the value f is

the height. The other application is to evolve programs to solve specific problemsfor example, programs to
control reactive agents.
Genetic Programming(GP)
Program representation
In GP, we evolve functional programs such as LISP functions. Such programs can be expressed as rooted trees
with labeled nodes. Internal nodes are functions, predicates, or actions that lake one or more arguments. Leaf
nodes are program constants, actions, or functions that take no arguments.
For GP, we must ensure that all expressions and subexpressions used in a program have values for all possible
arguments (unless execution of an expression terminates the program).
The GP process
In genetic programming we start with a population of random programs, using functions, constants, and
sensory inputs that we think may be the ones that will be needed by programs if they are to be effective in the
domain of interest. These initial programs are said to constitute generation 0. The size of the population of
programs in generation 0 is one of the parameters of a GP run. Note that several rather arbitrary parameters
must be set in constructing the next generation.
State Machines
Representing the Environment by Feature Vectors
The feature vector used by a stimulus-response agent can be thought of as representing the state of
the environment so far as that agent is concerned. From this feature vector, the S-R agent computes an action
appropriate for that environmental state. Sensory limitations of the agent preclude completely accurate
representation of environmental state by feature vectorsespecially feature vectors that are computed from
immediate sensory stimuli. The accuracy can be improved, however, by taking into account previous sensory
history. Important aspects of the environment that cannot be sensed at the moment might have been sensed
before. Such machines that track their environment in this way are called state machines. Besides immediate
sensory inputs, state machines must have memory in which to store a model of the environment. Agents
equipped with stored models of the environment will usually be able to perform tasks that memoryless agents
cannot.
This model can take many forms. Because agent environments might be arbitrarily complex, it is
always the case that the agent can only imperfectly represent its environment by a feature vector.
But it is also the case that agents designed for specific tasks can afford to conflate many
environmental states. The agent designer must arrange for a feature vector that an adequate
representation of the state of the environmentat least insofar as the agent's tasks are concerned.
Elman Networks
An agent can use a special type of recurrent neural network (called an Elman network ) to learn how to
compute a feature vector and an action from a previous feature vector and sensory inputs. Although an Elman
network is a special case of the recurrent neural networks, training can be accomplished by ordinary backpropagation.
Iconic Representations
Feature vectors are just one way of representing the environment. Other data structures may also be used. On
one side, we have a feature-based representation of the worlda vector of features or attributes. On the other
side, we have an iconic representationdata structures such as maps, which can usually be thought of as
simulations of important aspects of the environment, (Iconic representations are sometimes also called
analogical representation). A crisp and precise distinction between iconic and feature-based representations is
difficult to make.
When an agent uses an iconic representation, it must still compute actions appropriate to its task and
to the present (modeled) state of the environment. Reactive agents react to the data structure in much the
same way that agents without memory react to sensory stimuli: they compute features of the data structure.
The sensory information is first used to update the iconic model as appropriate. Then, operations similar to
perceptual processing are used to extract features needed by the action computation subsystem. The actions
include those that change the iconic model as well as those that affect the actual environment.
Blackboard Systems
Data structures used to model the world do not necessarily have to be iconic, although they often are.
An important style of AI machine is based on a blackboard architecture, which uses a data structure called a
blackboard. The blackboard is read and changed by programs called knowledge sources (KSs). Blackboard
systems are elaborations of the production systems I have already described. Each KS has a condition part and
an action part.

The condition part computes the value of a feature; it can be any condition about the blackboard data
structure that evaluates to 1 or 0 (or True or False). The action part can be any program that changes the data
structure or takes external action (or both). When two or more KSs evaluate to l, a conflict resolution program
decides which KSs should act. In addition to changing the blackboard, KS actions can also have external effects.
And the blackboard might also be changed by perceptual subsystems that process sensory data. Often, the
blackboard data structure is organized hierarchically with subordinate data structures occupying various levels
of the hierarchy.
The KSs are supposed to be "experts" about the part(s) of the blackboard that they watch. When they
detect some particular aspect of their part(s) of the blackboard they propose changes to the blackboard, which,
if selected, may evoke other KSs, and so on. Blackboard systems are designed so that as computation proceeds
in this manner, the blackboard ultimately becomes a data structure that contains the solution to some
particular problem and/or the associated external effects change the world in some desired way. The
blackboard architecture has been used in several applications ranging from speech understanding, to signal
interpretation, and medical patient-care monitoring.
Robot Vision
Introduction
Although vision is apparently effortless for humans, it has proved to be a very difficult problem for machines.
Major sources of difficulty include variable and uncontrolled illumination, shadows, complex and hard-todescribe objects such as those that occur in outdoor scenes and nonrigid objects, and objects occluding other
objects. Some of these difficulties are lessened in man-made environments, such as the interior of buildings,
and computer vision has generally been more successful in those environments.
The first step in computer vision is to create an image of a scene on an array of photosensitive devices,
such as the photocells of a TV camera. The image is formed by a camera through a lens that produces a
perspective projection of the scene within the camera's field of view.
The photocells convert the image into a two-dimensional, time-varying matrix of image intensity
values. A vision-guided reactive agent must then process this matrix to create either an iconic model of the
scene surrounding it or a set of features from which the agent can directly compute an action.
The kinds of information to be extracted depend on the purposes and tasks of the agent. To navigate
safely through a cluttered environment, an agent needs to know about the locations of objects, boundaries,
and openings and about the surface properties of its path. To manipulate objects, it needs to know about their
locations, sizes, shapes, compositions, and textures. For other purposes, it may need to know about their color
and to be able to recognize them as belonging to certain classes. Based on how all of this information has
changed over an observed time interval, an agent might need to be able to predict probable future changes.
Steering an Automobile
In certain applications involving S-R agents, neural networks can be used to convert the image
intensity matrix directly into actions.
The input to the network is derived from a low-resolution (30 x 32) television image. The TV camera is mounted
on the automobile and looks at the road straight ahead. This image is sampled and produces a stream of 960dimensional input vectors to the neural network.
Two Stages of Robot Vision
In man-made environments such as the interiors of buildings, objects can be doorways, furniture,
other agents, humans, walls, floors, and so on. In exterior natural environments, objects can be animals, plants,
man-made structures, automobiles, roads, and so on. Man-made environments are usually easier for robot
vision because most of the objects tend to have regular edges and surfaces.
Two computer vision techniques are useful for delineating the parts of images that relate to objects in
the scene. One technique looks for "edges" in the image. An image edge is a part of the image across which the
image intensity or some other property of the image changes abruptly. Another technique attempts to
segment the image into regions. A region is a part of the image in which the image intensity or some other
property of the image changes only gradually. Often, but not always, edges in the image and boundaries
between regions in the image correspond to important object-related discontinuities in the scene that
produced the image.
The discussion of visual processing is divided into two major stages. The image processing stage is
concerned with transforming the original image into one that is more amenable to the scene analysis stage.
Image processing involves various filtering operations that help reduce noise, accentuate edges, and find
regions in the image. Scene analysis routines attempt to create from the processed image either an iconic or a

feature-based description of the original sceneproviding the task-specific information about it that the agent
needs.
Image Processing
Averaging
Certain irregularities in the image can be smoothed by an averaging operation. This operation involves
sliding an averaging window all over the image array. The averaging window is centered at each pixel, and the
weighted sum of all the pixel numbers within the averaging window is computed.
This sum then replaces the original value at that pixel. The sliding and summing operation is called convolution.
If we want the resulting array to contain only binary numbers (say 0 and 1), then the sum is compared with a
threshold. Averaging tends to suppress isolated noise specks but also reduces the crispness of the image and
loses small image components
Edge Enhancement
As previously mentioned, computer vision techniques frequently involve extracting image edges. These edges
are then used to convert the image to a line drawing of some sort. The outlines in the converted image can
then be compared against prototypical (model) outlines characteristic of the sorts of objects that the scene
might contain. One method of extracting outlines begins by enhancing boundaries or edges in the image. An
edge is any boundary between parts of the image with markedly different values of some property, such as
intensity.
Combining edge enhancement with averaging: to be less sensitive to noise, we can combine the two
operations, first averaging and then edge enhancing.
Region finding
Another method for processing an image attempts to find regions" in the image within which intensity or
some other property such as texture does not change abruptly. In a sense, finding regions is a process that is
dual to finding outlines; both techniques segment the image into scene-relevant portions. But since finding
outlines and finding regions are both subject to idiosyncrasies due to noise; the two techniques are often used
to complement each other.
A region is a set of connected pixels satisfying two main properties:
A region is homogeneous.
For no two adjacent regions is it the case that the union of all the pixels in these two regions satisfies
the homogeneity property.
Scene Analysis
After the image has been processed, we can attempt to extract from it the needed information about the
scene. This phase of computer vision is called scene analysis. Since the scene-to-image transformation is manyto-one, the scene analysis phase requires either additional images or general information about the kinds of
scenes to be encountered (or both). The required extra knowledge can be very general or quite specific. It can
also be explicit or implicit.
Sometimes an iconic model of the scene is desired, and sometimes certain features of the scene
suffice. Iconic scene analysis usually attempts to build a model of the scene or of parts of the scene. Featurebased scene analysis extracts features of the scene needed by the task at hand. So-called task-oriented or
purposive vision typically employs feature-based scene analysis.
Interpreting Lines and Curves in the image
For scenes that are known to contain rectilinear objects (such as the scenes confronted inside of
buildings and scenes in grid-space world), an important step in scene analysis is to postulate lines in the image
(which later can be associated with key components of the scene). Lines in the image can be created by various
techniques that fit segments of straight lines to edges or to boundaries of regions. For scenes with curved
objects, curves in the image can be created by attempting to fit conic sections (such as ellipses, parabolas, and
hyperbolas) to the primal sketch or to boundaries of regions. These fitting operations, followed by various
techniques for eliminating small lines and joining line and curve segments at their extremes, convert the image
into a line drawing ready for further interpretation. Interpretation of the lines and curves of a line drawing
yields a great deal of useful information about a scene.
Model-Based Vision
Using a variety of model components, model fitting can be employed until either an iconic model of
the entire scene is produced or sufficient information about the scene is obtained to allow the extraction of
features needed for the task at hand. Model-based methods can test their accuracy by comparing the actual

image with a simulated image constructed from the iconic model produced by scene analysis. The simulated
image must be rendered from the model using parameters similar to those used by the imaging process
Stereo Vision and Depth Information
Under perspective projection, a large, distant object might produce the same image as does a similar
but smaller and closer one. Thus, the estimation of the distance to objects from single images is problematical.
Depth information can be obtained using stereo vision, which is based on triangulation calculations using two
(or more) images.

Potrebbero piacerti anche