Sei sulla pagina 1di 27

Game Playing

History of Games
• Play Chess: Analytical Engine [Babbage, 1846)
• Tic-tac-toe [Bowden 1953]
• Mechanism to play chess by machine [Claude
Shanon, 1950]
• Describe chess playing program [Turing, 1951)
• First significant, operational game-playing program
[Samuel, 1952-57]
 Checkers: learning from its mistakes & improve
evaluation accuracy
Games: A good domain to explore machine
intelligence
• Two reasons:
1. They provide a structured task in which it is very easy to
measures success or failure
2. They did not obviously require large amount of knowledge.
They were thought to be solvable by straightforward search
from the starting state to a winning state.
-First reason remains valid
-2nd is not true for any simplest game (Chess)
 Average branching factor: 35
 In an average game: 50 moves/player
 Complete game tree, need to examine: 35100 positions
Games vs. Search Problems
• Thus, it is clear that a program that simply does
a straightforward search of the game tree will
not be able to select even its first move during
the lifetime of its opponent.
• Some kind of heuristic search procedure is
necessary
Generate-and-test
• In order to improve the effectiveness of a
search based problem solving program 02
things can be done:
1. Improve the generate procedure so that only
good moves (paths) are generated
2. Improve the test procedure so that the best
moves (paths) will be recognized & explore
first
Games vs. Search Problems
• By incorporating heuristic knowledge into both the
generator & the tester, the performance of the
overall system can be improved.
• A good static evaluation function
• What Search strategy should we use?
 A*: okay for one player game. Inadequate for 02-
person games
• Minimax Procedure
• B* Algorithm
Games
• A game can be formally defined as a kind of search problem with 04
components:
1. The initial state, which includes the board position & identifies the
player to move
2. A successor function, which returns a list of (move, state) pairs,
each indicating a legal move & resulting state
3. A terminal test, which determines when the game is over. States
where the game has ended are called terminal states
4. A utility function, which gives a numeric values for the terminal
states.
- In chess, the outcome is a win, loss, or draw, with values +1, -1, or 0.
-Backgammon (+192~-192)
PLY and MINIMAX
• The possible moves for MAX at the root node are
labeled (a1, a2, a3). The possible replies to a1 for MIN
(b1, b2 , b3), and so on.
• This particular game ends after one move each by
MAX and MIN
• In game parlance, this tree is one move deep,
consisting of two half-moves, each of which is called
a ply
• The utilities of the terminal states in this game: 2~14
PLY and MINIMAX
• Given a game tree, the optimal strategy can be determined by
examining the minimax value of each node: MINIMAX-VALUE (n)
• The minimax value of a node is the utility (for MAX) of being in
the corresponding state, assuming that both players play
optimally from there to the end of the game
• Minimax value of a terminal state: utility
• MAX will prefer to move to a state of maximum value
• MIN prefers a state of minimum value
MINIMAX  VALUE(n) 


UTILITY(n) if n is a terminal state

maxs  Successors(n) MINIMAX  VALUE ( s ) if n is a MAX node


min
 s  Successors ( n ) MINIMAX  VALUE ( s ) if n is a MIN node
Minimax
• Depth-first, depth-limited search procedure
• Idea:
• Start at the current position & use the plausible-move generator
to generate the set of possible successor positions
• Apply the static evaluation function to those positions & simply
choose the best one
• Back that value up to the starting position to represent evaluation
for it
• The starting position is exactly good for us as the position
generated by the best move we can make next
• Assume that the static evaluation function returns large values to
indicate good situations for us, so goal is to maximize the value
of the static evaluation function of the next board position.
Game tree (2-player, deterministic, turns)
Example: Minimax
Minimax algorithm
Properties of Minimax
- Cutoff
• The problem with minimax search is that the number of game states it has
to examine is exponential in the number of moves.
• Unfortunately we can't eliminate the exponent, but we can effectively cut
it in half.
• The trick is that it is possible to compute the correct minimax decision
without looking at every node in the game tree.
• That is, we can use the idea of pruning in order to eliminate large parts of
the tree from consideration.
• The particular technique we will examine is called alpha-beta
pruning.
• When applied to a standard minimax tree, it returns the same move as
minimax would, but prunes away branches that cannot possibly
influence the final decision.
- pruning
- pruning
- pruning
- pruning
- pruning
• Another way to look at this is as a simplification
of the formula for MINIMAX-VALUE.
• Let the two unevaluated successors of node C
in Fig. have values x & y & let z be the minimum
of x & y. The value of the root node is given by

• In other words, the value of the root & hence


the minimax decision are independent of the
values of the pruned leaves x & y.
General Idea
• Alpha-beta pruning can be
applied to trees of any
Player
depth,
• It is often possible to prune opponent m
entire sub-trees rather than
just leaves.
• General principle: consider Player
a node n somewhere in the
opponent n
tree, such that Player has a
choice of moving to that
node.
Player

If Player has a better choice


opponent m
m either at the parent node
of n or at any choice point
further up, then n will never
Player
be reached in actual play.
So once we have found out opponent n
enough about n (by
examining some of its If m is better than n for Player,
descendants) to reach this we will never get to n in play

conclusion, we can prune it.


• Remember that minimax search is depth-first, so at any one
time we just have to consider the nodes along a single path in
the tree. Alpha-beta pruning gets its name from the following
two parameters that describe bounds on the backed-up values
that appear anywhere along the path:
•  = the value of the best (i.e., highest-value) choice we have
found so far at any choice point along the path for MAX.
•  = the value of the best (i.e., lowest-value) choice we have
found so far at any choice point along the path for MIN.
• Alpha-beta search updates the values of  &  as it goes along &
prunes the remaining branches at a node (i.e., terminates the
recursive call) as soon as the value of the current node is known
to be worse than the current  or  value for MAX or MIN,
respectively.
Conclusion
• Minimax

• Alpha-beta cut off

Potrebbero piacerti anche