Sei sulla pagina 1di 5

Complex dynamics in learning complicated games

which can cause an explosion in the number of possible equilibria5, 6, 11, 12 . This is further complicated if the players are not rational and must learn their strategies 710 . In 1 The University of Manchester, School of Physics and a few special cases it has been observed that the strategies Astronomy, Schuster Building, Manchester M13 9PL, display complex dynamics and fail to converge to equilibUnited Kingdom rium solutions15 . Are such games special, or is this typi2 Santa Fe Institute, 1399 Hyde Park Road, Santa Fe, NM cal behavior? More generally, under what circumstances 87501 should we expect that games become so hard to learn that 3 LUISS Guido Carli, Viale Pola 12, 00198 Roma, Italy their dynamics fail to converge? What kind of behavior should we expect and how should we characterize the soGame theory is the standard tool used to model strate- lutions? gic interactions in evolutionary biology and social science1, 2 . Traditional game theory studies the equilibHere we show that for complicated games, under ria of simple games3, 4 . But is traditional game theory applicable if the game is complicated, and if not, what a wide variety of circumstances one should expect comis? We investigate this question here, dening a com- plex dynamics in which the players never converge to a plicated game as one with many possible actions, and xed strategy. Instead their strategies continually vary as therefore many possible payoffs conditional on those each player responds to past conditions and attempts to actions. We investigate two-person games in which the do better than the other players. This corresponds to highplayers learn based on experience710 . By generating dimensional chaotic dynamics, suggesting that for most games at random5, 6, 11, 12 we show that under some cir- intents and purposes the behavior is essentially random. cumstances the strategies of the two players converge to xed points, but under others they follow limit cycles or chaotic attractors. The key parameters are the Description of the model memory loss in the players learning algorithm and the correlation of the payoffs of the two players, which de- We study 2-player games. For convenience call the two termines the extent to which the game is zero sum. The players Alice and Bob. At each time step t player dimension of the chaotic attractors can be very high, {Alice = A, Bob = B} chooses between one of N possiimplying that the dynamics of the strategies are effec- ble actions, picking the ith action with frequency x (t), i tively random. In the chaotic regime the payoffs uc- where i = 1, . . . , N . The frequency vector x (t) = tuate intermittently, showing bursts of rapid change (x , . . . , x ) is the strategy of player . If Alice plays 1 punctuated by periods of quiescence, similar to the i and BobNplays j, Alice receives receives payoff A ij 13 clustered volatility observed in nancial markets and and Bob receives payoff B . We assume that the play14 ji uid turbulence . Our results suggest that for comers learn their strategies x via a form of reinforcement plicated strategic interactions there is a large paramelearning called experience weighted attraction. This has ter regime in which the tools of dynamical systems are been extensively studied by experimental economists who more useful than those of standard equilibrium game have shown that it provides a reasonable approximation theory. for how real people learn in games7, 9 . Actions that have proved to be successful in the past are played more frequently and actions that have been less successful are I NTRODUCTION played less frequently. To be more specic, the probability of the actions is Traditional game theory gives a good understanding for simple games with a few players, or with only a few possible actions, characterizing the solutions in terms of their equilibria3, 4 . The applicability of this approach is eQi (t) not clear when the game becomes more complicated, for , (1) xi (t) = Q (t) example due to more players or a larger strategy space, k ke 1 Tobias Galla 1 and J. Doyne Farmer2,3 January 12, 2011

Figure 1: An illustration of the complex game dynamics of the strategy x (t) for different parameters. There are i N = 50 possible actions for each player. The game dynamics ranges from (a) a limit cycle to (b and c) low and intermediate dimensional chaotic motion to (d) high-dimensional chaos. The upper panel shows three-dimensional projections of the attractors in 98-dimensional phase space, scaling each coordinate logarithmically. Lower panels depict the time series of the corresponding three coordinates. Note the scale: As the dimension of the attractor increases, so does the range of x . For the highest dimensional case a given action has occasional bursts where it is i highly probable, and long periods where its probability is extremely small (at low as 1024 ). where Q is called the attraction for player i to strategy i . Alices strategy attractions are updated according to We choose games at random by drawing the elements of the payoff matrices from a normal disij tribution. The mean and the covariance are chosen so QA (t + 1) = (1 )QA (t) + A x B (2) that E[ ] = 0, E[( )2 ] = 1/N , and E[A B ] = i i ij j ij ji ij ij j (1 + )/N , where E[x] denotes the average of x. The variable is a crucial parameter which measures the deand similarly for Bob. viation from a zero-sum game. When = 1 the game is zero sum, i.e. the amount Alice wins is equal to the The dynamics for updating the strategies x of the amount Bob loses, whereas when = 0 their payoffs are i two players are completely deterministic. This approxi- uncorrelated. mates the situation in which the players vary their strategies slowly in comparison to the timescale on which they play the game. The key parameters that characterize the Results learning strategy are and . The parameter is called the intensity of choice; when is large a small historiWhen we simulate games with N = 50 pure strategies cal advantage for a given action causes that action to be we observe very different behaviors depending on paramvery probable, and when = 0 all actions are equally eters. In many cases we see stable learning dynamics, in likely. The parameter species the memory in the learnwhich the stategies of both players evolve to reach a xed ing; when = 1 there is no memory of previous learning point. For a large section of the parameter space, howsteps, and when = 0 all learning steps are remembered ever, the strategies x (t) do not settle to a xed point, but and are given equal weight, regardless of how far in the rather relaxe onto a more complicated orbit, either a limit past. The case = 0 corresponds to the well-known replicycle or a chaotic attractor. We characterize the attraccator dynamics used to describe evolutionary processes in tors in the unstable attractor space by numerically compopulation biology1, 16 . 2

puting the Lyapunov exponents i , i = 1, . . . , 2N 2, which characterize the rate of expansion or contraction of nearby points in the state space. The Lyapunov exponents also determine the Lyapunov dimension D, which characterizes the number of degrees of freedom of the motion in the 98-dimensional state space. We give several examples of the game dynamics x (t) in Fig. 1, including a i limit cycle and chaotic attractors of varying dimensionality. There can also be long transients in which the trajectory follows a complicated orbit for a long time and then suddenly collapses into a xed point. Of course, the behavior that is observed depends on the random draws of the payoff matrices , but as we move away from the ij stability boundary we observe fairly consistent behavior. Simulating games at many different parameter values reveals the stability diagram given in Fig. 2. Roughly speaking we nd that the dynamics are stable1 when 1 (zero sum games) and / is large (short memory), i.e. in the lower right of the diagram, and unstable when 0 (uncorrelated payoffs) and / is small (long memory), i.e. in the upper left. Interestingly, for reasons that we do not understand the highest dimensional behavior is observed when the payoffs are moderately anticorrelated ( 0.6) and when players have good memories and do not discount past payoffs by a lot (/ 0). In this case we often nd that D = 2N 2, i.e. the attractor lls all the dimensions of the phase space. A good approximation of the boundary between the stable and unstable regions of the parameter space can be computed analytically using techniques from statistical physics. We use path-integral methods from the theory of disordered systems17 to derive a stochastic process for an effective strategy in the limit of innite payoff matrices, N . The stability of xed points of the representative process can be computed in a continuous-time limit (see Supplementary Material). This also allows us to show that in this limit, at xed stability depends only on the ratio /, but not on or separately.

Figure 2: Stability diagram showing regions in parameter space where stable learning is found and where chaotic motion occurs. The solid line is obtained from the pathintegral analysis (see Supplementary Material), and indicates the onset of chaos in the continuous system. The coloured squares are data from simulations of the dynamics (1,2) and represent the typical dimension of the attractor (averaged over eight independent payoff matrices per data point).

Another interesting property of this system is the time dependence of the received payoffs. As shown in Fig. 3, when the dynamics are chaotic the payoff varies, with intermittent bursts of large uctuations punctuated by relative quiescence. This is observed, although to varying degrees, throughout the chaotic part of the parameter space. There is a strong resemblance to the clustered volatility observed in nancial markets (which in turn resembles uctuations observed in uid turbulence)14 . We also observe heavy tails in the distribution of the uctuations, as well as a concentration of power at lowfrequencies, as described in more detail in the SuppleWe have also simulated these games at various valmentary Information. This suggests that these properties, ues of N . Not surprisingly, we nd that if D > 0 at small which have received a great deal of attention in studies N , the dimension D tends to increase with N . At this of nancial markets, may occur simply because they are stage we have been unable to tell whether D reaches a generic properties of complicated games 2 . nite limit as N .
1 Note that the xed point reached in the stable regime is only a Nash equilibrium at = 0 and in the limit 0. When > 0 the players are effectively assuming their opponents behavior is non-stationary, and that more recent moves are more useful than moves in the distant past. 2 In contrast to nancial markets, the correlation function of the clustered volatility and the distribution of heavy tails decay exponentially (as opposed to following a power law). We hypothesize that this is because the players in nancial markets use a variety of different timescales .

random games. One can then guess at the behavior of that game under the players learning algorithm by locating it in the stability diagram of Fig. 2. We have shown that a key property of a game is its zero-sumness, characterized by . Games become harder to learn (in the sense that the strategies do not converge) when they are non-zerosum, particularly if the players use learning algorithms with long memory. Our approach gives a methodology for classifying the learnability of games by extending this type of analysis to multiplayer games, games on networks, alternative learning algorithms, etc. It suggests that under many circumstances it is more useful to abandon the tools of classic game theory in favor of those of dynamical systems. It also suggests that many behaviors that have attracted considerable interest, such as clustered volatility Figure 3: Chaotic dynamics displays clustered volatility. in nancial markets, may simply be examples of a highly We plot the difference of payoffs on successive time steps generic phenomenon. for case (c) in Fig. 1. The amplitude of the uctuations increases with the dimension of the attractor. References 1. M. A. Nowak, Evolutionary dynamics, Harvard University Press, Cambridge MA (2006) Why is dimensionality relevant? 2. J. Hofbauer, K. Sigmund, Evolutionary games and population dynamics, Cambridge University Press, The fact that the equilibria of a game are unlearnable Cambridge, 1998 with any particular learning algorithm, such as reinforcement learning, does not imply that learning is not possible 3. J. Nash, Equilibrium points in n-person games, Prowith some other learning algorithm. For example, if the ceedings of the National Academy of Sciences 36 (1) learning dynamics settles into a limit cycle or a low di48-49 (1950) mensional attractor, a careful observer could collect data and make better predictions about the other player using 4. J. von Neumann, O. Morgenstern, Theory of Games and Economic Behaviour, Princeton University Press, the method of analogues18 , or renements based on local Princeton NJ (2007) approximation19 . If the dimension of the chaotic attractor is too high, however, the curse of dimensionality makes 5. A. McLennan, J. Berg, The asymptotic expected this impossible with any reasonable amount of data19 . In number of Nash equilibria of two player normal form this case it is not clear that any algorithm can provide an games, Games and Economic Behavior 51(2), 264improvement. The observation of high-dimensional dy295 (2005) namics here leads us to conjecture that there are some games that are inherently unlearnable, in the sense that any learning algorithms the two players use will inevitably result in high-dimensional chaotic learning dynamics (See also Sato et al.15 ). 6. J. Berg, M. Weigt, Entropy and typical properties of Nash equilibria in two-player Games, Europhys. Lett. 48(2), 129-135 (1999). 7. T. H. Ho, C. F. Camerer, J.-K. Chong, Self-tuning experience weighed attraction learning in games, J. Econ. Theor. 133 177-198 (2007) 8. C. Camerer, T.H. Ho, Experience-weighted attraction learning in normal form games, Econometrica 67 (1999) 827 9. C, Camerer, Behavioral Game Theory: Experiments in Strategic Interaction (The Roundtable Series in 4

Conclusions The approach we have taken here makes it possible to estimate a priori the properties of the dynamics of any given complicated game 2 player game where the players use reinforcement learning. This is because the payoff matrix of any given game is a possible draw from an ensemble of

Behavioral Economics), Princeton University Press, Princeton NJ, 2003 10. D. Fudenberg, D.K. Levine, Theory of Learning in Games, MIT Press, Cambridge MA (1998) 11. M. Opper, S. Diederich, Phase transition and 1/f noise in a game dynamical model, Phys. Rev. Lett. 69 1616-1619 (1992) 12. S. Diederich, M. Opper, Replicators with random interactions: A solvable model, Phys. Rev. A 39 43334336 (1989). 13. Engle, R. F., Autoregressive conditional heteroscedasticity with estimates of the variance of United Kingdom ination, Econometrica 50 (4) 9871007 (1982) 14. Ghashghaie, S., Breymann, W., Peinke, J., Talkner, P., Dodge, Y., Turbulent cascades in foreign exchange markets, Nature 381 767-770 (1996) 15. Y. Sato, E. Akiyama, J. D. Farmer, Chaos in learning a simple two-player game, Proc. Nat. Acad. Sci. USA 99 4748-4751 (2002) 16. Y. Sato, J.-P. Crutcheld, Coupled replicator equations for the dynamics of learning in multiagent systems, Phys. Rev. E 67 015206(R) (2003) 17. De Dominicis, C., Phys. Rev. B 18 4913 (1978) 18. Lorenz, E. N., Atmospheric predictability revealed by naturally occurring analogues, J. Atmos. Sci. 26, 636646 (1969) 19. J. D. Farmer, J. J. Sidorowich, Predicting chaotic time series, Phys. Rev. Lett. 59 845848 (1987)

Acknowledgements We would like to thank National Science Foundation grant 0624351. We would also like to thank Yuzuru Sato and Nathan Collins for useful discussions. Competing Interests The authors declare that they have no competing nancial interests. Author Contributions Both authors were involved in the design and analysis of the model. TG ran the simulations and carried out the analytical calculations. JDF and TG wrote the paper. Correspondence Correspondence and requests for materials should be addressed to JDF (jdf@santafe.edu).

Potrebbero piacerti anche