Sei sulla pagina 1di 19

This article was downloaded by: [University of Arizona] On: 19 October 2011, At: 12:39 Publisher: Taylor &

Francis Informa Ltd Registered in England and Wales Registered Number: 1072954 Registered office: Mortimer House, 37-41 Mortimer Street, London W1T 3JH, UK

Systems Analysis Modelling Simulation


Publication details, including instructions for authors and subscription information: http://www.tandfonline.com/loi/gsam20

Simulation of Multi-Agent Prisoners' Dilemmas


Miklos N. Szilagyi
a a

Department of Electrical and Computer Engineering, University of Arizona, Tucson, AZ, 85721, USA Available online: 29 Oct 2010

To cite this article: Miklos N. Szilagyi (2003): Simulation of Multi-Agent Prisoners' Dilemmas, Systems Analysis Modelling Simulation, 43:6, 829-846 To link to this article: http://dx.doi.org/10.1080/0232929021000055488

PLEASE SCROLL DOWN FOR ARTICLE Full terms and conditions of use: http://www.tandfonline.com/page/terms-and-conditions This article may be used for research, teaching, and private study purposes. Any substantial or systematic reproduction, redistribution, reselling, loan, sub-licensing, systematic supply, or distribution in any form to anyone is expressly forbidden. The publisher does not give any warranty express or implied or make any representation that the contents will be complete or accurate or up to date. The accuracy of any instructions, formulae, and drug doses should be independently verified with primary sources. The publisher shall not be liable for any loss, actions, claims, proceedings, demand, or costs or damages whatsoever or howsoever caused arising directly or indirectly in connection with or arising out of the use of this material.

Systems Analysis Modelling Simulation Vol. 43, No. 6, June 2003, pp. 829846

SIMULATION OF MULTI-AGENT PRISONERS DILEMMAS


MIKLOS N. SZILAGYI*

Downloaded by [University of Arizona] at 12:39 19 October 2011

Department of Electrical and Computer Engineering, University of Arizona, Tucson, AZ 85721, USA
(Received 24 May 2002) This article is a report on computer simulation experiments based on our new agent-based simulation tool to model social situations for the case of large numbers of not necessarily rational decision-makers. It is a simulation of the multiagent Prisoners Dilemma game for realistic (nondyadic) interactions between agents with various personalities. Our model has a number of user-defined parameters such as the size and shape of the simulation environment, the definition of neighborhood, the payoff (reward/penalty) functions, the learning rules, the agents personalities, and the initial conditions. We have performed a series of simulation experiments with various combinations of these parameters. For the case of one class of agents we found two distinctly different nontrivial but remarkably regular solutions. For a wide range of initial conditions, the number of cooperators oscillates around a relatively small value. When the initial aggregate cooperation probability is above a certain value, the solutions tend to reach well-defined constant values that are dependent on the initial values. For other types of agents the solutions show interesting chaos-like behavior. The article provides some insight into the conditions of decentralized cooperation in spatially distributed populations of agents. Keywords: Agent-based simulation; Cooperation; Prisoners Dilemma

1. INTRODUCTION Prisoners Dilemma is usually defined between two players [3] and within game theory that assumes that the players act rationally. Realistic investigations of collective behavior, however, require a multiperson model of the game [34]. Various aspects of the multiperson Prisoners Dilemma have been investigated in the literature [2,46,1315,18,19,21,23,3537,39,41,43]. This topic has great practical importance because its study may lead to a better understanding of the factors stimulating or inhibiting cooperative behavior within social systems. The participants of a Prisoners Dilemma game may be persons, organizational units, or any other decision-making entities. They are usually called agents. The individual agents may cooperate with each other for the collective interest of their society or may defect, i.e., pursue their selfish interests. Their decisions to cooperate or defect will accumulate over time to produce a resulting collective order that will determine the success or failure of the society.
*E-mail: mns@ece.arizona.edu ISSN 0232-9298 print: ISSN 1029-4902 online 2003 Taylor & Francis Ltd DOI: 10.1080/0232929021000055488

830

M.N. SZILAGYI

Downloaded by [University of Arizona] at 12:39 19 October 2011

Formal models have been proposed to simulate collective phenomena [32]. Some of the models include computer simulation. Feinberg and Johnson [8] simulated the effects of alternative strategies on achieving consensus for action. A computer simulation of temporary gatherings was presented by McPhail et al. [29]. Glance and Huberman [11] used a thermodynamical model to investigate outbreaks of cooperation in a social system. Epstein and Axtell [7] demonstrated that it is possible to build complex artificial societies based on simple participating agents. Thousands of articles have been published about the two-agent iterated Prisoners Dilemma game [3,2628]. The interest in investigating various strategies for pair-wise interactions in multiagent Prisoners Dilemma tournaments is amazing because as Rapoport [33] rightly noted these tournaments demonstrated neither evolution nor learning because nothing evolved and nothing was learned in the succession of two-person games. Nevertheless, the obsession with these tournaments continues [17]. Even articles that claim the simulation of multiagent games are based on dyadic interactions between the agents. A stochastic learning model was developed by Macy [25] to explain critical states where threshold effects may cause shifting the system of agents from a defective equilibrium to a cooperative one. Nowak and May [31] and Lloyd [24] wrote simple computer programs that demonstrate the dynamics of deterministic social behavior based on pair-wise interaction between the participants. Akimov and Soutchansky [1] presented a multiagent simulation (not a succession of two-person games) but their experiment was limited to six agents.

2. THE MODEL We have developed an agent-based model for the investigation of social dilemmas with a large number of decision-makers operating in a stochastic environment [40]. Our model has three distinctive new features: 1. It is a genuine multiagent model and it has nothing to do with the repeated two-person games. 2. It is a general framework for inquiry in which the properties of the environment as well as those of the agents are user-defined parameters and the number of interacting agents is theoretically unlimited. 3. Although the analysis of rational agents may predict their behavior in some areas (e.g., economics), biological objects and even human beings are not always rational. It seems to us that human behavior can be best described as stochastic but influenced by personality characteristics. In view of this hypothesis, it becomes crucially important to investigate the role of personalities in Prisoners Dilemma. Our agents have various distinct, user-defined personalities. The participating agents are described as stochastic learning cellular automata, i.e., as combinations of cellular automata [16,44] and stochastic learning automata [10,30]. The cellular automaton format describes the environment in which the agents interact. In our model, this environment is not limited to the agents immediate neighbors: the agents may interact with all other agents simultaneously. Stochastic learning rules provide more powerful and realistic results than the deterministic rules usually used in cellular automata. Stochastic learning means that behavior is not determined

MULTI-AGENT PRISONERS DILEMMAS

831

but only shaped by its consequences, i.e., an action of the agent will be more probable but still not certain after a favorable response from the environment. Szilagyi and Szilagyi [40] describe the model in detail. We will only briefly explain its most important features here. A realistic simulation model of a multiperson game must include a number of parameters that define the game to be simulated. Our model in its present form has the following user-defined parameters: 1. Size and shape of the simulation environment (array of agents). 2. Definition of neighborhood: the number of layers of agents around each agent that are considered its neighbors. 3. Payoff (reward/penalty) functions. 4. Updating schemes (learning rules) for the agents subsequent actions. 5. Personalities. 6. Initial probabilities of cooperation. 7. Initial actions of the agents. Our simulation environment is a two-dimensional array of the participating agents. Its size is limited only by the computers virtual memory. The behavior of a few million interacting agents can easily be observed on the computers screen. There are two actions available to each agent, and each agent must choose between cooperation and defection. Each agent has a probability distribution for the two possible actions. The agents as stochastic learning cellular automata take actions according to their probabilities updated on the basis of the reward/ penalty received from the environment for their previous actions, their neighbors actions, and of the agents personalities. The updating occurs simultaneously for all agents. The updated probabilities lead to new decisions by the agents that are rewarded/ penalized by the environment. After each iteration, the software tool draws the array of agents in a window on the computers screen, with each agent in the array colored according to its most recent action. The experimenter can view and record the evolution of the society of agents as it changes with time. The updating scheme is different for different agents. Agents with completely different personalities can be allowed to interact with each other in the same experiment. Agents with various personalities and various initial states and actions can be placed anywhere in a two-dimensional array. A variety of personality profiles and their arbitrary combinations can be represented in the model, including the following: 1. Pavlovian: its probability of cooperation p changes by an amount proportional to its reward or penalty from the environment (the coefficient of proportionality is the learning rate that is a user-defined parameter). 2. Stochastically predictable: p is a constant. For example, angry: always defects ( p 0); benevolent: always cooperates ( p 1); unpredictable: acts randomly ( p 0.35). 3. Accountant: p depends on the average reward for previous actions. 4. Conformist: imitates the action of the majority. 5. Greedy: imitates the neighbor with the highest reward.

Downloaded by [University of Arizona] at 12:39 19 October 2011

832

M.N. SZILAGYI

Downloaded by [University of Arizona] at 12:39 19 October 2011

FIGURE 1 Reward/penalty functions for defectors (D) and cooperators (C). The horizontal axis x represents the ratio of the number of cooperators to the total number of neighbors; the vertical axis is the reward/penalty provided by the environment. The thicknesses of the curves are determined by the stochastic factors. In this figure, D 0.5 2 x for the defectors, C 1 2 x for the cooperators, and both stochastic factors are zero.

The combination of various personalities with stochastic learning makes it possible to simulate behavior in situations when each member of a group must choose between maximizing selfish interests or collective interests. Aggregate cooperation proportions are changing in time, i.e., over subsequent iterations. The iterations are repeated until a certain pattern appears to remain stable or oscillating. The payoff (reward/penalty) functions are given as two curves: one (C) for a cooperator and another (D) for a defector. The payoff to each agent depends on its choice, on the distribution of other players among cooperators and defectors, and also on the properties of the environment. The payoff curves are functions of the ratio of cooperators to the total number of neighbors (Fig. 1). A stochastic factor can be specified to simulate stochastic responses from the environment. Zero stochastic factors mean a deterministic environment. The freedom of using arbitrary functions for the determination of the reward/penalty system makes it possible to simulate a wide range of social situations, including those where the two curves intersect each other. We wish to emphasize that this is a genuine multiagent model and it has nothing to do with repeated two-person games [3]. It is well suited for simulating the behavior of artificial societies of large number of agents. We assume that, regardless of what the other agents do, each agent receives a higher payoff for defection than for cooperation (the D curve runs above the C curve). On the other hand, all agents receive a higher payoff if all cooperate than if all defect [C(1) > D(0)]. This is a typical multiagent Prisoners Dilemma situation [34]. The agents take actions according to probabilities updated on the basis of the reward/penalty received for their previous actions and of their personalities. The updating scheme may be different for different agents. This means that agents with different personalities can be allowed to interact with each other in the same experiment.

MULTI-AGENT PRISONERS DILEMMAS

833

Downloaded by [University of Arizona] at 12:39 19 October 2011

Agents with various personalities and various initial states and actions can be placed anywhere in the array. The response of the environment is influenced by the actions of all participating agents. The number of neighborhood layers around each agent and the agents location determine the number of its neighbors. The depth of agent As neighborhood is defined as the maximum distance, in the horizontal and vertical directions, that agent B can be from agent A and still be in its neighborhood. We do not wrap around the boundaries; therefore, an agent in the corner of the array has fewer neighbors than one in the middle. The neighborhood may extend to the entire array of agents. It is also very important to properly describe the environment. In our model, even in the almost trivial case when both payoff curves are straight lines and the stochastic factors are both zero, four parameters specify the environment. Attempts to describe it with a single variable [20,31] are certainly too simplistic.

3. PAVLOVIAN AGENTS Most biological objects modify their behavior according to Pavlovs experiments and Thorndikes [42] law of conditioning: if an action is followed by a satisfactory state of affairs, then the tendency to produce that particular action is reinforced. We conclude that Pavlovian agents are realistic candidates for the investigation of the evolution of cooperation. These agents are primitive enough not to know anything about their rational choices. However, they have enough intelligence to follow Thorndikes law. Their probability of cooperation changes by an amount proportional to the reward/penalty received from the environment. Krianes and Kraines [22], Macy [26], Flache and Hegselmann [9] and others used Pavlovian agents for the investigation of iterated two-person games. We will show below that it is possible to accurately predict the solutions of the multiperson Prisoners Dilemma for such agents. A linear updating scheme is used for the Pavlovian agents: the change in the probability of choosing the previously chosen action again is proportional to the reward/ penalty received from the environment (payoff curves). Of course, the probabilities always remain in the interval between 0 and 1. Let us assume that in a society of N Pavlovian agents there are x cooperators and (N x) defectors distributed randomly over the lattice at a certain time. Then xC (N x)D is the total payoff received by the entire society and [xC (N x)D]/ N is the average payoff to a single agent where C and D are the reward/penalty functions as defined earlier. This latter quantity is the so-called production function for the collective action of the society [38]. When the average payoff is zero, it is easy to think that nothing will happen and an equilibrium is reached. This is, however, not true. Indeed, this situation can only happen if either C D 0 or C and D have opposite signs. The first case means the two curves are crossing which contradicts the definition of Prisoners Dilemma. In the second case evidently D is positive and C is negative; therefore, the defectors are rewarded and the cooperators are punished. As a result, the number of cooperators will decrease and we do not have an equilibrium. Let us investigate what happens when the cooperators receive the same total payoff as the defectors, i.e.,

834

M.N. SZILAGYI

xC N xD:

This may happen if C and D are both negative or both positive. In the first case, a small number of cooperators are punished big and a large number of defectors are punished little. This leads to a stable equilibrium at this point. In the second case, a large number of cooperators are rewarded slightly and a small number of defectors are rewarded greatly. This point corresponds to an unstable equilibrium. If C and D are both linear functions of x, then xC (N x)D is a quadratic equation; if C and D are quadratic functions, then it is a cubic equation, etc. The equation generally has up to two real solutions. If both solutions are in the interval 0<x/N<1, then both equilibria are present. We will denote these equilibrium solutions x1/N p1 and x2/N p2, so that 0<p1<p2<1. The initial cooperation probability (which is set as a constant and uniform across all the agents) is p0. Let us consider the payoff curves shown in Fig. 1. Suppose first that p0<p1. Then there are few agents cooperating and many agents defecting. Those agents that happened to cooperate will be heavily punished, and their probability of cooperation will consequently go down substantially. As a result, some of the cooperators will become defectors. The agents that happened to defeat will be punished somewhat, and their probability of cooperation will consequently go up. Because there are so many more defectors than cooperators, the aggregate effect is that the overall cooperation probability goes up toward p1. A systematic formal analysis of equilibria of the Pavlovian learning model in the N-person games shows that in case of linear payoff functions and low initial cooperation rate p0 the value of p1 cannot exceed 50% [37]. If p1<p0<p2, three regions must be distinguished. Near the lower limit, the defectors and the cooperators are still both punished but there are more cooperators now who become defectors; therefore, the aggregate effect is that the overall cooperation probability goes down toward p1. If the value of p0 is chosen higher, we are in the region where the cooperators are punished but the defectors are rewarded. As a result, more will defect and the aggregate probability of cooperation again goes down toward p1. When the value of p0 is even higher, the defectors and the cooperators are both rewarded, but the defectors are rewarded more than the cooperators, so the proportion of cooperators will decrease and an equilibrium will be reached in this region or, if the aggregate probability reaches the region of mutual punishment, the equilibrium will occur at p1 again. The two cases above work together to keep the long-term aggregate cooperation proportion stable at p1. However, since none of the agents are rewarded for always taking the same action (always cooperating or always defecting), the probability of cooperation for an individual agent varies according to the agents own history of actions (and hence rewards). Over the long term, every single agent acquires a distinct cooperation probability depending on its own history of random actions. The amplitude of the aggregate oscillation depends on the size of the population: the larger the population, the more effectively the oscillation of each agents actions is compensated for by the oscillation of all the other agents actions. When p2<p0 there are many agents cooperating and a few agents defecting. The agents that cooperated are rewarded for cooperating; at each iteration their cooperation probability tends toward 1. Since their cooperation probability is high, most of the cooperators continue to cooperate. After a few iterations their cooperation

Downloaded by [University of Arizona] at 12:39 19 October 2011

MULTI-AGENT PRISONERS DILEMMAS

835

probability reaches 1, and they continue to be rewarded so they can never again defect. The few agents that happened to defect are also heavily rewarded; this encourages them to defect. The defectors still have a fairly high probability of cooperation, so at each iteration several of the defectors start to cooperate. (Note that there are defectors with high probability of cooperation and vice versa. What we cannot have is a defector with probability of cooperation consistently equal to 1 or a cooperator with probability of cooperation consistently equal to 0.) The few defectors that still continue to defect will be rewarded greatly for their defection; they eventually reach a probability of cooperation equal to 0 after which they will never cooperate for the duration of the simulation because they are continuously being rewarded for their defection. After a while, the net result is that most of the agents are cooperating with probability 1 and are being continuously rewarded for doing so, and a few of the agents are always defecting, have cooperation probability 0, and are being continuously rewarded for doing so. Thus, a steady state is reached. The two solutions are different from each other in three important ways: 1. The solution at p1 is a stable equilibrium (attractor) with respect to the aggregate cooperation proportion while the solution at p2 is an unstable equilibrium (repulsor). 2. The solution converges toward p1 as an oscillation while it stabilizes exactly in the p2<p0 case. This is because around p1 the agents are punished no matter what they do and tend to change their cooperation probabilities over time. Therefore, these probabilities do not converge to zero or one for any individual agent. In the latter case, each agent in the steady state has a probability of cooperation of 0 or 1, and it is just the proportion of agents cooperating that determines the final aggregate cooperation proportion. 3. Initial aggregate cooperation proportions of p0 > p2 do not result in the aggregate cooperation proportion converging to 1, as you would expect if you think of p2 as an unstable equilibrium. This is because, for an individual agent that started off as a defector, there is always some likelihood that the agent will continue to defect. This probability is initially small but continues to increase as the agent is always rewarded for defecting. If the number of agents is sufficiently large and p0 is not too close to 1, then there will be some agents that continue to defect until their cooperation probability reaches zero due to the successive rewards they have received, and these agents will defect forever. The exception is if you start off with the aggregate cooperation proportion equal to 1. Then no agent starts as a defector and there is no chance of any of them defecting in the steady state. The solutions can be predicted in a similar way for any situation. We have developed an algorithm that accurately predicts the final aggregate outcome for any combination of Pavlovian agents and any payoff functions [41]. The predictions are exact for an infinite number of agents but the experimental results of the simulation approximate the predictions very closely even for a few hundred agents and they are in complete agreement with the above qualitative explanation. For the experiments reported in this article the simulated societies have 10 000 agents each. The payoff curves for most experiments are given in Fig. 1. The graphics output of Fig. 2 shows the initial configuration for the case when the initial actions of all agents are random and their initial probability of cooperation is equal to 0.5. We see an approximately equal number of black (cooperator) and white (defector) spots. The initial state of the system is the decisive factor that

Downloaded by [University of Arizona] at 12:39 19 October 2011

836

M.N. SZILAGYI

Downloaded by [University of Arizona] at 12:39 19 October 2011

FIGURE 2 Graphics output of the initial configuration for the case when the initial actions of all agents are random and their initial probability of cooperation are equal to 0.5. There is an approximately equal number of black (cooperator) and white (defector) spots.

FIGURE 3 Evolution of a society of agents for the case when all agents are Pavlovian, the payoff curves are given by Fig. 1, and the neighborhood is one layer deep. The graphs show the proportions of cooperating agents as functions of the number of iterations. The initial cooperation probabilities are shown at the curves.

MULTI-AGENT PRISONERS DILEMMAS

837

Downloaded by [University of Arizona] at 12:39 19 October 2011

FIGURE 4 Graphics output of the 100th iteration for the case when all agents are Pavlovian, the payoff curves are given by Fig. 1, and the neighborhood is one layer deep.

determines its future state. This can be clearly seen from Fig. 3 that shows the evolution of a society for the case when all agents are Pavlovian, the payoff curves are given by Fig. 1, and the neighborhood is one layer deep. The graphs show the proportions of cooperating agents for different initial cooperation probabilities as functions of the number of iterations. The initial values of the aggregate cooperation probabilities as shown at the curves. We see that after several hundred iterations the solutions tend to reach mildly oscillating equilibria, the values of which strongly depend on the initial values of the aggregate cooperation probability. When these initial values are in the range between 0 and 0.65, the solutions oscillate around 0.18. Figure 4 shows the graphics output of the 100th iteration for the case when the initial cooperation probability is equal to 0.5. The situation is even more dramatic when the neighborhood is the entire society. Figure 5 shows that in this case there are two distinctly different solutions. When the initial values are in the range between 0 and 0.69, all solutions oscillate around 0.18. When the initial aggregate cooperation probability is above 0.695, the solutions tend to reach well-defined constant values that are dependent on the initial values. This result certainly satisfies the definition of chaos as sensitive dependence on initial conditions [12]. It means that a perturbation to the initial state of the system will cause the system to evolve into a different future state within a finite period

838

M.N. SZILAGYI

Downloaded by [University of Arizona] at 12:39 19 October 2011

FIGURE 5 Evolution of a society for the case when all agents are Pavlovian, the payoff curves are given by Fig. 1, and the neighborhood is the entire society. The graphs show the proportions of cooperating agents as functions of the number of iterations. The initial cooperation probabilities are shown at the curves.

FIGURE 6 Graphics output of the initial configuration for the case when the initial actions of all agents are random but the society is equally divided into two parts: agents in the upper half initially defect while those in the lower half initially cooperate.

MULTI-AGENT PRISONERS DILEMMAS

839

Downloaded by [University of Arizona] at 12:39 19 October 2011

FIGURE 7 Graphics output of the 500th iteration for the initial case of Fig. 6 when the neighborhood is one layer deep.

FIGURE 8 Graphics output of the 500th iteration for the initial case of Fig. 6 when the neighborhood is ten layers deep.

840

M.N. SZILAGYI

Downloaded by [University of Arizona] at 12:39 19 October 2011

FIGURE 9 Graphics output of the 63rd iteration for the initial case of Fig. 6 when the neighborhood is the entire society.

of time. Thus, a very small difference in the initial cooperation ratio leads to

FIGURE 10 Graphics output of the 100th iteration for the initial case of Fig. 6 when the neighborhood is the entire society.

MULTI-AGENT PRISONERS DILEMMAS

841

totally different behaviors. This phenomenon exactly satisfies the discussion of Eq. (1) above. More realistic simulations must take nonuniform distributions of different agents into account. Figure 6 shows the graphics output of the initial configuration for the case when the initial actions of all agents are random but the society is equally divided into two parts: agents in the upper half initially defect, those in the lower half initially cooperate. If the neighborhood is one layer deep, the upper half will be gradually infected with cooperators (Fig. 7). As the neighborhood depth is increased, a protective layer is formed where no cooperation occurs (Fig. 8). The situation is completely different when the neighborhood is the entire society. In this case change starts in the lower region (Fig. 9) and it spreads into the entire society (Fig. 10).

Downloaded by [University of Arizona] at 12:39 19 October 2011

4. AGENTS WITH OTHER PERSONALITIES We have simulated the behavior of artificial societies for the case of various personalities of the participating agents. The stochastically predictable agents have, in fact, no personality. Therefore, we will not consider them in this study. For the other personality types, remarkably interesting patterns arise even when all agents have the same personality. Let us now investigate the behavior of societies with all greedy, all accountant, and all conformist agents. If all agents are greedy and the neighborhood extends to the entire society, they will all defect immediately at the first iteration because they will all imitate the defectors who received a higher reward for their initial action than the cooperators. The situation is not so hopeless for a one-layer deep neighborhood but the behavior will stabilize with a very small number of cooperators (Fig. 11). If all agents are accountants, the interesting case again is when the neighborhood is one layer deep. Then the behavior will stabilize with a small number of the agents cooperating (Fig. 12). If all agents are conformists and the neighborhood extends to the entire society, then the outcome depends on the exact relationship between the initial number of cooperators and defectors: every agent will immediately imitate the majority and stay there. The behavior becomes quite interesting for the one-layer deep neighborhood. In this case, while the proportion of cooperators will not change substantially, their distribution will. Both cooperators and defectors will form mutually intertwined clusters (Fig. 13). The number of variations in infinitely large. We can change all the parameters simultaneously and mix different personalities in arbitrary ways. Figure 14 shows a situation when a single defector sits in the middle of the society for which C x and D 1.65x. If all agents are greedy and the neighborhood is one layer deep, beautiful symmetric fractal patterns arise (Fig. 15) that oscillate around a 29% cooperation rate. Similar patterns were found by Nowak and May [31] in a simulation based on dyadic interactions between the participating agents. Szilagyi [39] reported additional interesting cases.

842

M.N. SZILAGYI

Downloaded by [University of Arizona] at 12:39 19 October 2011

FIGURE 11 Graphics output of the 100th interation for the case when all agents are greedy, the payoff curves are given by Fig. 1, and the neighborhood is one layer deep.

FIGURE 12 Graphics output of the 100th iteration for the case when all agents are accountants, the payoff curves are given by Fig. 1, and the neighborhood is one layer deep.

MULTI-AGENT PRISONERS DILEMMAS

843

Downloaded by [University of Arizona] at 12:39 19 October 2011

FIGURE 13 Graphics output of the 100th iteration for the case when all agents are conformists, the payoff curves are given by Fig. 1, and the neighborhood is one layer deep.

FIGURE 14 Graphics output of the initial configuration for the case when a single defector sits in the middle of the society.

844

M.N. SZILAGYI

Downloaded by [University of Arizona] at 12:39 19 October 2011

FIGURE 15 Graphics output of the 100th iteration for the initial case of Fig.14. All agents are greedy, the neighborhood is one layer deep, and the payoff curves are given by C x and D 1.65x.

5. CONCLUSION The multiagent Prisoners Dilemma game shows nontrivial but remarkably regular solutions. For the case of Pavlovian agents we found two distinctly different solutions. For a wide range of initial conditions, the number of cooperators in the society oscillates around a relatively small value. When the initial aggregate cooperation probability is above a certain value, the solutions tend to reach well-defined constant values that are dependent on the initial values. Universal defection occurs only when p1 0 and p0 < p2 or when p2 is the only solution and p0 ( p2. For other types of agents the solutions show interesting chaos-like behavior. The article provides some insight into the conditions of decentralized cooperation in spatially distributed populations of agents. Acknowledgments This work was partially supported by the University of Arizona Foundation and the Office of the Vice President for Research of the University of Arizona. The author is indebted to Drs. Jeff Besser and David Lambert (Space and Naval Warfare Systems Center, San Diego, CA) for valuable discussions. References
[1] V. Akimov and M. Soutchansky (1994). Automata simulation of N-person social dilemma games. J. of Conflict Resolution, 38(1), 138148.

MULTI-AGENT PRISONERS DILEMMAS

845

[2] J.M. Anderson (1974). A model for The Tragedy of the Commons. IEEE Transactions on Systems, Man, and Cybernetics, 103105. [3] R. Axelrod (1984). The Evolution of Cooperation. Basic Books, New York. [4] V.E. Bixenstine, C.A. Levitt and K.V. Wilson (1966). Collaboration among six persons in a Prisoners Dilemma game. J. of Conflict Resolution, 10(4), 488496. [5] P. Bonacich, G.H. Shure, J.P. Kahan and R.J. Meeker (1976). Cooperation and group size in the N-person Prisoners Dilemma. J. of Conflict Resolution, 20(4), 687706. [6] R.M. Dawes (1980). Social dilemmas. Ann. Rev. Psychol., 31, 169193. [7] J.M. Epstein and R. Axtell (1996). Growing Artificial Societies. Brookings Institution Press The MIT Press, Washington, Cambridge, London. [8] W.E. Feinberg and N.R. Johnson (1990). Radical leaders, moderate followers: effects of alternative strategies on achieving consensus for action in simulated crowds. J. Mathematical Sociology, 15, 91115. [9] A. Flache and R. Hegselmann (1999). Rationality vs. learning in the evolution of solidarity networks: a theoretical comparison. Computational and Mathematical Theory, 5(2), 97127. [10] A. Flache and M.W. Macy (1996). Weakness of strong ties: collective action failure in a highly cohesive group. J. Mathematical Sociology, 21, 328. [11] N.S. Glance and B.A. Huberman (1993). The outbreak of cooperation. J. of Math. Sociol., 17(4), 281302. [12] J. Gleick (1987). Chaos: Making a New Science. Viking, New York. [13] D.J. Goehring and J.P. Kahan (1976). The uniform N-person Prisoners Dilemma game. J. of Conflict Resolution, 20(1), 111128. [14] H. Hamburger (1973). N-person Prisoners Dilemma. J. of Mathematical Sociology, 3, 2748. [15] D.D. Heckathorn (1988). Collective sanctions and the creation of Prisoners Dilemma norms. American Journal of Sociology, 94(3), 535562. [16] R. Hegselmann and A. Flache (1998). Understanding complex social dynamics: a plea for cellular automata based modeling. J. of Aritificial Societies and Social Simulation, 1, 3. [17] R. Hoffmann (2000). Twenty years on: The Evolution of Cooperation revisited. J. of Artificial Societies and Social Simulation, 3, 2. [18] B.A. Huberman and N.S. Glance (1993). Evolutionary games and computer simulations. Proc. Natl. Acad. Sci. USA, 90, 77167718. [19] H.H. Kelley and J. Grzelak (1972). Conflict between individual and common interest in an N-person relationship. J. of Personality and Social Psychology, 21(2), 190197. [20] S.S. Komorita (1976). A model of the n-person dilemma-type game. Journal of Experimental Social Psychology, 12, 357373. [21] S.S. Komorita and C.D. Parks (1994). Social Dilemmas. Brown and Benchmark, Madison, WI. [22] D. Kraines and V. Kraines (1989). Pavlov and the Prisoners Dilemma. Theory and Decision, 26, 4779. [23] W.B.G. Liebrand, D.M. Messick and H.A.M. Wilke (Eds.) (1992). Social Dilemmas: Theoretical Issues and Research Findings. Pergamon Press, Oxford, New York. [24] A.L. Lloyd (1995). Computing bouts of the Prisoners Dilemma. Scientific American 272(6), 110115. [25] M.W. Macy (1991). Chains of cooperation: Threshold effects in collective action. Am. Sociol. Rev., 56, 730747. [26] M.W. Macy (1995). Pavlov and the evolution of cooperation. An experimental test. Social Psychology Quarterly, 58(2), 7487. [27] L. Marinoff (1992). Maximizing expected utilities in the Prisoners Dilemma. J. of Conflict Resolution, 36(1), 183216. [28] D.M. Messick and W.B.G. Liebrand (1995). Individual heuristics and the dynamics of cooperation in large groups. Psychological Review, 102, 131145. [29] C. McPhail, W.T. Powers and C.W. Tucker (1992). Simulating individual and collective action in temporary gatherings. Soc. Sci. Computer Rev., 10, 128. [30] K.S. Narendra and M.A.L. Thathachar (1989). Learning Automata (An Introduction). Prentice Hall, Englewood Cliffs, NJ. [31] M.A. Nowak and R.M. May (1992). Evolutionary games and spatial chaos. Nature, 359, 826829. [32] P.E. Oliver (1993). Formal models of collective action. Ann. Rev. Sociol., 19, 271300. [33] A. Rapoport (1994). Editorial comments. J. of Conflict Resolution, 38(1), 149151. [34] T.C. Schelling (1973). Hockey helmets, concealed weapons, and daylight saving J. of Conflict Resolution, 17(3), 381428. [35] D.A. Schroeder (Ed.) (1995). Social Dilemmas: Perspectives on Individuals and Groups. Praeger, Westport, CT. [36] U. Schulz, W. Albers and U. Mueller (Eds.) (1994). Social Dilemmas and Cooperation. Springer-Verlag, Berlin-Heidelberg-New York. [37] F. Szidarovszky and M.N. Szilagyi (2002). An Analytical Study of the N-Person Prisoners Dilemma. Southwest Journal of Pure and Applied Mathematics. [38] M.N. Szilagyi (2000). Quantitative relationships between collective action and Prisoners Dilemma. Systems Research and Behavioral Science, 17, 6572.

Downloaded by [University of Arizona] at 12:39 19 October 2011

846

M.N. SZILAGYI

[39] M.N. Szilagyi (2001). Solutions to realistic Prisoners Dilemma games. Proceedings of the 2001 IEEE Systems, Man, and Cybernetics Conference, TA12/2, pp. 841846. [40] M.N. Szilagyi and Z.C. Szilagyi (2000). A tool for simulated social experiments. Simulation, 74(1), 410. [41] M.N. Szilagyi and Z.C. Szilagyi (2002). Nontrivial solutions to the N-Person Prisoners Dilemma. Systems Research and Behavioral Science, 19(3), 281290. [42] E.L. Thorndike (1911). Animal Intelligence. Hafner, Darien, CT. [43] R.L.Weil (1966). The N-person Prisoners Dilemma: some theory and a computer-oriented approach. Behavioral Science, 11, 227234. [44] S. Wolfram (1994). Cellular Automata and Complexity. Addison-Wesley, Reading, MA.

Downloaded by [University of Arizona] at 12:39 19 October 2011

Potrebbero piacerti anche