3 ,,n,Cooperation and control in multiplayer social dilemmas b.; 65 4 " " Christian Hilbe' ", Bin Wub, Arne Traulsenb, and Martin A. Nowak" 66 5 67 0:7.6.9 °Program for Evolutionary Dynamics, Harvard University, Cambridge, MA 0213t °Department for Evolutionary Theory, Max Planck Institute for 6 Evolutionary Biology, 20306 Plan, Germany; and 'Department of Organismic and Evolutionary Biology and Department of Mathematics, Harvard University, 68 7 Cambridge, MA 02138 69 8 70 Edited by Joshua B. Plotkin, University of Pennsylvania, Philadelphia, PA. and accepted by the Editorial Board September 26, 2010 (received for review 9 April 30, 2010) 71 10 72 II Direct reciprocity and conditional cooperation are important mecha- prevent free riders from taking over. Our results, however, are 73 nisms to prevent free riding in social dilemmas. However, in large not restricted to the space of ZD strategies. By extending the 12 techniques introduced by Press and Dyson (23) and Akin (27), we 74 groups, these mechanisms may become ineffective because they re- 13 quire single individuals to have a substantial influence on their peers. also derive exact conditions when generalized versions of Grim, Tit- 75 14 However, the recent discovery of zero-determinant strategies in the for-Tat, and Win-Stay Lase-Shift allow for stable cooperation. In 76 5 iterated prisoner's dilemma suggests that we may have underesti- this way, we find that most of the theoretical solutions for the re- 77 16 mated the degree of control that a single player can exert. Here, peated prisoner's dilemma can be directly transferred to repeated 78 17 we develop a theory for zero-determinant strategies for multiplayer dilemmas with an arbitrary number of involved players. 79 I8 social dilemmas, with any number of involved players. We distinguish In addition, we also propose two models to explore how indi- 80 several particularly interesting subclasses of strategies: fair strategies viduals can further enhance their strategic options by coordinating 19 their play with others. To this end, we extend the notion of ZD 81 20 ensure that the own payoff matches the average payoff of the group; 82 strategies for single players to subgroups of players (to which we 21 extortionate strategies allow a player to perform above average; refer as ZD alliances). We analyze two models of ZD alliances, and generous strategies let a player perform below average. We use depending on the degree of coordination between the players. 22 this theory to descnbe strategies that sustain cooperation. induding When players form a strategy alliance, they only agree on the set 23 generalized variants of Tit-for-Tat and Win-Stay Lose-Shift. Moreover, of alliance members, and on a common strategy that each alliance 24 we explore two models that show how individuals can further enhance member independently applies during the repeated game. When 25 their strategic options by coordinating their play with others. Our players form a synchronized alliance, on the other hand, they 26 results highlight the importance of individual control and coordination agree to act as a single entity, with all alliance members playing the 27 to succeed in large groups same action in a given round. We show that the strategic power of 28 ZD alliances depends on the size of the alliance, the applied 29 evolutionary game theory I alliances I public goods game I strategy of the allies, and on the properties of the underlying social 30 volunteer's dilemma I cooperation dilemma. Surprisingly, the degree of coordination only plays a role 31 as alliances become large (in which case a synchronized alliance has more strategic options than a strategy alliance). 32 33 C ooperation among self-interested individuals is generally difficult to achieve (1-3), but typically the free rider problem is aggravated even further when groups become large (4-9). In To obtain these results, we consider a repeated social dilemma betweenn players. In each round of the game, players can decide 34 whether to cooperate (C) or to defect (D). A player's payoff small communities, cooperation can often be stabilized by forms depends on the player's own decision and on the decisions of all 35 97 of direct and indirect reciprocity (10-17). For large groups, how- other group members (Fig. 1A): in a group in which/ of the other 36 98 ever, it has been suggested that these mechanisms may turn out to group members cooperate, a cooperator receives the payoff al , 37 be ineffective, as it becomes more difficult to keep track of the 99 whereas a defector obtains b,. We assume that payoffs satisfy the 38 reputation of others and because the individual influence on others 100 39 diminishes (4-8). To prevent the tragedy of the commons and to 101 Significance 40 compensate for the lack of individual control, many successful 102 41 communities have thus established central institutions that enforce 103 42 mutual cooperation (18-22). Many of the world's most pressing problems, like the prevention 1(14 However, a recent discovery suggests that we may have un- of climate change, have the form of a large-scale social dilemma 43 with numerous involved players. Previous results in evolutionary 105 44 derestimated the amount of control that single players can exert in 106 repeated games. For the repeated prisoners dilemma, Press and game theory suggest that multiplayer dilemmas make it partic- 45 Dyson (23) have shown the existence of zero-determinant strategies ularly difficult to achieve mutual cooperation because of the lack 107 46 (or ZD strategies), which allow a player to unilaterally enforce of individual control in large groups. Herein, we extend the 108 47 a linear relationship between the own payoff and the coplayer's theory of zero-determinant strategies to multiplayer games to 109 48 payoff, irrespective of the coplayer's actual strategy. The class of describe which strategies maintain cooperation. Moreover, we 110 49 zero-determinant strategies is surprisingly rich: for example, a player propose two simple models of alliances in multiplayer dilemmas. 111 50 who wants to ensure that the own payoff will always match the The effect of these alliances is determined by their size, the 112 5I coplayer's payoff can do so by applying a fair ZD strategy, like Tit- strategy of the allies, and the properties of the social dilemma. 113 52 for-Tat. On the other hand, a player who wants to outperform the When a single individual's strategic options are limited, forming 114 respective opponent can do so by slightly tweaking the Tit-for-Tat an alliance can result in a drastic leverage. 53 strategy to the own advantage, thereby giving rise to extortionate 115 54 ZD strategies. The discovery of such strategies has prompted sev- Author contrbutions: B.W. initiated the project; CN. B.W. AT, and M.A.N. designed 116 55 eral theoretical studies, exploring how different ZD strategies research; C.H.,B.W., AT., and MAN. performed research; At and MAN. analysed data 117 56 evolve under various evolutionary conditions (24-30). and C H and 6 W. wrote the paper. 118 57 ZD strategies are not confined to the repeated prisoner's di- The authors declare no conflict of interest. 119 58 lemma. Recently published studies have shown that ZD strate- 'Mks artkle Is a PNAS Direct Submission. J.B.P. is a guest editor Invited by the mow 120 gies also exist in other repeated two player games (29) or in Board. 59 Freely available online through the PNAS open access option. 121 repeated public goods games (31). Herein, we will show that such 122 strategies exist for all symmetric social dilemmas, with an arbi- 'To whom correspondence should be addressed. Email: nitiorlasriarraisethi. 61 trary number of participants. We use this theory to describe 'nth article contains supporting information online aUnww.pnas.orgiloalruWsupplidoi:10. 123 62 which ZD strategies can be used to enforce fair outcomes or to ion/linos motion imxsowirmental. 124 vnewpnas.orgrcglidoi/10.1073/pnas.1407887111 PNAS Early Edition I 1of 6 EFTA01199747 125 following three properties that are characteristic for social Results 187 126 dilemmas (corresponding to the individual-centered interpretation Memory-One Strategies and Akin's lemma ZD strategies are 188 127 of altruism in ref. 32): (i) irrespective of the own strategy, players memory-one strategies (23, 36); they only condition their behavior 189 128 prefer the other group members to cooperate (aft' ≥ aj and bp,. ≥ bj on the outcome of the previous round. Memory-one strategies can 190 129 for allj); (ii) within any mixed group, defectors obtain strictly higher be written as a vector p= (Pca Pc o.PnA-4 191 Poo)- The 130 payoffs than cooperators > aj for all j); and (iii) mutual co- entries ps) denote the probability to cooperate in the next round, 192 operation is favored over mutual defection (an _i> bo). To illustrate given that the player previously played S E {C. O} and that j of the 131 193 our results, we will discuss two particular examples of multiplayer coplayers cooperated (in the SI Tar, we present an extension in 132 games (Fig. 1B). In the first example, the public goods game (33), 194 which players additionally take into account who of the coplayers 133 cooperators contribute an amount c > 0 to a common pool. knowing 195 cooperated). A simple example of a memory-one strategy is the 134 that total contributions are multiplied by r (with 1 <r <it) and evenly 196 strategy Repeat. prim, which simply reiterates the own move of the 135 shared among all group members. Thus, a cooperator's payoff is previous round, pici.7 =1 and 147 =0. In addition, memory-one 197 136 a = rc (j + 1)/n — c, whereas defectors yield bj=rcj/n. In the second strategies need to specify a cooperation probability pa for the first 198 137 example, the volunteer's dilemma (34), at least one group member round. However, our results will often be independent of the initial 199 138 has to volunteer to bear a cost c> 0 in order for all group members play, and in that case we will drop Po. 200 139 to derive a benefit h>c. Therefore, cooperators obtain cif = b —c Let us consider a repeated game in which a focal player with 201 (irrespective ofj), whereas defectors yield hj = b ifj a 1 and bo =0. memory-one strategy p interacts with n —1 arbitrary coplayers 140 202 Both examples (and many more, such as the collective risk dilemma) (who are not restricted to any particular strategy). Let vs4(r) 141 (7, 8, 35) are simple instances of multiplayer social dilemmas. 203 142 denote the probability that the outcome of round t is (S,j). Let 204 We assume that the social dilemma is repeated, such that in- v(0= (t) vao(t)] be the vector of these probabilities. A 143 dividuals can react to their coplayers' past actions (for simplicity, 205 limit distribution v is a limit point for a' —• co of the sequence 144 we will focus here on the case of an infinitely repeated game). As tv(1)+ +v(t)Wr. The entries vs, of such a limit distribution 206 145 usual, payoffs for the repeated game are defined as the average 207 correspond to the fraction of rounds in which the focal player 146 payoff that players obtain over all rounds. In general, strategies 208 finds herself in state (S.j) over the course of the game. 147 for such repeated games can become arbitrarily complex, as There is a surprisingly powerful relationship between a focal 209 subjects may condition their behavior on past events and on the 148 player's memory-one strategy and the resulting limit distribution 210 round number in nontrivial ways. Nevertheless, as in pairwise 149 of the iterated game. To show this relationship, let qc(r) be the 211 games, ZD strategies turn out to be surprisingly simple. I50 probability that the focal player cooperates in round r. By definition 212 of pRiv we can write qc(r) = pRA° • v(1)=Evcs-i (0+ ... +vco(0). I51 213 Similarly, we can express the probability that the focal player 152 214 A cooperates in the next round as qc(r + I) = p • v(t). It follows that 153 Number of cooperators qc(r +1)— qc(t)=(p— pRc") • v(t). Summing up over all rounds 215 154 a.1 1}.2 .... 2 1 0 216 among co-players from 1 to t, and dividing by t. yields (p — pR•17)• iv(I)+ 155 v(r))/r= [qc(r+ I) —qc(1)1/r, which has absolute value at most 217 156 Cooperators payoff an-r arr.2 ... az as no IA By taking the limit r co we can conclude that 218 157 Defectors payoff bn-s bn-2 b2 br bo 219 158 (p —pRe0)•v=0. 220 159 221 B Volunteers Dilemma 160 This relation between a player's memory-one strategy and the 222 3 2 c. ‘. 4 , 161 Detector resulting limit distribution will prove to be extremely useful. 223 2 b0.0. to 4, Because the importance of Eq. 1has been first highlighted by Akin 162 224 (27) in the context of the pairwise prisoner's dilemma, we will refer 1 163 g a l 225 to it as Akin's lemma. We note that Akin's lemma is remarkably 164 general, because it neither makes any assumptions on the specific 226 165 0 .—e—e—crathr arb—C game being played nor does it make any restrictions on the strat- 227 166 0 2 4 6 a 10 0 2 4 6 8 10 egies applied by the remaining n —1 group members. 228 Number et cocperabng co-64eyere Plumber of oocperalfrig co-players 167 229 168 C ZD Alliance Outsiders zero-Determinant Strategies in Multiplayer Social Dilemmas. As an 230 169 application of Akin's lemma, we will show in the following that 231 170 single players can gain an unexpected amount of control over 232 171 the resulting payoffs in a multiplayer social dilemma. To this 233 end, we first need to introduce some further notation. For 172 234 1 1 a focal player i, let us write the possible payoffs in a given round 173 as a vector g = (es), with g'n =a) and eDi =b). Similarly, let us 235 174 write the average payoffs ores coplayers as r= (gr), where 236 175 the entries are given by k g., =fra) + (n —j-1)br ibl(n — 1) and 237 176 gilo =frafri +(n —j —1)64/(n — I). Finally, let 1 denote the 2n- 238 177 Fig. 1. Illustration of the model assumptions for repeated soda! dilemma (A) dimensional vector with all entries being one. Using this notation, we 239 178 We consider symmetric n-player soda' dilemmas in which each player can either can write player Ps payoff in the repeated game as x' = g' • v, and the 240 179 cooperate or defect The players payoff depends on its own decision and on the average payoff of ts coplayers as = • v. Moreover, by defini- 241 number of other group members who decide to cooperate. (B) We will discuss tion of v as a limit distribution. it follows that I • v= 1. After these ISO two particular examples: the public goods game (in which payoffs are pro- 242 preparations. let us assume player f applies the memory-one strategy 181 portional to the number of cooperators) and the volunteers dilemma (as the 243 182 most simple example of a nonlinear social dilemma). (C) In adcfrtion to individual 244 strategies, we will also explore how subjects can enhance their strategic options P= +4 +//C+71. [2] 183 245 184 by coordinating their play with other group members. We refer to the members 246 of such a ID alliance as allies, and we call group member that are not part of with a, p, and y being parameters that can be chosen by player i 185 the 2D alliance outsiders. Outsiders are not restricted to any particular strategy. (with the only restriction that p#0). Due to Akin's lemma, we 247 186 Some or all of the outsiders may even form their own alliance. can conclude that such a player enforces the relationship 248 2 of 6 I www.pnes.orgfcgildoi/10.1073Mnas.1407837III Hilbe et al. EFTA01199748 249 311 250 0 = (p - pile/ • v = (cre +fie +71)v =ad +fir' +y. 131 pTFTs- = 312 n —I [71 251 313 252 Player i's strategy thus guarantees that the resulting payoffs of 314 the repeated game obey a linear relationship, irrespective of how For pairwise games, this definition ofpTFT simplifies to Tit-for- 253 Tat, which is a fair ZD strategy (23). However, also for the public 315 254 the other group members play. Moreover, by appropriately choosing the parameters a, ft, and y, the player has direct control goods game and for the volunteer's dilemma, pTFT is a ZD 316 255 on the form of this payoff relation. As in Press and Dyson (23), strategy, because it can be obtained from Eq. 4 by setting s= 1 317 256 who were first to discover such strategies for the prisoner's di- and ¢=1/c, with c being the cost of cooperation. 318 257 lemma, we refer to the memory-one strategies in Eq. 2 as zero- As another interesting subclass of ZD strategies, let us con- 319 258 determinant strategies or ZD strategies. sider strategies that choose the mutual defection payoff as 320 For our purpose, it will be convenient to proceed with baseline payoff,1=60, and that enforce a positive slope 0 <s < 1. 259 The enforced payoff relation 5 becomes se"' =sx' + (1 —s)bo, im- 321 260 a slightly different representation of ZD strategies. Using the 322 plying that on average the other group members only get 261 parameter transformation 1=-71(a+ fi), s = —alfi, and ¢=—p, a fraction s of any surplus over the mutual defection payoff. 323 262 ZD strategies take the form Moreover, as the slope s is positive, the payoffs x' and le are 324 263 positively related. As a consequence, the collective best reply for 325 p= + OKI -s)(!1-gi) + — [4] the remaining group members is to maximize i's payoffs by 264 326 265 cooperating in every round. In analogy to Press and Dyson (23), 327 and the enforced payoff relationship according to Eq. 3 becomes we call such ZD strategies extortionate, and we call the quantity 266 x= 1/s the extortion factor. For games in which 1=14=0, Eq. 5 328 267 e 1 =ski +(i -s)1. shows that the extortion factor can be written as x = Je/x-I. Large 329 268 extortion factors thus signal a substantial inequality in favor of 330 269 We refer to1as the baseline payoff of the ZD strategy and to s as player i. Extortionate strategies are particularly powerful in so- 270 the strategy's slope. Both parameters allow an intuitive interpre- cial dilemmas in which mutual defection leads to the lowest 271 tation: when all players adopt the same ZD strategy p such that group payoff (as in the public goods game and in the volunteer's x' =x-', it follows from Eq. 5 that each player yields the payoff 1. dilemma). In that case, they enforce the relation Ki > cc; on 272 average, player i performs at least as well as the other group 273 The value of s determines how the mean payoff of the other group members e' varies with d. The parameter 0 does not members (as also depicted in Fig. 2B). As an example, let us 274 consider a public goods game and a Z1D strategy pEr with 1=0, 275 have a direct effect on Eq. 5: however, the magnitude of ¢ de- =nIK" —r)sc+rcl. for which Eq. 4 implies termines how fast payoffs converge to this linear payoff relation- 276 ship as the repeated game proceeds (37). 277 - I [i (1 scir+t -Ir)si. 278 279 280 The parameters 1. s. and efr of a ZD strategy cannot be chosen arbitrarily, because the entries psi are probabilities that need to satisfy 0 <psi < 1. In general, the admissible parameters depend on the specific social dilemma being played. In SI Tier, we show n- I independent of the players own move Se {C.D}. In the limit s I, pa approaches the fair strategypTFT. Ass decreases from Ig] 1 281 that exactly those relations 5 can be enforced for which either 282 s= 1 (in which case the parameter 1 in the definition of ZD 1, the cooperation probabilities of ey are increasingly biased to the own advantage. Extortionate strategies exist for all social 283 strategies becomes irrelevant) or for which / and s < 1 satisfy dilemmas (this follows from condition [6] by setting 1=b0 and 345 284 choosing an s close to 1). However, larger groups make extor- 346 285 j —a-_1 +n —j— I b 41.1 347 max fb. </< mM tion more difficult. For example, in public goods games with 286 osisx-i n— 1 —s osisn-i n—1 —s n > r fir — I), players cannot be arbitrarily extortionate any longer 348 287 161 as [6] implies that there is an upper bound on,' (SI Tea). 349 288 As the benevolent counterpart to extortioners, Stewart and 350 289 It follows that feasible baseline payoffs are bounded by the payoffs Plotkin described a set of generous strategies for the iterated 351 290 for mutual cooperation and mutual defection,b0 s! <a„_i, and that prisoner's dilemma (24, 28). Generous players set the baseline 352 the slope needs to satisfy —1/(n — 1) <s < 1. With s sufficiently payoff to the mutual cooperation payoff I= a„._i while still 291 enforcing a positive slope 0<s < 1. This results in the payoff con 353 292 close to 1, any baseline payoff between bo and a„_ can be achieved. Moreover, because the conditions in Eq. 6 become increasingly re- relation c' =so' + (1 —5)0„_1. In particular, for games in which 354 293 strictive as the group size n increases, larger groups make it more mutual cooperation is the optimal outcome for the group (as in 355 294 difficult for players to enforce specific payoff relationships. the public goods game and in the prisoner's dilemma but not in 356 295 the volunteer's dilemma), the payoff of a generous player sat- 357 296 Important Examples of ZD Strategist In the following, we discuss isfies x' < (Fig. 2C). For the example of a public goods game, 358 297 some examples of ZD strategies. At first, let us consider a player we obtain a generous ZD strategy p' by setting 1=rc—c and 359 who sets the slope to s = 1. By Eq. 5, such a player enforces the ep —r)sc +rc], such that 298 360 299 payoff relation x' =x', such that's payoff matches the average Ge I „ ti 1 n(r —1) 361 300 payoff of the other group members. We call such ZD strategies Ps Ig1 362 fair. As shown in Fig. 24, fair strategies do not ensure that all n +(I s) n —I r+(n—r)s. 301 363 302 group members get the same payoff; due to our definition of 364 social dilemmas, unconditional defectors always outperform For s 1, ptt approaches the fair strategy pTFT, whereas lower 303 values of s make pcw more cooperative. Again, such generous 365 unconditional cooperators, no matter whether the group also 304 strategies exist for all social dilemmas, but the extent to which 366 contains fair players. Instead, fair players can only ensure that players can be generous depends on the particular social di- 305 they do not take any unilateral advantage of their peers. Our 367 306 lemma and on the size of the group. 368 characterization 6 implies that all social dilemmas permit a As a last interesting class of ZD strategies, let us consider players 307 player to be fair, irrespective of the group size. As an example, 369 who chooses =0. By Eq. 5, such players enforce the payoff relation 308 consider the strategy proportional lit -for-Tat (pTFT), for which x'=1, meaning that they have unilateral control over the mean 370 309 the probability to cooperate is simply given by the fraction of payoff of the other group members (for the prisoner's dilemma, 371 310 cooperators among the coplayers in the previous round such equalizer strategies were first discovered in ref. 38). However, 372 Hilbe et el. PNAS lefty Edition I 3 o16 EFTA01199749 373 Fig. 2. Characteristic dynamics of payoffs over the 435 374 A FEW strategy B ExtallCilete Strategy C Gene/00$ SeategY 436 course of the game for three different 2D strategies. 2.5 2_5 2.5 375 Each panel depicts the payoff of the focal player fr' Other 437 (blue) and the average payoff of the other group Focal Focal 376 paysasi c 01aYOr * members 9v 438 members g- ' (red) by thick lines. Additionally, the 13, 377 individual payoffs of the other group members are gists . 439 . Mee 'Sc 378 shown as thin red lines. (A) A fair player ensures as OretiOMeoters Focal 440 °0e( 379 that the own payoff matches the mean payoff of ,I gray marrtors Fiery 441 380 Q:26 the other group members. This does not imply that o.s OA 0. 442 all other groupmembers yield the same payoff. (8) D 20 40 60 DO IDO 0 20 40 BO BO 100 D 20 40 60 HO IGO 381 For games in which mutual defection leads to the Flotnd Number Round Round Number 443 382 lowest group payoff, extortionate players ensure that their payoffs are above average. (C) In games in which mutual cooperation is the social optimum, 444 383 generous players let their coplayers gain higher payoffs. The three graphs depict the case of a public goods game with r =4, c =1, and group size n=20. For 445 384 the strategies of the other group members, we used random memory-one strategies, where the cooperation probabilities were independently drawn from 446 a uniform distribution. For the strategies of the focal player, we used (A)pift (8) p” with s =0.8, and (C) pc* with s =0.8. 385 447 386 448 387 unlike extortionate and generous strategies, equalizer strategies cooperation and after mutual defection lie., Pea-t =Pao = 1 and 449 388 typically cease to exist once the group size exceeds a critical psi =0 for all other states (S.j)]. We refer to such a behavior as 450 389 threshold. For the example of a public goods game this thresh- WSLS, because for painvise dilemmas it corresponds to the Win- 451 390 old is given by n =211(r — 1). For larger groups, single players Stay, Lose-Shift strategy described by ref. 36. Because of condition 452 391 cannot determine the mean payoff of their peers any longer. 1101, WSLS is a Nash equilibrium if and only if the social dilemma 453 392 satisfies (6„_1 +b0)12sa„_i. For the example of a public goods 454 Stable Cooperation In Multipbyer Sodal Dilemmas. Let us next ex- game, this condition simplifies to r> 2n1(n + 1). which is always 393 plore which ZU strategies give rise to a Nash equilibrium with 455 394 fulfilled for r≥ 2. For social dilemmas that meet this condition, 456 stable cooperation. In S/ Text, we prove that such ZD strategies WSLS provides a stable route to cooperation that is robust to errors. 395 need to have two properties: they need to be generous (by setting 457 396 / =a„_i and s> 0), but they must not be too generous [the slope Zero-Determinant Alliances. In agreement with most of the theo- 458 397 needs to satisfy s ≥ (n — 2)/(n —1)1. In particular, whereas in the retical literature on repeated social dilemmas, our previous 459 398 repeated prisoner's dilemma any generous strategy with s> 0 is analysis is based on the assumption that individuals act in- 460 399 a Nash equilibrium (27, 28), larger group sizes make it increasingly dependently. As a result, we observed that a player's strategic 461 400 difficult to uphold cooperation. In the limit of infinitely large options typically diminish with group size. As a countermeasure, 462 401 groups, it follows that s needs to approach 1, suggesting that ZD subjects may try to gain st

EFTA01199747.pdf

Entities

Document Metadata