A Network Analysis of Skill Game Dynamics

Show more

1. Introduction

In last years, online casino’s games exponentially grew. In particular, users from different countries now have the possibility to play and to bet money online in a very simple way, by registering in one of the many casino platforms. Beyond that, some of these games can be classified as pure gambling (e.g. slot machines), in particular when the outcome only depends on luck. While, other games as Poker, BlackJack, and so on, can be classified as “skill games” since “rational strategies” might influence the success probability of players [1] [2] .

Considering the case of Poker, some players show their skills facing, at the same time, a wide number of opponents, i.e. an amount much bigger than that they could face when seating in real “physical” tables.

This mechanism, often defined in the jargon “multitabling”, can be adopted also for other games. The underlying motivation is that, if one is strong enough in applying a “rationality-based” method for moving, i.e. for choosing a strategy, then can “almost” safely bet a lot of money playing skill games. As before stated, in principle this is absolutely true, and might apply to bots [3] [4] . However, in the case of humans, a number of external factors can affect their performance.

In general, previous investigations on Poker [5] [6] [7] reported that, at least in tournaments, this game can be considered as a skill game. In addition, it has been proved in real cases by very recent investigations [8] [9] . So, these findings corroborate the idea that expert players might play in different tables. At the same time, the increasing of availability of these services is making people more expert, so when two players with the same skills face each other, the outcome might become similar to that one observes in coin-ips (i.e. 50%).

Starting from these considerations, in this work we focus on the dynamics of a simple game, comparing the outcomes between regular lattices and scale-free networks [10] [11] . In doing so, we can evaluate “if” and “how” a heterogeneous topology might affect the whole dynamics of the proposed model, and also to study the potential difference between the gains one achieves in the two structures (i.e. regular and heterogeneous). Moreover, in order to consider a simple “learning process”, we analyze a small variant of the model, allowing transitions from “non-expert” to “expert”.

It is worth to highlight that the followed approach, for modeling these scenarios, results in a model that is strongly different from those we find in evolutionary game theory [12] [13] [14] .

Notably, even if we consider “learning processes”, here agents have no dilemma to solve, and there are no imitation processes to modify a strategy, as we might observe in games like the Public Goods Game [15] [16] .

The remainder of the paper is organized as follows. Section II introduces the proposed model. Section III shows results of numerical simulations. Eventually, Section IV concludes the paper.

2. Model

Before to introduce our model, let us briefly summarize the agent-based approach we used. As reported in [17] , and as extended in specific investigations (e.g. biology [18] [19] , finance [20] , language dynamics [21] , ecology [22] and optimization), our model considers a population of agents provided with a character and interacting over a network. Our agents have not memory and, according to their character, follow behavioral rules. It is worth to highlight the absence of external influences e.g. from environment. However, we consider an adaptive system, i.e. agents might change their character according to some mechanism. Eventually, numerical simulations allow studying the dynamics of the model. After this brief introduction, now we can present the proposed model with more details. In particular, our investigation aims to represent a simple scenario, i.e. the dynamics of a population with interactions based on a simple game, whose outcome is stochastic and depends on the agent’s strategy. Notably, agents can be “weak” (W) or “strong” (S), so that our population is divided in two species (i.e. W and S). Accordingly, the success probability defined in the pair-wise interactions reads S W Equation (1)

$\begin{array}{c}S\\ W\end{array}\left(\begin{array}{cc}0.5& 1-K\\ 1-K& 0.5\end{array}\right)$ (1)

with K in the range [0; 1], representing the probability that S agents prevail on W agents.

At the same time, when two agents of the same kind (i.e. using the same strategy) play each other, the outcome of the challenge is a coin-ip, i.e. both agents have equal probability to succeed. Previous investigations on Poker [5] reported a value of K ~0:8 for representing challenges between “rational” and “irrational” agents. In this case, a strong strategy S only refers to the skills agents gain with experience, while a W strategy to the lack of experience, i.e. to non-expert agents. At each interaction, one agent wins a pot of value ±1, i.e. +1 when successful and −1 when loses. So, we are interested in analyzing the gain of agents, belonging to the two species, over time, considering the density of S agents (ρS) as a degree of freedom. Notably, tuning _S we can study “if” and “how much” strong agents can win when arranged in two different topologies: regular lattices and scale-free networks. It is worth to highlight that comparing results achieved in these two configurations allows evaluating the difference between agents playing o_-line, i.e. with a limited amount of interactions, with those playing online. Remarkably, as previously mentioned, online platforms allow users to play at the same time with many opponents. In doing so, strong agents might, in principle, gain a lot of money, in little time. The algorithm to implement the dynamics of our model is very simple, and composed of the following steps:

1) Define a population with N agents, with a fraction ρS of strong players;

2) At each time step, randomly select one agent and let it play with all its neighbors;

3) Repeat from 2) until a number of time steps elapsed.

The step 2) of the algorithm entails that both the selected agent and its neighbors accumulate their payoff, i.e. again after all the interactions (with probability defined in Equation (1). Here, it is worth point out that our agents are “memory-aware”, i.e. they accumulate also their payoff over time. In order to understand the dynamics of the population, there are two important parameters to analyze: the “volatility” _ and the final average payoff of each kind of agents. The concept of volatility is well known in financial markets. Notably, traders aim to forecast the trend of a particular financial asset trying to identifying those with a high volatility, along a period of reference (e.g. one hour, one day, one month, and soon). Here, the volatility is defined as shown in Equation (2).

$\nu =\text{MAX}\left(G\right)-\mathrm{min}\left(G\right)$ (2)

with G gain (or payoff) achieved by the agents belonging to one species over time. Therefore, we have volatility for S agents and one for W agents. In addition, since the model now described is non-adaptive, i.e. agent never change their strategy/nature, we study a small variation to consider a learning process. In particular, we study a population that, at the beginning, is composed only of W agents. Then, the selected agent, after playing with its neighbors, might improve its skills becoming a S agent. Thus, we implement a simple learning process based on a parameter _ that represents the transition probability from the two strategies, i.e. from W ―› S. As result, the amount of S agents can be analytically described as in Equation (3).

$\frac{dy}{dx}=\eta \cdot W$ (3)

Therefore, after a while and depending on the learning rate η, all the population gets com-posed only of S agents. Despite the theoretical assumptions, what that can be interesting in this simple dynamics is related to the fact that might allow representing a common scenario, i.e. usually, after a while, people non-expert people improve their skills in a specific game. Like for the non-adaptive case, the model is studied both on regular lattices and on scale-free networks.

3. Results

Numerical simulations have been performed on a population composed of N = 2500 agents, arranged in two diﬀerent conﬁgurations: on regular lattice with continuous boundary conditions, and on scale-free networks. The former actually corresponds to a toroid, so that all agents have the same degree, i.e. the same number of neighbors, whereas the latter is intrinsically heterogeneous, i.e. a number of agents have a low degree and only few have a high degree (deﬁned hubs). In addition, the scale-free structure is implemented by using the BA model [10] . Now we present the results achieved in both conﬁguration, separating the non-adaptive case from the adaptive one. We remind that the non-adaptive case entails the number of S agents does not vary over time, while in the adaptive case agents improve their skills according to a learning rate η = 0.005. Moreover, we set the value of K = 0.6, i.e. the probability a S agent prevails on a W agent. Finally, each simulation lasts 10^{7} time steps, and we remind that at each time step more than one interaction occurs.

3.1. Non-Adaptive Case

The ﬁrst analysis is devoted to studying the agents’ gain over time, considering that at each time t only a fraction of the population is involved in the game, i.e. a randomly selected agent and its neighbors. We start considering the regular lattice. Figure 1 reports the average agents’ gain G in three diﬀerent cases: ρS = 0.1,

Figure 1. Agents gain over time in regular lattices. (a) Strong agents, with ρS = 0.1; (b) Both Strong (red) and Weak (blue) agents, with ρS = 0.1; (c) Strong agents, with ρS = 0.5; (d) Both Strong (red) and Weak (blue) agents, with ρS = 0.5; (e) Strong agents, with ρS = 0.9; (f) Both Strong (red) and Weak (blue) agents, with ρS = 0.9. Each plot refers to single and randomly selected realizations.

ρS = 0.5, ρS = 0.9, i.e. starting with few S agents and observing the value of G increasing their amount. On the left of Figure 1, only the gain of S agents is reported (i.e. red line), while plots on the right show both that of S and of W agents. The ﬁrst observation is that both kinds of agents have a limited gain (positive or negative) in the range ±4, according to the topology of the network. So, limited gain entails also limited risks. Considering the gain of S agents, it seems quite similar for ρS ≤ 0.8, while for higher values, their average gain reduces. As for the W agents, they have limited risks when there are only few S agents and, obviously, they receive many times negative gains when ρS increases up to 0.9. Then, focusing on the same parameter (i.e. G or GS when referred only to S agents), we observe results on scale-free networks―see Figure 2.

Figure 2. Agents gain over time in scale-free networks. (a) Strong agents, with ρS = 0.1; (b) Both Strong (red) and Weak (blue) agents, with ρS = 0.1; (c) Strong agents, with ρS = 0.5; (d) Both Strong (red) and Weak (blue) agents, with ρS = 0.5; (e) Strong agents, with ρS = 0.9; (f) Both Strong (red) and Weak (blue) agents, with ρS = 0.9. Each plot refers to single and randomly selected realizations.

A quick glance to the plots indicates the presence of richer behavior, e.g. the range of gain is now wider than in the regular lattice case. This fact is clearly due to high number of neighbors one agent might have. Like for the regular lattice conﬁguration, plots on the left represent the average gain of S agents (averaged, at each step, considering only the S agents actually involved in the game), while plots on the right show the gain for both species. It is worth highlight that increasing ρS, the range of GS decreases.

The motivation is related to the opportunity to exploit the high number of non-expert agents when ρS is small. Instead, for the W agents, as in the regular lattice we observe that their actual risks increase while increasing ρS. At this point, the previous results show ﬂuctuating gains and, in principle, one might compare them to the ﬂuctuating prices of ﬁnancial assets. Therefore, we need a macroscopic measure to obtain further information on the population. As previously mentioned, the volatility allows observing the system considering the average behavior of the population, in diﬀerent realizations (i.e. according to a Monte Carlo approach). Figure 3 reports a comparison between considering S (and W) agents in the two structures. A fast glance to both plots allows appreciating the Strong diﬀerence, in the proposed model, between a regular structure and a heterogeneous one. In particular, on the left of Figure 3 we observe the two vitalities referred to scale-free (black line) and regular lattice (green line). In addition, for each point in the plot, it is reported the highest and the smallest value computed (on average). The same, considering W agents, in reported in the right plot of Figure 3. Looking at the range of gain used to compute the volatility, one might observe the high risks of W agents in scale-free networks when populated by too many expert users. At the same time, S agents achieve on average 0 good positive gain only if their density is smaller than ρS = 0.4, because

Figure 3. Average volatility computed in regular lattices and scale-free networks. (a) Results related to strong agents; (b) Results related to weak agents. As indicated in the legend, the green line indicates the regular lattices result, while the black line indicates those achieved on scale-free networks. Numerical values close to each point indicate the average maximum value and the average minimum value used for computing the related volatility.

for higher value the positive and negative gains become almost equal (on absolute value).

3.2. Adaptive Case

Now, we brieﬂy study a scenario composed of agents that can improve their skills. In particular, starting with a population composed of only W (i.e. non-expert) agents, a learning process at rate η turns agents to S. The dynamics of the transition process, between the two states (or strategy), can be analytically studied by means of eq. 3, so that after a number of time steps (which depends both on η and on N) all agents become expert (i.e. S). Figure 4 shows results of the numerical simulation, as before, considering the two conﬁgurations. The two plots on the top refer to the regular lattice conﬁguration that, like observed in the non-adaptive case, shows two limits, i.e. ± in the gain that both kinds of agents can reach. Instead, the situation appears diﬀerent for the scale-free conﬁguration. In addition, at the beginning the average gain (computed considering

Figure 4. Agents gain over time, considering the adaptive mechanism, in regular lattice and scale-free networks. (a) Strong agents in regular lattices; (b) Both Strong (red) and Weak (blue) agents in regular lattices; (c) Strong agents in scale-free networks; (d) Both Strong (red) and Weak (blue) agents in scale-free networks. Each plot refers to single and randomly selected realizations.

only the agents involved in the interaction) is close to zero. Then, with the emergence of expert agents (i.e. S) the gain ﬂuctuates for a while, showing on the right that in scale-free networks W agents risk a lot. At the end of the process, as expected, the average gain goes again to zero since all the agents have the same strategy, i.e. they are all strong/expert.

Finally, it is worth emphasize that, beyond the considered case (i.e. non-adaptive and adaptive), results indicate that (as expected) strong agents earn more than weak ones. However, despite the diﬀerent volatility we computed in the two topologies, the ﬁnal average gain for expert is not strongly diﬀerent between them, i.e. considering an opportune amount of time their gain is quite similar, even if a bit higher for those playing in scale-free networks.

4. Discussion and Conclusion

In this work, we study the dynamics of a population whose interactions are based on a simple (stochastic) game, which outcome allows agents to receive a payoff, i.e. gain. The latter depends on the strategy adopted by the agents, that can be expert (i.e. Strong) or non-expert (i.e. Weak). Obviously, the expected result of a similar process suggests that expert agents overcome the non-expert ones, i.e. strong agents receive on average a higher gain. Beyond that, here we are interested in evaluating the influence of the interaction topology in the dynamics of the model (see also [22] for further details about the role of complex topologies in dynamical processes). In particular, considering real scenarios where online users might face at the same time several opponents, we aim to analyze the benefits and potential risks of heterogeneous interaction structures. For instance, in the case of online Poker, the emerging trend defined of “multitabling” indicates that a number of users are able to play Poker with more than 10 opponents at time. Since these games involve the utilization of money, a deeper understanding of the related dynamics might allow providing useful insights to potential players and to those that manage online platforms, or their rules. Therefore, motivated by this problem, we compare the outcome of a simple two-strategy game, where the strategy corresponds to the skill of a player, and arranging agents in two different configurations: regular lattice and scale-free. The former aims to represent offline challenges, i.e. those that one player might perform with a restricted number of opponents, while the latter allows representing the online case previously described. In addition, two different cases are considered: non-adaptive and adaptive. The former entails agents never change strategy, whereas the latter entails that non-expert agent can improve after a while, becoming experts. In the proposed model, there are two main points to investigate: are scale-free networks better than regular lattice for some kind of agents? And, since we study the non-adaptive case considering different densities of strong agents, is there a critical density ρS? As for the first question, results of numerical simulations indicate a higher volatility, i.e. more opportunity for gaining, in the scale-free configuration. On the other hand, the latter entails higher risks for both kinds of agents, in particular when the density of experts exceeds a value close to 0:4. Therefore, as for the second question, real players might be aware that, after a while, a platform can become less convenient than it was at the beginning, because of learning processes that involve all agents achieving more experience. In addition, we found also a relatively small difference between the average gain that strong agents receive in scale-free networks and in regular lattice. So that, even if simulations refer to a very simple game, real players should be aware about that, notably because only very few of them will be successful. Before concluding, there is a further aspect of our investigation that deserves to be clarified. In particular, in the proposed model, our agents have the capability of improving their strategy becoming experts, i.e. they can learn. So, in principle, it can be possible to assume that some of them can learn before than others. While this aspect would be of strong interest for highlighting the relation between the speed of learning and the equilibrium of the population, we think that it might constitute the topic for a future extension of the present work. To conclude, we deem that some of the achieved results are already known to gamblers, from direct experience, however, this can be helpful for a deeper understanding of gambling and its relation with online platforms.

References

[1] Hannum, R.C. and Cabot, A.N. (2009) Toward Legalization of Poker: The Skill vs. Chance Debate. UNLV Gaming Research & Review Journal, 13, 1-13.

[2] Teofilo, L.F., Reis, L.P. and Lopes Cardoso, H. (2013) Computing Card Probabilities in Texas Hold’em. 2013 8th Iberian Conference on Information Systems and Technologies (CISTI), 19-22 June 2013, Lisbon, 988-993.

[3] Gibney, E. (2017) How Rival Bots Battled Their Way to Poker Supremacy Nature.

[4] Bessi, A. and Ferrara, E. (2016) Social Bots Distort the 2016 U.S. Presidential Election Online Discussion. First Monday, 21.

[5] Croson, R., Fishman, P. and Pope, D.G. (2013) Poker Superstars: Skill or Luck? Chance, 21, 25-28.

https://doi.org/10.1080/09332480.2008.10722929

[6] Harsanyi, J.C. and Selten, R. (1988) A General Theory of Equilibrium Selection in Games. Vol. 1, 1st Edition, MIT Press Books, The MIT Press, Cambridge.

[7] Suttle, J.P. and Jones, D.A. (1991) Electronic Poker Game. U.S. Patent 5022653.

[8] Bowling, M., Burch, N., Johanson, M. and Tammelin, O. (2015) Heads-Up Limit Hold’em Poker Is Solved. Science, 347, 145-149.

https://doi.org/10.1126/science.1259433

[9] (2017) Oh the Humanity! Poker Computer Trounces Humans in Big Step for AI. The Guardian.

[10] Estrada, E. (2011) The Structure of Complex Networks: Theory and Applications. Oxford University Press, Oxford.

https://doi.org/10.1093/acprof:oso/9780199591756.001.0001

[11] Albert, R. and Barabasi, A.L. (2002) Statistical Mechanics of Complex Networks. Reviews of Modern Physics, 74, 47-97.

https://doi.org/10.1103/RevModPhys.74.47

[12] Perc, M. and Grigolini, P. (2013) Collective Behavior and Evolutionary Games—An Introduction. Chaos, Solitons & Fractals, 56, 1-5.

https://doi.org/10.1016/j.chaos.2013.06.002

[13] Szolnoki, A. and Perc, M. (2015) Conformity Enhances Network Reciprocity in Evolutionary Social Dilemmas. Journal of the Royal Society Interface, 12, Article ID: 20141299.

https://doi.org/10.1098/rsif.2014.1299

[14] Julia, P.C., Gomez-Gardenes, J., Traulsen, A. and Moreno, Y. (2009) Evolutionary Game Dynamics in a Growing Structured Population. New Journal of Physics, 11, 083031.

https://doi.org/10.1088/1367-2630/11/8/083031

[15] Amaral, M.A., Wardil, L., Perc, M. and da Silva, J.K.L. (2016) Stochastic Win-Stay-Lose-Shift Strategy with Dynamic Aspirations in Evolutionary Social Dilemmas. Physical Review E, 94, Article ID: 032317.

[16] Wood, M.W. and Wilson, T.L. (1999) Method for Playing a Poker Game. U.S. Patent No. 5,868,619.

[17] Denizer, C.A., Iyigun, M.F. and Owen, A. (2002) Finance and Macroeconomic Volatility. The B.E. Journal of Macroeconomics, 2.

https://doi.org/10.2202/1534-6005.1048

[18] Gomez-Gardenes, J., Campillo, M., Floria, L.M. and Moreno, Y. (2007) Dynamical Organization of Cooperation in Complex Topologies. Physical Review Letters, 98, Article ID: 108103.

https://doi.org/10.1103/PhysRevLett.98.108103

[19] Kelly, J.M., Dhar, Z. and Verbiest, T. (2007) Poker and the Law: Is It a Game of Skill or Chance and Legally Does It Matter? Gaming Law Review, 11, 190.

[20] Cabot, A. and Hannum, R. (2005) Poker: Public Policy, Law, Mathematics, and the Future of an American Tradition. Cooley Law Review, 22, 443.

[21] Dickinson, J.R. and Faria, A.J. (1996) A Random-Strategy Criterion for Validity of Simulation Game Participation. Simulation and Gaming Yearbook, 4, 49-60.

[22] McInish, T.H. (1980) A Game-Simulation of Stock Market Behavior: An Extension. Simulation & Games, 11, 477-484.

https://doi.org/10.1177/104687818001100407