Feeds:
Posts

Correlated equilibria — some CS perspectives

What will happen in a given strategic situation?  The dogma of game theory — which models strategic situations — is that some kind of equilibrium will be reached.  Taking a wide enough definition of equilibrium, the dogma seems fine.  Taking a narrower look and just considering the Nash equilibrium — either pure (if it exists) or mixed — may be more questionable.  In 1974 Robert Aumann suggested a generalization of Nash equilibrium termed correlated equilibrium.  This this type of equilibrium is very appealing due to several reasons, and this post presents these from a CS-ey perspective.

What is Correlated equilibrium?

For ease of notation let us stick with two players, Alice and Bob (everything does extend to an arbitrary number of players).  A game between Alice and Bob is given by a pair of $n \times m$ matrices $A=(a_{ij})$ and $B=(b_{ij})$,  where $1 \le i \le n$ and $1 \le j \le m$, i.e. Alice has $n$ strategies and Bob has $m$ strategies.  $a_{ij}$ denotes Alice’s utility when she plays $i$ and Bob plays $j$, and $b_{ij}$ denotes Bob’s utility in this case.  We will start with the simplest notions of equilibria and proceed to correlated equilibria.  In all cases equilibrium means that each player is best-replying to the other, so it suffices to define what does best-replying mean in each notion.

Let us start by defining the pure Nash equilibrium: a pure strategy of  Alice $i$ is a best response to a pure strategy $j$ of Bob if for all $i'$ we have that $a_{ij} \ge a_{i'j}$.

The next step allows players to play mixed strategies rather than pure ones, where a mixed strategy is simply a probability distribution over pure strategies. Denote by $\Delta_n$ the set of mixed strategies ($\sum_i x_i = 1$ and for all $i = 1 ... n$, $x_i \ge 0$), then a mixed strategy $x \in \Delta_n$ is a best reply to a mixed strategy $y \in \Delta_m$ if for every mixed strategy $x'$ we have that $\sum_{ij} x_i a_{ij} y_j \ge \sum_{ij} x'_i a_{ij} y_j$.  (This turns out to be equivalent to every pure strategy in the support of $x$ being at least as good against $y$ as every other pure strategy.)

The mixed-Nash equilibrium allows each player to randomize his own actions.  The basic idea of correlated equilibrium is to allow joint randomization between Alice and Bob.  We imagine a third trusted party choosing a pair $(i,j)$ according to some distribution $p$ on pairs, and telling $i$ to Alice (only) and telling $j$ to Bob.    When can we say that $i$ is indeed a best reply to the information that Alice has about Bob’s action?   Well, when Alice gets $i$, she may assume the conditional distribution over Bob’s strategies: $Pr[j|i] = p_{ij} / (\sum_j p_{ij})$.  Alice is best responding if $i$ is indeed the best response to this distribution, i.e. if for all $i'$ we have that $\sum_j p_{ij} a_{ij} \ge \sum_j p_{ij} a_{i'j}$.  (Dividing both sides by $\sum_j p_{ij}$ gives exactly the conditional probabilities).  The distribution $p$ is called a correlated equilibrium if every $i$ is a best response for Alice, and every $j$ a best response for Bob.

Notice that a Nash equilibrium is a special case: when $(p_{ij})$ is a product distribution $p_{ij}=x_i y_j$.   In this case the previous best-reply constraints become $x_i \sum_j y_j a_{ij} \ge x_i \sum_j y_j a_{i'j}$, which means that each pure strategy $i$ in the support ($x_i>0$) it is a best reply to $y$$\sum_j y_j a_{ij} \ge \sum_j y_j a_{i'j}$.  In particular this implies existence of a correlated equilibrium in every finite game.

1. It can be more efficient than Nash
2. It can be implemented like Nash
3. It can be efficiently computed unlike Nash
4. It can be efficiently learned unlike Nash

Efficiency

It is well known that the Nash equilibrium need not give an efficient outcome. The following game of chicken serves as the standard example that the correlated equilibrium may be more efficient:

.                                  Bob

.                        Dare          Chicken

.      Dare          0, 0             4, 1

Alice

.      Chicken    1, 4              3, 3

This game has two pure equilibria (C,D) and (D,C) as well as a single mixed Nash equilibrium where both Alice and Bob choose to Dare with probability of exactly 1/2.   If we are interested in the obtained sum of utilities (social welfare) as our goal, then each pure equilibria gets social welfare of 5, while the mixed one gets an expected social welfare of 4. The optimal social welfare, 6, is obtained at a non-equilibrium point. Putting it into a price of anarchy/stability point of view, the price of stability (ratio between welfares obtained in best Nash equilibrium and the optimimum) is 6/5=1.2, while the price of anarchy (ratio between welfares obtained in worst Nash equilibrium and the optimum) is 6/4=1.5 (for the mixed case).

Let us look at the following correlated distribution:

.                            Bob

.                   Dare          Chicken

.     Dare          0                  1/3

Alice

.     Chicken     1/3               1/3

Let us see why this is a correlated equilibrium.  Assume that Alice “gets” dare. This happens with probability 1/3, and in this case she knows for sure that Bob got “chicken” — so clearly daring is a best reply.  On the other hand, if Alice gets Chicken — which happens with probability 2/3 — then her conditional distribution on Bob is 1/2-1/2.  Under this distribution she is indifferent between daring and chicken, so chicken is still a best reply.  The nice thing is that the expected sum of utilities in this correlated equilibrium is $5 \frac{1}{3}$ — better than that of any Nash equilibrium.  This turns out to be the best correlated equilibrium in terms of social welfare, and the resulting price of correlated stability is thus $6/(5 \frac{1}{3}) = 1.125$.  The set of correlated equilibria does not only includes ones with higher welfare than any Nash equilibrium, but also ones with lower welfare.  The worst correlated equilibrium turns out to give 1/3 weight to each of (dare,dare), (dare,chicken), and (chicken,dare), for a total expected social welfare of $3 \frac{1}{3}$, giving a correlated price of anarchy of  $6/(3\frac{1}{3}) = 1.8$.  The efficiency gap may be arbitrarily large as the following example shows [corrected on 21/10 thanks to Jochen Konemann who found a bug in the original version] (all missing entries are 0,0):

.          C1       C2        C3       D0      D1      D2     D3
C1                  1,1       1,1       0,1      0,2
C2      1,1                   1,1       0,1                0,2
C3      1,1       1,1                   0,1                        0,2
D0      1,0       1,0       1,0       $\epsilon,\epsilon$     $\epsilon,\epsilon$     $\epsilon,\epsilon$    $\epsilon,\epsilon$
D1      2,0                               $\epsilon,\epsilon$
D2                  2,0                   $\epsilon,\epsilon$
D3                              2,0       $\epsilon,\epsilon$

Inspection will reveal that any Nash equilibrium cannot have any C* strategies in its support and thus gives always utility $\le \epsilon$ to both players.   In contrast, a uniform distribution on the 1,1 entries is a correlated equilibrium that gives utility 1 to each player.  Thus while the regular price of stability is infinite, the correlated price of stability is 1.  So, the optimists (like me) who prefer looking at the price of stability may view the notion of  a correlated equilibrium as an opportunity to get better results.  On the other hand, the pessimists who look at the price of anarchy may be concerned that it may bring worse results.  To ease these fears, a very recent paper, “Intrinsic robustness of the price of anarchy” by Tim Roughgarden  shows that for a wide class of games, including the congestion games usually studied in the context of the price of anarchy, this does not happen — the correlated price of anarchy turns out to be no worse than the regular price of anarchy.

Implementation

The most serious concern with the notion of correlated equilibrium is its implementation — how can the players get a joint correlated probability distribution.  The definition of the notion assumes a fictional trusted third party termed a mediator.  But how can the correlated equilibrium be reached in normal circumstances, without a mediator?  One may devise various “correlation devices” to replace the fictional mediator.  For example, in the chicken example above a hat and three balls will do the trick: two balls labeled “chicken” and one ball labeled “dare” are thrown into a hat (under the joint supervision of Alice and Bob) and Alice and Bob each takes one ball from the hat (without looking).  Notice that each of them draws “dare” with probability 1/3 — and they never do so simultaneously — exactly our correlated equilibrium!

Can such a correlation device be implemented in general?  Can an equivalent device that works over computer communication networks be implemented?  Both the game-theory community and the cryptography community worked on these questions using somewhat different notions and terminology.  Game theorists considered adding a pre-round of cheap talk to the game where the parties may communicate with each other.  From the computer science perspective this is just a special case of secure function evaluation: there are general cryptographic techniques that enable any number of parties to emulate any trusted mediator without really having such a mediator and without trusting each other.  There are a variety of models and results here, but the basic ones in our setting allow (1) making standard cryptographic assumptions, any correlated equilibrium in any game with any number of players can be implemented securely in a computational sense (2) any correlated equilibrium in any game with at least 3 (or 4, depending on the exact model) players can be implemented securely in an information-theoretic sense (but not in 2-party games).  For more information see chapter 8 of the algorithmic game theory book.

Computation

The first nice thing about correlated equilibria is that they can be efficiently found.  Just look at the inequalities that define them above: they are all linear inequalities in the $p_{ij}$‘s: for every $i,i'$$\sum_j p_{ij} a_{ij} \ge \sum_j p_{ij} a_{i'j}$,  similar inequalities for Bob, and further linear inequalities specifying that $p$ is a probability distribution.  Thus we can find a correlated equilibrium using linear programming and even optimize an arbitrary linear function over correlated equilibria, e.g. find the one with highest social welfare.  This is in stark contrast to Nash equilibria where  finding an arbitrary one is PPAD-complete and optimizing social welfare over them is NP-hard.

The LP algorithm for correlated equilibria directly applies to games with any number of players and its running time is polynomial in the size of the game (the descriptions size of the utility functions of the players, which is the product of the players’ strategy-space sizes).  The situation is actually even better and we can often find correlated equilibria of exponential-size games in polynomial time.  Specifically, using the Ellipsoid algorithm, there are cases where it is possible to find a correlated equilibrium in time that is polynomial in the length of the representation of the strategies of the game.  For this to make sense there needs to be a succinct representation of the game, where given a representation of players’ strategies, their utilities can be easily determined.

Learning

How can an equilibrium of a game be practically reached?  Is there a natural process that will lead to a state of equilibrium?  A process where the players will “learn” what they should play?  There have been many different attempts and models to define processes that converge to equilibria.  A simple setting considers players repeatedly playing a game, where at each stage they act according to some given “learning” rule (which is intuitively “somewhat rational” but not really strategic) that depends on the history of the game as well as on their own utility function (but usually not on the other players’ utility functions — this is called uncoupled dynamics).

A simple example is repeated-best-reply: in every round you play your best reply to the strategies played by the other players in the previous round.   If this process converges, then it will be to a pure Nash equilibrium.  However, it may in general fail to converge even if a pure Nash equilibrium exists and certainly will not converge if the game lacks pure Nash equilibria.  There are however interesting special cases where this process does converge, specifically potential games and games solvable by iterated elimination of dominated strategies (in the latter case even in a strong asynchronous sense).  A natural attempt for converging to a mixed Nash equilibria is called fictitious play: in every round you play your best response to the other players’ mixed strategies defined according to the empirical historical distribution of their strategies. This process again is not guaranteed to converge and may loop indefinitely.  In fact, there is no natural process that is known to converge to a Nash equilibrium that is not essentially equivalent to exhaustive search; this shouldn’t come as a surprise since such a process would likely imply an efficient algorithm whose existence we doubt.  There are however natural processes that do converge to correlated equilibria.  These are based on regret minimization procedures. Below is a minimal description of these types of learning algorithms; for a comprehensive exposition as well as historical context see chapter 4 in the algorithmic game theory book, or this talk by Avrim Blum.

Here is the basic framework: we need to design a “learning algorithm” $A$ that chooses one out of $N$ possible actions, and does this repeatedly for $T$ steps.  For each possible action $i \in \{1...N\}$ and time $0 < t \le T$ there is some utility $0 \le u_t(i) \le 1$.  At time $t$ the algorithm must choose an action $i_t$ before seeing the current utilities $u_t(i)$ but rather only based on previous utilities.  The goal is to maximize the total utility obtained: $U_A = \sum_t u_t(i_t)$, where $i_t$ is the action $A$ took at time $t$.  The main challenge here is the total lack of information regarding the utilities $u_t(i)$ which need not have any relation to the utilities at previous times.  I.e. the model allows $u_t(i)$ to be chosen by an adversary, and furthermore to depend arbitrarily on the actions chosen previously by our learning algorithm, i.e. on $i_1 ... i_{t-1}$.   Since such a setting is hopeless we will not attempt comparing the performance of our algorithm to the posterior optimum, but just to some family of alternative algorithms (a family that may be defined as possible variants of the original $A$).  What we need in this context is called low swap (or internal) regret.   Let $F : \{1...N\} \rightarrow \{1...N\}$, we say that $B$ is obtained from $A$ by swap $F$ if whenever $A$ takes action $i$, then $B$ takes action $F(i)$.

Definition: $A$ that runs for $T$ steps has swap regret $R$ if for for every algorithm $B$ that is obtained from $A$ by some swap, we have that $U_A \ge U_B - R$.

It is not difficult to see that deterministic algorithms cannot have “low” swap regret, but it turns out that randomized algorithms may. A randomized algorithm may choose its action $i_t$ according to a probability distribution $p^t$, (where each $i$ is chosen with probability $p^t_i$).   There has been much work on designing learning algorithms with low regret with the end result being efficient randomized algorithms that almost surely achieve swap regret $R=O(\sqrt{NT\log N})$, the important point being that $R << T$.  (For a weaker notion, external regret, the dependence on $N$ can be made logarithmic, but this is not known for internal regret).   This post will not go into how these algorithms are constructed, but just why they are what we need.

So far all this regret-minimization stuff was in a decision-theoretic framework — just a single decision maker.  Now comes the key switch to games: what if each player in a game plays such a regret minimization algorithm?   Let us be more specific, and as above, consider two players, even though everything holds in general.  Suppose that Alice and Bob play the game for $T$ rounds, each time choosing their strategy (action) in the game according to a regret minimization algorithm.  At each time $t$ Alice plays $i_t$ and Bob plays $j_t$ and thus Alice sees utility vector $u_t(i)=a_{i,j_t}$ and Bob sees utility vector $u^{Bob}_t(j)=a_{i_t,j}$.

Lemma: Assume that Alice and Bob both play an algorithm for $T$ steps with swap regret $R$ then the uniform distribution on $(i_t,j_t)$ over all $1 \le t \le T$ is $\epsilon$-close to a correlated equilibrium (in $L_1$ norm), where $\epsilon=R/T$.   I.e., let $p$ denote this distribution then for every $i$ there exists $\epsilon_i$ such that for all $i'$, we have that $\sum_j p_{ij} a_{ij} \ge \sum_j p_{ij} a_{i'j} - \epsilon_i$, with $\sum_i \epsilon_i = \epsilon$ (and similarly for Bob).

To see why this is true, let us look at Alice’s optimal strategy when the trusted mediator tells her that $i$ was drawn according to this distribution. Certainly, Alice’s best response is to play the strategy $i'$ that maximizes $\sum_j p_{ij} a_{i'j}$ — we can denote the mapping from each $i$ to its optimal $i'$ by the function $F(i)$, and we define $\epsilon_i$ to be the gap needed for $i'=F(i)$ , $\epsilon_i = \sum_j p_{ij} a_{F(i)j} - \sum_j p_{ij} a_{ij}$.  Now let us look at the expression for the regret of $A$ relative to the swap $F$:   $U_B - U_A = \sum_t u_t(F(i_t)) - \sum_t u_t(i_t)$ which we know is bounded by $R$.   Given that by definition $u_t(i) = a_{i,j_t}$, and that $p_{ij}$ is simply $1/T$ times the number of times $t$ for which $i_t=i$ and $j_t=j$, we see that this expression is exactly equal to the the sum of the $\epsilon_i$‘s which completes the proof of the lemma.

11 Responses

1. There is another interesting aspect regarding correlated equilibrium. You can ask if there is some seperate dynamics for the players (when the game is played repeatedly) so that when each player of a game behave according to this dynamics it will collectively lead to an equilibrium point. Sergiu Hart and Andreu Mass-Collel have several general results showing that such a dynamics is possible for correlated equilibrium and is impossible for Nash equilibrium. See e.g. http://www.ma.huji.ac.il/~hart/abs/adaptdyn.html?

• Gil, this is exactly what the last section, on learning, of this post discusses. Hart and Ma-Collel show that you can’t get any “memory-less” convergence to Nash, but they do give general uncoupled dynamics that do converge to Nash — although after essentially performing a random exhaustive search (and taking exponential time).

Their dynamics that converge to correlated equilibria are exactly the regret-minimization ones described in this post (they actually build the regret minimization learning algorithm which I didn’t describe.)

2. Right, and there are several related works by Hart and Mas-Collel including a result that in certain cases uncpupled dynamice that converges to Nash does not exist at all. Anyway Sergiu’s presentation and homepage is a good link to this topic. (Strangely “Mas-Collel” in Hebrew means “comprehensive tax”, a nice name for an economist.)

3. For what it’s worth, I worked out some code in Mathematica to solve for correlated equilibrium in an N-player game using linear programming. From a computational point of view, all of the calculating variables needed to find a correlated equilibrium using linear programming make the number of constraints increase rapidly with the number of strategies available to each player. The problem size increases at a polynomial rate, but it is a pretty ugly polynomial.

http://www.economicenquiry.com/blog/2009/4/19/computer-program-to-calculate-correlated-equilibria.html

4. [...] and thus only a tiny fraction of the private information needs to be transferred. You can see a previous post of mine for background on correlated [...]

5. Since I’ve had a bug in the original version, here’s the “inspection” of the 7*7 game above that sows that no Nash equilibrium has any C* strategy in its support.

D0 weakly dominates any C* strategy, and is strictly better than them as a reply to any D* strategy. Thus unless the column player’s support is limited to the C* strategies, the row player must play just on the D’s and then the column player must do so too. It follows that any Nash equilibrium whose support includes any C* strategy must be completely contained in the C*-by-C* sub-matrix.

However, if some Ci is both in the support of the row player and of the column player, then, again, D0 is a strictly better reply. Thus at most one of the players can use each Ci, and wlog, the column player concentrates ALL his probability on the C1. But then D1 is a strictly better reply for the row player. Contradiction.

6. [...] depend on the distribution and not on the chosen recommendation.  This is strictly weaker than correlated equilibrium in which players should prefer following suggestions to choosing strategies whose identity may [...]

7. i still dont understand What is Correlated equilibrium?

thanks
killing games

8. [...] is well known that a correlated equilibrium can be computed in polynomial time in the size of the game.  In a beautiful paper published several years ago, Papadimitriou and [...]

9. [...] correlated equilibrium is defined by a linear program which, in the context under question, has exponentially many variables and polynomially many [...]

10. It is widespread thing to encounter some sorts of illnesses within our life, but the factor is it really is uncommon to meet the proper treatment
as soon as. There are approximately 17 million people in the US that suffer from some type of yeast infection.
licitations with regards to making ones cachet that is most
certainly Talbots at present. And so those who are suffering from high blood pressure should be careful not to have such health problem for long and unchecked.

The delivery of Linda Mc – Cartney sausage rolls package is usually in frozen form.