What will happen in a given strategic situation? The dogma of game theory — which models strategic situations — is that some kind of equilibrium will be reached. Taking a wide enough definition of equilibrium, the dogma seems fine. Taking a narrower look and just considering the Nash equilibrium — either pure (if it exists) or mixed — may be more questionable. In 1974 Robert Aumann suggested a generalization of Nash equilibrium termed correlated equilibrium. This this type of equilibrium is very appealing due to several reasons, and this post presents these from a CS-ey perspective.
What is Correlated equilibrium?
For ease of notation let us stick with two players, Alice and Bob (everything does extend to an arbitrary number of players). A game between Alice and Bob is given by a pair of matrices and , where and , i.e. Alice has strategies and Bob has strategies. denotes Alice’s utility when she plays and Bob plays , and denotes Bob’s utility in this case. We will start with the simplest notions of equilibria and proceed to correlated equilibria. In all cases equilibrium means that each player is best-replying to the other, so it suffices to define what does best-replying mean in each notion.
Let us start by defining the pure Nash equilibrium: a pure strategy of Alice is a best response to a pure strategy of Bob if for all we have that .
The next step allows players to play mixed strategies rather than pure ones, where a mixed strategy is simply a probability distribution over pure strategies. Denote by the set of mixed strategies ( and for all , ), then a mixed strategy is a best reply to a mixed strategy if for every mixed strategy we have that . (This turns out to be equivalent to every pure strategy in the support of being at least as good against as every other pure strategy.)
The mixed-Nash equilibrium allows each player to randomize his own actions. The basic idea of correlated equilibrium is to allow joint randomization between Alice and Bob. We imagine a third trusted party choosing a pair according to some distribution on pairs, and telling to Alice (only) and telling to Bob. When can we say that is indeed a best reply to the information that Alice has about Bob’s action? Well, when Alice gets , she may assume the conditional distribution over Bob’s strategies: . Alice is best responding if is indeed the best response to this distribution, i.e. if for all we have that . (Dividing both sides by gives exactly the conditional probabilities). The distribution is called a correlated equilibrium if every is a best response for Alice, and every a best response for Bob.
Notice that a Nash equilibrium is a special case: when is a product distribution . In this case the previous best-reply constraints become , which means that each pure strategy in the support () it is a best reply to , . In particular this implies existence of a correlated equilibrium in every finite game.
So what are the good things about this notion of equilibrium relative to Nash equilibrium?
- It can be more efficient than Nash
- It can be implemented like Nash
- It can be efficiently computed unlike Nash
- It can be efficiently learned unlike Nash
It is well known that the Nash equilibrium need not give an efficient outcome. The following game of chicken serves as the standard example that the correlated equilibrium may be more efficient:
. Dare Chicken
. Dare 0, 0 4, 1
. Chicken 1, 4 3, 3
This game has two pure equilibria (C,D) and (D,C) as well as a single mixed Nash equilibrium where both Alice and Bob choose to Dare with probability of exactly 1/2. If we are interested in the obtained sum of utilities (social welfare) as our goal, then each pure equilibria gets social welfare of 5, while the mixed one gets an expected social welfare of 4. The optimal social welfare, 6, is obtained at a non-equilibrium point. Putting it into a price of anarchy/stability point of view, the price of stability (ratio between welfares obtained in best Nash equilibrium and the optimimum) is 6/5=1.2, while the price of anarchy (ratio between welfares obtained in worst Nash equilibrium and the optimum) is 6/4=1.5 (for the mixed case).
Let us look at the following correlated distribution:
. Dare Chicken
. Dare 0 1/3
. Chicken 1/3 1/3
Let us see why this is a correlated equilibrium. Assume that Alice “gets” dare. This happens with probability 1/3, and in this case she knows for sure that Bob got “chicken” — so clearly daring is a best reply. On the other hand, if Alice gets Chicken — which happens with probability 2/3 — then her conditional distribution on Bob is 1/2-1/2. Under this distribution she is indifferent between daring and chicken, so chicken is still a best reply. The nice thing is that the expected sum of utilities in this correlated equilibrium is — better than that of any Nash equilibrium. This turns out to be the best correlated equilibrium in terms of social welfare, and the resulting price of correlated stability is thus . The set of correlated equilibria does not only includes ones with higher welfare than any Nash equilibrium, but also ones with lower welfare. The worst correlated equilibrium turns out to give 1/3 weight to each of (dare,dare), (dare,chicken), and (chicken,dare), for a total expected social welfare of , giving a correlated price of anarchy of . The efficiency gap may be arbitrarily large as the following example shows [corrected on 21/10 thanks to Jochen Konemann who found a bug in the original version] (all missing entries are 0,0):
Inspection will reveal that any Nash equilibrium cannot have any C* strategies in its support and thus gives always utility to both players. In contrast, a uniform distribution on the 1,1 entries is a correlated equilibrium that gives utility 1 to each player. Thus while the regular price of stability is infinite, the correlated price of stability is 1. So, the optimists (like me) who prefer looking at the price of stability may view the notion of a correlated equilibrium as an opportunity to get better results. On the other hand, the pessimists who look at the price of anarchy may be concerned that it may bring worse results. To ease these fears, a very recent paper, “Intrinsic robustness of the price of anarchy” by Tim Roughgarden shows that for a wide class of games, including the congestion games usually studied in the context of the price of anarchy, this does not happen — the correlated price of anarchy turns out to be no worse than the regular price of anarchy.
The most serious concern with the notion of correlated equilibrium is its implementation — how can the players get a joint correlated probability distribution. The definition of the notion assumes a fictional trusted third party termed a mediator. But how can the correlated equilibrium be reached in normal circumstances, without a mediator? One may devise various “correlation devices” to replace the fictional mediator. For example, in the chicken example above a hat and three balls will do the trick: two balls labeled “chicken” and one ball labeled “dare” are thrown into a hat (under the joint supervision of Alice and Bob) and Alice and Bob each takes one ball from the hat (without looking). Notice that each of them draws “dare” with probability 1/3 — and they never do so simultaneously — exactly our correlated equilibrium!
Can such a correlation device be implemented in general? Can an equivalent device that works over computer communication networks be implemented? Both the game-theory community and the cryptography community worked on these questions using somewhat different notions and terminology. Game theorists considered adding a pre-round of cheap talk to the game where the parties may communicate with each other. From the computer science perspective this is just a special case of secure function evaluation: there are general cryptographic techniques that enable any number of parties to emulate any trusted mediator without really having such a mediator and without trusting each other. There are a variety of models and results here, but the basic ones in our setting allow (1) making standard cryptographic assumptions, any correlated equilibrium in any game with any number of players can be implemented securely in a computational sense (2) any correlated equilibrium in any game with at least 3 (or 4, depending on the exact model) players can be implemented securely in an information-theoretic sense (but not in 2-party games). For more information see chapter 8 of the algorithmic game theory book.
The first nice thing about correlated equilibria is that they can be efficiently found. Just look at the inequalities that define them above: they are all linear inequalities in the ‘s: for every : , similar inequalities for Bob, and further linear inequalities specifying that is a probability distribution. Thus we can find a correlated equilibrium using linear programming and even optimize an arbitrary linear function over correlated equilibria, e.g. find the one with highest social welfare. This is in stark contrast to Nash equilibria where finding an arbitrary one is PPAD-complete and optimizing social welfare over them is NP-hard.
The LP algorithm for correlated equilibria directly applies to games with any number of players and its running time is polynomial in the size of the game (the descriptions size of the utility functions of the players, which is the product of the players’ strategy-space sizes). The situation is actually even better and we can often find correlated equilibria of exponential-size games in polynomial time. Specifically, using the Ellipsoid algorithm, there are cases where it is possible to find a correlated equilibrium in time that is polynomial in the length of the representation of the strategies of the game. For this to make sense there needs to be a succinct representation of the game, where given a representation of players’ strategies, their utilities can be easily determined.
How can an equilibrium of a game be practically reached? Is there a natural process that will lead to a state of equilibrium? A process where the players will “learn” what they should play? There have been many different attempts and models to define processes that converge to equilibria. A simple setting considers players repeatedly playing a game, where at each stage they act according to some given “learning” rule (which is intuitively “somewhat rational” but not really strategic) that depends on the history of the game as well as on their own utility function (but usually not on the other players’ utility functions — this is called uncoupled dynamics).
A simple example is repeated-best-reply: in every round you play your best reply to the strategies played by the other players in the previous round. If this process converges, then it will be to a pure Nash equilibrium. However, it may in general fail to converge even if a pure Nash equilibrium exists and certainly will not converge if the game lacks pure Nash equilibria. There are however interesting special cases where this process does converge, specifically potential games and games solvable by iterated elimination of dominated strategies (in the latter case even in a strong asynchronous sense). A natural attempt for converging to a mixed Nash equilibria is called fictitious play: in every round you play your best response to the other players’ mixed strategies defined according to the empirical historical distribution of their strategies. This process again is not guaranteed to converge and may loop indefinitely. In fact, there is no natural process that is known to converge to a Nash equilibrium that is not essentially equivalent to exhaustive search; this shouldn’t come as a surprise since such a process would likely imply an efficient algorithm whose existence we doubt. There are however natural processes that do converge to correlated equilibria. These are based on regret minimization procedures. Below is a minimal description of these types of learning algorithms; for a comprehensive exposition as well as historical context see chapter 4 in the algorithmic game theory book, or this talk by Avrim Blum.
Here is the basic framework: we need to design a “learning algorithm” that chooses one out of possible actions, and does this repeatedly for steps. For each possible action and time there is some utility . At time the algorithm must choose an action before seeing the current utilities but rather only based on previous utilities. The goal is to maximize the total utility obtained: , where is the action took at time . The main challenge here is the total lack of information regarding the utilities which need not have any relation to the utilities at previous times. I.e. the model allows to be chosen by an adversary, and furthermore to depend arbitrarily on the actions chosen previously by our learning algorithm, i.e. on . Since such a setting is hopeless we will not attempt comparing the performance of our algorithm to the posterior optimum, but just to some family of alternative algorithms (a family that may be defined as possible variants of the original ). What we need in this context is called low swap (or internal) regret. Let , we say that is obtained from by swap if whenever takes action , then takes action .
Definition: that runs for steps has swap regret if for for every algorithm that is obtained from by some swap, we have that .
It is not difficult to see that deterministic algorithms cannot have “low” swap regret, but it turns out that randomized algorithms may. A randomized algorithm may choose its action according to a probability distribution , (where each is chosen with probability ). There has been much work on designing learning algorithms with low regret with the end result being efficient randomized algorithms that almost surely achieve swap regret , the important point being that . (For a weaker notion, external regret, the dependence on can be made logarithmic, but this is not known for internal regret). This post will not go into how these algorithms are constructed, but just why they are what we need.
Lemma: Assume that Alice and Bob both play an algorithm for steps with swap regret then the uniform distribution on over all is -close to a correlated equilibrium (in norm), where . I.e., let denote this distribution then for every there exists such that for all , we have that , with (and similarly for Bob).