What will happen in a given strategic situation? The dogma of game theory — which models strategic situations — is that some kind of equilibrium will be reached. Taking a wide enough definition of equilibrium, the dogma seems fine. Taking a narrower look and just considering the Nash equilibrium — either pure (if it exists) or mixed — may be more questionable. In 1974 Robert Aumann suggested a generalization of Nash equilibrium termed correlated equilibrium. This this type of equilibrium is very appealing due to several reasons, and this post presents these from a CS-ey perspective.
What is Correlated equilibrium?
For ease of notation let us stick with two players, Alice and Bob (everything does extend to an arbitrary number of players). A game between Alice and Bob is given by a pair of matrices
and
, where
and
, i.e. Alice has
strategies and Bob has
strategies.
denotes Alice’s utility when she plays
and Bob plays
, and
denotes Bob’s utility in this case. We will start with the simplest notions of equilibria and proceed to correlated equilibria. In all cases equilibrium means that each player is best-replying to the other, so it suffices to define what does best-replying mean in each notion.
Let us start by defining the pure Nash equilibrium: a pure strategy of Alice is a best response to a pure strategy
of Bob if for all
we have that
.
The next step allows players to play mixed strategies rather than pure ones, where a mixed strategy is simply a probability distribution over pure strategies. Denote by the set of mixed strategies (
and for all
,
), then a mixed strategy
is a best reply to a mixed strategy
if for every mixed strategy
we have that
. (This turns out to be equivalent to every pure strategy in the support of
being at least as good against
as every other pure strategy.)
The mixed-Nash equilibrium allows each player to randomize his own actions. The basic idea of correlated equilibrium is to allow joint randomization between Alice and Bob. We imagine a third trusted party choosing a pair according to some distribution
on pairs, and telling
to Alice (only) and telling
to Bob. When can we say that
is indeed a best reply to the information that Alice has about Bob’s action? Well, when Alice gets
, she may assume the conditional distribution over Bob’s strategies:
. Alice is best responding if
is indeed the best response to this distribution, i.e. if for all
we have that
. (Dividing both sides by
gives exactly the conditional probabilities). The distribution
is called a correlated equilibrium if every
is a best response for Alice, and every
a best response for Bob.
Notice that a Nash equilibrium is a special case: when is a product distribution
. In this case the previous best-reply constraints become
, which means that each pure strategy
in the support (
) it is a best reply to
,
. In particular this implies existence of a correlated equilibrium in every finite game.
So what are the good things about this notion of equilibrium relative to Nash equilibrium?
- It can be more efficient than Nash
- It can be implemented like Nash
- It can be efficiently computed unlike Nash
- It can be efficiently learned unlike Nash
Efficiency
It is well known that the Nash equilibrium need not give an efficient outcome. The following game of chicken serves as the standard example that the correlated equilibrium may be more efficient:
. Bob
. Dare Chicken
. Dare 0, 0 4, 1
Alice
. Chicken 1, 4 3, 3
This game has two pure equilibria (C,D) and (D,C) as well as a single mixed Nash equilibrium where both Alice and Bob choose to Dare with probability of exactly 1/2. If we are interested in the obtained sum of utilities (social welfare) as our goal, then each pure equilibria gets social welfare of 5, while the mixed one gets an expected social welfare of 4. The optimal social welfare, 6, is obtained at a non-equilibrium point. Putting it into a price of anarchy/stability point of view, the price of stability (ratio between welfares obtained in best Nash equilibrium and the optimimum) is 6/5=1.2, while the price of anarchy (ratio between welfares obtained in worst Nash equilibrium and the optimum) is 6/4=1.5 (for the mixed case).
Let us look at the following correlated distribution:
. Bob
. Dare Chicken
. Dare 0 1/3
Alice
. Chicken 1/3 1/3
Let us see why this is a correlated equilibrium. Assume that Alice “gets” dare. This happens with probability 1/3, and in this case she knows for sure that Bob got “chicken” — so clearly daring is a best reply. On the other hand, if Alice gets Chicken — which happens with probability 2/3 — then her conditional distribution on Bob is 1/2-1/2. Under this distribution she is indifferent between daring and chicken, so chicken is still a best reply. The nice thing is that the expected sum of utilities in this correlated equilibrium is — better than that of any Nash equilibrium. This turns out to be the best correlated equilibrium in terms of social welfare, and the resulting price of correlated stability is thus
. The set of correlated equilibria does not only includes ones with higher welfare than any Nash equilibrium, but also ones with lower welfare. The worst correlated equilibrium turns out to give 1/3 weight to each of (dare,dare), (dare,chicken), and (chicken,dare), for a total expected social welfare of
, giving a correlated price of anarchy of
. The efficiency gap may be arbitrarily large as the following example shows [corrected on 21/10 thanks to Jochen Konemann who found a bug in the original version] (all missing entries are 0,0):
Inspection will reveal that any Nash equilibrium cannot have any C* strategies in its support and thus gives always utility to both players. In contrast, a uniform distribution on the 1,1 entries is a correlated equilibrium that gives utility 1 to each player. Thus while the regular price of stability is infinite, the correlated price of stability is 1. So, the optimists (like me) who prefer looking at the price of stability may view the notion of a correlated equilibrium as an opportunity to get better results. On the other hand, the pessimists who look at the price of anarchy may be concerned that it may bring worse results. To ease these fears, a very recent paper, “Intrinsic robustness of the price of anarchy” by Tim Roughgarden shows that for a wide class of games, including the congestion games usually studied in the context of the price of anarchy, this does not happen — the correlated price of anarchy turns out to be no worse than the regular price of anarchy.
Implementation
The most serious concern with the notion of correlated equilibrium is its implementation — how can the players get a joint correlated probability distribution. The definition of the notion assumes a fictional trusted third party termed a mediator. But how can the correlated equilibrium be reached in normal circumstances, without a mediator? One may devise various “correlation devices” to replace the fictional mediator. For example, in the chicken example above a hat and three balls will do the trick: two balls labeled “chicken” and one ball labeled “dare” are thrown into a hat (under the joint supervision of Alice and Bob) and Alice and Bob each takes one ball from the hat (without looking). Notice that each of them draws “dare” with probability 1/3 — and they never do so simultaneously — exactly our correlated equilibrium!
Can such a correlation device be implemented in general? Can an equivalent device that works over computer communication networks be implemented? Both the game-theory community and the cryptography community worked on these questions using somewhat different notions and terminology. Game theorists considered adding a pre-round of cheap talk to the game where the parties may communicate with each other. From the computer science perspective this is just a special case of secure function evaluation: there are general cryptographic techniques that enable any number of parties to emulate any trusted mediator without really having such a mediator and without trusting each other. There are a variety of models and results here, but the basic ones in our setting allow (1) making standard cryptographic assumptions, any correlated equilibrium in any game with any number of players can be implemented securely in a computational sense (2) any correlated equilibrium in any game with at least 3 (or 4, depending on the exact model) players can be implemented securely in an information-theoretic sense (but not in 2-party games). For more information see chapter 8 of the algorithmic game theory book.
Computation
The first nice thing about correlated equilibria is that they can be efficiently found. Just look at the inequalities that define them above: they are all linear inequalities in the ‘s: for every
:
, similar inequalities for Bob, and further linear inequalities specifying that
is a probability distribution. Thus we can find a correlated equilibrium using linear programming and even optimize an arbitrary linear function over correlated equilibria, e.g. find the one with highest social welfare. This is in stark contrast to Nash equilibria where finding an arbitrary one is PPAD-complete and optimizing social welfare over them is NP-hard.
The LP algorithm for correlated equilibria directly applies to games with any number of players and its running time is polynomial in the size of the game (the descriptions size of the utility functions of the players, which is the product of the players’ strategy-space sizes). The situation is actually even better and we can often find correlated equilibria of exponential-size games in polynomial time. Specifically, using the Ellipsoid algorithm, there are cases where it is possible to find a correlated equilibrium in time that is polynomial in the length of the representation of the strategies of the game. For this to make sense there needs to be a succinct representation of the game, where given a representation of players’ strategies, their utilities can be easily determined.
Learning
How can an equilibrium of a game be practically reached? Is there a natural process that will lead to a state of equilibrium? A process where the players will “learn” what they should play? There have been many different attempts and models to define processes that converge to equilibria. A simple setting considers players repeatedly playing a game, where at each stage they act according to some given “learning” rule (which is intuitively “somewhat rational” but not really strategic) that depends on the history of the game as well as on their own utility function (but usually not on the other players’ utility functions — this is called uncoupled dynamics).
A simple example is repeated-best-reply: in every round you play your best reply to the strategies played by the other players in the previous round. If this process converges, then it will be to a pure Nash equilibrium. However, it may in general fail to converge even if a pure Nash equilibrium exists and certainly will not converge if the game lacks pure Nash equilibria. There are however interesting special cases where this process does converge, specifically potential games and games solvable by iterated elimination of dominated strategies (in the latter case even in a strong asynchronous sense). A natural attempt for converging to a mixed Nash equilibria is called fictitious play: in every round you play your best response to the other players’ mixed strategies defined according to the empirical historical distribution of their strategies. This process again is not guaranteed to converge and may loop indefinitely. In fact, there is no natural process that is known to converge to a Nash equilibrium that is not essentially equivalent to exhaustive search; this shouldn’t come as a surprise since such a process would likely imply an efficient algorithm whose existence we doubt. There are however natural processes that do converge to correlated equilibria. These are based on regret minimization procedures. Below is a minimal description of these types of learning algorithms; for a comprehensive exposition as well as historical context see chapter 4 in the algorithmic game theory book, or this talk by Avrim Blum.
Here is the basic framework: we need to design a “learning algorithm” that chooses one out of
possible actions, and does this repeatedly for
steps. For each possible action
and time
there is some utility
. At time
the algorithm must choose an action
before seeing the current utilities
but rather only based on previous utilities. The goal is to maximize the total utility obtained:
, where
is the action
took at time
. The main challenge here is the total lack of information regarding the utilities
which need not have any relation to the utilities at previous times. I.e. the model allows
to be chosen by an adversary, and furthermore to depend arbitrarily on the actions chosen previously by our learning algorithm, i.e. on
. Since such a setting is hopeless we will not attempt comparing the performance of our algorithm to the posterior optimum, but just to some family of alternative algorithms (a family that may be defined as possible variants of the original
). What we need in this context is called low swap (or internal) regret. Let
, we say that
is obtained from
by swap
if whenever
takes action
, then
takes action
.
Definition: that runs for
steps has swap regret
if for for every algorithm
that is obtained from
by some swap, we have that
.
It is not difficult to see that deterministic algorithms cannot have “low” swap regret, but it turns out that randomized algorithms may. A randomized algorithm may choose its action according to a probability distribution
, (where each
is chosen with probability
). There has been much work on designing learning algorithms with low regret with the end result being efficient randomized algorithms that almost surely achieve swap regret
, the important point being that
. (For a weaker notion, external regret, the dependence on
can be made logarithmic, but this is not known for internal regret). This post will not go into how these algorithms are constructed, but just why they are what we need.
Lemma: Assume that Alice and Bob both play an algorithm for steps with swap regret
then the uniform distribution on
over all
is
-close to a correlated equilibrium (in
norm), where
. I.e., let
denote this distribution then for every
there exists
such that for all
, we have that
, with
(and similarly for Bob).
There is another interesting aspect regarding correlated equilibrium. You can ask if there is some seperate dynamics for the players (when the game is played repeatedly) so that when each player of a game behave according to this dynamics it will collectively lead to an equilibrium point. Sergiu Hart and Andreu Mass-Collel have several general results showing that such a dynamics is possible for correlated equilibrium and is impossible for Nash equilibrium. See e.g. http://www.ma.huji.ac.il/~hart/abs/adaptdyn.html?
Gil, this is exactly what the last section, on learning, of this post discusses. Hart and Ma-Collel show that you can’t get any “memory-less” convergence to Nash, but they do give general uncoupled dynamics that do converge to Nash — although after essentially performing a random exhaustive search (and taking exponential time).
Their dynamics that converge to correlated equilibria are exactly the regret-minimization ones described in this post (they actually build the regret minimization learning algorithm which I didn’t describe.)
Right, and there are several related works by Hart and Mas-Collel including a result that in certain cases uncpupled dynamice that converges to Nash does not exist at all. Anyway Sergiu’s presentation and homepage is a good link to this topic. (Strangely “Mas-Collel” in Hebrew means “comprehensive tax”, a nice name for an economist.)
For what it’s worth, I worked out some code in Mathematica to solve for correlated equilibrium in an N-player game using linear programming. From a computational point of view, all of the calculating variables needed to find a correlated equilibrium using linear programming make the number of constraints increase rapidly with the number of strategies available to each player. The problem size increases at a polynomial rate, but it is a pretty ugly polynomial.
http://www.economicenquiry.com/blog/2009/4/19/computer-program-to-calculate-correlated-equilibria.html
[…] and thus only a tiny fraction of the private information needs to be transferred. You can see a previous post of mine for background on correlated […]
Since I’ve had a bug in the original version, here’s the “inspection” of the 7*7 game above that sows that no Nash equilibrium has any C* strategy in its support.
D0 weakly dominates any C* strategy, and is strictly better than them as a reply to any D* strategy. Thus unless the column player’s support is limited to the C* strategies, the row player must play just on the D’s and then the column player must do so too. It follows that any Nash equilibrium whose support includes any C* strategy must be completely contained in the C*-by-C* sub-matrix.
However, if some Ci is both in the support of the row player and of the column player, then, again, D0 is a strictly better reply. Thus at most one of the players can use each Ci, and wlog, the column player concentrates ALL his probability on the C1. But then D1 is a strictly better reply for the row player. Contradiction.
[…] depend on the distribution and not on the chosen recommendation. This is strictly weaker than correlated equilibrium in which players should prefer following suggestions to choosing strategies whose identity may […]
i still dont understand What is Correlated equilibrium?
thanks
killing games
[…] is well known that a correlated equilibrium can be computed in polynomial time in the size of the game. In a beautiful paper published several years ago, Papadimitriou and […]
[…] correlated equilibrium is defined by a linear program which, in the context under question, has exponentially many variables and polynomially many […]
It is widespread thing to encounter some sorts of illnesses within our life, but the factor is it really is uncommon to meet the proper treatment
as soon as. There are approximately 17 million people in the US that suffer from some type of yeast infection.
licitations with regards to making ones cachet that is most
certainly Talbots at present. And so those who are suffering from high blood pressure should be careful not to have such health problem for long and unchecked.
The delivery of Linda Mc – Cartney sausage rolls package is usually in frozen form.
Can there be a correlated equilibrium for a game with no cooperation?
do the players have to be aware of the other players subjective probabilities and on what action they choose?
Thanks!
Can there be a correlated equilibrium for a game with no cooperation?
do the players have to be aware of the other players subjective probabilities and on what action they choose?
Thanks!