How do markets and other games reach equilibrium? Or at least, how can they do so? A nice answer will have each participant in the market or game repeatedly performing some simple update step that only depends on information available to him and that makes sense from his local point of view, with the whole system then magically converging to equilibrium. The meanings of “simple”, “update”, “information available”, “makes sense”, and “converging” may vary, but the idea is to capture the basic phenomena of a global equilibrium arising from simple local optimizations.
In this post I will describe some recent work that shows this kind of phenomena where local “regret-minimization” optimizations lead to global convergence in a natural market model.
Regret and Online Convex Optimization
We start by looking at an online optimization strategy for a single decision maker that interacts with an adversarial environment. It will be nicest to use the general and clean model of online convex optimization due to Zinkevich. This model considers an sequence of functions for some set
. The rules of the game require the algorithm at every time
, for
, to pick a point
with the aim of maximizing
, and do so before seeing the function
, but only based on the history, i.e. on the functions
. Given that we are not assuming any connection between the different
‘s, even not a stochastic one, this may seem hopeless, but our yardstick for success will be a bit modest as well: we only want to minimize our regret relative to the best constant point. I.e.
Definition: A sequence of points has regret
on the sequence of functions
if for every
we have that
.
The main point is that there exist online algorithms that always find such low-regret sequences when is any compact convex subspace of a Euclidean space and each
is concave (hill shaped). Zinkevich presents a simple online algorithm that ensure
for any sequence of such functions, and other ones can do even better. Zinkevich’s learning algorithm is simple gradient ascent: You start with an arbitrary
and then each
is obtained by taking a small step of size
from
in the direction of steepest ascent of the function
(and , if needed, projecting back to
). Intuitively you prepare for the future by only slightly modifying what you did in the past, in the direction that would have been best then. Amazingly, by choosing
, one gets
for any sequence of concave functions (where the hidden constant in the big-O notation is quadratic in the diameter of
and in the maximum steepness of the
‘s). In particular, the average loss,
approaches 0. This model and these results generalize the well-studied case where the
‘s are linear functions. In particular, a leading application to have in mind is where the convex set
is the set of mixed strategies, i.e. distributions, over a finite set of
pure strategies, and the utility is the linear function giving the expectation of the utility of the chosen pure strategy.
Coarse Equilibria in Games
So far we were in a decision theoretic model of a single decision maker. Now, what happens when several different decision makers are using such a regret-minimization procedure together? Formally, we have an -player game where players have convex strategy sets
and utility functions
. The players play this game repeatedly, at each time
, each player
chooses a strategy
. Looking at this from player
‘s point of view, at each time
he is presented with function of his
, where the function is determined by the values of the other players’ choices at time
, i.e. for
we see the function
. If each of the players uses a regret minimization algorithm then they will generate together some sequence of strategy profiles
, where each
. Will these approach some kind of equilibrium from a game theoretic perspective? I heard from Eva Tardos the following term for this:
Definition: Let be a distribution on the strategy profiles
. We say that
is a coarse equilibrium (or Hannan consistent) if for every player
and every
we have that
, where
denotes expectation. We say that it is an
-coarse-equilibrium if
.
Intuitively, we can think about a trusted coordinator who picks a profile of strategies according to the distribution
and “suggests” it to the players.
is a coarse equilibrium if all players prefer to follow these suggestions, rather than play any fixed strategy that may only depend on the distribution
and not on the chosen recommendation. This is strictly weaker than correlated equilibrium in which players should prefer following suggestions to choosing strategies whose identity may depend on their suggested strategy
. This type of equilibrium is what the definition of low regret implies directly:
Proposition: If all players have regret , (i.e. for each
and each
, we have that
,) then the uniform distribution on
is an
-coarse equilibrium for
.
Let us just look what this implies for finite games in which players have a finite set of pure strategies. To apply the results form online concave optimization, the regret algorithms will be run over the set of mixed strategies, which is a compact convex set. In this case the utility for a profile of mixed strategies is the expected utility of the chosen pure strategy profile. Thus once is fixed, the utility function on
is linear and thus concave so the regret minimization algorithms apply here and yield a
-coarse equilibrium in time
. By direct definition this is a
-coarse equilibrium of the mixed game, i.e a distribution over (the
) profiles of mixed strategies. However, this distribution may also be viewed as a distribution on the underlying pure strategies so we have also achieved an
-coarse equilibrium of the pure game. It follows that for every finite game, if each player runs a regret minimization algorithm on the set of mixed strategies then we get a coarse equilibrium.
Socially Concave Games
Just the fact that we have called what is reached (in the sense of average over time) an equilibrium does not mean that we have to like it from a game theoretic sense. Could we get a “normal” equilibrium from such corse equilibrium? Even-Dar, Mansour, and Nadav point out a family of games, termed socially concave games, where this is the case.
Definition: An player game is called socially concave if it satisfies the following two properties:
- For each
and each fixed value of
, the function
is convex in
.
- The sum
is a concave function of
.
Theorem: Let be a coarse equilibrium of a socially concave game with convex strategy sets, and denote the average value of of elements drawn according to this distribution by
(so by convexity of the
‘s also
) then
is a (pure) Nash equilibrium. If
is
-coarse equilibrium then
is an
-Nash equilibrium.
This theorem ensures a pure equilibrium in the game with convex strategy sets. Looking at the case that the game is the mixed extension of a finite game, this is translated to a mixed equilibrium of the finite game. Before we prove the theorem let’s look at a simple application: two-player zero-sum finite games. Looking at the mixed extension we have , where,
and
are the mixed strategies of the two players and
is the
matrix of utilities of player 1. This game satsifies the two conditions since for any fixed
we have that
is a linear function (hence convex) of
, dualy for player 2, and the sum
is constant and thus concave. This implies that if the two players in a zero-sum game run a regret minimization algorithm (on the mixed strategies) then taking the average of the distribution over time is a (
-)mixed Nash equilibrium.
Proof: We will prove the exact version, the quantative is identical, only taking care of the ‘s everywhere. What we need to show is that for every
, it holds that
. We will show this by showing the same inequality for the sum:
. Notice that for the optimal
(the one maximizing
) we trivially have
, and thus the inequality of the sum at the optimal
‘s implies the inequality (in fact equality) for each term seperately. So we start with
and make use of the concavity of the sum and the definition of
to conclude
(using Jensen’s inequality.) At this point we invoke the definition of coarse equilibrium for each
getting
, and now using the convexity of
in
(once
is fixed) we get
, chaining these inequalities we get the required inequality on the sums.
A Market Equilibrium
We can now take this general result and apply it to a rather general market model. This is due to Sergiu Hart and Andreu Mas-Colell. (When I asked Sergiu how should I refer to this result, he said something like “had we been computer scientists this would have already been published in a conference; since we are economists these are preliminary thoughts that maybe will find their way into a paper within a few years”.)
Our market has divisible goods and
players. Each player
has a concave valuation function
that specifies his value for owning the bundle of goods with
fraction of each good
. Each player
has an initial endowment of
of each good
, and they may trade between themselves, reaching some final allocation, where each player
receives a fraction
of each good
, where for each good
we have that
. Significantly, we assume the existence of money that can be transferred between the players, formally “transferable utilities” or “quasi-linearity”. In this model the notion of a market equilibrium is:
Definition: The allocation is at (transferable utility) equilibrium with prices
if for all goods
we have that
and for every player
, his net utility,
is maximized (over all possible bundles
).
We now turn this market into a game. For this, we will require the minor technical assumption that the partial derivatives of the ‘s are bounded from above by
and from below by
for some constant
. The game will have
players:
players that correspond to the original players in the market, with the strategy set of player
being the set of vectors
with
, and
players corresponding to the goods, where the strategies of “good-player”
are the possible prices
. The utility of an “original-player”
is, as expected,
, while those of “good-player”
is:
. The point is that this new game captures, as its Nash equilibrium, the equilibrium of the previous market.
Proposition: The allocation with prices
is an equilibrium of the market if and only if it is a pure Nash equilibrium of the game defined above.
Proof: First notice that being a market equilibrium directly implies that all “original” players best-reply in the associated game. It also implies that and thus any
is a best-reply for “good-player”
in the game. The opposite direction is identical once we show that any pure Nash equilibrium of the game must have
. Suppose that this was not the case, say
for some good
, then the only best response for “good-player”
would be to declare
, to which all other players could only best-respond with
due to the bound from above on the derivative of the
‘s, contradicting
. Similarly if
then the only best response for the “good-player”
would be
, to which all other players could only best-respond with
due to the bound from below on the derivative the the
‘s, contradicting
.
At this point it is easy to see that the game we just defined is socially concave: for an “original” player , and every fixed
, his utility is just a linear function of the
‘s and thus convex in the strategies of the others; for a “good-player”
and every fixed value of
, his utility is just a linear function in the
‘s so again is convex in the strategies of the others. Finally, summing up the utilities of all players, the
terms cancel out and we are left with
which is a concave function being the sum of concave
‘s.
So we now close the loop: to effectively find an -equilibrium of a given market, we let each player in the market as well as artificially created “good-players” run a regret-minimization algorithm for a while. This is a very simple decentralized process: the original players attempt optimizing their demands to the sequence of ever-changing price vectors that they see, while the “good-players” optimize their prices to the changing aggregated demands that they see. In both cases, not by a simple best-response but rather by the regret minimization response. We then have that the long-term averages of demands and prices will be the desired market equilibrium point.