Feeds:
Posts

## Regret in Markets

How do markets and other games reach equilibrium?  Or at least, how can they do so? A nice answer will have each participant in the market or game repeatedly performing some simple update step that only depends on information available to him and that makes sense from his local point of view, with the whole system then magically converging to equilibrium.  The meanings of “simple”, “update”, “information available”,  “makes sense”, and “converging” may vary, but the idea is to capture the basic phenomena of a global equilibrium arising from simple local optimizations.

In this post I will describe some recent work that shows this kind of phenomena where local “regret-minimization” optimizations lead to global convergence in a natural market model.

## Regret and Online Convex Optimization

We start by looking at an online optimization strategy for a single decision maker that interacts with an adversarial environment.  It will be nicest to use the general and clean model of  online convex optimization due to Zinkevich.  This model considers an sequence of functions $u^t : S \rightarrow \Re$ for some set $S$. The rules of the game require the algorithm at every time $t$, for $t =1,2,...T$, to pick a point $x^t \in S$ with the aim of  maximizing $u^t(x^t)$, and do so before seeing the function $u^t$, but only based on the history, i.e. on the functions $u^1 ... u^{t-1}$.  Given that we are not assuming any connection between the different $u^t$‘s, even not a stochastic one, this may seem hopeless, but our yardstick for success will be a bit modest as well: we only want to minimize our regret relative to the best constant point.  I.e.

Definition: A sequence of points $x^1 ... x^T$ has regret $R$ on the sequence of functions $u^1 ... u^T$ if for every $x^* \in S$ we have that $\sum_{t=1}^T u^t(x^t) \ge \sum_{t=1}^T u^t(x^*) - R$.

The main point is that there exist online algorithms that always find such low-regret sequences when $S$ is any compact convex subspace of a Euclidean space and each $u^t$ is concave (hill shaped).  Zinkevich presents a simple online algorithm that ensure $R = O(\sqrt{T})$ for any sequence of such functions, and other ones can do even better.  Zinkevich’s learning algorithm is simple gradient ascent: You start with an arbitrary $x^1$ and then each $x^t$ is obtained by taking a small step of size $\delta_t$ from $x^{t-1}$ in the direction of steepest ascent of the function $u^{t-1}$ (and , if needed, projecting back to $S$).  Intuitively you prepare for the future by only slightly modifying what you did in the past, in the direction that would have been best then.  Amazingly, by choosing $\delta_t = 1/\sqrt{t}$, one gets $R = O(\sqrt{t})$ for any sequence of concave functions (where the hidden constant in the big-O notation is quadratic in the diameter of $S$ and in the maximum steepness of the $u^t$‘s).  In particular, the average loss, $R/T$ approaches 0.  This model and these results generalize the well-studied case where the $u^t$‘s are linear functions. In particular, a leading application to have in mind is where the convex set $S$ is the set of mixed strategies, i.e. distributions, over a finite set of $m$ pure strategies, and the utility is the linear function giving the expectation of the utility of the chosen pure strategy.

## Coarse Equilibria in Games

So far we were in a decision theoretic model of a single decision maker.  Now, what happens when several different decision makers are using such a regret-minimization procedure together?  Formally, we have an $n$-player  game where players have convex strategy sets $S_i$ and utility functions  $u_i : S_1 \times ... \times S_n \rightarrow \Re$.  The players play this game repeatedly, at each time $t$, each player $i$ chooses a strategy $x_i^t$.  Looking at this from player $i$‘s point of view, at each time $t$ he is presented with function of his $x_i$, where the function is determined by the values of the other players’ choices at time $t$, i.e. for $z \in S_i$ we see the function $u_i^t(z) =u_i(z, x_{-i}^t)$.  If each of the players uses a regret minimization algorithm then they will generate together some sequence of strategy profiles $x^1 ... x^T$, where each $x^t=(x_1^t ... x_n^t)$.  Will these approach some kind of equilibrium from a game theoretic perspective?  I heard from Eva Tardos the following term for this:

Definition: Let $D$ be a distribution on the strategy profiles $S_1 \times ... \times S_n$.  We say that $D$ is a coarse equilibrium (or Hannan consistent) if for every player $i$ and every $x^*_i \in S_i$ we have that $E_{x \sim D} [u_i(x)] \ge E_{x_{-i} \sim D_{-i}} [u_i(x^*_i,x_{-i})]$, where $E$ denotes expectation.  We say that it is an $\epsilon$-coarse-equilibrium if $E_{x \sim D} [u_i(x)] \ge E_{x_{-i} \sim D_{-i}} [u_i(x^*_i,x_{-i})] - \epsilon$.

Intuitively, we can think about a trusted coordinator who picks a profile of strategies $x_1 .. x_n$ according to the distribution $D$ and “suggests” it to the players.  $D$ is a coarse equilibrium if all players prefer to follow these suggestions, rather than play any fixed strategy that may only depend on the distribution $D$ and not on the chosen recommendation.  This is strictly weaker than correlated equilibrium in which players should prefer following suggestions to choosing strategies whose identity may depend on  their suggested strategy $x_i$.  This type of equilibrium is what the definition of low regret implies directly:

Proposition: If all players have regret $R$, (i.e. for each $i$ and each $x^*_i$, we have that $\sum_{t=1}^T u_i(x_i^t,x_{-i}^t) \ge \sum_{t=1}^T u_i(x^*_i,x_{-i}^t) - R$,) then the uniform distribution on $\{x^1 ... x^T\}$ is an $\epsilon$-coarse equilibrium for $\epsilon = R/T$.

Let us just look what this implies for finite games in which players have a finite set of pure strategies.  To apply the results form online concave optimization, the regret algorithms will be run over the set of mixed strategies, which is a compact convex set.  In this case the utility for a profile of mixed strategies is the expected utility of the chosen pure strategy profile. Thus once $x^t_{-i}$ is fixed, the utility function on $x_i$ is linear and thus concave so the regret minimization algorithms apply here and yield a $\epsilon = O(1/\sqrt{T})$-coarse equilibrium in time $T$.  By direct definition this is a $\epsilon$-coarse equilibrium of the mixed game, i.e a distribution over (the $T$) profiles of mixed strategies.  However, this distribution may also be viewed as a distribution on the underlying pure strategies so we have also achieved an $\epsilon$-coarse equilibrium of the pure game.   It follows that for every finite game, if each player runs a regret minimization algorithm on the set of mixed strategies then we get a coarse equilibrium.

## Socially Concave Games

Just the fact that we have called what is reached (in the sense of average over time) an equilibrium does not mean that we have to like it from a game theoretic sense.  Could we get a “normal” equilibrium from such corse equilibrium?  Even-Dar, Mansour, and Nadav point out a family of games, termed socially concave games, where this is the case.

Definition: An $n$ player game is called socially concave if it satisfies the following two properties:

• For each $i$ and each fixed value of $x_i$, the function $u_i(x_i, x_{-i})$ is convex in $x_{-i}$.
• The sum $\sum_{i=1}^n u_i(x)$ is a concave function of $x$.

Theorem: Let $D$ be a coarse equilibrium of a socially concave game with convex strategy sets, and denote the average value of of elements drawn according to this distribution by $\hat{x} = E_{x \sim D} [x]$ (so by convexity of the $S_i$‘s also $\hat{x}_1 \in S_1 ... \hat{x}_n \in S_n$) then $(\hat{x}_1 ... \hat{x}_n)$ is a (pure) Nash equilibrium.  If $D$ is $\epsilon$-coarse equilibrium then $(\hat{x}_1 ... \hat{x}_n)$ is an $n\epsilon$-Nash equilibrium.

This theorem ensures a pure equilibrium in the game with convex strategy sets.  Looking at the case that the game is the mixed extension of a finite game, this is translated to a mixed equilibrium of the finite game.  Before we prove the theorem let’s look at a simple application: two-player zero-sum finite games.  Looking at the mixed extension we have $u_1(y,z) = -u_2(y,z) = y^TAz$, where, $y$ and $z$ are the mixed strategies of the two players and $A$ is the $m_1 \times m_2$ matrix of utilities of player 1.  This game satsifies the two conditions since for any fixed $y$ we have that $u_1$ is a linear function (hence convex) of $z$, dualy for player 2, and the sum $u_1(y,z)+u_2(y_z)=0$ is constant and thus concave.  This implies that if the two players in a zero-sum game run a regret minimization algorithm (on the mixed strategies) then taking the average of the distribution over time is a ($\epsilon=O(1\sqrt{T})$-)mixed Nash equilibrium.

Proof: We will prove the exact version, the quantative is identical, only taking care of the $\epsilon$‘s everywhere.  What we need to show is that for every $x^*_i$, it holds that $u_i(\hat{x}_i,\hat{x}_{-i}) \ge u_i(x^*_i,\hat{x}_{-i})$.  We will show this by showing the same inequality for the sum: $\sum_i u_i(\hat{x}_i,\hat{x}_{-i}) \ge \sum_i u_i(x^*_i,\hat{x}_{-i})$.  Notice that for the optimal $x^*_i$ (the one maximizing $u_i(x^*_i,\hat{x}_{-i}$) we trivially have $u_i(\hat{x}_i,\hat{x}_{-i}) \le u_i(x^*_i,\hat{x}_{-i})$, and thus the inequality of the sum at the optimal $x^*_i$‘s implies the inequality (in fact equality) for each term seperately.  So we start with $\sum_i u_i(\hat{x}_i,\hat{x}_{-i})$ and make use of the concavity of the sum and the definition of $\hat{x} = E[ x]$  to conclude $\sum_i u_i(\hat{x}_i,\hat{x}_{-i}) \ge E[\sum_i u_i(x_i, x_{-i})]=\sum_i E[u_i(x_i, x_{-i})]$ (using Jensen’s inequality.)  At this point we invoke the definition of coarse equilibrium for each $i$ getting $E[u_i(x_i, x_{-i})] \ge E[u_i(x^*_i,x_{-i})]$, and now using the convexity of $u_i$ in $x_{-i}$ (once $x^*_i$ is fixed) we get $E[u_i(x^*_i,x_{-i})] \ge u_i(x^*_i,\hat{x}_{-i})$, chaining these inequalities we get the required inequality on the sums.

## A Market Equilibrium

We can now take this general result and apply it to a rather general market model.  This  is due to Sergiu Hart and Andreu Mas-Colell.  (When I asked Sergiu how should I refer to this result, he said something like “had we been computer scientists this would have already been published in a conference; since we are economists these are preliminary thoughts that maybe will find their way into a paper within a few years”.)

Our market has $m$ divisible goods and $n$ players.  Each player $i$ has  a concave valuation function $v_i(x_1 ... x_m)$ that specifies his value for owning the bundle of goods with $x_j$ fraction of each good $j$.  Each player $i$ has an initial endowment of $w_{ij}$ of each good $j$, and they may trade between themselves, reaching some final allocation, where each player $i$ receives a fraction $x_{ij}$ of each good $j$, where for each good $j$ we have that $\sum_{i} w_{ij} = \sum_{i} x_{ij}$.  Significantly, we assume the existence of money that can be transferred between the players, formally “transferable utilities” or “quasi-linearity”.   In this model the notion of a market equilibrium is:

Definition: The allocation $(x_{ij})$  is at (transferable utility) equilibrium with prices $(p_1 ... p_m)$ if for all goods $j$ we have that $\sum_{i} w_{ij} = \sum_{i} x_{ij}$ and for every player $i$, his net utility, $v_i(x_{i1}...x_{im})-\sum_{j} (x_{ij}-w_{ij})p_j$ is maximized (over all possible bundles $(x_{i1}...x_{im})$).

We now turn this market into a game.  For this, we will require the minor technical assumption that the partial derivatives of the $u_i$‘s are bounded  from above by $M$ and from below by $1/M$ for some constant $M$.  The game will have $n+m$ players: $n$ players that correspond to the original players in the market, with the strategy set of player $i$  being the set of  vectors $(x_{i1}...x_{im})$ with $0 \le x_{ij} \le \sum_i w_{ij}$, and $m$ players corresponding to the goods, where the strategies of “good-player” $j$ are the possible prices $0 \le p_j \le M$.  The utility of an “original-player” $i$ is, as expected,  $u_i(x,p)=v_i(x_{i1}...x_{im})- \sum_{j} p_j(x_{ij} - w_{ij})$, while those of  “good-player” $j$ is: $u_j(x,p) = p_j \cdot (\sum_i x_{ij} - \sum_i w_{ij})$.  The point is that this new game captures, as its Nash equilibrium, the equilibrium of the previous market.

Proposition: The allocation $(x_{ij})$ with prices $(p_1 ... p_m)$ is an equilibrium of the market if and only if  it is a pure Nash equilibrium of the game defined above.

Proof: First notice that being a market equilibrium directly implies that all “original” players best-reply in the associated game.  It also implies that  $\sum_{i} w_{ij} = \sum_{i} x_{ij}$ and thus any $p_j$ is a best-reply for “good-player” $j$ in the game.  The opposite direction is identical once we show that any pure Nash equilibrium of the game must have $\sum_{i} w_{ij} = \sum_{i} x_{ij}$.  Suppose that this was not the case, say $\sum_{i} x_{ij} > \sum_{i} w_{ij}$ for some good $j$, then the only best response for “good-player” $j$ would be to declare $p_j=M$, to which all other players could only best-respond with $x_{ij}=0$ due to the bound from above on the derivative of the $u_i$‘s, contradicting  $\sum_{i} x_{ij} > \sum_{i} w_{ij}$.  Similarly if $\sum_{i} x_{ij} < \sum_{i} w_{ij}$ then the only best response for the “good-player” $j$ would be $p_j=0$, to which all other players could only best-respond with $x_{ij}=1$ due to the bound from below on the derivative the the $u_i$‘s, contradicting  $\sum_{i} x_{ij} < \sum_{i} w_{ij}$.

At this point it is easy to see that the game we just defined is socially concave: for an “original” player $i$, and every fixed $(x_{i1}...x_{im})$, his utility is just a linear function of the $p_j$‘s and thus convex in the strategies of the others; for a “good-player” $j$ and every fixed value of $p_j$, his utility is just a linear function in the $x_{ij}$‘s so again is convex in the strategies of the others.  Finally, summing up the utilities of all players, the $p_j \cdot (\sum_i x_{ij} - \sum_i w_{ij})$ terms cancel out and we are left with $\sum_i v_i(x_{i1}...x_{im})$ which is a concave function being the sum of concave $v_i$‘s.

So we now close the loop: to effectively find an $\epsilon$-equilibrium of a given market, we let each player in the market as well as artificially created  “good-players” run a regret-minimization algorithm for a while.  This is a very simple decentralized process: the original players attempt optimizing their demands to the sequence of ever-changing price vectors that they see, while the “good-players” optimize their prices to the changing aggregated demands that they see.  In both cases, not by a simple best-response but rather by the regret minimization response.  We then have that the long-term averages of demands and prices will be the desired market equilibrium point.