Feeds:
Posts
Comments

Posts Tagged ‘technical’

How do markets and other games reach equilibrium?  Or at least, how can they do so? A nice answer will have each participant in the market or game repeatedly performing some simple update step that only depends on information available to him and that makes sense from his local point of view, with the whole system then magically converging to equilibrium.  The meanings of “simple”, “update”, “information available”,  “makes sense”, and “converging” may vary, but the idea is to capture the basic phenomena of a global equilibrium arising from simple local optimizations.

In this post I will describe some recent work that shows this kind of phenomena where local “regret-minimization” optimizations lead to global convergence in a natural market model.

Regret and Online Convex Optimization

We start by looking at an online optimization strategy for a single decision maker that interacts with an adversarial environment.  It will be nicest to use the general and clean model of  online convex optimization due to Zinkevich.  This model considers an sequence of functions u^t : S \rightarrow \Re for some set S. The rules of the game require the algorithm at every time t, for t =1,2,...T, to pick a point x^t \in S with the aim of  maximizing u^t(x^t), and do so before seeing the function u^t, but only based on the history, i.e. on the functions u^1 ... u^{t-1}.  Given that we are not assuming any connection between the different u^t‘s, even not a stochastic one, this may seem hopeless, but our yardstick for success will be a bit modest as well: we only want to minimize our regret relative to the best constant point.  I.e.

Definition: A sequence of points x^1 ... x^T has regret R on the sequence of functions u^1 ... u^T if for every x^* \in S we have that \sum_{t=1}^T u^t(x^t) \ge \sum_{t=1}^T u^t(x^*) - R.

The main point is that there exist online algorithms that always find such low-regret sequences when S is any compact convex subspace of a Euclidean space and each u^t is concave (hill shaped).  Zinkevich presents a simple online algorithm that ensure R = O(\sqrt{T}) for any sequence of such functions, and other ones can do even better.  Zinkevich’s learning algorithm is simple gradient ascent: You start with an arbitrary x^1 and then each x^t is obtained by taking a small step of size \delta_t from x^{t-1} in the direction of steepest ascent of the function u^{t-1} (and , if needed, projecting back to S).  Intuitively you prepare for the future by only slightly modifying what you did in the past, in the direction that would have been best then.  Amazingly, by choosing \delta_t = 1/\sqrt{t}, one gets R = O(\sqrt{t}) for any sequence of concave functions (where the hidden constant in the big-O notation is quadratic in the diameter of S and in the maximum steepness of the u^t‘s).  In particular, the average loss, R/T approaches 0.  This model and these results generalize the well-studied case where the u^t‘s are linear functions. In particular, a leading application to have in mind is where the convex set S is the set of mixed strategies, i.e. distributions, over a finite set of m pure strategies, and the utility is the linear function giving the expectation of the utility of the chosen pure strategy.

Coarse Equilibria in Games

So far we were in a decision theoretic model of a single decision maker.  Now, what happens when several different decision makers are using such a regret-minimization procedure together?  Formally, we have an n-player  game where players have convex strategy sets S_i and utility functions  u_i : S_1 \times ... \times S_n \rightarrow \Re.  The players play this game repeatedly, at each time t, each player i chooses a strategy x_i^t.  Looking at this from player i‘s point of view, at each time t he is presented with function of his x_i, where the function is determined by the values of the other players’ choices at time t, i.e. for z \in S_i we see the function u_i^t(z) =u_i(z, x_{-i}^t).  If each of the players uses a regret minimization algorithm then they will generate together some sequence of strategy profiles x^1 ... x^T, where each x^t=(x_1^t ... x_n^t).  Will these approach some kind of equilibrium from a game theoretic perspective?  I heard from Eva Tardos the following term for this:

Definition: Let D be a distribution on the strategy profiles S_1 \times ... \times S_n.  We say that D is a coarse equilibrium (or Hannan consistent) if for every player i and every x^*_i \in S_i we have that E_{x \sim D} [u_i(x)] \ge E_{x_{-i} \sim D_{-i}} [u_i(x^*_i,x_{-i})], where E denotes expectation.  We say that it is an \epsilon-coarse-equilibrium if E_{x \sim D} [u_i(x)] \ge E_{x_{-i} \sim D_{-i}} [u_i(x^*_i,x_{-i})] - \epsilon.

Intuitively, we can think about a trusted coordinator who picks a profile of strategies x_1 .. x_n according to the distribution D and “suggests” it to the players.  D is a coarse equilibrium if all players prefer to follow these suggestions, rather than play any fixed strategy that may only depend on the distribution D and not on the chosen recommendation.  This is strictly weaker than correlated equilibrium in which players should prefer following suggestions to choosing strategies whose identity may depend on  their suggested strategy x_i.  This type of equilibrium is what the definition of low regret implies directly:

Proposition: If all players have regret R, (i.e. for each i and each x^*_i, we have that \sum_{t=1}^T u_i(x_i^t,x_{-i}^t) \ge \sum_{t=1}^T u_i(x^*_i,x_{-i}^t) - R,) then the uniform distribution on \{x^1 ... x^T\} is an \epsilon-coarse equilibrium for \epsilon = R/T.

Let us just look what this implies for finite games in which players have a finite set of pure strategies.  To apply the results form online concave optimization, the regret algorithms will be run over the set of mixed strategies, which is a compact convex set.  In this case the utility for a profile of mixed strategies is the expected utility of the chosen pure strategy profile. Thus once x^t_{-i} is fixed, the utility function on x_i is linear and thus concave so the regret minimization algorithms apply here and yield a \epsilon = O(1/\sqrt{T})-coarse equilibrium in time T.  By direct definition this is a \epsilon-coarse equilibrium of the mixed game, i.e a distribution over (the T) profiles of mixed strategies.  However, this distribution may also be viewed as a distribution on the underlying pure strategies so we have also achieved an \epsilon-coarse equilibrium of the pure game.   It follows that for every finite game, if each player runs a regret minimization algorithm on the set of mixed strategies then we get a coarse equilibrium.

Socially Concave Games

Just the fact that we have called what is reached (in the sense of average over time) an equilibrium does not mean that we have to like it from a game theoretic sense.  Could we get a “normal” equilibrium from such corse equilibrium?  Even-Dar, Mansour, and Nadav point out a family of games, termed socially concave games, where this is the case.

Definition: An n player game is called socially concave if it satisfies the following two properties:

  • For each i and each fixed value of x_i, the function u_i(x_i, x_{-i}) is convex in x_{-i}.
  • The sum \sum_{i=1}^n u_i(x) is a concave function of x.

Theorem: Let D be a coarse equilibrium of a socially concave game with convex strategy sets, and denote the average value of of elements drawn according to this distribution by \hat{x} = E_{x \sim D} [x] (so by convexity of the S_i‘s also \hat{x}_1 \in S_1 ... \hat{x}_n \in S_n) then (\hat{x}_1 ... \hat{x}_n) is a (pure) Nash equilibrium.  If D is \epsilon-coarse equilibrium then (\hat{x}_1 ... \hat{x}_n) is an n\epsilon-Nash equilibrium.

This theorem ensures a pure equilibrium in the game with convex strategy sets.  Looking at the case that the game is the mixed extension of a finite game, this is translated to a mixed equilibrium of the finite game.  Before we prove the theorem let’s look at a simple application: two-player zero-sum finite games.  Looking at the mixed extension we have u_1(y,z) = -u_2(y,z) = y^TAz, where, y and z are the mixed strategies of the two players and A is the m_1 \times m_2 matrix of utilities of player 1.  This game satsifies the two conditions since for any fixed y we have that u_1 is a linear function (hence convex) of z, dualy for player 2, and the sum u_1(y,z)+u_2(y_z)=0 is constant and thus concave.  This implies that if the two players in a zero-sum game run a regret minimization algorithm (on the mixed strategies) then taking the average of the distribution over time is a (\epsilon=O(1\sqrt{T})-)mixed Nash equilibrium.

Proof: We will prove the exact version, the quantative is identical, only taking care of the \epsilon‘s everywhere.  What we need to show is that for every x^*_i, it holds that u_i(\hat{x}_i,\hat{x}_{-i}) \ge u_i(x^*_i,\hat{x}_{-i}).  We will show this by showing the same inequality for the sum: \sum_i u_i(\hat{x}_i,\hat{x}_{-i}) \ge \sum_i u_i(x^*_i,\hat{x}_{-i}).  Notice that for the optimal x^*_i (the one maximizing u_i(x^*_i,\hat{x}_{-i}) we trivially have u_i(\hat{x}_i,\hat{x}_{-i}) \le u_i(x^*_i,\hat{x}_{-i}), and thus the inequality of the sum at the optimal x^*_i‘s implies the inequality (in fact equality) for each term seperately.  So we start with \sum_i u_i(\hat{x}_i,\hat{x}_{-i}) and make use of the concavity of the sum and the definition of \hat{x} = E[ x]  to conclude \sum_i u_i(\hat{x}_i,\hat{x}_{-i}) \ge E[\sum_i u_i(x_i, x_{-i})]=\sum_i E[u_i(x_i, x_{-i})] (using Jensen’s inequality.)  At this point we invoke the definition of coarse equilibrium for each i getting E[u_i(x_i, x_{-i})] \ge E[u_i(x^*_i,x_{-i})], and now using the convexity of u_i in x_{-i} (once x^*_i is fixed) we get E[u_i(x^*_i,x_{-i})] \ge u_i(x^*_i,\hat{x}_{-i}), chaining these inequalities we get the required inequality on the sums.

A Market Equilibrium

We can now take this general result and apply it to a rather general market model.  This  is due to Sergiu Hart and Andreu Mas-Colell.  (When I asked Sergiu how should I refer to this result, he said something like “had we been computer scientists this would have already been published in a conference; since we are economists these are preliminary thoughts that maybe will find their way into a paper within a few years”.)

Our market has m divisible goods and n players.  Each player i has  a concave valuation function v_i(x_1 ... x_m) that specifies his value for owning the bundle of goods with x_j fraction of each good j.  Each player i has an initial endowment of w_{ij} of each good j, and they may trade between themselves, reaching some final allocation, where each player i receives a fraction x_{ij} of each good j, where for each good j we have that \sum_{i} w_{ij} = \sum_{i} x_{ij}.  Significantly, we assume the existence of money that can be transferred between the players, formally “transferable utilities” or “quasi-linearity”.   In this model the notion of a market equilibrium is:

Definition: The allocation (x_{ij})  is at (transferable utility) equilibrium with prices (p_1 ... p_m) if for all goods j we have that \sum_{i} w_{ij} = \sum_{i} x_{ij} and for every player i, his net utility, v_i(x_{i1}...x_{im})-\sum_{j} (x_{ij}-w_{ij})p_j is maximized (over all possible bundles (x_{i1}...x_{im})).

We now turn this market into a game.  For this, we will require the minor technical assumption that the partial derivatives of the u_i‘s are bounded  from above by M and from below by 1/M for some constant M.  The game will have n+m players: n players that correspond to the original players in the market, with the strategy set of player i  being the set of  vectors (x_{i1}...x_{im}) with 0 \le x_{ij} \le \sum_i w_{ij}, and m players corresponding to the goods, where the strategies of “good-player” j are the possible prices 0 \le p_j \le M.  The utility of an “original-player” i is, as expected,  u_i(x,p)=v_i(x_{i1}...x_{im})- \sum_{j} p_j(x_{ij} - w_{ij}), while those of  “good-player” j is: u_j(x,p) = p_j \cdot (\sum_i x_{ij} - \sum_i w_{ij}).  The point is that this new game captures, as its Nash equilibrium, the equilibrium of the previous market.

Proposition: The allocation (x_{ij}) with prices (p_1 ... p_m) is an equilibrium of the market if and only if  it is a pure Nash equilibrium of the game defined above.

Proof: First notice that being a market equilibrium directly implies that all “original” players best-reply in the associated game.  It also implies that  \sum_{i} w_{ij} = \sum_{i} x_{ij} and thus any p_j is a best-reply for “good-player” j in the game.  The opposite direction is identical once we show that any pure Nash equilibrium of the game must have \sum_{i} w_{ij} = \sum_{i} x_{ij}.  Suppose that this was not the case, say \sum_{i} x_{ij} > \sum_{i} w_{ij} for some good j, then the only best response for “good-player” j would be to declare p_j=M, to which all other players could only best-respond with x_{ij}=0 due to the bound from above on the derivative of the u_i‘s, contradicting  \sum_{i} x_{ij} > \sum_{i} w_{ij}.  Similarly if \sum_{i} x_{ij} < \sum_{i} w_{ij} then the only best response for the “good-player” j would be p_j=0, to which all other players could only best-respond with x_{ij}=1 due to the bound from below on the derivative the the u_i‘s, contradicting  \sum_{i} x_{ij} < \sum_{i} w_{ij}.

At this point it is easy to see that the game we just defined is socially concave: for an “original” player i, and every fixed (x_{i1}...x_{im}), his utility is just a linear function of the p_j‘s and thus convex in the strategies of the others; for a “good-player” j and every fixed value of p_j, his utility is just a linear function in the x_{ij}‘s so again is convex in the strategies of the others.  Finally, summing up the utilities of all players, the p_j \cdot (\sum_i x_{ij} - \sum_i w_{ij}) terms cancel out and we are left with \sum_i v_i(x_{i1}...x_{im}) which is a concave function being the sum of concave v_i‘s.

So we now close the loop: to effectively find an \epsilon-equilibrium of a given market, we let each player in the market as well as artificially created  “good-players” run a regret-minimization algorithm for a while.  This is a very simple decentralized process: the original players attempt optimizing their demands to the sequence of ever-changing price vectors that they see, while the “good-players” optimize their prices to the changing aggregated demands that they see.  In both cases, not by a simple best-response but rather by the regret minimization response.  We then have that the long-term averages of demands and prices will be the desired market equilibrium point.

Advertisements

Read Full Post »

Perhaps the first complexity question regarding equilibria that a Computer Scientist can ask is how difficult — computationally — is it to find a pure Nash equilibrium of a given game, or report that none exists.

In the simple setting, the n-player game is given by it’s n utility functions, each of them given as a table of size m^n of numbers (where we assume for simplicity that each player has a strategy space of size m.)  By going over all possible m^n possible strategy profiles and checking for each of them whether some player can gain by deviating we get an efficient algorithm.  (Easy exercise: The running time of this algorithm is O(n^2 m^n) — improve it to linear in the input size, i.e to O(nm^n).)

This, however, is not the end of the story.  In many cases, we really have a “very large” game, and we would like to find an equilibrium in time which is significantly sub-linear in the size of the game’s utility tables.   How can we even talk about getting sub-linear time algorithms?  We need to read the input — don’t we?  There are two standard ways in Computer Science to talk about this question.  The concrete complexity approach assumes that the utility tables are given by some of “black box” which can answer queries of a certain type (e.g. given a strategy profile s, can return the value of u_i(s) as a single atomic operation).  The set of queries that the black box supports will of course determine what can be done efficiently in this model.  The Turing-Machine complexity approach will consider concise representations of the utility functions, whose length is significantly less than the full size of the tables, and expect the algorithm to find an answer in time that is polynomial in the size of the concise input.  This of course gives different algorithmic problems for different concise representations.

In the rest of this post I will present the basic study of the complexity of finding a pure Nash equilibrium in these models (for general games as well as some special cases.)   These basics are often overlooked as they are shadowed by the meaty PLS-completeness results and PPAD-completeness results, but I prefer  talking about the basics even though these contains no “significant” proofs.

Black Box Complexity

In the variant here we assume that we get n “black boxes”, one for each player, that can answer utility queries: given a strategy profile s return u_i(s).  Another variant we can consider would also allow asking best-reply queries: given a strategy profile of the others s_{-i}, return the best-reply s_i of player i.  (Where we would need to specify how ties are handled.)  We will concentrate on cases where m is not too large and we allow the running time to be polynomial in m, even though in some cases we could think about running in time which is logarithmic in m — the length needed to represent a strategy of a player. In these situations, as a best-reply query can easily be simulated by m utility queries to all possible values of s_i, adding a best-reply query doesn’t significantly change the power of the model.  (One can imagine even more queries to ask from the black-box, and the ultimate in this direction would be t allow asking any question about u_i, but only about a single u_i for each query.  This model turns out to be equivalent to the communication complexity model.)

We will consider not only general games, but also two sub-classes of games that are known to always have a pure-Nash equilibrium: Dominance-solvable games (those where iterated elimination of strictly dominated strategies leaves a single strategy profile), and Potential Games.  For motivation it is best to start with a “good” algorithm, that finds a pure Nash equilibrium in all Dominance-solvable games:

  1. Let s be an arbitrary initial strategy profile
  2. Repeat mn times
    1. For all players i = 1 ... n do
      1. s_{i} = best reply of i to s_{-i}.

The total running time of this algorithm is O(m^2 n^2) operations (which include utility queries to the black box, where each best-reply calculation is implemented via m utility queries.)  The interesting fact is that this algorithm always ends with a pure Nash equilibrium — as long as the game is dominance-solvable.  (Idea of easy proof: a strategy that was eliminated in the t‘th step of the strategy elimination process can never be used after t iterations of the outer loop of this algorithm.)  What is remarkable about this running time is that it is exponentially smaller than the “input size”, as each black box contains m^n numbers in it.

Can such an efficient algorithm — whose running time is polynomial in n and m — be found for other games?  For potential games?  Maybe even this algorithm — just repeated best-replying — will do the trick?  It is not difficult to see for general games repeated-best replying may loop infinitely even if some pure Nash equilibrium exists.  For potential games, almost by definition, every sequence of best-replies will indeed terminate with a pure Nash equilibrium.  (In fact, ordinal potential games are exactly those games where every sequence of better-replies terminates.)  But how long can this take?  Unfortunately, exponential time.  (Exercise: find an exact potential game with n players each having two strategies, where starting from some strategy profile s, every sequence of best-replies requires exponential time to reach a Nash equilibrium.)

So, can there be another efficient algorithm for general games or for potential games?  A simple observation is that in a potential game where the potential function has polynomial-sized range, any best-reply sequence will terminate withing this polynomial number of steps.  Can this be generalized even further?  One can formally prove a negative answer — a lower bound.  One of the simplest formalisms for this is using an “adversary argument”: instead of first choosing a game and then letting the algorithm access the utility functions via the black boxes, the adversary will make-up the game as he goes along.  Each time the hypothetical algorithm makes a query to the black-box, the adversary will answer it in some way that he thinks will be least helpful to the algorithm — making sure that for as long as possible the algorithm does not have sufficient information as to provide an answer.  The idea about this technique is that if the adversary can pull this off — i.e. keep answering in a consistent way such that a single answer is still not guaranteed — then there must be some fixed game for which the hypothetical algorithm was not able to produce a correct answer in time — exactly the game that is still consistent with the adversary’s answers.

For the case of general games it is easy to provide such an adversary.  Let the adversary fix in his mind some game without any pure Nash equilibrium. (For concreteness, say, 2 players playing matching pennies, and another n-2 players not caring at all, but still having two strategies each.  This gives a game with 2^n strategy profiles, whose structure is just many multiple copies of the 2×2 game of matching pennies.)  Whenever the algorithm makes a query, the adversary will just answer with the value of the fixed game.  Clearly the algorithm can never find a pure Nash equilibrium since none exists.  The point is that the algorithm cannot decide that no Nash equilibrium exists before making at least 2^n queries.  If even a single strategy profile was not queried, then it could be that at that strategy profile all players get high utility (where “high” is anything higher than any value in the original game) and thus it is a Nash equilibrium.

For the case of potential games the adversary must be somewhat more clever.  We will consider an n player game where each player has two strategies and thus the space of strategy profiles is the n-dimensional Boolean hypercube. Our adversary will produce an exact potential game, as will be evident by the fact that all players will have exactly the same utility at each strategy profile.  In such cases a pure Nash equilibrium is simply a local maximum of this utility function.  Let us start with the basic idea, that does not quite do the job: At the t‘th query of the algorithm, s^t, the adversary will answer u_1(s^t) = ... = u_n(s^t) = u(s^t)=t.  The idea is that the algorithm cannot find any Nash equilibrium this way since a strategy profile can be declared to be such only if all of its n neighbors (in the Boolean hypercube — corresponding to the possible n deviations by a single player) were already queried to have a lower utility than the vertex itself.  Since the adversary keeps answering with increasing values, the only way that this can happen is if the algorithm first “closes up” a strategy profile (vertex in the hypercube) by first asking all of its neighbors and only then the enclosed vertex.  So to fix it, we’ll improve the adversary: when asked a query s^t, the adversary will look at the set of un-accessed vertices of the hypercube and see whether any vertices are left disconnected from the “rest of the hypercube”.  Specifically, let us call a vertex “compromised” if its connected component (after removing all queries s^1 ... s^t from the hypercube) has size strictly less than 2^{n-1} (half the hypercube).  What our adversary will do is go over all newly compromised vertices, fixing their values in an order that increases from the farthest away from s^t to s^t itself.  This way a vertex is never assigned a value after all of its neighbors have, and thus no vertex can be guaranteed to be a Nash equilibrium until all vertices of the hypercube are either queried or compromised.  The only question is how many queries does this take?  Let Q be the set of queries and C be the set of all vertices compromised by Q, how large does Q have to be for the union to be at least half of the whole hypercube?  Combinatorially, Q must contain all of C‘s neighbors in the hypercube.  This leads us to ask for smallest possible size of the set of neighbors of a set of vertices in the hypercube — known as the isoperimetric problem for the Boolean hypercube.  This question is completely solved (see chapter 16 of Bollobas), and in particular it is known that for any subset of at most half of the hypercube, the size of the set of its neighbors is at least \Omega(1/\sqrt{n}) fraction of original set’s  size.  This means that |Q| \ge \Omega(|C|/\sqrt{n}), which implies that |Q| \ge \Omega(2^n/\sqrt{n}) — an exponential lower bound for the worst case running time of any algorithm that finds a pure Nash equilibrium for all potential games.  An immediate corollary is that there may be best-reply paths of this exponential length.

Turing-Machine complexity

We now move to the alternative way of accessing a “large” game, that of representing it concisely.  Clearly not every game has such a concise representation, but can we at least find and equilibrium efficiently for those that do?  Also, different representations may require different complexities — which concise representation should we consider?  The first one we will look at is the general representation of a black box by the code of a procedure, equivalently by a description of Turing machine. However,  we only want to consider TMs that run efficiently, e.g. in polynomial time, or alternatively count the running time of the TM as part of the input size.  The usual formalism for giving such a TM as input, is the equivalent requirement of a Boolean circuit that computes the black-box queries, and this will be our first concise representation:

Input: n boolean circuits, each computing a function u_i : (\{0,1\}^{\log m})^n \rightarrow \{0,1\}^t, where the input of each circuit is the index of each player’s strategy and  the output of each circuit is interpreted as a t-bit long integer specifying the utility.

Output (Decision version): does this game have a pure Nash equilibrium?

We would expect the running time to be polynomial in the input size: n,t, and the total size of the circuits.  As before let us concentrate on cases where m is not too large (i.e. also polynomial in these parameters).

So what is the complexity of this problem?  Generally speaking, the state of complexity theory is such that we get essentially no algorithmically-useful information from the circuit representation.  Thus, we would expect a polynomial time algorithm to exist for this generic representation only if such existed in the black-box model — which is doesn’t.  In this world we can easily prove an NP-completeness result.

Theorem: Existence of a pure Nash equilibrium in a game given by boolean circuits (as above) is NP-complete.

Proof: Containment in NP is by guessing the equilibrium point, and then verifying it by checking all m possible deviations for every player.  (This is the only place where we used m being small, otherwise the problem seems to be \Sigma_2-complete).  To prove NP-hardness we will provide a reduction from 3-SAT (I lost the reference for this reduction).  All players will have two strategies: 0 and 1.  For each variable we will have a player whose utility function is a constant 0.  For each clause we will have an additional player whose utility function gives him 1 if his strategy is the OR of its literals.  We will then have two additional “top” players whose utilities are as follows: if all clause players play 1, then both of these players get utility 1 whatever they play; otherwise, these players play “matching pennies” between themselves.  Note that in a pure Nash equilibrium, the variable players can play anything, the clause players must faithfully compute the clause values, and then the two “top” players can be in equilibrium if and only if all clause values are 1.

The next succinct representation we consider is one that aims to take advantage of the fact that in many games while there are many players, each player’s utility only depends on the strategies of a few others and not on everyone’s strategies.  In such cases, we can represent  the utility of each player as a full table of only the relevant players’ strategies.  In such games an important structure is obtained by looking at the graph  (or digraph) specifying for each two players whether the utility of one depends on  the strategy played by the other.  So here is the equilibrium problem using this “graphical games” succinct representation:

Input: A graph G and for each vertex i in it, a table u_i : \{1...m\}^{d_i+1} \rightarrow\mathbb{Q}, where d_i is the degree of vertex i, and the table specifies the utility of player i as a function of his strategy as well as those of his neighbors.

Output (Decision version): does this game have a pure Nash equilibrium?

We would expect the running time to be polynomial in the the total size of the tables, i.e polynomial in n and in m^d, where d = \max_i d_i.

However just a tiny variation of the NP-completeness proof above shows that this problem in NP-complete too.  The only players in the reduction above who had degree more than 3 are the two top players who were “computing” the “AND” of all the clauses.  To get a reduction where all players have degree at most 3, we will add an additional player for each clause who will compute the “and” of this clause with all preceding ones.  This will be done by giving him utility 1 if his strategy is indeed the “and” of the original clause-player of his clause and the new-clause-player of the preceding clause (and 0 otherwise).  Now the two “top” players will just need to depend on the new-clause-player of the last clause as they did before (either always wining or playing matching pennies).  Thus we get:

Theorem: Existence of a pure Nash equilibrium in graphical games (as above) is NP-complete.

A nice exercise is to exhibit a polynomial time algorithm for the special case that when the graph is in fact a tree.

Now let us move to the special case of exact potential games.  The problem specification (input and output) is exactly as defined above where the utility functions are given as Boolean circuits, except that we are guaranteed that the utility functions are those of an exact potential game.  (This condition may not be easy to verify in general, but our algorithm may take it as an assumption.)   We know that a pure equilibrium is exactly a local maximum of the potential function.  This potential function is uniquely defined, up to an additive constant, by the given utility functions: once the value of the potential function is known for some profile, P(s^0), we can change, in turn each s^0_i to an arbitrary s_i and the difference is completely specified by u_i.  Thus the potential function is defined by P(s)-P(s^0) = \sum_{i=1}^n [ P(s^0_1 ... s^0_i, s_{i+1} ... s_n) - P(s^0_1 ... s^0_{i-1}, s_i, ..., s_n)] =   \sum_{i=1}^n [u_i(s^0_1 ... s^0_i, s_{i+1} ... s_n) - u_i(s^0_1 ... s^0_{i-1}, s_i, ..., s_n)].  We see that the problem of finding a pure Nash equilibrium for potential games is equivalent to that of finding a local maximum of a given Boolean circuit, where “locality” is given by switching the values of one of the n sets of \log m bits in our problem definition. (Proof: the expression above is efficiently computable and thus our problem is no harder, and on the other hand a special case of our problem is where all u_i‘s are identical in which case our problem is equivalent to finding a local maximum.)

Now let us say a few words about the complexity of this local-maximum problem.  First, syntactically speaking we have an “NP search problem”: on input x (the utility functions) the task is to find some solution y (strategy profile) that satisfies R(x, y) (y is a local maximum of x), where R is computable in polynomial time.  The general class of such search problems is termed FNP,  and in our case we are talking about problems with a guarantee that such a y always exists, a subclass termed TFNP.  Now, semantically, there are various other problems that have a similar flavor of finding a local maximum (or minimum), and this class of problems, a subset of TFNP, was termed PLS (polynomial local search).  Some examples of such problems come from attempting to formalize what various local-improvement search heuristics obtain: suppose that you attempt finding a short traveling salesman route by gradually trying to switch the locations of pairs of cities along the route.  What do you get?  How long will this take?  The general class of problems in PLS is defined as those having polynomial-time computable procedures specifying a neighborhood relation and a quality score over solutions where the problem is to define a solution which is a local minimum (or maximum).  By these definitions, our problem is in this class PLS, and in fact the generic nature of our problem makes it PLS-complete.  So what does this PLS-completeness mean?  It means that the complexity of our problem is similar to the hardest problems in this class which seem to have intermediate difficulty between P and NP.  We do not know a polynomial time algorithm for them (and such an algorithm will require significantly new techniques since it doesn’t “relativize”), nor or they likely to be NP-complete (since that would also imply NP=coNP).

A special case of potential games is that of network congestion games.  In these we are given a (directed) graph, with n pairs of vertices (a_i,b_i) in it and, for each edge e, a sequence of n+1 costs: c_e(0) .... c_e(n).  This is a succinct representation of a congestion game: player i‘s set of strategies is the set of paths in the graph from a_i to b_i, and the cost (negative utility) of a player is the sum of c_e(k_e) over all edges e in his path, where k_e is the number of players whose strategy (path) contains the edge e.  Network congestion games are exact potential games, and have a pure Nash equilibrium.  Finding such an equilibrium turns out to be PLS-complete too, although this requires a complicated reduction.

Read Full Post »

In a previous post I discussed the communication complexity of reaching a pure Nash equilibrium.  The communication complexity model aims to capture the basic information transfer bottleneck between the different players in a games, abstracting away the question of incentives, and focusing on the need of communicating the different preferences (utilities) of the players that are assumed to initially be privately known.  The previous post introduced the  model of communication complexity and applied it to the question of finding a pure Nash equilibrium (if it exists).  The bottom line was that, in the general case, essentially all information about the utilities must be transferred and no “shortcuts” are possible.  In a multi-player game this amount of information is exponential in the number of players, implying that convergence to equilibrium, in general, is impractical, and special properties of the game must be used in order to reach equilibrium in reasonable time (one such property was demonstrated: dominance-solvable games).

This post discusses the issue of convergence to a mixed Nash equilibrium, as studied by Sergiu Hart and Yishay Mansour in How Long to Equilibrium? The Communication Complexity of Uncoupled Equilibrium Procedures.   The setting, as in my previous post, has each one of n players holding his utility function u_i : S_1 \times ... \times S_n \rightarrow \Re, where each S_j is the set of strategies of player j, and for ease of notation lets have |S_j|=m for all j.  We will assume that all utilities are finitely represented, i.e. are rational numbers, say with k-bit numerator and denominator.  Can a mixed Nash equilibrium be found by communicating significantly less than m^n \cdot k bits — the size of each utility function?  Maybe it can be done by communicating only a polynomial (in n, m, and k) bits?  This would be a necessary condition for efficient convergence of any  (uncoupled) dynamics between the players.

Before we look deeper into at this question, let us look at the similar problem of finding a correlated  equilibrium.  In this case it turns out that a polynomial amount of communication suffices and thus only a tiny fraction of the private information needs to be transferred. You can see a previous post of mine for background on correlated equilibrium.

The Communication Complexity of Correlated Equilibrium

Let us recall that a correlated equilibrium is a probability distribution on the strategy profiles p : S_1 \times ... \times S_n \rightarrow \Re that satisfies linear inequalities of the following form:  for every player i, and every two strategies of i, s_i, s'_i : \sum_{s_{-i}} p(s_i,s_{-i}) u_i(s_i,s_{-i}) \ge \sum_{s_{-i}} p(s_i,s_{-i}) u_i(s'_i,s_{-i}), where the sum ranges over all strategy profiles s_{-i} of the other players.  The interpretation is that s_i is indeed a best-reply to the conditional distribution p(s_i,\cdot) on s_{-i}.

[Edited on Oct 2nd: Albert Xin Jiang was kind enough to point out that the next paragraph is, well, wrong, so I’m striking it out.  In order to show that finding a correlated equilibrium can be done with low communication complexity, one must run the Ellipsoid algorithm on the dual LP (which has polynomial many variables) rather than on the primal (which has exponentially many variables).  How to do so effectively is shown in a paper by Papadimitriou and Roughgarden, and Hart and Mansour show that the algorithm can be implemented with low communication.]

As being a correlated equilibrium is defined by a set of linear inequalities, we can find a correlated equilibrium using linear programming.  Let us see how the players can jointly run the Ellipsoid LP algorithm while keeping the amount of communication in check.  The Ellipsoid algorithm runs in time polynomial in m, n, and k as long as it has access to a separation oracle.  Such an oracle must be able to answer queries of the following form: given an unfeasible point, in our case a candidate distribution p, it must find a constraint, in our case a player i and strategies s_i, s'_i, where the corresponding inequality is violated by p.  The main point is that each of the inequalities can be checked by a single player since it depends only on a single utility function.  Thus to implement the separation oracle, each player must either report a violated constraint or a single bit specifying that all his constraints are satisfied — all together taking O(n + \log m) bits of communication.  When the algorithm terminates — after a polynomial number of steps (polynomial in n, m, and k) — all players know the answer, which is a correlated equilibrium.   (Note that even though a correlated equilibrium is an exponential-sized object, it will turn out to have a polynomial-sized support.)

While this low-communication algorithm can not be viewed as natural dynamics that efficiently converge to a correlated equilibrium, it does point out that some dynamics that converge efficiently do exist.   The question of finding natural dynamics then gets more pressing and indeed it turns out that natural dynamics do exist as well: dynamics based on regret minimization (see my previous post.)

Mixed-Nash Equilibria

Now, once we have what to aim for — a similarly low-communication way of reaching a mixed-Nash equilibrium — let us look at the problem carefully again.  First, we must recall that Nash equilibria may be irrational even if all utilities are rational, so a Nash equilibrium can not be “printed” as binary numbers in any finite time.  Still, in our communication complexity formulation this shouldn’t be a problem since any representation of the equilibrium is in principle acceptable as long as it is uniquely determined by the communication.  Closely related to this is the fact that the representation of a Nash equilibrium in an n-player game may require exponentially many bits of precision, even if the utilities themselves have only short descriptions.  As before, while the required precision itself is not a lower bound on the communication complexity, Hart and Mansour do show how to use this precision to obtain a lower bound.  Below I give a different, somewhat stronger proof.

Let us start with the following 2-player bi-strategy game, where 0<r<1 is an arbitrary parameter.

r, 0                0, r

0, 1-r            1-r, 0

It is easy to see that the only Nash equilibrium of this game has each player choosing his second strategy with probability exactly r and his first with probability 1-r.  This family of games suffices for giving a tight lower bound for the special case of two-player (n=2) bi-strategy (m=2) games.

Lemma: The communication complexity of finding a Nash equilibrium in two-player bi-strategy games, where all utilities are given by k-bit integers, is \Theta(k).

The upper bound is trivial and the lower bound is implied by games of the previous form where r ranges over all fractions of the form r = x/2^k, for integer 0 <x < 2^k since each of these games has a different (unique) answer giving 2^k different such answers, which thus requires at least k bits of communication just to get all possibilities.

The dependence of the communication complexity on m, the number of strategies of each player, is still unclear.  However, Hart and Mansour show that the dependence on the number of players, n, is exponential.  The basic idea is that with n players we can get doubly-exponential  many different answers.  A clean way to show this is to “simulate” utilities of representation length k=2^{\Theta(n)}, and then invoke the previous bound.

Win-Lose Simulation Lemma: For every n-player game with utilities that are k-bit integers, there exists an (n + 3\log k)-player game, with the following properties:

  • The new game is a win-lose game, i.e. all utilities are 0 or 1.
  • Each of the first n players has the same set of strategies as in the original game.  The utility of each of these players is fully determined (in an easy manner) by his utility in the original game.
  • Each of the final 3 \log k players has 2 strategies.  The utilities of each of these are constants not depending on the original game at all.
  • The Nash equilibria of the new game are in 1-1 correspondence with those of the original game: the mixed strategies of each of the first n players are identical, while the strategies of the final 3\log k players are fixed constants independent of the game.

From this we immediately get our theorem.

Theorem: The communication complexity of finding a Nash equilibrium in n-player bi-strategy win-lose games is 2^{\Theta(n)}.

The upper bound is trivial and the lower bound is implied by the lower bound on 2-player games with k-bit utilities via this reduction using n=2+3\log k players.

Proof of Win-Lose Simulation lemma

It remains to prove the simulation lemma.  We will do so in three steps: first construct a fixed game whose unique equilibrium has exponentially low probabilities, then use this game to simulate games with utilities that are all powers of two, and finally use these to simulate general games with integer utilities.

The first step of the reduction is the following construction:

Construction: for each t there exist (2t)-player win-lose bi-strategy games such that the unique Nash equilibrium has, for each 1 \le i \le t, players 2i-1 and 2i choosing their first strategy with probability exactly 2^{-2^{i-1}}.

Proof: for t=1, this is exactly “matching pennies”, i.e. the 2 \times 2 game described above for r=1/2 (after scaling by a factor of 2).  Now for the induction step we start with t-1 pairs of players as promised by the induction hypothesis, and we keep these players’ utilities completely independent of the soon to be introduced 2 new players, so we know that in every Nash equilibrium of the currently constructed 2t-player game these will still mix exactly as stated by the fact above for t-1.  In particular the last pair of players in the (2t-2)-player game choose their first strategy with probability 2^{-2^{t-2}}.  The point is that with these players in place, we can easily simulate the 2 \times 2 game above for r=2^{-2^{t-1}}.  To simulate a “r entry”, we define the utility as 1 when the last pair of players in the (2t-2)-player game play their first strategy (which happens with probability 2^{-2^{t-2}} \times 2^{-2^{t-2}} = 2^{-2^{t-1}} = r) and 0 otherwise, while to define a 1-r entry we define the opposite.

Our next step is to simulate (in the sense of the lemma) games where all utilities are powers of two in the range 1, 2, 4 , ..., 2^k, equivalently, after scaling, in the range 1, 1/2, 1/4, ..., 2^{-k}.  This is done by adding to the original players t=2\log k players as defined in the construction above.  We can now replace each utility of the original players of the form 2^{-x} where x‘s binary representation is x = \sum_{i=1}^{t} x_i 2^{i-1} with a utility of 1 whenever, for every i with x_i=1, new player 2i plays his first strategy,(and 0 otherwise).  This happens with probability \Pi_{i|x_i=1} 2^{-2^{i-1}} = 2^{-\sum_{i |x_i=1} 2^{i-1}} = 2^{-x}, as needed.

Our final step is simulating games with utilities that are general k-bit integers by games that have all their utilities powers of two, as in the previous step.  This is done by adding another \log k bi-strategy players, paired into \log k/2 independent “matching pennies” games, and thus each of them always evenly mixes between his two strategies.  To simulate a utility of y=\sum_{i=1}^k y_i 2^{i-1} in the original game, we give, in the new game, a utility of y_i 2^{i-1} whenever the \log k new players played a sequence of strategies which is the binary representation of i (and 0 otherwise). The expected value is exactly y/k, completing the simulation (as the scaling by a factor of k does not matter).

What remains?

The previous lower bound leaves much to be desired: while it does give an exponential lower bound in the input parameters, this is done at the cost of having an exponentially long output.  If we also take the output length into account (in any representation) then the bound is no more than linear — trivial.  Indeed, in the formal description of the computational Nash equilibrium problem, one formally has an input parameter \epsilon that specifies how close to an equilibrium do we demand the output to be, and the algorithm is allowed to run in time that is polynomial in \log \epsilon.  (See a previous post on the subtleties of this approximation parameter \epsilon.)  Taking this approach, the lower bound is trivial too as it is bounded from above by \log\epsilon.

Thus, the main open problem that remains is that of determining the communication complexity of finding a mixed-Nash equilibrium when the precision required at the output is only polynomial in the other parameters.

Read Full Post »

Motivation

The dogma of game theory (and economics) is that strategic situations will reach equilibrium — one way or another. How this will happen is often left un-answered, but from a computational point of view this is quite a central question. Indeed, a great amount of work has been done regarding the computational complexity of reaching an equilibrium of a game, reaching a conclusion that an infeasible amount of computation might be needed. In this post I want to discuss another aspect of the complexity of reaching equilibrium: the amount of information that needs to be transferred between the different players. The point of view here is that initially each player “knows” only his own utility function u_i and then the players somehow interact until they find an equilibrium of the game (u_1 .... u_n) defined by their joint utilities. The analysis here abstracts away the different incentives of the players (the usual focus of attention in game theory) and is only concerned with the amount of information transfer required between the players. I.e., we treat this as a purely distributed computation question where the input (u_1 ... u_n) is distributed between the players who all just want to jointly compute an equilibrium point. We assume perfect cooperation and prior coordination between the players, i.e. that all follow a predetermined protocol that was devised to compute the required output with minimum amount of communication. The advantage of this model is that it provides lower bounds for all sorts of processes and dynamics and that eventually should reach equilibria: if even under perfect cooperation much communication is required for reaching equilibrium, then certainly whatever individual strategy each player follows, convergence to equilibrium cannot happen quickly.

The main result that we shall give in this post is that finding a pure Nash equilibrium (or determining that none exists) requires essentially communicating everything about the utility functions — no shortcuts are possible. This is the basic result that appears in two papers that studied the issue: Communication complexity as a lower bound for learning in games by Vincent Conitzer and Tuomas Sandholm that considered two-player games and How Long to Equilibrium? The Communication Complexity of Uncoupled Equilibrium Procedures by Sergiu Hart and Yishay Mansour whose focus is multi-player games. See also the presentation by Sergiu Hart. A future post will discuss the much more complicated case of finding a mixed Nash equilibrium, a question which is still mostly open, despite results given in the H&M paper.

The Communication Complexity model

The basic communication complexity model considers n players, where each player i holds part of the input which we assume is an N-bit string, x^i \in \{0,1\}^N. The usual focus is on the two-player case, n=2, in which case we simplify the notation to use x and y rather than x^1 and x^2. The joint goal of these players is to compute the value of f(x^1 ... x^n) where f : (\{0,1\}^N)^n \rightarrow Z is some predetermined function. As f(x^1 ... x^n) depends on all values of x^i, the players will need to communicate with each other in order to do so (we will use “broadcasting” rather than player-to-player channels), and they will do so according to a predetermined protocol. Such a protocol specifies when each of them speaks and what he says, where the main point is that each player’s behavior must be a function of his own input as well as what was broadcast so far. We will assume that all communication is in bits (any finite alphabet can be represented in bits). The communication complexity of f is the minimum amount of bits of communication that a protocol for f uses in the worst case. This model has been introduced in a seminal paper by Andy Yao, the standard reference is still the (just slightly dated) book Communication Complexity I wrote with Eyal Kushilevtiz, and online references include the wikipedia article, and the chapter from the Barak-Arora computational complexity book.

.

The Disjointness Function

Our interest will be in lower bounds on the communication complexity. There are several techniques known for proving such lower bounds, the easiest of which is the “fooling set method” which we will use on the “disjointness function” for two players. The disjointness function, introduced in Yao’s original paper, plays a key role in communication complexity, and indeed it will turn out to be useful to us too: once we have a lower bound for it, it will be easy to easily deduce the lower bound for the function of our interest, finding a pure Nash equilibrium, by simple reduction.

The disjointness function: Alice holds x \in \{0,1\}^N and Bob holds y \in \{0,1\}^N. They need to determine whether there exists an index i where they both have 1, x_i=1=y_i. If we view x as specifying a subset S \subseteq \{1...N\} and y as specifying T \subseteq \{1 ... N\}, then we are asking whether S \cap T = \emptyset.

Lemma: Any deterministic protocol for solving the disjointness problem requires at least N bits of communication in the worst case.

Before proving this lemma, notice that it is almost tight, as Alice can always send all her input to Bob (N bits) who then has all the information for determining the answer, who then sends it back to Alice (1 more bit). It is known that a \Omega(N) lower bound also applies to randomized protocols, but the proof for the randomized case is harder, while the proof for the deterministic case is easy.

Proof: Let us look at the 2^N different input pairs of the form \{<x,y> | \forall i,\: x_i+y_i=1\} (these are the “maximal disjoint” inputs). We will show that for no two of these input pairs can the protocol produce the same communication transcript (i.e. the exact same sequence of bits is sent throughout the protocol). It follows that the protocol must have at least 2^N different possible transcripts, and since these are all binary sequences, at least one of them must be of length at least N bits.

Now comes the main point: why can’t two maximally disjoint pairs <x,y> and <x',y'> have the same transcript? Had we been in a situation that f(x,y) \ne f(x',y') then we would know that the transcripts must be different since the protocol must output a different answer in the two cases; however in our case all “maximal disjoint” input pairs give the same answer: “disjoint”. We use the structure of communication protocols — that each player’s actions can only depend on his own input (and on the communication so far) — to show that if <x,y> and <x',y'> have the same transcript then <x,y'> has the same transcript too (and so does <x',y>). Consider the players communicating on inputs <x,y'>: Alice, that only sees x, cannot distinguish this from the case of <x,y> and Bob, that only sees y' cannot distinguish it from the case <x',y'>. Thus when Alice communicates she will not deviate from what she does on of <x,y> and Bob will not deviate from what he does on <x',y'>. Thus, none of them can be the first to deviate from the joint transcript, thus none of them ever does deviate and the joint transcript is also followed on <x,y'>. We now get our contradiction since either x and y' are not disjoint or x' and y are not disjoint, and thus at least one of these pairs cannot not share the same transcript, a contradiction.   (The reason for non disjointness is that since x \ne x' there must be an index i with x_i \ne x'_i. If x_i = 1 and x'_i=0 then y_i = 0 and y'_i=1 in which case x and y' are not disjoint. Similarly, if x'_i = 1 and x_i=0 then y'_i = 0 and y_i=1 in which case x' and y are not disjoint.)

Finding a Pure Nash Equilibrium

Back to our setting where each player i holds his own utility function u_i : S_1 \times ... S_n \rightarrow \Re, where the S_i‘s are the commonly known strategy sets, which we will assume have size m each. Thus each player’s input consists of m^n real numbers, which we will always assume are in some small integer range. Perhaps it is best to start with a non-trivial (just) upper bound.  Let us consider the class of games that are (strictly) dominance solvable, i.e. those that iterated elimination of strictly dominated strategies leaves a single strategy profile, which is obviously a pure Nash equilibrium.  A simple protocol to find this equilibrium is the following:

  • Repeat mn times:
    • For i = 1 ...n do
      • If player i has a strategy that is dominated after the elimination of all previously removed strategies, then announce it, else say “pass”.
  • Output the (single) profile of non-eliminated strategies.

What is remarkable about this protocol is that number of bits communicated is polynomial in n and m, which may be exponentially smaller than  the total size of the input which is O(m^n) numbers for each player.  Can some protocol like this be designed for all games that have a pure Nash equilibrium?  The basic answer in no:

Theorem: Every deterministic (or randomized) protocol that recognizes whether the given game has a pure Nash equilibrium requires \Omega(m^n) bits of communication.

Note that this theorem implies in particular that one cannot design a more efficient protocol that finds an equilibrium just for games that are guaranteed to have one, as adding a verification stage (where each player checks whether the strategy suggested to him is indeed a best reply to those suggested to the others) would convert it to a protocol that recognizes the existence of a pure Nash.

Proof: The proof is by a reduction from the disjointness problem.  We will show that a protocol for determining the existence of a pure Nash equilibrium for games with n players each having m strategies can be used to solve the disjointness problem on strings of length N = \Omega(m^n).

Let us start with the case n=2 and solve disjointness for length N=(m-2)^2.  The idea is very simple: Alice will build a utility matrix u_A by filling it with the values of its bit string x in some arbitrary but fixed order, and Bob will build a matrix u_B from y using the same order.  Now, any cell of the two matrices where x_i=y_i=1 will turn out to be a Nash equilibrium since both players get the highest utility possible in this matrix.  For this idea to formally work, we just need to make sure that no other cell is a Nash equilibrium.  This can be done in one of several ways (and different ones are used in the C&S and H&M papers), the simplest of which is adding two more rows and columns as follows:

CodeCogsEqn (1)

With the addition of these rows and columns, each player’s best-reply always obtains a value of 1, and thus only a (1,1) entry may be a Nash equilibrium.

Now for general n.  We add to Alice and Bob n-2 players that have utilities that are identically 0.  Thus any strategy will be a best reply for these new players, and a pure Nash equilibrium is determined solely by Alice and Bob.  The main change that the new players bring is that the utility functions of Alice and Bob are now n-dimensional matrices of total size m^n.  Thus Alice and Bob can fold their length N bit string into this larger table (as before keeping two strategies from their own strategy space to ensure no “none-(1,1)” equilibrium points), allowing to answer disjointness on N=(m-2)^2 \cdot m^{n-2}-bit long vectors.

Variants

There are several interesting variants on the basic question:

  1. The lower bound uses heavily the fact that the utilities of different strategies may be equal and thus there are many possible best replies.  Would finding a Nash equilibrium be easier for games in “general position” — specifically if there was always a single best reply for each player?  For the case of n=2 players, C&S show that the complexity does go down to \Theta(m \log m).  (Exercise: show that the randomized communication complexity in this case is \Theta(m).)  As n grows, the savings are not exponential though and H&M show a 2^{\Omega(n)} lower bound for constant m.  A small gap remains.
  2. We have seen that finding an equilibrium of a dominance solvable game is easy.  Another family that is guaranteed to posses a pure equilibrium is potential games.  However, H&M show that finding a Nash equilibrium in an ordinal potential game may require exponential communication.  I don’t know whether things are better for exact potential games.
  3. The efficient protocol for dominance solvable games was certainly artificial.  However, an efficient natural protocol exists as well: if players just best reply in round robin fashion then they will quickly converge to a pure Nash equilibrium.  (This seems to be well known, but the only link that I know of is to one of my papers.)

Read Full Post »

Undergraduate algorithms courses typically discuss the maximum matching problem in bipartite graphs and present algorithms that are based on the alternating paths (Hungarian) method.  This is true in the standard CLR book as well as in the newer KT book (and implicitly in the new DPV book that just gives the reduction to max-flow.)  There is an alternative auction-like algorithm originally due to Demange, Gale, and Sotomayor that is not well known in the CS community despite being even simpler.  The algorithm naturally applies also to the weighted version, sometimes termed the assignment problem, and this is how we will present it.

Input: A weighted bipartite graph, with non-negative integer weights.  We will denote the vertices on one side of the graph by B (bidders) and on the other side by G (goods).  The weight between a bidder i and a good j is denoted by w_{ij}.  We interpret w_{ij} as quantifying the amount that bidder i values good j.

Output: A matching M with maximum total weight \sum_{(i,j) \in M} w_{ij}.  A matching is a subset of B \times G such that no bidder and no good appear more than once in it.

The special case where w_{ij} \in \{0,1\} is the usual maximum matching problem.

Algorithm:

Initialization:

  1. For each good j, set p_j \leftarrow 0 and owner_j \leftarrow null.
  2. Initialize a queue Q to contain all bidders i.
  3. Fix \delta = 1/(n_g+1), where n_g is the number of goods.

While Q is not empty do:

  1. i \leftarrow Q.deque().
  2. Find j that maximizes w_{ij} - p_j.
  3. If w_{ij} - p_j \ge 0 then
    1. Enque current owner_j into Q.
    2. owner_j \leftarrow i.
    3. p_j \leftarrow p_j + \delta.

Output: the set of (owner_j, j) for all j.

Correctness: The proof of correctness is based on showing that the algorithm gets into an “equilibrium”, a situation where all bidders “are happy”.

Definition: We say that bidder i is \delta-happy if one of the following is true:

  1. For some good j, owner_j=i and for all goods j’ we have that \delta + w_{ij}-p_j \ge w_{ij'}-p_{j'}.
  2. For no good j does it hold that owner_j=i and  for all goods j we have that that w_{ij} \le p_{j}.

The key loop invariant is that all bidders, except those that are in Q, are \delta-happy.  This is true at the beginning since Q is initialized to all bidders.  For the bidder i dequeued in an iteration, the loop exactly chooses the j that makes him happy, if such j exists, and the \delta-error is due to the final increase in p_j.  The main point is that this iteration cannot hurt the invariant for any other i’: any increase in p_j for j that is not owned by i’ does not hurt the inequality while an increase for the j that was owned by i’ immediately enqueues i’.

The running time analysis below implies that the algorithm terminates, at which point Q must be happy and thus all bidders must be \delta-happy.

Lemma: if all bidders are \delta-happy then for every matching M’ we have that n\delta + \sum_{i=owner_j} w_{ij} \ge \sum_{(i,j) \in M'} w_{ij}.

Before proving this lemma, we notice that this implies the correctness of the algorithm since by our choice of \delta, we have that n\delta < 1, and as all weights are integers, this implies that our matching does in fact have maximum weight.

We now prove the lemma.  Fix a bidder i, let j denote the good that he got from the algorithm and let j’ be the good he gets in M’ (possibly j=null or j’=null).  Since i is happy we have that \delta + w_{ij}-p_j \ge w_{ij'}-p_{j'} (with a notational convention that w_{i,null}=0 and p_{null}=0, which takes care also of case 2 in the definition of happy)  Summing up over all i we get \sum_{i=owner_j} (\delta + w_{ij}-p_j) \ge \sum_{(i,j') \in M'} (w_{ij'}-p_{j'}).  Now notice that since both the algorithm and M’ give matchings, each j appears at most once on the left hand side and at most once on the right hand side.  More over if some j does not appear on the left hand side then it was never picked by the algorithm and thus p_j=0.  Thus when we subtract \sum_j p_j from both sides of the inequality, the LHS becomes the LHS of the inequality in the lemma and the RHS becomes at most the RHS of the inequality in the lemma.  QED.

Running Time Analysis:

Each time the main loop is repeated, some p_j is increased by \delta or some bidder is removed from Q forever.  No p_j can ever increase once its value is above C = max_{i,j} w_{ij}.  It follows that the total number of iterations of the main loop is at most Cn/\delta = O(Cn^2) where n is the total number of vertices (goods+bidders).  Each loop can be trivially implemented in O(n) time, giving total running time of O(Cn^3), which for the unweighted case, C=1, matches the running time of the basic alternating paths algorithm on dense graphs.

For non-dense graphs, with only m=o(n^2) edges (where an edge is a non-zero w_{ij}), we can improve the running time by using a better data structure.  Each vertex maintains a priority que of goods ordered according to the value of w_{ij} - p_j.  Whenever some p_j is increased, all bidders that have an edge to this j need to update the value in the priority queue.  Thus an increase in p_j requires d_j priority queue operations, where d_j is the degree of j. Since each p_j is increased at most C/\delta = O(Cn) times, and since \sum_j d_j =m we get a total of O(Cmn) priority queue operations.  Using a heap to implement the priority queue takes O(\log n) per operation.  However, for our usage, an implementation using an array of linked lists gives O(1) amortized time per operation: entry t of the array contains all j such that w_{ij} - p_j = t\delta, updating the value of j requires moving it down one place in the array, and finding the maximum w_{ij} - p_j is done by marching down the array to find the next non empty entry (this is the only amortized part).  All in all, the running time for the unweighted case is O(mn).

Additional comments:

  • As shown by DGS, a similar procedure terminates with close to VCG prices, which are also the point-wise minimum equilibrium prices.
  • The algorithm was presented for the assignment problem where bidders never desire more than a single item.  It does work more generally as long as bidders are “gross substitutes”.
  • The algorithm, like many auctions, can be viewed as a primal-dual algorithm for the associated linear program.
  • Choosing a small fixed value of, say, \delta=0.01 gives a linear time 1.01-approximation for the maximum matching.
  • Choosing the value \delta = 1/\sqrt{n} gives a matching that misses at most \sqrt{n} edges, that can then be added using \sqrt{n} alternating path computations, for a total running time of O(m \sqrt{n}).
  • Many algorithmic variants were studied by Dimitry Bertsekas.
  • A wider economic context appears in this book.

Read Full Post »

The basic computational problem in algorithmic game theory is that of computing a (mixed) Nash equilibrium of a given (non-cooperative) game. The computational complexity status of this problem has been recently settled by showing that it is “PPAD-complete” and thus “unlikely” to be efficiently computable (see this link for a nice overview). This may be considered as not very satisfying answer due to our incomplete understanding of the class PPAD, but at least the problem is now reduced to be a purely computational complexity one with no game theory aspects to it and no special role for the Nash problem. The main related problem remaining is whether approximating a Nash equilibrium may be done efficiently. This post will describe the current status of the approximation problem.

The approximate-Nash problem

Let us first settle on some notation. We are talking about a n-player game, where each player i has m_i possible strategies, each player has a utility function u_i :  \{1..m_1\} \times \{1..m_2\} \times ... \times \{1..m_n\} \rightarrow  \Re. We will denote by \Delta_i the set of probability distributions on \{ 1 ... m_i\} and for x^1 \in \Delta_1 ... x^n \in \Delta_n we use u_i(x^1 ... x^n) as a shorthand for \sum_{j_1, ... , j_n} \Pi_i x^i_{j_i} u_i(j_1 ... j_n), the expected value of u_i(j_1 ... j_n) where each j_i is chosen at random according to the probability x^i.

We know by Nash’s theorem that a mixed Nash equilibrium of the game exists and the computational problem is to find one. Unfortunately, it is also known that when n \ge 3 then the Nash equilibria may all be irrational numbers. We thus cannot just ask our algorithm to output a Nash equilibrium since it may not have a finite representation and so we need to settle for approximation in order to even define the “exact” problem. So here is the carefully stated common definition of the Nash equilibrium computation problem:

Input: The input is given by n tables u_1 ... u_n each of dimensions m_1 \times m_2 \times ... \times m_n. For finiteness of representation, each of the n \Pi_i m_i numbers in the input is a rational number given by a d-digit numerator and denominator. We are also give a rational number \epsilon>0 (with d-digit numerator and denominator).

Output: n rational vectors x^1 \in \Delta_1, ..., x^n \in \Delta_n  that are an \epsilon-Nash equilibrium.

Definition: x^1, ..., x^n are \epsilon-Nash of the game defined by u_1 ... u_n if for every i and every x^{i*} \in \Delta_i we have that u_i(x^i, x^{-i}) \ge u_i(x^{i*},x^{-i}) - \epsilon.

Running time: we want the running time to be polynomial in the input size, i.e. polynomial in n, d, and \Pi_i m_i.

It is quite useful to look carefully at various choices made here, and what we would get with slightly different choices:

  1. Allowing any Nash equilibrium: our definition allowed any (\epsilon)-Nash rather than specifying which one we desire in case multiple ones exist. It is known that most ways of attempting to require a specific Nash point make the problem harder, NP-complete.
  2. Approximation: as mentioned above when n \ge 3 we need to settle for a \epsilon-approximation just to have a finite output. For n=2 there always exists an equilibrium which has rational numbers with poly(d, n, \Pi_i m_i)-digit long numerator and denominator, and thus the exact version is equivalent to the problem as stated.
  3. Approximation of best-response condition: our notion for approximation relaxed the best-response condition by \epsilon rather than asking for some rational point that is \epsilon-close to an exact Nash equilibrium point. The latter condition seems to be much harder and in not even known to be in NP (or in fact even in the polynomial time hierarchy) and is treated at length by this 57-page paper by Mihalis Yannakakis and Kousha Etessami. The crux of the difficulty is that getting some grip on an exact Nash point seems to require “infinite precision” arithmetic — the same type of problem encountered in trying to determine whether \sum_i \sqrt{x_i} > T (discussed by Richard Lipton and see also this related post of his). (Thanks to Kousha Etessami for explaining these delicate issues to me.)
  4. Having \epsilon be given in binary: in our definition \epsilon was given by d-digit numerator and denominator and the running time was required to be polynomial in d, i.e. in \log \epsilon^{-1}. An alternative version would be to require \epsilon to be “given in unary” i.e. allow the algorithm to run in time polynomial in \epsilon^{-1}. This version is asking for a fully polynomial approximation scheme (FPTAS) and could be easier. However, it turns out that getting a FPTAS is also PPAD-hard and thus this is actually not the case.
  5. Additive approximation: Our condition demands u_i(x^i, x^{-i}) \ge u_i(x^{i*},x^{-i}) - \epsilon. We can not use the more natural relative error u_i(x^i, x^{-i}) \ge u_i(x^{i*},x^{-i}) (1 - \epsilon) as the Nash equilibrium point u_i may be arbitrarily close to zero which may again require an exponential-length answer.

At this point we can start talking about the approximate version of this problem. For this we will allow an arbitrary dependence of the running time on \epsilon and even allow \epsilon be fixed (e.g. \epsilon=0.1) . As the original problem scales, for the approximation to make sense, we need to normalize \epsilon by the sizes of the numbers in the probem.

\epsilon-Approximation Problem: This is the same problem as above but scaled so that the utilities all satisfy 0 \le u_i(j_1 ... j_n) \le 1.

A Quasi-polynomial-time Algorithm

The key result by Richrad Lipton, Evangelos Marakakis, and Aranyak Metha is that the approximation problem can be solved in “quasi-polynomial” time — in this case time N^{\log N} where N is the input size. The algorithm is very simple and is based on the existence of approximate equilibria with strategies that have only small support.  In many “non-uniform” models, randomization over a large domain can be replaced by randomization over a much smaller domain.  (One nice example is the simulation of public coins by private ones in communication complexity.) This is the situation here and some background is given in a recent blog post by Richard Lipton.

Definition: Let k be an integer, then a distribution \tilde{x}^i \in \Delta_i is k-simple if all its coordinates are integer multiples of 1/k. I.e. it is a uniform distribution on a multi-set of size k of pure strategies. A random k-simplification of x^i is obtained by choosing at random k elements (with repetitions) according to the distribution specified by x^i and then taking the uniform distribution on this multiset.

Proposition: For every x^1 \in \Delta_1 and fixed pure strategies j_2 ... j_n, if we chose a random k-simplification \tilde{x}^1 of x^1 then Pr[|u_i(\tilde{x}^1,j_2,...,j_n) - u_i(x^1,j_2,...,j_n)| > \epsilon] \le e^{-\Omega(k\epsilon^2)}.

The proof is by observing that u_i(x^1,j_2,...,j_n) is the expected value of  u_i(j_1,j_2,...,j_n)  when j_1 is chosen according to x_1, while u_i(\tilde{x}^1,j_2,...,j_n) is the average value of  u_i(j_1,j_2,...,j_n) over k random choices of j_1 according to x_1, and thus the proposition is just a direct use of the Hoeffding tail inequalities.

Now, if we choose k = O(\log N / \epsilon^2), where N=\Pi_i m_i, then, with high probability (due to the union bound), the previous proposition holds for all pure strategies j_2 ... j_n and all u_i for i = 1 ...n .  In this case, by averaging, it also holds for all mixed strategies x^2 ... x^n:

Definition: \tilde{x^1} is an \epsilon-approximation of x^1 (relative to u_1 ... u_n) if for all x^{2} \in \Delta_2 .... x^{n} \in \Delta_n and all i we have |u_i(\tilde{x}^1,x^{2},...,x^{n}) - u_i(x^1,x^{2},...,x^{n})| \le \epsilon.

Corollary: For every x^1 \in \Delta_1 there exists an \epsilon-approximation \tilde{x}^1 which is k-simple, for k = O(\log N / \epsilon^2).

The main lemma now follows:

Lemma: Every game has an \epsilon-Nash equilibrium where all player strategies are k-simple, with k = O(\log N / \epsilon^2).

The proof starts with any Nash equilibrium x^1 ... x^n and chooses a k-simple \epsilon'-approximation \tilde{x}^i for each x^i (where \epsilon'=\epsilon/(2n)).  Since x^1 ... x^n are an exact Nash equilibrium, in order to show that \tilde{x^1} ... \tilde{x^n} are an approximate equilibrium, it suffices to show that changing the x^i‘s to \tilde{x}^i‘s changes things by at most \epsilon — but this is exactly what being an \epsilon'-approximation means for replacing a single x^i by \tilde{x}^i, and the approximation errors \epsilon'‘s simply add up.

Once the existence of a k-simple \epsilon-equilibrium is known, an O(n^k) algorithm follows simply by exhaustive search, and thus we get our algorithm.

Notice that we have actually shown a stronger theorem: not only can we find some \epsilon-approximation but in fact we can find an approximation to any equilibrium, and our approximated equilibrium maintains also player’s utilities.  This for example also provides an approximate equilibrium with approximately optimal social welfare.

A polynomial time algorithm?

The existence of a quasi-polynomial time algorithm may often be viewed as a hint that a polynomial-time algorithm exists as well.  In our case the main open problem is whether such a polynomial time algorithm exists for every fixed value of \epsilon>0.  The upper bounds for the approximation problem are rater weak and the best that is known is a polynomial time approximation for \epsilon=0.3393....

One possible approach is to see whether the parameter k above may be reduced to being constant when \epsilon is constant.  This however is not the case.  Consider a zero-sum two-player game where each utility is chosen at random to be 0 or 1.  The following statements can be easily shown to hold with high probability over the random choice of the game: (a) for each row or column, the fraction of 1-entries is very close to 1/2 and thus a value of close to 1/2 can be obtained by any player by using a uniform distribution on his rows or columns (b) for any k=o(\log n) rows there exists a single column which has all 0 entries in these rows.  Thus any mixed strategy that is o(\log n)-simple cannot ensure positive value.  It follows that there does not exists a \epsilon-approximate Nash which is k-simple for any \epsilon<1/2 and k = o(\log n).  A recent paper of Constantinos Daskalakis and Christos Papadimitriou generalizes this impossibility not only to simples strategies but to more general “oblivious” algorithms.

Another difficulty is shown by a recent paper of Elad Hazan and Robert Krauthgamer.  They consider the harder problem of approximating a specific Nash equilibrium — specifically the one with highest social welfare — for which the technique above also provides a quasi-polynomial-time algorithm.  They show that this problem is at least as hard as the known problem of finding a small randomly planted clique in a graph, providing some evidence for its difficulty.


Read Full Post »

What will happen in a given strategic situation?  The dogma of game theory — which models strategic situations — is that some kind of equilibrium will be reached.  Taking a wide enough definition of equilibrium, the dogma seems fine.  Taking a narrower look and just considering the Nash equilibrium — either pure (if it exists) or mixed — may be more questionable.  In 1974 Robert Aumann suggested a generalization of Nash equilibrium termed correlated equilibrium.  This this type of equilibrium is very appealing due to several reasons, and this post presents these from a CS-ey perspective.

What is Correlated equilibrium?

For ease of notation let us stick with two players, Alice and Bob (everything does extend to an arbitrary number of players).  A game between Alice and Bob is given by a pair of n \times m matrices A=(a_{ij}) and B=(b_{ij}),  where 1 \le i \le n and 1 \le j \le m, i.e. Alice has n strategies and Bob has m strategies.  a_{ij} denotes Alice’s utility when she plays i and Bob plays j, and b_{ij} denotes Bob’s utility in this case.  We will start with the simplest notions of equilibria and proceed to correlated equilibria.  In all cases equilibrium means that each player is best-replying to the other, so it suffices to define what does best-replying mean in each notion.

Let us start by defining the pure Nash equilibrium: a pure strategy of  Alice i is a best response to a pure strategy j of Bob if for all i' we have that a_{ij} \ge a_{i'j}.

The next step allows players to play mixed strategies rather than pure ones, where a mixed strategy is simply a probability distribution over pure strategies. Denote by \Delta_n the set of mixed strategies (\sum_i x_i = 1 and for all i = 1 ... n, x_i \ge 0), then a mixed strategy x \in \Delta_n is a best reply to a mixed strategy y \in \Delta_m if for every mixed strategy x' we have that \sum_{ij} x_i a_{ij} y_j \ge \sum_{ij} x'_i a_{ij} y_j.  (This turns out to be equivalent to every pure strategy in the support of x being at least as good against y as every other pure strategy.)

The mixed-Nash equilibrium allows each player to randomize his own actions.  The basic idea of correlated equilibrium is to allow joint randomization between Alice and Bob.  We imagine a third trusted party choosing a pair (i,j) according to some distribution p on pairs, and telling i to Alice (only) and telling j to Bob.    When can we say that i is indeed a best reply to the information that Alice has about Bob’s action?   Well, when Alice gets i, she may assume the conditional distribution over Bob’s strategies: Pr[j|i] = p_{ij} / (\sum_j p_{ij}).  Alice is best responding if i is indeed the best response to this distribution, i.e. if for all i' we have that \sum_j p_{ij} a_{ij} \ge \sum_j p_{ij} a_{i'j}.  (Dividing both sides by \sum_j p_{ij} gives exactly the conditional probabilities).  The distribution p is called a correlated equilibrium if every i is a best response for Alice, and every j a best response for Bob.

Notice that a Nash equilibrium is a special case: when (p_{ij}) is a product distribution p_{ij}=x_i y_j.   In this case the previous best-reply constraints become x_i \sum_j y_j a_{ij} \ge x_i \sum_j y_j a_{i'j}, which means that each pure strategy i in the support (x_i>0) it is a best reply to y\sum_j y_j a_{ij} \ge \sum_j y_j a_{i'j}.  In particular this implies existence of a correlated equilibrium in every finite game.

So what are the good things about this notion of equilibrium relative to Nash equilibrium?

  1. It can be more efficient than Nash
  2. It can be implemented like Nash
  3. It can be efficiently computed unlike Nash
  4. It can be efficiently learned unlike Nash

Efficiency

It is well known that the Nash equilibrium need not give an efficient outcome. The following game of chicken serves as the standard example that the correlated equilibrium may be more efficient:

.                                  Bob

.                        Dare          Chicken

.      Dare          0, 0             4, 1

Alice

.      Chicken    1, 4              3, 3

This game has two pure equilibria (C,D) and (D,C) as well as a single mixed Nash equilibrium where both Alice and Bob choose to Dare with probability of exactly 1/2.   If we are interested in the obtained sum of utilities (social welfare) as our goal, then each pure equilibria gets social welfare of 5, while the mixed one gets an expected social welfare of 4. The optimal social welfare, 6, is obtained at a non-equilibrium point. Putting it into a price of anarchy/stability point of view, the price of stability (ratio between welfares obtained in best Nash equilibrium and the optimimum) is 6/5=1.2, while the price of anarchy (ratio between welfares obtained in worst Nash equilibrium and the optimum) is 6/4=1.5 (for the mixed case).

Let us look at the following correlated distribution:

.                            Bob

.                   Dare          Chicken

.     Dare          0                  1/3

Alice

.     Chicken     1/3               1/3

Let us see why this is a correlated equilibrium.  Assume that Alice “gets” dare. This happens with probability 1/3, and in this case she knows for sure that Bob got “chicken” — so clearly daring is a best reply.  On the other hand, if Alice gets Chicken — which happens with probability 2/3 — then her conditional distribution on Bob is 1/2-1/2.  Under this distribution she is indifferent between daring and chicken, so chicken is still a best reply.  The nice thing is that the expected sum of utilities in this correlated equilibrium is 5 \frac{1}{3} — better than that of any Nash equilibrium.  This turns out to be the best correlated equilibrium in terms of social welfare, and the resulting price of correlated stability is thus 6/(5 \frac{1}{3}) = 1.125.  The set of correlated equilibria does not only includes ones with higher welfare than any Nash equilibrium, but also ones with lower welfare.  The worst correlated equilibrium turns out to give 1/3 weight to each of (dare,dare), (dare,chicken), and (chicken,dare), for a total expected social welfare of 3 \frac{1}{3}, giving a correlated price of anarchy of  6/(3\frac{1}{3}) = 1.8.  The efficiency gap may be arbitrarily large as the following example shows [corrected on 21/10 thanks to Jochen Konemann who found a bug in the original version] (all missing entries are 0,0):

.          C1       C2        C3       D0      D1      D2     D3
C1                  1,1       1,1       0,1      0,2
C2      1,1                   1,1       0,1                0,2
C3      1,1       1,1                   0,1                        0,2
D0      1,0       1,0       1,0       \epsilon,\epsilon     \epsilon,\epsilon     \epsilon,\epsilon    \epsilon,\epsilon
D1      2,0                               \epsilon,\epsilon
D2                  2,0                   \epsilon,\epsilon
D3                              2,0       \epsilon,\epsilon

Inspection will reveal that any Nash equilibrium cannot have any C* strategies in its support and thus gives always utility \le \epsilon to both players.   In contrast, a uniform distribution on the 1,1 entries is a correlated equilibrium that gives utility 1 to each player.  Thus while the regular price of stability is infinite, the correlated price of stability is 1.  So, the optimists (like me) who prefer looking at the price of stability may view the notion of  a correlated equilibrium as an opportunity to get better results.  On the other hand, the pessimists who look at the price of anarchy may be concerned that it may bring worse results.  To ease these fears, a very recent paper, “Intrinsic robustness of the price of anarchy” by Tim Roughgarden  shows that for a wide class of games, including the congestion games usually studied in the context of the price of anarchy, this does not happen — the correlated price of anarchy turns out to be no worse than the regular price of anarchy.

Implementation

The most serious concern with the notion of correlated equilibrium is its implementation — how can the players get a joint correlated probability distribution.  The definition of the notion assumes a fictional trusted third party termed a mediator.  But how can the correlated equilibrium be reached in normal circumstances, without a mediator?  One may devise various “correlation devices” to replace the fictional mediator.  For example, in the chicken example above a hat and three balls will do the trick: two balls labeled “chicken” and one ball labeled “dare” are thrown into a hat (under the joint supervision of Alice and Bob) and Alice and Bob each takes one ball from the hat (without looking).  Notice that each of them draws “dare” with probability 1/3 — and they never do so simultaneously — exactly our correlated equilibrium!

Can such a correlation device be implemented in general?  Can an equivalent device that works over computer communication networks be implemented?  Both the game-theory community and the cryptography community worked on these questions using somewhat different notions and terminology.  Game theorists considered adding a pre-round of cheap talk to the game where the parties may communicate with each other.  From the computer science perspective this is just a special case of secure function evaluation: there are general cryptographic techniques that enable any number of parties to emulate any trusted mediator without really having such a mediator and without trusting each other.  There are a variety of models and results here, but the basic ones in our setting allow (1) making standard cryptographic assumptions, any correlated equilibrium in any game with any number of players can be implemented securely in a computational sense (2) any correlated equilibrium in any game with at least 3 (or 4, depending on the exact model) players can be implemented securely in an information-theoretic sense (but not in 2-party games).  For more information see chapter 8 of the algorithmic game theory book.

Computation

The first nice thing about correlated equilibria is that they can be efficiently found.  Just look at the inequalities that define them above: they are all linear inequalities in the p_{ij}‘s: for every i,i'\sum_j p_{ij} a_{ij} \ge \sum_j p_{ij} a_{i'j},  similar inequalities for Bob, and further linear inequalities specifying that p is a probability distribution.  Thus we can find a correlated equilibrium using linear programming and even optimize an arbitrary linear function over correlated equilibria, e.g. find the one with highest social welfare.  This is in stark contrast to Nash equilibria where  finding an arbitrary one is PPAD-complete and optimizing social welfare over them is NP-hard.

The LP algorithm for correlated equilibria directly applies to games with any number of players and its running time is polynomial in the size of the game (the descriptions size of the utility functions of the players, which is the product of the players’ strategy-space sizes).  The situation is actually even better and we can often find correlated equilibria of exponential-size games in polynomial time.  Specifically, using the Ellipsoid algorithm, there are cases where it is possible to find a correlated equilibrium in time that is polynomial in the length of the representation of the strategies of the game.  For this to make sense there needs to be a succinct representation of the game, where given a representation of players’ strategies, their utilities can be easily determined.

Learning

How can an equilibrium of a game be practically reached?  Is there a natural process that will lead to a state of equilibrium?  A process where the players will “learn” what they should play?  There have been many different attempts and models to define processes that converge to equilibria.  A simple setting considers players repeatedly playing a game, where at each stage they act according to some given “learning” rule (which is intuitively “somewhat rational” but not really strategic) that depends on the history of the game as well as on their own utility function (but usually not on the other players’ utility functions — this is called uncoupled dynamics).

A simple example is repeated-best-reply: in every round you play your best reply to the strategies played by the other players in the previous round.   If this process converges, then it will be to a pure Nash equilibrium.  However, it may in general fail to converge even if a pure Nash equilibrium exists and certainly will not converge if the game lacks pure Nash equilibria.  There are however interesting special cases where this process does converge, specifically potential games and games solvable by iterated elimination of dominated strategies (in the latter case even in a strong asynchronous sense).  A natural attempt for converging to a mixed Nash equilibria is called fictitious play: in every round you play your best response to the other players’ mixed strategies defined according to the empirical historical distribution of their strategies. This process again is not guaranteed to converge and may loop indefinitely.  In fact, there is no natural process that is known to converge to a Nash equilibrium that is not essentially equivalent to exhaustive search; this shouldn’t come as a surprise since such a process would likely imply an efficient algorithm whose existence we doubt.  There are however natural processes that do converge to correlated equilibria.  These are based on regret minimization procedures. Below is a minimal description of these types of learning algorithms; for a comprehensive exposition as well as historical context see chapter 4 in the algorithmic game theory book, or this talk by Avrim Blum.

Here is the basic framework: we need to design a “learning algorithm” A that chooses one out of N possible actions, and does this repeatedly for T steps.  For each possible action i \in \{1...N\} and time 0 < t \le T there is some utility 0 \le u_t(i) \le 1.  At time t the algorithm must choose an action i_t before seeing the current utilities u_t(i) but rather only based on previous utilities.  The goal is to maximize the total utility obtained: U_A = \sum_t u_t(i_t), where i_t is the action A took at time t.  The main challenge here is the total lack of information regarding the utilities u_t(i) which need not have any relation to the utilities at previous times.  I.e. the model allows u_t(i) to be chosen by an adversary, and furthermore to depend arbitrarily on the actions chosen previously by our learning algorithm, i.e. on i_1 ... i_{t-1}.   Since such a setting is hopeless we will not attempt comparing the performance of our algorithm to the posterior optimum, but just to some family of alternative algorithms (a family that may be defined as possible variants of the original A).  What we need in this context is called low swap (or internal) regret.   Let F : \{1...N\} \rightarrow \{1...N\}, we say that B is obtained from A by swap F if whenever A takes action i, then B takes action F(i).

Definition: A that runs for T steps has swap regret R if for for every algorithm B that is obtained from A by some swap, we have that U_A \ge U_B - R.

It is not difficult to see that deterministic algorithms cannot have “low” swap regret, but it turns out that randomized algorithms may. A randomized algorithm may choose its action i_t according to a probability distribution p^t, (where each i is chosen with probability p^t_i).   There has been much work on designing learning algorithms with low regret with the end result being efficient randomized algorithms that almost surely achieve swap regret R=O(\sqrt{NT\log N}), the important point being that R << T.  (For a weaker notion, external regret, the dependence on N can be made logarithmic, but this is not known for internal regret).   This post will not go into how these algorithms are constructed, but just why they are what we need.

So far all this regret-minimization stuff was in a decision-theoretic framework — just a single decision maker.  Now comes the key switch to games: what if each player in a game plays such a regret minimization algorithm?   Let us be more specific, and as above, consider two players, even though everything holds in general.  Suppose that Alice and Bob play the game for T rounds, each time choosing their strategy (action) in the game according to a regret minimization algorithm.  At each time t Alice plays i_t and Bob plays j_t and thus Alice sees utility vector u_t(i)=a_{i,j_t} and Bob sees utility vector u^{Bob}_t(j)=a_{i_t,j}.

Lemma: Assume that Alice and Bob both play an algorithm for T steps with swap regret R then the uniform distribution on (i_t,j_t) over all 1 \le t \le T is \epsilon-close to a correlated equilibrium (in L_1 norm), where \epsilon=R/T.   I.e., let p denote this distribution then for every i there exists \epsilon_i such that for all i', we have that \sum_j p_{ij} a_{ij} \ge \sum_j p_{ij} a_{i'j} - \epsilon_i, with \sum_i \epsilon_i = \epsilon (and similarly for Bob).

To see why this is true, let us look at Alice’s optimal strategy when the trusted mediator tells her that i was drawn according to this distribution. Certainly, Alice’s best response is to play the strategy i' that maximizes \sum_j p_{ij} a_{i'j} — we can denote the mapping from each i to its optimal i' by the function F(i), and we define \epsilon_i to be the gap needed for i'=F(i) , \epsilon_i = \sum_j p_{ij} a_{F(i)j} - \sum_j p_{ij} a_{ij}.  Now let us look at the expression for the regret of A relative to the swap F:   U_B - U_A = \sum_t u_t(F(i_t)) - \sum_t u_t(i_t) which we know is bounded by R.   Given that by definition u_t(i) = a_{i,j_t}, and that p_{ij} is simply 1/T times the number of times t for which i_t=i and j_t=j, we see that this expression is exactly equal to the the sum of the \epsilon_i‘s which completes the proof of the lemma.

Read Full Post »

Older Posts »