Feeds:
Posts

CACM article on Mechanism Design and CS

The August 2010 issue of CACM published an article by Gary Anthens on Mechanism Design Meets Computer Science.

Yu Jiajin reports on Shanghai AGT Summer School

Yu Jaijin reports on the GAME 2010 summer school in AGT that was held in Shangai early this month:

The theory group at Fudan University hosted a Summer School on Algorithmic Game Theory (GAME 2010) in Shanghai from July 4th to July 15th. This was a greatly successful event that attracted more than sixty participants from China, Europe, other Asian countries, and the US. Most of them were students, but some postdocs and researchers attended as well.  Nine invited speakers gave great lectures about various aspects of algorithmic game theory, and several participants presented their own research results. In addition to the enlightening lectures, the school organized an exciting excursion to EXPO 2010 and a nice banquet in a gorgeous Chinese garden restaurant. The complete schedule can be found at http://www.tcs.fudan.edu.cn/game10/program.html.

The following topics were covered in the summer school:

• Avrim Blum and Rudolf Fleischer started the summer school with an introduction into game theory and mechanism design, introducing the main concepts and giving some first examples of games and mechanisms.
• Andrei Border talked about the fascinating world of computational advertising by presenting some of its key features and comparing it to traditional advertising. His presentation included many vivid examples and figures from real world applications (like retrieving good ads in the sponsored search auction and the guaranteed delivery of display ads), and various kinds of research behind it (IR, optimization, etc.).
• Stefano Leonardi’s lectures were mostly related to the design of approximation algorithms for truthful mechanisms (like cost sharing mechanism). He taught how to adapt classic approximation algorithms to truthful ones in the utilitarian mechanism design, and presented his recent work about Pareto-optimal combinatorial auctions whose motivation came from Google TV ads.
• Kamal Jain first talked about Internet Economics. He believed search engines collected intentions from users and sold them to advertisers, which were on an atomic scale, while in traditional advertising aggregate interests are sold. He then presented two of his interesting recent research results. The first one was how to decide the existence of a matching in some lopsided bipartite graph, coming from the context of matching targets for display ads. The other one was about market equilibria in a market where agents are both consumers and producers of the content that belongs to one of several categories.
• Avrim Blum gave lectures about game theory, mechanism design, machine learning, and pricing problems. He first introduced the concept of internal/swap regret and showed that if players use no internal-regret strategies, the empirical distribution of the game will be a correlated equilibrium. He then talked about two interesting applications of machine learning techniques for designing incentive compatible mechanisms for revenue maximization in unlimited supply settings. The first one concerned the online digital good auction: how to adapt pricing for a single digital good to achieve performance matching the best fixed price in hindsight. He also showed how to use ideas from machine learning to convert batch optimization algorithms into incentive-compatible mechanisms with performance close to the best pricing function in some class of pricing functions. Finally, we also talked about about some interesting algorithmic pricing problems in this setting as well.
• Eva Tardos’s lectures were mostly related to the price of anarchy. She first gave a very gentle introduction of congestion games in atomic and non-atomic settings. Then she taught a technique called (λ, μ) – smoothness that can prove stronger bounds of PoA for non-atomic games. She also presented the work about some multiplicative update learning process that will converge to a weakly Nash equilibrium in polynomial time. At last she talked about a recent work of PoA in GSP auction. Using some combinatorial structures in the auction, they could show a constant bound of PoA of pure Nash, mixed Nash, and Bayesian-Nash in GSP auctions.
• Xiaotie Deng gave a proof sketch of the celebrated result concerning the PPAD-completeness of finding mixed Nash equilibria in 2-player game. After introducing the PPAD class, he showed the reduction in the following way: 2D Another End of Lines => 2D Sperner Problem => 2D Discrete Fixed Point Problem => 2-player Nash Equilibrium Problem.
• Ning Chen covered several important known results on stable matchings and its Pareto-optimal extensions, including the GS algorithm, the Roth-Vande Vate Algorithm that solved Knuth’s local search problem, and some extensions like having partial orders or ties in the preference lists. Ning Chen and Xiaotie Deng then presented together their recent work about market equilibria in the advertising business.  They gave a polynomial time algorithm that found the minimum market equilibrium price where agents had budget constraints.

Reputation for Human Computation

Like many, I am fascinated by the notion of “Human Computation“: algorithms that use black-boxes that are implemented by actual humans.  The fascination is probably mostly due to the “inverse”  interface between humans and machines: what we are used to is humans using black-boxes that are implemented by computers; now the roles are switched.   Add to this the unexpected effectiveness of this practice and one can certainly start musing about practical possibilities, various philosophical-leaning directions, as well as actual research questions.

Amazon’s Mechanical Turk brings this possibility into the hands of many, and in particular I have been hearing more and more researchers who are attempting to use it for running experiments on human subjects.  Many questions regrading the validity, ethics, and administration of such experiments suggest themselves, and these questions are starting to be addressed.

I have been following with interest Panos Ipeirotis’s stream of blog posts about the Mechanical Turk , the last of which suggests that the lack of a sufficient reputation mechanism is leading to the failure of the Mechanical Turk labor market.  In other words, “spammers” abound and are hurting the proper functioning of the market.  Building a sufficiently good reputation mechanism would seem to require the proper blend of dealing with the humans and the computers, a blend that may be especially interesting due to the “inverse” roles of humans and machines in this setting.

AGT Workshop in Tehran

On July 29, 2010, there will be an AGT workshop in Tehran.

Online Dating

Hat tip to Al Roth for pointing out an interesting clip of Dan Ariely talking about online dating.

The world of online dating is one of the most significant demonstrations of how much the Internet has changed our lives.  Finding a spouse (or even just sex) is one of the most basic human activities and every culture has put much effort into structuring it just so.  Yet, within a very short time, online dating has captured a huge share of this deeply socially rooted “market”.  I would say that this is more due to the deficiencies of the current “offline” models than due to the strength of  the current online systems. Maybe it is time for some new models of online dating?

IPAM workshop on AGT

The institute of Pure and Applied Math (IPAM) at UCLA is having a workshop on Algorithmic Game Theory on January 10-14, 2011 with an impressive list of confirmed speakers.  Registration by November 15th is recommended.

Regret in Markets

How do markets and other games reach equilibrium?  Or at least, how can they do so? A nice answer will have each participant in the market or game repeatedly performing some simple update step that only depends on information available to him and that makes sense from his local point of view, with the whole system then magically converging to equilibrium.  The meanings of “simple”, “update”, “information available”,  “makes sense”, and “converging” may vary, but the idea is to capture the basic phenomena of a global equilibrium arising from simple local optimizations.

In this post I will describe some recent work that shows this kind of phenomena where local “regret-minimization” optimizations lead to global convergence in a natural market model.

Regret and Online Convex Optimization

We start by looking at an online optimization strategy for a single decision maker that interacts with an adversarial environment.  It will be nicest to use the general and clean model of  online convex optimization due to Zinkevich.  This model considers an sequence of functions $u^t : S \rightarrow \Re$ for some set $S$. The rules of the game require the algorithm at every time $t$, for $t =1,2,...T$, to pick a point $x^t \in S$ with the aim of  maximizing $u^t(x^t)$, and do so before seeing the function $u^t$, but only based on the history, i.e. on the functions $u^1 ... u^{t-1}$.  Given that we are not assuming any connection between the different $u^t$‘s, even not a stochastic one, this may seem hopeless, but our yardstick for success will be a bit modest as well: we only want to minimize our regret relative to the best constant point.  I.e.

Definition: A sequence of points $x^1 ... x^T$ has regret $R$ on the sequence of functions $u^1 ... u^T$ if for every $x^* \in S$ we have that $\sum_{t=1}^T u^t(x^t) \ge \sum_{t=1}^T u^t(x^*) - R$.

The main point is that there exist online algorithms that always find such low-regret sequences when $S$ is any compact convex subspace of a Euclidean space and each $u^t$ is concave (hill shaped).  Zinkevich presents a simple online algorithm that ensure $R = O(\sqrt{T})$ for any sequence of such functions, and other ones can do even better.  Zinkevich’s learning algorithm is simple gradient ascent: You start with an arbitrary $x^1$ and then each $x^t$ is obtained by taking a small step of size $\delta_t$ from $x^{t-1}$ in the direction of steepest ascent of the function $u^{t-1}$ (and , if needed, projecting back to $S$).  Intuitively you prepare for the future by only slightly modifying what you did in the past, in the direction that would have been best then.  Amazingly, by choosing $\delta_t = 1/\sqrt{t}$, one gets $R = O(\sqrt{t})$ for any sequence of concave functions (where the hidden constant in the big-O notation is quadratic in the diameter of $S$ and in the maximum steepness of the $u^t$‘s).  In particular, the average loss, $R/T$ approaches 0.  This model and these results generalize the well-studied case where the $u^t$‘s are linear functions. In particular, a leading application to have in mind is where the convex set $S$ is the set of mixed strategies, i.e. distributions, over a finite set of $m$ pure strategies, and the utility is the linear function giving the expectation of the utility of the chosen pure strategy.

Coarse Equilibria in Games

So far we were in a decision theoretic model of a single decision maker.  Now, what happens when several different decision makers are using such a regret-minimization procedure together?  Formally, we have an $n$-player  game where players have convex strategy sets $S_i$ and utility functions  $u_i : S_1 \times ... \times S_n \rightarrow \Re$.  The players play this game repeatedly, at each time $t$, each player $i$ chooses a strategy $x_i^t$.  Looking at this from player $i$‘s point of view, at each time $t$ he is presented with function of his $x_i$, where the function is determined by the values of the other players’ choices at time $t$, i.e. for $z \in S_i$ we see the function $u_i^t(z) =u_i(z, x_{-i}^t)$.  If each of the players uses a regret minimization algorithm then they will generate together some sequence of strategy profiles $x^1 ... x^T$, where each $x^t=(x_1^t ... x_n^t)$.  Will these approach some kind of equilibrium from a game theoretic perspective?  I heard from Eva Tardos the following term for this:

Definition: Let $D$ be a distribution on the strategy profiles $S_1 \times ... \times S_n$.  We say that $D$ is a coarse equilibrium (or Hannan consistent) if for every player $i$ and every $x^*_i \in S_i$ we have that $E_{x \sim D} [u_i(x)] \ge E_{x_{-i} \sim D_{-i}} [u_i(x^*_i,x_{-i})]$, where $E$ denotes expectation.  We say that it is an $\epsilon$-coarse-equilibrium if $E_{x \sim D} [u_i(x)] \ge E_{x_{-i} \sim D_{-i}} [u_i(x^*_i,x_{-i})] - \epsilon$.

Intuitively, we can think about a trusted coordinator who picks a profile of strategies $x_1 .. x_n$ according to the distribution $D$ and “suggests” it to the players.  $D$ is a coarse equilibrium if all players prefer to follow these suggestions, rather than play any fixed strategy that may only depend on the distribution $D$ and not on the chosen recommendation.  This is strictly weaker than correlated equilibrium in which players should prefer following suggestions to choosing strategies whose identity may depend on  their suggested strategy $x_i$.  This type of equilibrium is what the definition of low regret implies directly:

Proposition: If all players have regret $R$, (i.e. for each $i$ and each $x^*_i$, we have that $\sum_{t=1}^T u_i(x_i^t,x_{-i}^t) \ge \sum_{t=1}^T u_i(x^*_i,x_{-i}^t) - R$,) then the uniform distribution on $\{x^1 ... x^T\}$ is an $\epsilon$-coarse equilibrium for $\epsilon = R/T$.

Let us just look what this implies for finite games in which players have a finite set of pure strategies.  To apply the results form online concave optimization, the regret algorithms will be run over the set of mixed strategies, which is a compact convex set.  In this case the utility for a profile of mixed strategies is the expected utility of the chosen pure strategy profile. Thus once $x^t_{-i}$ is fixed, the utility function on $x_i$ is linear and thus concave so the regret minimization algorithms apply here and yield a $\epsilon = O(1/\sqrt{T})$-coarse equilibrium in time $T$.  By direct definition this is a $\epsilon$-coarse equilibrium of the mixed game, i.e a distribution over (the $T$) profiles of mixed strategies.  However, this distribution may also be viewed as a distribution on the underlying pure strategies so we have also achieved an $\epsilon$-coarse equilibrium of the pure game.   It follows that for every finite game, if each player runs a regret minimization algorithm on the set of mixed strategies then we get a coarse equilibrium.

Socially Concave Games

Just the fact that we have called what is reached (in the sense of average over time) an equilibrium does not mean that we have to like it from a game theoretic sense.  Could we get a “normal” equilibrium from such corse equilibrium?  Even-Dar, Mansour, and Nadav point out a family of games, termed socially concave games, where this is the case.

Definition: An $n$ player game is called socially concave if it satisfies the following two properties:

• For each $i$ and each fixed value of $x_i$, the function $u_i(x_i, x_{-i})$ is convex in $x_{-i}$.
• The sum $\sum_{i=1}^n u_i(x)$ is a concave function of $x$.

Theorem: Let $D$ be a coarse equilibrium of a socially concave game with convex strategy sets, and denote the average value of of elements drawn according to this distribution by $\hat{x} = E_{x \sim D} [x]$ (so by convexity of the $S_i$‘s also $\hat{x}_1 \in S_1 ... \hat{x}_n \in S_n$) then $(\hat{x}_1 ... \hat{x}_n)$ is a (pure) Nash equilibrium.  If $D$ is $\epsilon$-coarse equilibrium then $(\hat{x}_1 ... \hat{x}_n)$ is an $n\epsilon$-Nash equilibrium.

This theorem ensures a pure equilibrium in the game with convex strategy sets.  Looking at the case that the game is the mixed extension of a finite game, this is translated to a mixed equilibrium of the finite game.  Before we prove the theorem let’s look at a simple application: two-player zero-sum finite games.  Looking at the mixed extension we have $u_1(y,z) = -u_2(y,z) = y^TAz$, where, $y$ and $z$ are the mixed strategies of the two players and $A$ is the $m_1 \times m_2$ matrix of utilities of player 1.  This game satsifies the two conditions since for any fixed $y$ we have that $u_1$ is a linear function (hence convex) of $z$, dualy for player 2, and the sum $u_1(y,z)+u_2(y_z)=0$ is constant and thus concave.  This implies that if the two players in a zero-sum game run a regret minimization algorithm (on the mixed strategies) then taking the average of the distribution over time is a ($\epsilon=O(1\sqrt{T})$-)mixed Nash equilibrium.

Proof: We will prove the exact version, the quantative is identical, only taking care of the $\epsilon$‘s everywhere.  What we need to show is that for every $x^*_i$, it holds that $u_i(\hat{x}_i,\hat{x}_{-i}) \ge u_i(x^*_i,\hat{x}_{-i})$.  We will show this by showing the same inequality for the sum: $\sum_i u_i(\hat{x}_i,\hat{x}_{-i}) \ge \sum_i u_i(x^*_i,\hat{x}_{-i})$.  Notice that for the optimal $x^*_i$ (the one maximizing $u_i(x^*_i,\hat{x}_{-i}$) we trivially have $u_i(\hat{x}_i,\hat{x}_{-i}) \le u_i(x^*_i,\hat{x}_{-i})$, and thus the inequality of the sum at the optimal $x^*_i$‘s implies the inequality (in fact equality) for each term seperately.  So we start with $\sum_i u_i(\hat{x}_i,\hat{x}_{-i})$ and make use of the concavity of the sum and the definition of $\hat{x} = E[ x]$  to conclude $\sum_i u_i(\hat{x}_i,\hat{x}_{-i}) \ge E[\sum_i u_i(x_i, x_{-i})]=\sum_i E[u_i(x_i, x_{-i})]$ (using Jensen’s inequality.)  At this point we invoke the definition of coarse equilibrium for each $i$ getting $E[u_i(x_i, x_{-i})] \ge E[u_i(x^*_i,x_{-i})]$, and now using the convexity of $u_i$ in $x_{-i}$ (once $x^*_i$ is fixed) we get $E[u_i(x^*_i,x_{-i})] \ge u_i(x^*_i,\hat{x}_{-i})$, chaining these inequalities we get the required inequality on the sums.

A Market Equilibrium

We can now take this general result and apply it to a rather general market model.  This  is due to Sergiu Hart and Andreu Mas-Colell.  (When I asked Sergiu how should I refer to this result, he said something like “had we been computer scientists this would have already been published in a conference; since we are economists these are preliminary thoughts that maybe will find their way into a paper within a few years”.)

Our market has $m$ divisible goods and $n$ players.  Each player $i$ has  a concave valuation function $v_i(x_1 ... x_m)$ that specifies his value for owning the bundle of goods with $x_j$ fraction of each good $j$.  Each player $i$ has an initial endowment of $w_{ij}$ of each good $j$, and they may trade between themselves, reaching some final allocation, where each player $i$ receives a fraction $x_{ij}$ of each good $j$, where for each good $j$ we have that $\sum_{i} w_{ij} = \sum_{i} x_{ij}$.  Significantly, we assume the existence of money that can be transferred between the players, formally “transferable utilities” or “quasi-linearity”.   In this model the notion of a market equilibrium is:

Definition: The allocation $(x_{ij})$  is at (transferable utility) equilibrium with prices $(p_1 ... p_m)$ if for all goods $j$ we have that $\sum_{i} w_{ij} = \sum_{i} x_{ij}$ and for every player $i$, his net utility, $v_i(x_{i1}...x_{im})-\sum_{j} (x_{ij}-w_{ij})p_j$ is maximized (over all possible bundles $(x_{i1}...x_{im})$).

We now turn this market into a game.  For this, we will require the minor technical assumption that the partial derivatives of the $u_i$‘s are bounded  from above by $M$ and from below by $1/M$ for some constant $M$.  The game will have $n+m$ players: $n$ players that correspond to the original players in the market, with the strategy set of player $i$  being the set of  vectors $(x_{i1}...x_{im})$ with $0 \le x_{ij} \le \sum_i w_{ij}$, and $m$ players corresponding to the goods, where the strategies of “good-player” $j$ are the possible prices $0 \le p_j \le M$.  The utility of an “original-player” $i$ is, as expected,  $u_i(x,p)=v_i(x_{i1}...x_{im})- \sum_{j} p_j(x_{ij} - w_{ij})$, while those of  “good-player” $j$ is: $u_j(x,p) = p_j \cdot (\sum_i x_{ij} - \sum_i w_{ij})$.  The point is that this new game captures, as its Nash equilibrium, the equilibrium of the previous market.

Proposition: The allocation $(x_{ij})$ with prices $(p_1 ... p_m)$ is an equilibrium of the market if and only if  it is a pure Nash equilibrium of the game defined above.

Proof: First notice that being a market equilibrium directly implies that all “original” players best-reply in the associated game.  It also implies that  $\sum_{i} w_{ij} = \sum_{i} x_{ij}$ and thus any $p_j$ is a best-reply for “good-player” $j$ in the game.  The opposite direction is identical once we show that any pure Nash equilibrium of the game must have $\sum_{i} w_{ij} = \sum_{i} x_{ij}$.  Suppose that this was not the case, say $\sum_{i} x_{ij} > \sum_{i} w_{ij}$ for some good $j$, then the only best response for “good-player” $j$ would be to declare $p_j=M$, to which all other players could only best-respond with $x_{ij}=0$ due to the bound from above on the derivative of the $u_i$‘s, contradicting  $\sum_{i} x_{ij} > \sum_{i} w_{ij}$.  Similarly if $\sum_{i} x_{ij} < \sum_{i} w_{ij}$ then the only best response for the “good-player” $j$ would be $p_j=0$, to which all other players could only best-respond with $x_{ij}=1$ due to the bound from below on the derivative the the $u_i$‘s, contradicting  $\sum_{i} x_{ij} < \sum_{i} w_{ij}$.

At this point it is easy to see that the game we just defined is socially concave: for an “original” player $i$, and every fixed $(x_{i1}...x_{im})$, his utility is just a linear function of the $p_j$‘s and thus convex in the strategies of the others; for a “good-player” $j$ and every fixed value of $p_j$, his utility is just a linear function in the $x_{ij}$‘s so again is convex in the strategies of the others.  Finally, summing up the utilities of all players, the $p_j \cdot (\sum_i x_{ij} - \sum_i w_{ij})$ terms cancel out and we are left with $\sum_i v_i(x_{i1}...x_{im})$ which is a concave function being the sum of concave $v_i$‘s.

So we now close the loop: to effectively find an $\epsilon$-equilibrium of a given market, we let each player in the market as well as artificially created  “good-players” run a regret-minimization algorithm for a while.  This is a very simple decentralized process: the original players attempt optimizing their demands to the sequence of ever-changing price vectors that they see, while the “good-players” optimize their prices to the changing aggregated demands that they see.  In both cases, not by a simple best-response but rather by the regret minimization response.  We then have that the long-term averages of demands and prices will be the desired market equilibrium point.