Feeds:
Posts

## Bayesian Computer Scientists

I spent most of last week in the Bertinoro workshop on Frontiers in Mechanism Design organized by Jason Hartline and Tim Roughgarden.  Such a small focused event is really a winning format (when done right — of course): almost every talk was very close to my interests and was really good (since the speakers presented their preferred recent work which usually was also accepted to some other top conference).

One of my take-homes from this event was the strengthening of the realization that computer scientists are doing more and more Bayesian analysis  in Algorithmic Mechanism Design, in this respect getting closer to the economists’ way of thinking.  It is not that computer scientists have lost their basic dislike of “average case analysis”, distributional priors, or especially, common priors, it is just that we have started reaching in more and more places the limits of “worst case” analysis.  It seems that computer scientists are being very careful with “how much” they rely on Bayesian analysis, obtaining various “hybrid” results that are more robust, in various senses, than straight Bayesian ones.

An extreme good example is the classic result of economists Jeremy Bulow and Paul Klemperer who show that revenue of the straight 2nd price auction (for a single item) with $n+1$ bidders always dominates the revenue of Myerson’s optimal auction with $n$ bidders.  Only the analysis is Bayesian: the resulting 2nd price $n+1$-bidder auction is completely “worst-case” — neither the auctioneer, nor the bidders must have any knowledge or agreement on the underlying distribution.   In spirit, such a result is similar to Valiant’s prior-free learning where the analysis is with respect to an underlying distribution even though the learning algorithms cannot depend on it.  A recent paper Revenue Maximization with a Single Sample by Dhangwatnotai, Roughgarden and Yan (to appear in EC 2010) gets more general results in this prior-independent vein, although in an approximation sense.

A weaker type of result, but still better than full-blown Bayesian, is the 2nd price version of Myerson’s auction.  In this version, the auctioneer must “know” the underlying distribution in order to set the optimal reserve price, but once that is done, the bidders see a “worst-case” situation in front  of them(2nd price with reserve) and should bid truthfully in the dominant strategy sense without needing to know or agree about the underlying prior distribution.  (This is opposed to the revenue-equivalent-in-principle 1st price version in which bidders must know and agree on a common prior for them to have any chance of reaching he desired equilibrium.)  A recent paper Multi-parameter mechanism design and sequential posted pricing by Chawla,  Hartline,  Malec, and Sivan (to appear in STOC 2010) gets similar types of results in a unit-demand heterogeneous auction setting where the auctioneer needs to know the distribution in order to set prices (in this case, posted prices) but the resulting mechanism is very simple and truthful in the dominant-strategy sense (again the approximation guarantee is in an approximation sense).

A yet weaker version of a similar type of result appears in the paper Bayesian Algorithmic Mechanism Design by Jason D. Hartline and Brendan Lucier (to appear in STOC 2010).  In this paper again the auctioneer does need to know the underlying distribution, and then he creates a mechanism that is incentive compatible, but here only in the Bayesian sense.  I.e. the the bidders need not know the underlying distribution (as they should just act truthfully) but they should still agree that the auctioneer knows the prior distribution.  This type of result is more fragile than the previously mentioned ones since the  truthfulness of the auction depends on the auctioneer correctly knowing the underlying distribution, rather than just the optimality of it.  On the plus side, the paper shows that the auctioneer’s derivation of the auction can be done effectively just by using a “black box” for sampling the underlying distribution (as is the case for he derivation of Myerson’s optimal reserve price).

A someone dual situation is presented in the paper Price of Anarchy for Greedy Auctions by Brendan Lucier and Allan Borodin (SODA 2010).  In that paper, auctions are presented in which the auctioneer need not know the underlying auction and acts in “detail-free” way, i.e. the auction is independent of the underlying distribution.  However,  the approximate-optimality holds when the bidders are in a Bayesian equilibrium, i.e the bidders must know and agree on a common prior for the analysis to hold.

The last example of “somewhat-Bayesian” results that comes to mind has nothing to do with incentives but is just algorithmic.  The paper The Adwords Problem: Online Keyword Matching with Budgeted Bidders under Random Permutations by Nikhil Devanur and Thomas Hayes (in EC 2009) considers an online model of repeated auctions, where no distributional assumptions are made on the bidder values that are assumed to be “worst case”, but a distributional assumption is made on the order or arrival which is assumed to be uniform.  This allows them to get arbitrarily good approximations, circumventing the known lower bounds for the completely worst case.

While economists too have looked at weakening of the fully Bayesian assumptions, as computer scientists are doing now, I see a difference in the two approaches.  Harsanyi‘s influence on economic thinking seems to be so great that the Bayesian point of view seems to be taken as the correct model by economists, and even its criticisms (cf. the Wilson critique) and results that weaken the assumptions are simply taken as “2nd order” improvements.  Computer Scientists, on the other hand, seem to have their basic intellectual foundation in a non-Bayesian setting, and as they move toward Bayesian models they do not seem to have a single model in mind but are rather quite happy to explore the “Pareto frontier ” between the strength of the model and the strength of the results obtained.

Finally, as a side note, let me observe that while the gap between computer scientists and economists is narrowing on the Bayesian question, the other great gap — the issue of approximation — seems to be as great as ever.  E.g., all the results by computer scientists mentioned in this posts only provide approximate results, while all those by economists provide exact ones.

Facebook has announced a pretty nice graduate student fellowship program.  The first three reserach areas they mention are:

• Internet Economics: auction theory and algorithmic game theory relevant to online advertising auctions.
• Cloud Computing: storage, databases, and optimization for computing in a massively distributed environment.
• Social Computing: models, algorithms and systems around social networks, social media, social search and collaborative environments.

(Hat tip to TechCrunch.)

Google has its own graduate Fellowship program (Nicolas Lambert got the 2009 Fellowship in “Market Algorithms”.)

## My $1/day Adwords Account Google gives its employees$1/day of free adwords advertising.  Beyond an employee perk, this gives Google’s employees the experience of being an Internet advertiser, i.e. experiencing the point of view of Google’s paying customers.  I have been using my $1/day account to advertise the divorce-consulting business of my sister in law (In Hebrew) and did indeed find this experience to be quite illuminating. The first thing I learned is that the ad auction itself is just a small part of the whole thing. Choosing the right text for the ads (all ten words of it), choosing the right keywords to target, etc, is much more prominent than setting the right bids. ”Tiny” issues come up everywhere, e.g. when I needed to choose keywords to target, it turns out that there are four ways to write divorce in Hebrew: גירושים, גירושין, גרושין, גרושים. I’m not really sure whether these are all “kosher” spellings, but they are all searched for an do need to be taken into account. Even more, the whole advertising campaign is peripheral, in principle, to the business itself, and frankly the auction logic is not the first concern of the advertiser. This is obvious, of course, but is easy to forget for one whose work focuses on the auction logic. Now for the auction itself: all together I spent several hours setting up the campaign, making up the ads, choosing keywords, looking at reports, and trying a bit of optimization. The adwords user interface was very easy and convenient to start with, but it didn’t take long until I was attempting things that confused me (e.g. splitting my single campaign into two different ones), at which point I gave up, and stayed with what I achieved, which is quite fine actually. I was especially impressed with Google’s automatic suggestions of keywords which were cleverer than what I came up with (I know that not really, just some learning algorithm, but they were eerily good.) I was surprised and disappointed (as an advertiser, but frankly delighted as a Google employee) by the pretty high prices on the keywords that I targeted: my average cost per click is 88 cents, and this is for pretty low slots, on the average. (Divorce is expensive, it seems, also on the Internet.) This means that my$1 per day suffices for a single click per day, and no more.  I do get this single click almost every day, but have so far been unable to ever get two clicks in one day: neither optimizing by hand, nor letting Google’s automatic bidder do it.  I was professionally insulted by not being able to beat the automatic bidder, but have still not given up on getting an average of more than one click per day for my \$1.  My click through rate is pretty high (compared to my expectations):  0.79%, so I usually get about 150 impressions every day of which one is clicked and results in a visit to the website.  Now these visitors are probably really good leads: not only have they searched for relevant keywords, but they also clicked on a pretty specific ad.  I suppose that if even 1% of them become clients (this is not so little: we are not talking about buying a sweatshirt; this is about handling divorce), then the advertising would be considered quite  profitable even had my sister in law paid for it.  Unfortunately, it is quite hard to gauge whether this is the case: getting and keeping the required statistics is easier to imagine theoretically than to do when you have to handle a small business.  In other words, I haven’t a clue what my valuation of a click is.   (The lack of knowledge of one own’s valuation has been discussed in AGT, but frankly I have not seen really convincing treatment of this issue.)

The set of reports about the performance of the campaign that adwords makes available  is quite impressive, and they are really nicely and simply done, but somehow I still don’t really know how how to optimize my campaign as to get the most and the best customers to the site.  I’m sure that more time on my part, as well as a more data-centric handling of my sister-in-law’s business would improve things, but the difficulty of getting and handling the right data is another lesson that I got from this exercise (again, I knew this theoretically, but now I feel it too).

## Nemmers conference in honor of Paul Milgrom

A few days ago, Nortrhwestern hosted a conference in honor of  Paul Milgrom winning the Nemmers Prize.  (As reported here and here.) The speakers seem to be the A-list of Economists in Mechanism Design, and their talks slides, as well as a video of Milgrom’s talk, are on the web page.  Computer Scientists were not invited, but it seems that Lance Fortnow was still able to tweet from there gems like:

Preston McAfee: Left to their own devices computer scientists would recreate the Soviet Union

Susan Athey: CS still needs to catch up in auction theory (on why Microsoft hires economists)

A large number of AGT researchers work in just three large companies: Google, Microsoft, and Yahoo, mostly working on ad auctions. From two or three data points, it seems that Facebook is now attempting to hire such researchers too (although maybe not in “research” roles). Looking in Facebook’s “careers” page, they are indeed looking for an “advertising auction expert” whose requirements include “Expert knowledge of algorithmic game theory, auction theory, mechanism design and their applications to online advertising” as well as a Ph.D. in CS or OR.

I can well see the potential for Algorithmic Mechanism Design research in Facebook. Facebook’s information about its users is enormous, detailed, and lives within the context of their complex social network. Making good use of this complex information seems to require another level of sophistication relative to today’s ad auctions. Privacy issues are always there, and while Facebook has already had problems in that department, it seems that its users do not consider their “identity” to be too private. I wonder if within a year or two we’ll start seeing research papers coming from Facebook too.

## New papers by Brendan Lucier

In the last month, Brendan Lucier has uploaded three conceptually related papers to the arXiv (two of them with co-authors):

All three papers aim to improve the approximation ratios obtained by computationally efficient mechanisms.  Each of them gives pretty strong and general results, but at a price: relaxing the notion of implementation from the usual one (in CS) of dominant-strategies.  Specifically, the notions used in these three papers are, respectively:

1. “We consider models of agent behaviour in which they either apply common learning techniques to minimize the regret of their bidding strategies, or apply short-sighted best-response strategies.”
2. “We consider relaxing the desideratum of (ex post) incentive compatibility (IC) to Bayesian incentive compatibility (BIC), where truthtelling is a Bayes-Nash equilibrium (the standard notion of incentive compatibility in economics).”
3. “We study the price of anarchy for such mechanisms, which is a bound on the approximation ratio obtained at any mixed Nash equilibrium.”

In general, I like the strategy of relaxing the required notion of implementation or equilibrium in order to get strong and general results.  This is especially true in the context of algorithmic mechanism design for a few reasons (1) The concept used in CS seems to be so strict that simply not enough is achievable (2) The CS point of view is anyway much stricter than that of Bayesian-Nash commonly used by economists (3) much less seems to be needed in practice in order to ensure cooperative behavior.  (E.g.  people just use TCP truthfully despite the clear opportunities for beneficial manipulation.)  The difficulty with the relaxation strategy is that there are many possible ways of relaxing the game theoretic requirements, and it takes some time to really be convinced which notions are most useful, and in which situations.  I would love to see more community discussions on these as well as other notions of relaxed implementation and equilibrium.

## SIEPR & Microsoft Conference On Internet Economics

The Stanford Institute for Economic and Policy Research, jointly with Microsoft, is holding a conference on Internet Economics on September 24-25.  Many papers are available on the web site.

The DoubleClick Ad Exchange is a real-time marketplace to buy and sell display advertising space. By establishing an open marketplace where prices are set in a real-time auction, the Ad Exchange enables display ads and ad space to be allocated much more efficiently and easily across the web. It’s just like a stock exchange, which enables stocks to be traded in an open way.

There are some existing competitors, but Google’s entry may be game-changing.  Lots of research problems beg themselves.

## An Observation, Truthful Approximation, and the ArXiv

Suppose you have an interesting result that has an easy, almost trivial proof. What is the best way to publish it? Writing a full, formal paper takes too much energy. Besides, a travel to a conference just to give a 5 minutes presentation is an overkill, and journals are just too slow (who reads them anyways?)

The result in question regards the basic issue in algorithmic mechanism design of to what extent does incentive compatibility penalize computationally efficient approximation algorithms.  Shahar observed that known techniques imply that, at least for artificial problems, incentive compatibility may result in an unbounded degradation.

I talked Shahar (who is my just-graduating student) into writing it up and uploading it to the arxiv, here.

I think that the question that Shahar raises (how to “publish” easy stuff), as well as the answer he gives (unbounded price of incentive compatibility), are both interesting (though not really related) — so here they are.

## Auction Algorithm for Bipartite Matching

Undergraduate algorithms courses typically discuss the maximum matching problem in bipartite graphs and present algorithms that are based on the alternating paths (Hungarian) method.  This is true in the standard CLR book as well as in the newer KT book (and implicitly in the new DPV book that just gives the reduction to max-flow.)  There is an alternative auction-like algorithm originally due to Demange, Gale, and Sotomayor that is not well known in the CS community despite being even simpler.  The algorithm naturally applies also to the weighted version, sometimes termed the assignment problem, and this is how we will present it.

Input: A weighted bipartite graph, with non-negative integer weights.  We will denote the vertices on one side of the graph by B (bidders) and on the other side by G (goods).  The weight between a bidder i and a good j is denoted by $w_{ij}$.  We interpret $w_{ij}$ as quantifying the amount that bidder i values good j.

Output: A matching M with maximum total weight $\sum_{(i,j) \in M} w_{ij}$.  A matching is a subset of $B \times G$ such that no bidder and no good appear more than once in it.

The special case where $w_{ij} \in \{0,1\}$ is the usual maximum matching problem.

Algorithm:

Initialization:

1. For each good j, set $p_j \leftarrow 0$ and $owner_j \leftarrow null$.
2. Initialize a queue Q to contain all bidders i.
3. Fix $\delta = 1/(n_g+1)$, where $n_g$ is the number of goods.

While Q is not empty do:

1. $i \leftarrow Q.deque()$.
2. Find j that maximizes $w_{ij} - p_j$.
3. If $w_{ij} - p_j \ge 0$ then
1. Enque current $owner_j$ into Q.
2. $owner_j \leftarrow i$.
3. $p_j \leftarrow p_j + \delta$.

Output: the set of $(owner_j, j)$ for all j.

Correctness: The proof of correctness is based on showing that the algorithm gets into an “equilibrium”, a situation where all bidders “are happy”.

Definition: We say that bidder i is $\delta$-happy if one of the following is true:

1. For some good j, $owner_j=i$ and for all goods j’ we have that $\delta + w_{ij}-p_j \ge w_{ij'}-p_{j'}$.
2. For no good j does it hold that $owner_j=i$ and  for all goods j we have that that $w_{ij} \le p_{j}$.

The key loop invariant is that all bidders, except those that are in Q, are $\delta$-happy.  This is true at the beginning since Q is initialized to all bidders.  For the bidder i dequeued in an iteration, the loop exactly chooses the j that makes him happy, if such j exists, and the $\delta$-error is due to the final increase in $p_j$.  The main point is that this iteration cannot hurt the invariant for any other i’: any increase in $p_j$ for j that is not owned by i’ does not hurt the inequality while an increase for the j that was owned by i’ immediately enqueues i’.

The running time analysis below implies that the algorithm terminates, at which point Q must be happy and thus all bidders must be $\delta$-happy.

Lemma: if all bidders are $\delta$-happy then for every matching M’ we have that $n\delta + \sum_{i=owner_j} w_{ij} \ge \sum_{(i,j) \in M'} w_{ij}$.

Before proving this lemma, we notice that this implies the correctness of the algorithm since by our choice of $\delta$, we have that $n\delta < 1$, and as all weights are integers, this implies that our matching does in fact have maximum weight.

We now prove the lemma.  Fix a bidder i, let j denote the good that he got from the algorithm and let j’ be the good he gets in M’ (possibly j=null or j’=null).  Since i is happy we have that $\delta + w_{ij}-p_j \ge w_{ij'}-p_{j'}$ (with a notational convention that $w_{i,null}=0$ and $p_{null}=0$, which takes care also of case 2 in the definition of happy)  Summing up over all i we get $\sum_{i=owner_j} (\delta + w_{ij}-p_j) \ge \sum_{(i,j') \in M'} (w_{ij'}-p_{j'})$.  Now notice that since both the algorithm and M’ give matchings, each j appears at most once on the left hand side and at most once on the right hand side.  More over if some j does not appear on the left hand side then it was never picked by the algorithm and thus $p_j=0$.  Thus when we subtract $\sum_j p_j$ from both sides of the inequality, the LHS becomes the LHS of the inequality in the lemma and the RHS becomes at most the RHS of the inequality in the lemma.  QED.

Running Time Analysis:

Each time the main loop is repeated, some $p_j$ is increased by $\delta$ or some bidder is removed from Q forever.  No $p_j$ can ever increase once its value is above $C = max_{i,j} w_{ij}$.  It follows that the total number of iterations of the main loop is at most $Cn/\delta = O(Cn^2)$ where n is the total number of vertices (goods+bidders).  Each loop can be trivially implemented in O(n) time, giving total running time of $O(Cn^3)$, which for the unweighted case, C=1, matches the running time of the basic alternating paths algorithm on dense graphs.

For non-dense graphs, with only $m=o(n^2)$ edges (where an edge is a non-zero $w_{ij}$), we can improve the running time by using a better data structure.  Each vertex maintains a priority que of goods ordered according to the value of $w_{ij} - p_j$.  Whenever some $p_j$ is increased, all bidders that have an edge to this j need to update the value in the priority queue.  Thus an increase in $p_j$ requires $d_j$ priority queue operations, where $d_j$ is the degree of j. Since each $p_j$ is increased at most $C/\delta = O(Cn)$ times, and since $\sum_j d_j =m$ we get a total of O(Cmn) priority queue operations.  Using a heap to implement the priority queue takes $O(\log n)$ per operation.  However, for our usage, an implementation using an array of linked lists gives O(1) amortized time per operation: entry t of the array contains all j such that $w_{ij} - p_j = t\delta$, updating the value of j requires moving it down one place in the array, and finding the maximum $w_{ij} - p_j$ is done by marching down the array to find the next non empty entry (this is the only amortized part).  All in all, the running time for the unweighted case is O(mn).

• As shown by DGS, a similar procedure terminates with close to VCG prices, which are also the point-wise minimum equilibrium prices.
• The algorithm was presented for the assignment problem where bidders never desire more than a single item.  It does work more generally as long as bidders are “gross substitutes”.
• The algorithm, like many auctions, can be viewed as a primal-dual algorithm for the associated linear program.
• Choosing a small fixed value of, say, $\delta=0.01$ gives a linear time 1.01-approximation for the maximum matching.
• Choosing the value $\delta = 1/\sqrt{n}$ gives a matching that misses at most $\sqrt{n}$ edges, that can then be added using $\sqrt{n}$ alternating path computations, for a total running time of $O(m \sqrt{n})$.
• Many algorithmic variants were studied by Dimitry Bertsekas.
• A wider economic context appears in this book.