Feeds:
Posts

## Internet Advertising: Theory and Practice

For those AGT/E researchers working on questions related to Ad-auctions (like most of those employed by Google/Yahoo/Microsoft), and writing nice theoretical papers for workshops like the coming “Ad auctions workshop“, it is interesting to see an overview of the messy real word, as can be found in keynote talk given by Terrence Kawaja in a recent event organized by the Interactive Advertising Bureau.

## Bayesian Computer Scientists

I spent most of last week in the Bertinoro workshop on Frontiers in Mechanism Design organized by Jason Hartline and Tim Roughgarden.  Such a small focused event is really a winning format (when done right — of course): almost every talk was very close to my interests and was really good (since the speakers presented their preferred recent work which usually was also accepted to some other top conference).

One of my take-homes from this event was the strengthening of the realization that computer scientists are doing more and more Bayesian analysis  in Algorithmic Mechanism Design, in this respect getting closer to the economists’ way of thinking.  It is not that computer scientists have lost their basic dislike of “average case analysis”, distributional priors, or especially, common priors, it is just that we have started reaching in more and more places the limits of “worst case” analysis.  It seems that computer scientists are being very careful with “how much” they rely on Bayesian analysis, obtaining various “hybrid” results that are more robust, in various senses, than straight Bayesian ones.

An extreme good example is the classic result of economists Jeremy Bulow and Paul Klemperer who show that revenue of the straight 2nd price auction (for a single item) with $n+1$ bidders always dominates the revenue of Myerson’s optimal auction with $n$ bidders.  Only the analysis is Bayesian: the resulting 2nd price $n+1$-bidder auction is completely “worst-case” — neither the auctioneer, nor the bidders must have any knowledge or agreement on the underlying distribution.   In spirit, such a result is similar to Valiant’s prior-free learning where the analysis is with respect to an underlying distribution even though the learning algorithms cannot depend on it.  A recent paper Revenue Maximization with a Single Sample by Dhangwatnotai, Roughgarden and Yan (to appear in EC 2010) gets more general results in this prior-independent vein, although in an approximation sense.

A weaker type of result, but still better than full-blown Bayesian, is the 2nd price version of Myerson’s auction.  In this version, the auctioneer must “know” the underlying distribution in order to set the optimal reserve price, but once that is done, the bidders see a “worst-case” situation in front  of them(2nd price with reserve) and should bid truthfully in the dominant strategy sense without needing to know or agree about the underlying prior distribution.  (This is opposed to the revenue-equivalent-in-principle 1st price version in which bidders must know and agree on a common prior for them to have any chance of reaching he desired equilibrium.)  A recent paper Multi-parameter mechanism design and sequential posted pricing by Chawla,  Hartline,  Malec, and Sivan (to appear in STOC 2010) gets similar types of results in a unit-demand heterogeneous auction setting where the auctioneer needs to know the distribution in order to set prices (in this case, posted prices) but the resulting mechanism is very simple and truthful in the dominant-strategy sense (again the approximation guarantee is in an approximation sense).

A yet weaker version of a similar type of result appears in the paper Bayesian Algorithmic Mechanism Design by Jason D. Hartline and Brendan Lucier (to appear in STOC 2010).  In this paper again the auctioneer does need to know the underlying distribution, and then he creates a mechanism that is incentive compatible, but here only in the Bayesian sense.  I.e. the the bidders need not know the underlying distribution (as they should just act truthfully) but they should still agree that the auctioneer knows the prior distribution.  This type of result is more fragile than the previously mentioned ones since the  truthfulness of the auction depends on the auctioneer correctly knowing the underlying distribution, rather than just the optimality of it.  On the plus side, the paper shows that the auctioneer’s derivation of the auction can be done effectively just by using a “black box” for sampling the underlying distribution (as is the case for he derivation of Myerson’s optimal reserve price).

A someone dual situation is presented in the paper Price of Anarchy for Greedy Auctions by Brendan Lucier and Allan Borodin (SODA 2010).  In that paper, auctions are presented in which the auctioneer need not know the underlying auction and acts in “detail-free” way, i.e. the auction is independent of the underlying distribution.  However,  the approximate-optimality holds when the bidders are in a Bayesian equilibrium, i.e the bidders must know and agree on a common prior for the analysis to hold.

The last example of “somewhat-Bayesian” results that comes to mind has nothing to do with incentives but is just algorithmic.  The paper The Adwords Problem: Online Keyword Matching with Budgeted Bidders under Random Permutations by Nikhil Devanur and Thomas Hayes (in EC 2009) considers an online model of repeated auctions, where no distributional assumptions are made on the bidder values that are assumed to be “worst case”, but a distributional assumption is made on the order or arrival which is assumed to be uniform.  This allows them to get arbitrarily good approximations, circumventing the known lower bounds for the completely worst case.

While economists too have looked at weakening of the fully Bayesian assumptions, as computer scientists are doing now, I see a difference in the two approaches.  Harsanyi‘s influence on economic thinking seems to be so great that the Bayesian point of view seems to be taken as the correct model by economists, and even its criticisms (cf. the Wilson critique) and results that weaken the assumptions are simply taken as “2nd order” improvements.  Computer Scientists, on the other hand, seem to have their basic intellectual foundation in a non-Bayesian setting, and as they move toward Bayesian models they do not seem to have a single model in mind but are rather quite happy to explore the “Pareto frontier ” between the strength of the model and the strength of the results obtained.

Finally, as a side note, let me observe that while the gap between computer scientists and economists is narrowing on the Bayesian question, the other great gap — the issue of approximation — seems to be as great as ever.  E.g., all the results by computer scientists mentioned in this posts only provide approximate results, while all those by economists provide exact ones.

## My $1/day Adwords Account Google gives its employees$1/day of free adwords advertising.  Beyond an employee perk, this gives Google’s employees the experience of being an Internet advertiser, i.e. experiencing the point of view of Google’s paying customers.  I have been using my $1/day account to advertise the divorce-consulting business of my sister in law (In Hebrew) and did indeed find this experience to be quite illuminating. The first thing I learned is that the ad auction itself is just a small part of the whole thing. Choosing the right text for the ads (all ten words of it), choosing the right keywords to target, etc, is much more prominent than setting the right bids. “Tiny” issues come up everywhere, e.g. when I needed to choose keywords to target, it turns out that there are four ways to write divorce in Hebrew: גירושים, גירושין, גרושין, גרושים. I’m not really sure whether these are all “kosher” spellings, but they are all searched for an do need to be taken into account. Even more, the whole advertising campaign is peripheral, in principle, to the business itself, and frankly the auction logic is not the first concern of the advertiser. This is obvious, of course, but is easy to forget for one whose work focuses on the auction logic. Now for the auction itself: all together I spent several hours setting up the campaign, making up the ads, choosing keywords, looking at reports, and trying a bit of optimization. The adwords user interface was very easy and convenient to start with, but it didn’t take long until I was attempting things that confused me (e.g. splitting my single campaign into two different ones), at which point I gave up, and stayed with what I achieved, which is quite fine actually. I was especially impressed with Google’s automatic suggestions of keywords which were cleverer than what I came up with (I know that not really, just some learning algorithm, but they were eerily good.) I was surprised and disappointed (as an advertiser, but frankly delighted as a Google employee) by the pretty high prices on the keywords that I targeted: my average cost per click is 88 cents, and this is for pretty low slots, on the average. (Divorce is expensive, it seems, also on the Internet.) This means that my$1 per day suffices for a single click per day, and no more.  I do get this single click almost every day, but have so far been unable to ever get two clicks in one day: neither optimizing by hand, nor letting Google’s automatic bidder do it.  I was professionally insulted by not being able to beat the automatic bidder, but have still not given up on getting an average of more than one click per day for my \$1.  My click through rate is pretty high (compared to my expectations):  0.79%, so I usually get about 150 impressions every day of which one is clicked and results in a visit to the website.  Now these visitors are probably really good leads: not only have they searched for relevant keywords, but they also clicked on a pretty specific ad.  I suppose that if even 1% of them become clients (this is not so little: we are not talking about buying a sweatshirt; this is about handling divorce), then the advertising would be considered quite  profitable even had my sister in law paid for it.  Unfortunately, it is quite hard to gauge whether this is the case: getting and keeping the required statistics is easier to imagine theoretically than to do when you have to handle a small business.  In other words, I haven’t a clue what my valuation of a click is.   (The lack of knowledge of one own’s valuation has been discussed in AGT, but frankly I have not seen really convincing treatment of this issue.)

The set of reports about the performance of the campaign that adwords makes available  is quite impressive, and they are really nicely and simply done, but somehow I still don’t really know how how to optimize my campaign as to get the most and the best customers to the site.  I’m sure that more time on my part, as well as a more data-centric handling of my sister-in-law’s business would improve things, but the difficulty of getting and handling the right data is another lesson that I got from this exercise (again, I knew this theoretically, but now I feel it too).

## Experiments on Mechanical Turk

Crowd sourcing has just been given a recent visibility boost with DARPA’s Red Balloon contest that was won by the MIT team.  At the same time, Amazon’s well-established (since 2005!) platform for crowd sourcing, the Mechanical Turk, is gaining attention as a platform for conducting behavioral experiments, especially in behavioral economics and game-theory.

Named after the 18th century human-powered chess-playing “machine” called “the Turk”, this platform allows “requesters” to submit “Human Intelligence Tasks” (HITs) to a multitude of human “workers” who solve them for money.  A sample of recent typical tasks include tagging pictures (for 3 cents), writing (and posting on the web) a short essay with a link (for 20 cents), or correcting spellings (for 2 cents, in Japanese).  This allows brief and cheap employment of hundreds and thousands of people to work on simple Internet-doable low-level knowledge tasks.  You may demand various “qualification exams” from these workers, and design such quals of your own.  Obviously workers are in it for the money, but apparently not just that.

Recently, the Mechanical Turk is being used to conduct behavioral experiments.  Gabriele Paolacci is methodologically repeating experiments of Kahneman and Tversky and reporting on them in his blog. Panos Ipeirotis reports on his blog studies of some aspects of the Mechanical Turk as well as results of various behavioral game-theory experiments on it.  I’ve seen others report on such experiments too.  Markus Jacobsson from PARC gives general tips for conducting such human experiments using the Mechanical Turk.

Turk-based behavioral experimentation has the immense appeal of being cheap, fast, and easy to administer.  There are obviously pitfalls such as getting a good grasp on the population, but so does any experimental setup.   Such a platform may be especially appropriate for Internet-related behavioral experiments such as figuring out bidding behavior in online auctions, or how to best frame a commercial situation on a web-page.  Could this be a tool for the yet not-quite-existent experimental AGT?

## Economists and Complexity

One of main culture clashes between computer scientists and economists on the CS-econ frontier is whether “complexity of equilibria” matters.  The  CS-y view of the matter is captured in Kamal Jain’s quote: “If your laptop can’t find it then neither can the market“.  Economists mostly don’t care since they see equilibrium reached everyday, contrary to what CS says.  As Jeff Ely quips:  “Solving the n-body problem is beyond the capabilities of the world’s smartest mathematicians.  How do those rocks-for-brains planets manage to do pull it off?”  TCS folks who see complexity as the map of the world can’t really understand this indifference, as Lance Fortnow tweets: “I’m an economist so I can ignore computational constraints / I’m a computer scientist, so I can ignore gravity.

Computational Complexity map of the world

The beautiful thing about studying systems at equilibrium is precisely the fact that this abstracts away the gory details of the process (aka dynamics aka algorithm) of reaching this equilibrium.  In principle, there are many different difficult-to-understand dynamics that all reach the same easy-to-understand equilibrium point.   This is all very nice as long as the equilibrium is indeed magically reached somehow by the “real” dynamics.  The obvious crucial question is whether this is the case in practice. It seems that the general tendency among economists is to claim “yes”, to which the natural next question in line is “why”?  As computer scientists show, this is not a general characteristic of large games or markets.  Understanding the properties of interesting games and markets that make them actually reach equilibrium should be enlightening.  Maybe it is because economists choose to consider only those that do turn out to converge quickly, ignoring another large and interesting class of strategic scenarios? Maybe it is because economists are thinking about “smallish” games and so their insights will not carry over to more complex realistic scenarios?  Maybe there is some deeper interesting structure that guarantees fast convergence?  Distinguishing between these possibilities is especially crucial as we aim to analyze the new artificial games and markets that are to be found — and designed — on the Internet as well as elsewhere.  Which economic and game-theoretic sensibilities will still hold in these complex unnatural  circumstances?

Complexity is all about understanding such processes.  While the foremost question dealt by computational complexity is that of “time” — how long does a computational process need in order to find the solution — in our case to reach (close-to) equilibrium — this is not the only type of questions and insights provided by complexity theory.  As one can see in the “map above”, there are a stunning variety of complexity classes, each attempting to capture a different facet of the challenge of finding the solution: how much space (memory) is needed? Can we even tell when we reach a solution?  Does randomization help?  Is it helped parallelism?  Are approximations easier?  Does the solution have this or that particular structure? In the case of finding equilibria, the classes PPAD and PLS give a very nuanced explanation of what is involved.  There are also “concrete” models that study explicitly specific parameters such as communication or queries.  One may dislike the fact that this complexity analysis does not restrict attention to natural dynamics but allows arbitrary and unnatural algorithms.  The kind of natural dynamics that economists usually have in mind are some kind of best-replying in case of games and some kind of a Walrasian auctioneer in markets.  The problem is that there are many variants of these that make sense: fictitious play, various forms of population dynamics, more sophisticated types of learning such as regret-minimization, and all these can be enhanced with various orderings, smoothing attempts, tie-breaking rules, strategic look-ahead, re-starts, actions of the central planner, not to mention other more or less complex optimization and learning  attempts.  The strength of complexity analysis is that it applies to all of these.  Any “lower bounds” are definitive: any practical system can be simulated by a computer, and thus no dynamics can succeed in general. (Emphasis on “in general” — as mention above, the problems that you may be interested in may have special structure — so what is it?)   A statement of  an “upper bound” may be less interesting as stated, but immediately raises the challenge of either finding a natural algorithm=process=dynamic or pinpointing the complexity reason explaining why this is impossible.

This is a good point to refute several irrelevant objections to the applicability of computational complexity for analyzing dynamics of games and markets.  The first is the notion that humans or markets can undergo processes that cannot be computed.  Frankly, there is no evidence of this; there is certainly much about the world that we do not understand well enough to simulate, but there is no indication of any natural process that is inherently more powerful than computers.  This is the modern version of the Church-Turing thesis.  (It seems that some quantum phenomena cannot be simulated by usual computers, but that doesn’t change the principle — it just would replace classical computers with quantum ones in the few places that it matters.)  Even if you do subscribe to metaphysical dualism, do you want to base economics on it?  The second types of objections concern the standard technical notions used in complexity which obviously leave much wiggle-room: “polynomial time”, with its hidden constants, is not synonymous with efficient; worst-case analysis is not always the right notion, etc.  The point is that these are simply concise notions that usually seem to capture the issues well.  There always are cases where more refined notions are needed, and in such cases complexity theory can provide more precise answers: for example in analysis of basic problems like sorting or matrix multiplication, very exact results are obtained (with no hidden constants) similarly, cryptography is not based on worst-case analysis, etc.  There is no indication so far that the usual somewhat coarse notions of computational complexity miss something significant when applied to games or markets — quite the contrary in fact.  If such evidence emerges then complexity will not become irrelevant; simply more refined analysis will be used.

Take for example the stunning difference between reaching a Nash equilibrium and reaching a Correlated equilibrium.  While the latter is reached by various natural regret-minimization dynamics, there are no “non-trivial” dynamics that, in general, reach the former.  Let me say a word about this “non-triviality” by giving an example of what I would consider a trivial process: suppose that each player chooses a strategy at random every “turn”, unless his last strategy was already a best-reply to those of the others (up-to some $\epsilon$ indifference).  At some time, the players will happen to “hit” an ($\epsilon$-) equilibrium.  This type of dynamics that simply search over the whole space of strategy profiles provides no insight and is not useful in most practical situations.  The point of the triviality is not that of the random choices but rather that of essentially trying every possibility.  Many other proposed “dynamics” for Nash equilibrium or for market equilibrium are similarly trivial — in some cases they resort to simply trying all possibilities (in some approximate sense).  The dynamics for correlated-Nash are not like this at all — they only look at a tiny “small-dimensional” fraction of the space of possibilities.  Why is that? Complexity theory explains this phenomena clearly: correlated equilibria can be found in polynomial time, but finding a Nash equilibrium (or many types of market-equilibrium) is PPAD-complete.

## Grading a game theory course

A few days ago I re-heard the story of how the game-theory course in the Technion was graded. I’ve heard versions where the professor was Ran Smorodinsky and other versions with Dov Monderer, but I haven’t checked out what “really” happened. (Anyone with the real facts is welcome to send in a comment…) Here is the story.

In the last day of class, the Professor gathered all the students and made the following offer: He wants to have an average grade of X (say 80) in the course with a standard deviation of Y (say 15). If all students agree upon everybody’s grades in a way that conforms to these constraints, then this is how they will be graded, and the exam will be canceled. Otherwise, the scheduled exam will take place as usual and will determine the grades. The professor then left the class and let the students try to reach a joint decision.

Some versions of the story continue thus: the best student in class immediately got up; said that she will not accept anything less than 100; and immediately left the room too.