Feeds:
Posts

## Internet Advertising: Theory and Practice

For those AGT/E researchers working on questions related to Ad-auctions (like most of those employed by Google/Yahoo/Microsoft), and writing nice theoretical papers for workshops like the coming “Ad auctions workshop“, it is interesting to see an overview of the messy real word, as can be found in keynote talk given by Terrence Kawaja in a recent event organized by the Interactive Advertising Bureau.

## Bayesian Computer Scientists

I spent most of last week in the Bertinoro workshop on Frontiers in Mechanism Design organized by Jason Hartline and Tim Roughgarden.  Such a small focused event is really a winning format (when done right — of course): almost every talk was very close to my interests and was really good (since the speakers presented their preferred recent work which usually was also accepted to some other top conference).

One of my take-homes from this event was the strengthening of the realization that computer scientists are doing more and more Bayesian analysis  in Algorithmic Mechanism Design, in this respect getting closer to the economists’ way of thinking.  It is not that computer scientists have lost their basic dislike of “average case analysis”, distributional priors, or especially, common priors, it is just that we have started reaching in more and more places the limits of “worst case” analysis.  It seems that computer scientists are being very careful with “how much” they rely on Bayesian analysis, obtaining various “hybrid” results that are more robust, in various senses, than straight Bayesian ones.

An extreme good example is the classic result of economists Jeremy Bulow and Paul Klemperer who show that revenue of the straight 2nd price auction (for a single item) with $n+1$ bidders always dominates the revenue of Myerson’s optimal auction with $n$ bidders.  Only the analysis is Bayesian: the resulting 2nd price $n+1$-bidder auction is completely “worst-case” — neither the auctioneer, nor the bidders must have any knowledge or agreement on the underlying distribution.   In spirit, such a result is similar to Valiant’s prior-free learning where the analysis is with respect to an underlying distribution even though the learning algorithms cannot depend on it.  A recent paper Revenue Maximization with a Single Sample by Dhangwatnotai, Roughgarden and Yan (to appear in EC 2010) gets more general results in this prior-independent vein, although in an approximation sense.

A weaker type of result, but still better than full-blown Bayesian, is the 2nd price version of Myerson’s auction.  In this version, the auctioneer must “know” the underlying distribution in order to set the optimal reserve price, but once that is done, the bidders see a “worst-case” situation in front  of them(2nd price with reserve) and should bid truthfully in the dominant strategy sense without needing to know or agree about the underlying prior distribution.  (This is opposed to the revenue-equivalent-in-principle 1st price version in which bidders must know and agree on a common prior for them to have any chance of reaching he desired equilibrium.)  A recent paper Multi-parameter mechanism design and sequential posted pricing by Chawla,  Hartline,  Malec, and Sivan (to appear in STOC 2010) gets similar types of results in a unit-demand heterogeneous auction setting where the auctioneer needs to know the distribution in order to set prices (in this case, posted prices) but the resulting mechanism is very simple and truthful in the dominant-strategy sense (again the approximation guarantee is in an approximation sense).

A yet weaker version of a similar type of result appears in the paper Bayesian Algorithmic Mechanism Design by Jason D. Hartline and Brendan Lucier (to appear in STOC 2010).  In this paper again the auctioneer does need to know the underlying distribution, and then he creates a mechanism that is incentive compatible, but here only in the Bayesian sense.  I.e. the the bidders need not know the underlying distribution (as they should just act truthfully) but they should still agree that the auctioneer knows the prior distribution.  This type of result is more fragile than the previously mentioned ones since the  truthfulness of the auction depends on the auctioneer correctly knowing the underlying distribution, rather than just the optimality of it.  On the plus side, the paper shows that the auctioneer’s derivation of the auction can be done effectively just by using a “black box” for sampling the underlying distribution (as is the case for he derivation of Myerson’s optimal reserve price).

A someone dual situation is presented in the paper Price of Anarchy for Greedy Auctions by Brendan Lucier and Allan Borodin (SODA 2010).  In that paper, auctions are presented in which the auctioneer need not know the underlying auction and acts in “detail-free” way, i.e. the auction is independent of the underlying distribution.  However,  the approximate-optimality holds when the bidders are in a Bayesian equilibrium, i.e the bidders must know and agree on a common prior for the analysis to hold.

The last example of “somewhat-Bayesian” results that comes to mind has nothing to do with incentives but is just algorithmic.  The paper The Adwords Problem: Online Keyword Matching with Budgeted Bidders under Random Permutations by Nikhil Devanur and Thomas Hayes (in EC 2009) considers an online model of repeated auctions, where no distributional assumptions are made on the bidder values that are assumed to be “worst case”, but a distributional assumption is made on the order or arrival which is assumed to be uniform.  This allows them to get arbitrarily good approximations, circumventing the known lower bounds for the completely worst case.

While economists too have looked at weakening of the fully Bayesian assumptions, as computer scientists are doing now, I see a difference in the two approaches.  Harsanyi‘s influence on economic thinking seems to be so great that the Bayesian point of view seems to be taken as the correct model by economists, and even its criticisms (cf. the Wilson critique) and results that weaken the assumptions are simply taken as “2nd order” improvements.  Computer Scientists, on the other hand, seem to have their basic intellectual foundation in a non-Bayesian setting, and as they move toward Bayesian models they do not seem to have a single model in mind but are rather quite happy to explore the “Pareto frontier ” between the strength of the model and the strength of the results obtained.

Finally, as a side note, let me observe that while the gap between computer scientists and economists is narrowing on the Bayesian question, the other great gap — the issue of approximation — seems to be as great as ever.  E.g., all the results by computer scientists mentioned in this posts only provide approximate results, while all those by economists provide exact ones.

## My $1/day Adwords Account Google gives its employees$1/day of free adwords advertising.  Beyond an employee perk, this gives Google’s employees the experience of being an Internet advertiser, i.e. experiencing the point of view of Google’s paying customers.  I have been using my $1/day account to advertise the divorce-consulting business of my sister in law (In Hebrew) and did indeed find this experience to be quite illuminating. The first thing I learned is that the ad auction itself is just a small part of the whole thing. Choosing the right text for the ads (all ten words of it), choosing the right keywords to target, etc, is much more prominent than setting the right bids. “Tiny” issues come up everywhere, e.g. when I needed to choose keywords to target, it turns out that there are four ways to write divorce in Hebrew: גירושים, גירושין, גרושין, גרושים. I’m not really sure whether these are all “kosher” spellings, but they are all searched for an do need to be taken into account. Even more, the whole advertising campaign is peripheral, in principle, to the business itself, and frankly the auction logic is not the first concern of the advertiser. This is obvious, of course, but is easy to forget for one whose work focuses on the auction logic. Now for the auction itself: all together I spent several hours setting up the campaign, making up the ads, choosing keywords, looking at reports, and trying a bit of optimization. The adwords user interface was very easy and convenient to start with, but it didn’t take long until I was attempting things that confused me (e.g. splitting my single campaign into two different ones), at which point I gave up, and stayed with what I achieved, which is quite fine actually. I was especially impressed with Google’s automatic suggestions of keywords which were cleverer than what I came up with (I know that not really, just some learning algorithm, but they were eerily good.) I was surprised and disappointed (as an advertiser, but frankly delighted as a Google employee) by the pretty high prices on the keywords that I targeted: my average cost per click is 88 cents, and this is for pretty low slots, on the average. (Divorce is expensive, it seems, also on the Internet.) This means that my$1 per day suffices for a single click per day, and no more.  I do get this single click almost every day, but have so far been unable to ever get two clicks in one day: neither optimizing by hand, nor letting Google’s automatic bidder do it.  I was professionally insulted by not being able to beat the automatic bidder, but have still not given up on getting an average of more than one click per day for my \$1.  My click through rate is pretty high (compared to my expectations):  0.79%, so I usually get about 150 impressions every day of which one is clicked and results in a visit to the website.  Now these visitors are probably really good leads: not only have they searched for relevant keywords, but they also clicked on a pretty specific ad.  I suppose that if even 1% of them become clients (this is not so little: we are not talking about buying a sweatshirt; this is about handling divorce), then the advertising would be considered quite  profitable even had my sister in law paid for it.  Unfortunately, it is quite hard to gauge whether this is the case: getting and keeping the required statistics is easier to imagine theoretically than to do when you have to handle a small business.  In other words, I haven’t a clue what my valuation of a click is.   (The lack of knowledge of one own’s valuation has been discussed in AGT, but frankly I have not seen really convincing treatment of this issue.)

The set of reports about the performance of the campaign that adwords makes available  is quite impressive, and they are really nicely and simply done, but somehow I still don’t really know how how to optimize my campaign as to get the most and the best customers to the site.  I’m sure that more time on my part, as well as a more data-centric handling of my sister-in-law’s business would improve things, but the difficulty of getting and handling the right data is another lesson that I got from this exercise (again, I knew this theoretically, but now I feel it too).

## Experiments on Mechanical Turk

Crowd sourcing has just been given a recent visibility boost with DARPA’s Red Balloon contest that was won by the MIT team.  At the same time, Amazon’s well-established (since 2005!) platform for crowd sourcing, the Mechanical Turk, is gaining attention as a platform for conducting behavioral experiments, especially in behavioral economics and game-theory.

Named after the 18th century human-powered chess-playing “machine” called “the Turk”, this platform allows “requesters” to submit “Human Intelligence Tasks” (HITs) to a multitude of human “workers” who solve them for money.  A sample of recent typical tasks include tagging pictures (for 3 cents), writing (and posting on the web) a short essay with a link (for 20 cents), or correcting spellings (for 2 cents, in Japanese).  This allows brief and cheap employment of hundreds and thousands of people to work on simple Internet-doable low-level knowledge tasks.  You may demand various “qualification exams” from these workers, and design such quals of your own.  Obviously workers are in it for the money, but apparently not just that.

Recently, the Mechanical Turk is being used to conduct behavioral experiments.  Gabriele Paolacci is methodologically repeating experiments of Kahneman and Tversky and reporting on them in his blog. Panos Ipeirotis reports on his blog studies of some aspects of the Mechanical Turk as well as results of various behavioral game-theory experiments on it.  I’ve seen others report on such experiments too.  Markus Jacobsson from PARC gives general tips for conducting such human experiments using the Mechanical Turk.

Turk-based behavioral experimentation has the immense appeal of being cheap, fast, and easy to administer.  There are obviously pitfalls such as getting a good grasp on the population, but so does any experimental setup.   Such a platform may be especially appropriate for Internet-related behavioral experiments such as figuring out bidding behavior in online auctions, or how to best frame a commercial situation on a web-page.  Could this be a tool for the yet not-quite-existent experimental AGT?

## Economists and Complexity

One of main culture clashes between computer scientists and economists on the CS-econ frontier is whether “complexity of equilibria” matters.  The  CS-y view of the matter is captured in Kamal Jain’s quote: “If your laptop can’t find it then neither can the market“.  Economists mostly don’t care since they see equilibrium reached everyday, contrary to what CS says.  As Jeff Ely quips:  “Solving the n-body problem is beyond the capabilities of the world’s smartest mathematicians.  How do those rocks-for-brains planets manage to do pull it off?”  TCS folks who see complexity as the map of the world can’t really understand this indifference, as Lance Fortnow tweets: “I’m an economist so I can ignore computational constraints / I’m a computer scientist, so I can ignore gravity.

Computational Complexity map of the world

The beautiful thing about studying systems at equilibrium is precisely the fact that this abstracts away the gory details of the process (aka dynamics aka algorithm) of reaching this equilibrium.  In principle, there are many different difficult-to-understand dynamics that all reach the same easy-to-understand equilibrium point.   This is all very nice as long as the equilibrium is indeed magically reached somehow by the “real” dynamics.  The obvious crucial question is whether this is the case in practice. It seems that the general tendency among economists is to claim “yes”, to which the natural next question in line is “why”?  As computer scientists show, this is not a general characteristic of large games or markets.  Understanding the properties of interesting games and markets that make them actually reach equilibrium should be enlightening.  Maybe it is because economists choose to consider only those that do turn out to converge quickly, ignoring another large and interesting class of strategic scenarios? Maybe it is because economists are thinking about “smallish” games and so their insights will not carry over to more complex realistic scenarios?  Maybe there is some deeper interesting structure that guarantees fast convergence?  Distinguishing between these possibilities is especially crucial as we aim to analyze the new artificial games and markets that are to be found — and designed — on the Internet as well as elsewhere.  Which economic and game-theoretic sensibilities will still hold in these complex unnatural  circumstances?

Complexity is all about understanding such processes.  While the foremost question dealt by computational complexity is that of “time” — how long does a computational process need in order to find the solution — in our case to reach (close-to) equilibrium — this is not the only type of questions and insights provided by complexity theory.  As one can see in the “map above”, there are a stunning variety of complexity classes, each attempting to capture a different facet of the challenge of finding the solution: how much space (memory) is needed? Can we even tell when we reach a solution?  Does randomization help?  Is it helped parallelism?  Are approximations easier?  Does the solution have this or that particular structure? In the case of finding equilibria, the classes PPAD and PLS give a very nuanced explanation of what is involved.  There are also “concrete” models that study explicitly specific parameters such as communication or queries.  One may dislike the fact that this complexity analysis does not restrict attention to natural dynamics but allows arbitrary and unnatural algorithms.  The kind of natural dynamics that economists usually have in mind are some kind of best-replying in case of games and some kind of a Walrasian auctioneer in markets.  The problem is that there are many variants of these that make sense: fictitious play, various forms of population dynamics, more sophisticated types of learning such as regret-minimization, and all these can be enhanced with various orderings, smoothing attempts, tie-breaking rules, strategic look-ahead, re-starts, actions of the central planner, not to mention other more or less complex optimization and learning  attempts.  The strength of complexity analysis is that it applies to all of these.  Any “lower bounds” are definitive: any practical system can be simulated by a computer, and thus no dynamics can succeed in general. (Emphasis on “in general” — as mention above, the problems that you may be interested in may have special structure — so what is it?)   A statement of  an “upper bound” may be less interesting as stated, but immediately raises the challenge of either finding a natural algorithm=process=dynamic or pinpointing the complexity reason explaining why this is impossible.

This is a good point to refute several irrelevant objections to the applicability of computational complexity for analyzing dynamics of games and markets.  The first is the notion that humans or markets can undergo processes that cannot be computed.  Frankly, there is no evidence of this; there is certainly much about the world that we do not understand well enough to simulate, but there is no indication of any natural process that is inherently more powerful than computers.  This is the modern version of the Church-Turing thesis.  (It seems that some quantum phenomena cannot be simulated by usual computers, but that doesn’t change the principle — it just would replace classical computers with quantum ones in the few places that it matters.)  Even if you do subscribe to metaphysical dualism, do you want to base economics on it?  The second types of objections concern the standard technical notions used in complexity which obviously leave much wiggle-room: “polynomial time”, with its hidden constants, is not synonymous with efficient; worst-case analysis is not always the right notion, etc.  The point is that these are simply concise notions that usually seem to capture the issues well.  There always are cases where more refined notions are needed, and in such cases complexity theory can provide more precise answers: for example in analysis of basic problems like sorting or matrix multiplication, very exact results are obtained (with no hidden constants) similarly, cryptography is not based on worst-case analysis, etc.  There is no indication so far that the usual somewhat coarse notions of computational complexity miss something significant when applied to games or markets — quite the contrary in fact.  If such evidence emerges then complexity will not become irrelevant; simply more refined analysis will be used.

Take for example the stunning difference between reaching a Nash equilibrium and reaching a Correlated equilibrium.  While the latter is reached by various natural regret-minimization dynamics, there are no “non-trivial” dynamics that, in general, reach the former.  Let me say a word about this “non-triviality” by giving an example of what I would consider a trivial process: suppose that each player chooses a strategy at random every “turn”, unless his last strategy was already a best-reply to those of the others (up-to some $\epsilon$ indifference).  At some time, the players will happen to “hit” an ($\epsilon$-) equilibrium.  This type of dynamics that simply search over the whole space of strategy profiles provides no insight and is not useful in most practical situations.  The point of the triviality is not that of the random choices but rather that of essentially trying every possibility.  Many other proposed “dynamics” for Nash equilibrium or for market equilibrium are similarly trivial — in some cases they resort to simply trying all possibilities (in some approximate sense).  The dynamics for correlated-Nash are not like this at all — they only look at a tiny “small-dimensional” fraction of the space of possibilities.  Why is that? Complexity theory explains this phenomena clearly: correlated equilibria can be found in polynomial time, but finding a Nash equilibrium (or many types of market-equilibrium) is PPAD-complete.

## Grading a game theory course

A few days ago I re-heard the story of how the game-theory course in the Technion was graded. I’ve heard versions where the professor was Ran Smorodinsky and other versions with Dov Monderer, but I haven’t checked out what “really” happened. (Anyone with the real facts is welcome to send in a comment…) Here is the story.

In the last day of class, the Professor gathered all the students and made the following offer: He wants to have an average grade of X (say 80) in the course with a standard deviation of Y (say 15). If all students agree upon everybody’s grades in a way that conforms to these constraints, then this is how they will be graded, and the exam will be canceled. Otherwise, the scheduled exam will take place as usual and will determine the grades. The professor then left the class and let the students try to reach a joint decision.

Some versions of the story continue thus: the best student in class immediately got up; said that she will not accept anything less than 100; and immediately left the room too.

A large number of AGT researchers work in just three large companies: Google, Microsoft, and Yahoo, mostly working on ad auctions. From two or three data points, it seems that Facebook is now attempting to hire such researchers too (although maybe not in “research” roles). Looking in Facebook’s “careers” page, they are indeed looking for an “advertising auction expert” whose requirements include “Expert knowledge of algorithmic game theory, auction theory, mechanism design and their applications to online advertising” as well as a Ph.D. in CS or OR.

I can well see the potential for Algorithmic Mechanism Design research in Facebook. Facebook’s information about its users is enormous, detailed, and lives within the context of their complex social network. Making good use of this complex information seems to require another level of sophistication relative to today’s ad auctions. Privacy issues are always there, and while Facebook has already had problems in that department, it seems that its users do not consider their “identity” to be too private. I wonder if within a year or two we’ll start seeing research papers coming from Facebook too.

## Theoretical Economics as Math

Eran Shmaya, in a beautiful recent blog post, adds some sociological angles to the economists’ never-ending introspection of whether their field is a science, where the starting point of such discussion is Popper‘s falsifiability.  The discussion about whether economic theory has any significant prediction power has certainly picked up lately due to the financial crisis that convinced many people that the answer is “No”.  (With a variant taken by many micro-economists claiming that micro is fine and only macro-economics is a superstition.)

Whenever I hear these types of discussions about the prediction power economic theory, I always think about the irony that, as an algorithmic game-theorist, I am mostly interested in using economic theory as a branch of mathematics…  Arrow’s theorem is simply true, not falsifiable.  So is Meyrson’s expression for optimal auctions, as well as many other mathematical tidbits that we, computer scientists, pick up from economic theory.  We don’t really care how well these theorems relate to human situations, since we use them in different circumstances, those related to computer systems.  I often wish that theoretical economists would allow themselves to stray even farther away from the realities of their human subjects.

This is not to say that we, in algorithmic game theory, need not worry about how well our theories correspond with reality.  We should.  We have not really started doing so.   And, it’s getting high time we do.  But our concerns may be more like those of engineers rather than those of scientists.  From economics, I’m happy to just take the theory…

## Supremum and mathematical education

My nephew just started his undergraduate degree in Math at the Technion.  When his grandfather, a high-powered engineer (whom I sometimes turn to for solving a differential equation that I need solved) , asked him what they learned in their very first calculus lesson, the answer was “supremum“.  The grandfather was mystified, claiming that “supremum” was not part of the math curriculum at all when he got his engineering degree half a century ago (also in the Technion, it turns out.)  While this can hardly be literally true, it does highlight the difference between the “abstract” approach to math and the “applied approach” which he got as an engineer.  From an abstract point of view, it makes much sense to start  with the notion of supremum, the ubiquitous existence of which essentially defines the real numbers, while I suppose that a more applied approach will hardly emphasize such abstract technical hurdles.  When I studied math at the Hebrew university, we spent an enormous amount of time on such abstract notions up to the point of ending the first year with hardly any practical abilities of figuring out a simple Integral but having a pretty good grasp of properties of zero-measure (yes, first year).

In my department, we have an on-going debate about the type of math that our CS students should get.  Traditionally, we have followed the extremely abstract tradition of our math department which emphasizes the basics rather than applications (even in the “differential equations” course, students rarely solve them, but more often prove existence and such).   In the last decade or so there has been a push in my department to shift some focus to “useful” math too (like being able to actually use singular value decomposition or Fourier transform, as needed in computer vision, learning theory, and other engineering-leaning disciplines).  I’m on the loosing side of this battle and am against this shift away from the “non-useful” fundamentals.   The truth is that most computer scientists will rarely need to use any piece of “useful” math.  What they will constantly need to use is “mathematical maturity”: being comfortable with formal definitions and statements, being able to tell when a proof is correct, knowing when and how to apply a theorem, and so on.

## New papers by Brendan Lucier

In the last month, Brendan Lucier has uploaded three conceptually related papers to the arXiv (two of them with co-authors):

All three papers aim to improve the approximation ratios obtained by computationally efficient mechanisms.  Each of them gives pretty strong and general results, but at a price: relaxing the notion of implementation from the usual one (in CS) of dominant-strategies.  Specifically, the notions used in these three papers are, respectively:

1. “We consider models of agent behaviour in which they either apply common learning techniques to minimize the regret of their bidding strategies, or apply short-sighted best-response strategies.”
2. “We consider relaxing the desideratum of (ex post) incentive compatibility (IC) to Bayesian incentive compatibility (BIC), where truthtelling is a Bayes-Nash equilibrium (the standard notion of incentive compatibility in economics).”
3. “We study the price of anarchy for such mechanisms, which is a bound on the approximation ratio obtained at any mixed Nash equilibrium.”

In general, I like the strategy of relaxing the required notion of implementation or equilibrium in order to get strong and general results.  This is especially true in the context of algorithmic mechanism design for a few reasons (1) The concept used in CS seems to be so strict that simply not enough is achievable (2) The CS point of view is anyway much stricter than that of Bayesian-Nash commonly used by economists (3) much less seems to be needed in practice in order to ensure cooperative behavior.  (E.g.  people just use TCP truthfully despite the clear opportunities for beneficial manipulation.)  The difficulty with the relaxation strategy is that there are many possible ways of relaxing the game theoretic requirements, and it takes some time to really be convinced which notions are most useful, and in which situations.  I would love to see more community discussions on these as well as other notions of relaxed implementation and equilibrium.