Feeds:
Posts

From Arrow to Fourier

Social Choice Theory is a pretty mature field that deals with the question of how to combine the preferences of different individuals into a single preference or a single choice.  This field may serve as a conceptual foundation in many areas: political science (how to organize elections), law (how to set commercial laws), economics (how to allocate goods), and computer science (networking protocols, interaction between software agents).  Unsurprisingly, there are interesting computational aspects to this field, and indeed a workshop series on computational social choice already exists.  The starting point of this field is Arrow‘s theorem that shows the there are unexpected inherent difficulties in performing this preference aggregation.  There have been many different proofs of Arrow’s impossibility theorem, all of them combinatorial.  In this post I’ll explain a basic observation of Gil Kalai that allows quantifying the level of impossibility using analytical tools (Fourier transform) on Boolean functions commonly used in theoretical computer science.  At first Gil’s introduction of these tools in this context seemed artificial to me, but in this post I hope to show you that  it is the natural thing to do.

Arrow’s Theorem

Here is our setting and notation:

1. There is a finite set of participants numbered $1...n$.
2. There is a set of three alternatives over which the participants have preferences: $A=\{a,b,c\}$.  (Arrow’s theorem, as well as everything here actually applies also to any set $|A|\ge 3$.)
3. We denote by $L$ the set of preferences over $A$, i.e. $L$ is the set of full orders on the elements of $A$.

The point of view here is that each participant $1 \le i \le n$ has his own preference $x_i \in L$, and we are concerned with functions that reach a common conclusion as a function of the $x_i$‘s.  In principle, the common conclusion may be a joint preference, or a single alternative; Arrow’s theorem concerns the former, i.e. it deals with functions $F:L^n \rightarrow L$ called social welfare functions.  Arrow’s theorem points out in a precise and very general way that there are no “natural non-trivial” social welfare functions when $|A|\ge 3$.  (This is in contrast to the case $|A|=2$, where taking the majority of all preferences is natural and non-trivial.)  Formal definitions will follow.

A preference $x \in L$ really specifies for each two alternatives $a,b \in A$ which of them is preferred over the other, and we denote by $x^{a,b}$ the bit specifying whether $x$ prefers $a$ to $b$. We view each $x$ as composed of three bits $x=(x^{a,b},x^{b,c},x^{c,a})$ (where it is implied that $x^{b,a}=-x^{a,b}$, $x^{c,b}=-x^{b,c}$, and $x^{a,c}=-x^{c,a}$.  Note that under this representation only 6 of the possible 8 three-bit sequences correspond to elements of $L$, where the bad combinations are 000 and 111.

The formal meaning of natural is  the following:

Definition: A social welfare function $F$ satisfies the IIA (independence of irrelevant alternatives) property if for any two alternatives $a,b \in A$ the aggregate preference between $a$ and $b$ depends only on the preferences of the participants between $a$ and $b$ (and not on any preferences with another alternative $c$).  In the notation introduced above it means that $F^{a,b}$ is in fact just a function of the $n$ bits $x_1^{a,b}, ..., x_n^{a,b}$ (rather than of all the bits in the $x_i$‘s).

One may have varying opinions of whether this is indeed a natural requirement, but the lack of it does turn out to imply what may be viewed as inconsistencies.  For example, when we  use the aggregate preference to choose a single alternative (hence obtaining a social choice function), then lack of IIA is directly tied to the the possibility of strategic manipulation of the derived social choice function.

We can take the following definition of non-trivial:

Definition: A social welfare function $F$ is a dictatorship if for some $i$, $F$ is the identity function on $x_i$ or the exact opposite order (in our coding, the bitwise negation of $x_i$).

It is easy to see that any dictatorship satisfies IIA.  Arrow’s theorem states that, with another minor assumption, these are the only functions that do so.  Kalai’s quantitative proof requires an assumption that is more restrictive than that required by Arrow’s original statement.  Recently Elchanan Mossel published a paper without this additional assumption, but we’ll continue with Kalai’s elementary variant.

Definition: A social welfare function $F$ is neutral if it is invariant under changing the names of alternatives.

This basically means that as a voting method it does not discriminate between candidates.

Arrow’s Theorem (for neutral functions): Every neutral social welfare function that satisfies IIA is a dictatorship.

Quantification for Arrow’s Theorem

In CS, we are often quite happy with approximation: we know that sometimes things can’t be perfect, but can still be pretty good.  Thus if IIA is “perfect”, then the following quantitative version comes up naturally: can there be a function that is “almost” an IIA social welfare function and yet not even close to a dictatorship? Let us define closeness first:

Definition: Two social welfare functions $F, G$ are $\epsilon$-close if $Pr[F(x_1...x_n) \ne G(x_1 ... x_n)] \le \epsilon$.  The probability is over independent uniformly random choices of $x_i \in L$.

The choice of the uniform probability distribution over $L^n$ (strangely termed the impartial culture hypothesis) can not really be justified as a model of the empirical distribution in reasonable settings, but is certainly natural for proving impossibility results which then extend to other reasonably flat distributions. Once we have a notion of closeness of functions, being $\epsilon$-almost a dictatorship is well defined by being $\epsilon$-close to some dictatorship function.  But what is being “almost an IIA social choice function”?  The problem is that the IIA condition involves a relation between different values of $F$.  This is similar to situation in the field of property testing, and one natural approach is to follow the approach of property testing and quantify how often the desired property is violated:

Definition A: A function $F : L^n \rightarrow L$ is an $\epsilon$-almost IIA social welfare function if for all $a,b \in A$ we have that $Pr[F^{a,b}(x_1...x_n) \ne F^{a,b}(y_1...y_n) | \forall i:x^{a,b}_i=y^{a,b}_i] \le \epsilon$.  I.e. for a random input, a random change of the non $(a,b)$-bits is unlikely to change the value of $F^{a,b}$.

A second approach is simpler although surprising at first, and instead of relaxing the IIA requirement, relaxes the social welfare one.  I.e. we can allow $F$ to have in its range all possible 8 values in ${0,1}^n$ rather than just the 6 in $L$.  We call the bad values, 000 and 111, irrational outcomes, as they do not correspond to a consistent preference on $A$.

Definition B: A function $F : L^n \rightarrow \{0,1\}^n$  is an $\epsilon$-almost IIA social welfare function if it is IIA and there exists a social choice function $G : L^n \rightarrow L^n$ such that $F$ and $G$ are $\epsilon$-close.

Note that we implicitly extended the definitions of closeness and of being IIA from functions having a range of $L$ to those also with a range of ${0,1}^3$.

We can now state Kalai’s quantitative version of Arrow’s theorem:

Theorem (Kalai): For every $\epsilon>0$ there exists a $\delta>0$ such that every neutral function that is $\delta$-almost an IIA social choice function is $\epsilon$-almost a dictatorship.

Following Kalai, we will prove the theorem for definition B of “almost”, but the same theorem for definition A directly follows: take a function $F$ that is an $\delta$-almost IIA social welfare according to definition A, and define a new function $F'$ by letting $F'^{a,b}(x_1...x_n)$ to be the majority value of $F^{a,b}(y_1...y_n)$ where the $y_i$‘s agree with the $x_i$‘s on their $(a,b)$-bit and range over all possibilities on the other bits.  By definition $F'$ is IIA, and it is not difficult to see that since $F'$ satisfied definition A, then the majority vote in the definition is almost always overwhelming and thus $F'$ is $\delta'$-close to $F$ (where $\delta' = O(\sqrt{\delta})$).  Since we defined each bit of $F'$ separately, its range may contain irrational outcomes, but since it is close to $F'$, at most $\delta'$ of these, and thus it satisfies definition B.

The main ingredient of the proof will be an analysis of the correlation between two bits from the output of $F$.

Main Lemma (social choice version): For every $\epsilon>0$ there exists a $\delta>0$ such that if a neutral $F$ is not $\epsilon$-almost a dictatorship then $Pr[F^{a,b}(x_1...x_n)=1\:and\:F^{b,c}(x_1...x_n)=0] \le 1/3-\delta$.

Before we proceed, let us look at the significance of the $1/3-\delta$: If the values of $F^{a,b}$ and $F^{b,c}$ were completely independent, then the joint probability would be exactly $1/4$ (since each of the two bits is unbiased due to the neutrality of $F$).  For a “random” $F$ the value obtained would be almost $1/4$.  In contrast, if $F$ is a dictatorship then the joint probability would be exactly $1/3$ since $Pr[x_i^{a,b}=1\:and\:x_i^{b,c}=0]=1/3$ as $x_i$ is uniform in $L$.  The point here is that if $F$ has non-negligible difference from being a dictatorship then the probability is non-negligibly smaller than $1/3$.

The theorem is directly implied from this lemma as the event $F(x_1 ... x_n) \in L$ is the disjoint union of the three events $F^{a,b}(x_1...x_n)=1\:and\:F^{b,c}(x_1...x_n)=0$$F^{b,c}(x_1...x_n)=1\:and\:F^{c,a}(x_1...x_n)=0$, and $F^{c,a}(x_1...x_n)=1\:and\:F^{a,b}(x_1...x_n)=0$, whose total probability, if $F$ is not $\epsilon$-almost a dictatorship,  is bounded by the lemma by $1-3\delta$ and thus the probability of an irrational outcome is at least $3\delta$.

So let us inspect this lemma.  We have two Boolean functions $F^{a,b}$ and $F^{b,c}$ each operating on $n$-bit strings.  Since $F$ is neutral, these are actually the same Boolean function, so $F^{a,b}=F^{b,c}=f$.  What we are asking is for the probability that $f(z)=1\:and\:f(w)=0$ where $w$ and $z$ are each an $n$-bit strings: $w=(x^{a,b}_1....x^{a,b}_n)$ and $z=(x^{b,c}_1...x^{b,c}_n)$.  The main issue how $w$ and $z$ are distributed.   Well, $z$ is uniformly distributed over $\{0,1\}^n$ and so is $w$. They are not independent though: $Pr[w_i=z_i]=1/3$. We say that the distributions on $w$ and $z$ are (anti-1/3-)correlated.

So our main lemma is equivalent to the following version of the lemma which now talks solely of Boolean functions:

Main Lemma (Boolean version): For every $\epsilon>0$ there exists a $\delta>0$ such that if an odd Boolean function $f:\{ 0,1 \}^n \rightarrow \{ 0,1 \}$ is not $\epsilon$-almost a dictatorship then $Pr[f(z)=1\:and\:f(w)=0] \le 1/3-\delta$, where $z$ is chosen uniformly at random in $\{0,1\}^n$ and $w$ chosen so that $Pr[w_i = z_i]=1/3$ and $Pr[w_i=-z_i]=2/3$.

An odd Boolean function just means that $f(-z_1...-z_n)=-f(z_1....z_n)$ which is follows from the neutrality of $F$ by switching the roles of $a$ and $b$.  (We really only need that $f$ is “balanced”, i.e. takes value 1 on exactly 1/2 of the inputs, but lets keep the more specific requirement for compatibility with the previous version of this lemma.)

Fourier Transform on the Boolean Cube

At this point using the Fourier transform is quite natural.  While I do not wish to give a full introduction to the Fourier transform on the Boolean cube here, I do want to give the basic flavor in one paragraph, convincing those who haven’t looked at it yet to do so.  1st year algebra and an afternoon’s work should suffice to fill in all the holes.

The basic idea is to view Boolean functions $f:{0,1}^n \rightarrow {0,1}$ as special cases of the real-valued functions on the Boolean cube: $f:{0,1}^n \rightarrow \Re$.  This is a real vector space of dimension $2^n$, and has a natural inner product $ = 2^{-n} \sum_x f(x) g(x)$ The $2^{-n}$ factor is just for convenience and allows viewing the inner product as the expectation over a random choice of $x$: $ = E[f(x)g(x)]$.  As usual the choice of a “nice” basis is helpful and our choice is the “characters”: functions of the form $\chi_S(x) = (-1)^{\sum_{i \in S} x_i}$, where $S$ is some subset of  the $n$ bits.  $\chi_S$  takes values -1 and 1 according to the parity of the bits of $x$ in $S$.  There are $2^n$ such characters and they turn out to be an orthonormal basis.  The Fourier coefficients of $f$, denoted $\hat{f} (S)$, are simply the coefficients under this basis $f=\sum \hat{f} (S)\chi_S$, where we have that $\hat{f} (S) = $.

One reason why this vector space is appropriate here is that the correlation operation we needed between $w$ and $z$ in the lemma above, is easily expressible as a linear transformation $T$ on this space defined by $(Tf)(x)=E[f(y)]$ where $y$ is chosen at random with $Pr[y_i = x_i]=1/3$ and $Pr[y_i=-x_i]=2/3$.  Using this it is possible to elementarily evaluate the probability in the lemma as $\sum_S 3^{-|S|} \hat{f} (S)^2$, where the sum ranges over all $2^n$ subsets $S$ of the $n$ bits.  To get a feel of what this means we need to compare it with the following property of the Fourier transform of a balanced Boolean function: $\sum_S \hat{f}(S)^2 = ||f||_2^2 = 1/2$.  The difference between this sum and our sum is the factor of $3^{-|S|}$ for each term.   The “first” element is this sum is easily evaluated for a balanced function: $\hat{f}(\emptyset)^2 =(E[f])^2 = 1/4$, so in order to prove the main lemma it suffices to show that $\sum_{S \ge 1} 3^{-|S|} \hat{f} (S)^2 < (1/3 - \delta) - 1/4 = 1/12 - \delta$. This is so as long as a non-negligible part of $\sum_{|S| \ge 1} \hat{f}(S)^2 = 1/4$ is multiplied by a factor strictly smaller than $3^{-1}$, i.e. is on sets $|S|>1$. Thus the main lemma boils down to showing that if $\sum_{|S| \ge 2} \hat{f} (S)^2 \le \delta'$  then $f$ is $\epsilon$-almost a dictatorship.  This is not elementary but is proved by E. Friedgut, G. Kalai, and A. Naor and completes our journey from Arrow to Fourier.

More results

There seems much promise in using these analytical tools for addressing quantitative questions in social choice theory.  I’d like to mention two results of this form.  The first is the celebrated MOO paper (by E. Mossel, R. O’Donnell and K. Oleszkiewicz) that proves, among other things, that among functions that do not give any player unbounded influence, the closest to being an IIA social welfare function is the majority function.  The second, is a paper of mine with E. Friedgut and G. Kalai that obtains a quantitative version of the Gibbard–Satterthwaite theorem showing that every voting method that is far from being a dictatorship can be strategically manipulated.  Our version shows that this can be done on a non-negligible fraction of preferences, but is limited to neutral voting methods between 3 candidates.

EC 2009: get ready for reviewing your reviewers

Those who submitted papers to EC 2009 will get a chance, starting this coming Sunday, to comment upon the reviews of their submissions (essentially reviewing the reviews of their papers), at which point I suppose that the program committee (co-chaired by blogger Lance Fortnow) will review these comments (reviewing the reviews of the reviews of the submission.)

In the good-old-days we only got a single bit (accept/reject) from the program committee.   I have to admit that my natural tendency is to view getting anything more as being too user-friendly, not to mention this further round of interaction.  I can’t really argue for this point of view though, so let me leave it for readers’ comments.

On this note, let me mention that while I usually find the quality of conference reviews in CS too low to be useful, in the last STOC (chaired by blogger Michael Mitzenmacer),I got extremely useful reviews (for my rejected paper), and in general got the impression that the reviews were of high quality, at least for AGT submissions.  I don’t know whether others got the same impressions.  Anyone?

AGT and practice

What exactly do all those AGT researchers do to increase profitability?

Shouldn’t we stop pretending that AGT has any relation to the real world?

The theory vs. practice debate in every field is interesting, but for now just a few quick comments:

1) Actually it seems to me that ignoring the relation of AGT to the real world is pretending.  It is a fact that many companies are interested in this work, whether or not justifiably so.

2) At least what I did for a while in Google can be seen in this paper.

3) While there are a few “deep” theoretical problems in AGT, most of the interest in this young field is actually figuring out the right questions and models.  An early split from the “real world” would make sure that we end up with the wrong questions and models.

4) It seems to me that this comment is part of a game we theoreticians like to play: “we don’t do anything useful — we just pretend so in order to get grant money”.  I don’t buy.

5) However, I can’t resist participating a bit in this game and sharing a strategy of pretending to be useful: when I started working in the econ/CS border I observed a difference in the way economists and computer scientists gave examples.  While in CS examples would look like “Suppose Alice bids $3 for a pair of socks“, economists would give the same example as “Suppose ATT bids$3B for a spectrum license“, which is so much more practical.  My new suggestion for computer scientists is to move to “Suppose Alice the plumber bids $0.003 for an ad slot”, trumping the$3B, I would say.  (Which makes me wonder how many other human inventions function over a range of twelve orders of magnitude like auctions do?)

Internet advertising is perhaps the number one current application (revenue-wise) for algorithmic game theory, and is paying the salaries of many AGT researchers in Microsoft, Yahoo, Google, and elsewhere.    Beyond the basic “generalized second price auction” used for ad-auctions, there are many other issues and questions involved.

About a week ago, Google annouced a new beta program for interest-based targeting (a form of behavioral targeting).  The basic idea is very simple: allow advertisers to target ads to segments of people according to their previous Internet behavior and not just the current web-page that they are on.   E.g. if you visit many sports web-sites, the it makes sense to show you sports-related ads even on a general news page rather than, say, detergent ads.  In general, this should result in less annoying (or even actually useful) ads for the viewer, higher effectiveness for the advertiser, and higher revenue for the web-site  “publisher” (and of course, for my own current employer, Google).

Technically, this is quite easy to do using browser cookies, and is already heavily used by most advertising middlemen on the web.   The catch is of course in the privacy issues.  It seems that the main innovation here was in how Google managed to handle the privacy issues, as explained on it’s public policy blog.  The main technical ingredient here is the ads preferences manager that gives viewers full control of their data.

The attention economy for scientific work

In a recent blog post, Lance raised the question of whether a conference should be judged (?) by its best or its worst papers.  To me, this is part of a broader issue that views the conference (and journal) system as mostly playing a role in allocation of attention.  As there are “too many” scientific papers, the question of which ones should get the attention of “the community” is a critical one.  This view of attention as a resource is a general phenomena in our information-overload world, as explained by the nice introduction to the wikipedia article on Attention Economy:

Herbert Simon was perhaps the first person to articulate the concept of attention economics when he wrote:

“…in an information-rich world, the wealth of information means a dearth of something else: a scarcity of whatever it is that information consumes. What information consumes is rather obvious: it consumes the attention of its recipients. Hence a wealth of information creates a poverty of attention and a need to allocate that attention efficiently among the overabundance of information sources that might consume it”(Simon 1971, p. 40-41).

He noted that many designers of information systems incorrectly represented their design problem as information scarcity rather than attention scarcity, and as a result they built systems that excelled at providing more and more information to people, when what was really needed were systems that excelled at filtering out unimportant or irrelevant information (Simon 1996, p. 143-144).

In recent years, Simon’s characterization of the problem of information overload as an economic one has become more popular. Business strategists have adopted the term “attention economy” (Davenport & Beck 2001), and some writers have even speculated that “attention transactions” will replace financial transactions as the focus of our economic system (Goldhaber 1997Franck 1999). Information systems researchers have also adopted the idea, and are beginning to investigate mechanism designs which build on the idea of creating property rights in attention (see Applications).

My natural tendency is indeed to view the issue of allocation of attention (between scientific papers, as well as elsewhere) indeed as a mechanism design problem.  Looking from afar, the journal and conference is a pretty impressive mechanism — better than what humanity has for most resource allocation purposes.  Looking more closely, this mechanism was designed under a completely different set of technological constraints from those we have now.  In particular the original main motivation of journals and conferences was dissemination of information, which today is better handled by the web, e.g. the ECCC or arXiv.  Personally, I can  cleary see the need for conferences as providing direct human interaction, but how journals still survive escapes me.

I believe that the near future will bring significant changes in how science uses conferences and journals.  Blogs are already making a real impact: notice for example the attention (here, here, here, here, here, and here) that Mark Braverman (a young postdoc) got for his last result, before any journal or conference has assessed it.

Today, it is hard to separate the social issue of how science should organize itself, from the direct effect that the current mechanism has on our own job offers, tenure decisions, and promotions.  It should be no surprise that the same mechanism that is used to allocate attention among scientists is also used to allocate other resources (our salaries) among them: the criteria are probably quite similar.

The AGT Blogosphere

The theoretical computer science community has a pretty active blogosphere that indeed “carries a conversation” for the field.  Lance Fortnow’s Computational Complexity blog (now co-written by Bill Gasarch) has been running since 2002.  Luca Trevisan’s blog is focused on complexity, Scott Aaronson’s on quantum computation, Michael Mitzenmacher’s is general, Richard Lipton’s has an historical point of view, and they all, as well as a few others, are really interacting with each other to a level that gives a feeling of a community [Added after a well desrved snark at me:  Suresh Venkatasubramanian‘s (and a few other) focuses on computational geometry, and Mihai Pătraşcu‘s on data structures.]  I think that this mode of community-building is good for science and we are yet exploring the possibilities.  (In this vein, it’s hard not to mention the recent spectacular polymath project on Gowers’s blog.)

Part of the motivation for this blog is a desire for the creation of a similar community for algorithmic game theory.  Some blogs in this community do exist: Muthu’s, Paul Goldberg’s, and David Penock’s is also related.  Lance also writes on AGT stuff in his complexity blog.  A community is in the making and I want to be part of it.

What is “Algorithmic Game Theory”?

What is a blog called “Algorithmic Game Theory” attempting to cover?  It seems that there is some rather ill-defined reserach area that now goes under this name, but also under various other names.

As the Internet was growing around the mid-to-late 1990’s, several threads of research emerged that somehow connected three key elements: Computer Science, Economics & Game-Theory, and the Internet.    Within the networking community, several researchers started worrying about the non-cooperative nature of the Internet, and started considering the issue of incentives.  Within the AI community, the coordination between multiple software “agents” became an area of study.  Electronic commerce started flourishing, and the underlying basic questions started emerging. A few economists started studying Internet-related economic questions.  Around the late 1990’s the different groups of people with these different motivations started talking to each other, and a joint field started emerging.

There was never a clear name for this emerging field.  The first conference (that I know) that started connecting these people was called “Information and Computation Economies” empasizing the computation-economics interplay (the conference web page is no longer available on the web — the link here is to the wayback machine copy).   ACM opened up a SIG (special interest group) under the name of Electronic Commerce,  empasizing the application, with a confernce series under that name.  The Multiagent angle was emphasized by AI-leaning conferences, a workshop taking the name Internet and Network Economies was later started, and other names for closely related — yet not just the same — fields were used.  My own choice when teaching courses on this topic was either the un-committed Topics on the border of CS and economics or Foundations of electronic commerce (where “foundations” was meant to imply that we won’t be doing anything too practical).  Christos Papadimitriou at Berkeley called his course Algorithmic aspects of game theory, while Tuomas Sandholm at CMU used Foundations of electronic marketplaces (wayback machine, again).

When Eva Tardos, Vijay Vazirani, Tim Roughgarden and myself were editing our book on Algorithmic game theory we knew that we wanted to cover this general area (from a somewhat focused perspective due to our theoretical-CS  background), but we were not really sure what to call it — neither the book nor the area.  Christos had a strong opinion what the field should be called Algorithmic Game Theory, so we decided to stick with that for our book as well.  While the name itself is somewhat imprecise (I personally am missing “Internet”, “Markets”, and “Computation” in the name, but you can’t have it all in a snappy name), it seems that Christos’s naming for the field is sticking, so I’m going with it.