Feeds:
Posts

## From Arrow to Fourier

Social Choice Theory is a pretty mature field that deals with the question of how to combine the preferences of different individuals into a single preference or a single choice.  This field may serve as a conceptual foundation in many areas: political science (how to organize elections), law (how to set commercial laws), economics (how to allocate goods), and computer science (networking protocols, interaction between software agents).  Unsurprisingly, there are interesting computational aspects to this field, and indeed a workshop series on computational social choice already exists.  The starting point of this field is Arrow‘s theorem that shows the there are unexpected inherent difficulties in performing this preference aggregation.  There have been many different proofs of Arrow’s impossibility theorem, all of them combinatorial.  In this post I’ll explain a basic observation of Gil Kalai that allows quantifying the level of impossibility using analytical tools (Fourier transform) on Boolean functions commonly used in theoretical computer science.  At first Gil’s introduction of these tools in this context seemed artificial to me, but in this post I hope to show you that  it is the natural thing to do.

## Arrow’s Theorem

Here is our setting and notation:

1. There is a finite set of participants numbered $1...n$.
2. There is a set of three alternatives over which the participants have preferences: $A=\{a,b,c\}$.  (Arrow’s theorem, as well as everything here actually applies also to any set $|A|\ge 3$.)
3. We denote by $L$ the set of preferences over $A$, i.e. $L$ is the set of full orders on the elements of $A$.

The point of view here is that each participant $1 \le i \le n$ has his own preference $x_i \in L$, and we are concerned with functions that reach a common conclusion as a function of the $x_i$‘s.  In principle, the common conclusion may be a joint preference, or a single alternative; Arrow’s theorem concerns the former, i.e. it deals with functions $F:L^n \rightarrow L$ called social welfare functions.  Arrow’s theorem points out in a precise and very general way that there are no “natural non-trivial” social welfare functions when $|A|\ge 3$.  (This is in contrast to the case $|A|=2$, where taking the majority of all preferences is natural and non-trivial.)  Formal definitions will follow.

A preference $x \in L$ really specifies for each two alternatives $a,b \in A$ which of them is preferred over the other, and we denote by $x^{a,b}$ the bit specifying whether $x$ prefers $a$ to $b$. We view each $x$ as composed of three bits $x=(x^{a,b},x^{b,c},x^{c,a})$ (where it is implied that $x^{b,a}=-x^{a,b}$, $x^{c,b}=-x^{b,c}$, and $x^{a,c}=-x^{c,a}$.  Note that under this representation only 6 of the possible 8 three-bit sequences correspond to elements of $L$, where the bad combinations are 000 and 111.

The formal meaning of natural is  the following:

Definition: A social welfare function $F$ satisfies the IIA (independence of irrelevant alternatives) property if for any two alternatives $a,b \in A$ the aggregate preference between $a$ and $b$ depends only on the preferences of the participants between $a$ and $b$ (and not on any preferences with another alternative $c$).  In the notation introduced above it means that $F^{a,b}$ is in fact just a function of the $n$ bits $x_1^{a,b}, ..., x_n^{a,b}$ (rather than of all the bits in the $x_i$‘s).

One may have varying opinions of whether this is indeed a natural requirement, but the lack of it does turn out to imply what may be viewed as inconsistencies.  For example, when we  use the aggregate preference to choose a single alternative (hence obtaining a social choice function), then lack of IIA is directly tied to the the possibility of strategic manipulation of the derived social choice function.

We can take the following definition of non-trivial:

Definition: A social welfare function $F$ is a dictatorship if for some $i$, $F$ is the identity function on $x_i$ or the exact opposite order (in our coding, the bitwise negation of $x_i$).

It is easy to see that any dictatorship satisfies IIA.  Arrow’s theorem states that, with another minor assumption, these are the only functions that do so.  Kalai’s quantitative proof requires an assumption that is more restrictive than that required by Arrow’s original statement.  Recently Elchanan Mossel published a paper without this additional assumption, but we’ll continue with Kalai’s elementary variant.

Definition: A social welfare function $F$ is neutral if it is invariant under changing the names of alternatives.

This basically means that as a voting method it does not discriminate between candidates.

Arrow’s Theorem (for neutral functions): Every neutral social welfare function that satisfies IIA is a dictatorship.

## Quantification for Arrow’s Theorem

In CS, we are often quite happy with approximation: we know that sometimes things can’t be perfect, but can still be pretty good.  Thus if IIA is “perfect”, then the following quantitative version comes up naturally: can there be a function that is “almost” an IIA social welfare function and yet not even close to a dictatorship? Let us define closeness first:

Definition: Two social welfare functions $F, G$ are $\epsilon$-close if $Pr[F(x_1...x_n) \ne G(x_1 ... x_n)] \le \epsilon$.  The probability is over independent uniformly random choices of $x_i \in L$.

The choice of the uniform probability distribution over $L^n$ (strangely termed the impartial culture hypothesis) can not really be justified as a model of the empirical distribution in reasonable settings, but is certainly natural for proving impossibility results which then extend to other reasonably flat distributions. Once we have a notion of closeness of functions, being $\epsilon$-almost a dictatorship is well defined by being $\epsilon$-close to some dictatorship function.  But what is being “almost an IIA social choice function”?  The problem is that the IIA condition involves a relation between different values of $F$.  This is similar to situation in the field of property testing, and one natural approach is to follow the approach of property testing and quantify how often the desired property is violated:

Definition A: A function $F : L^n \rightarrow L$ is an $\epsilon$-almost IIA social welfare function if for all $a,b \in A$ we have that $Pr[F^{a,b}(x_1...x_n) \ne F^{a,b}(y_1...y_n) | \forall i:x^{a,b}_i=y^{a,b}_i] \le \epsilon$.  I.e. for a random input, a random change of the non $(a,b)$-bits is unlikely to change the value of $F^{a,b}$.

A second approach is simpler although surprising at first, and instead of relaxing the IIA requirement, relaxes the social welfare one.  I.e. we can allow $F$ to have in its range all possible 8 values in ${0,1}^n$ rather than just the 6 in $L$.  We call the bad values, 000 and 111, irrational outcomes, as they do not correspond to a consistent preference on $A$.

Definition B: A function $F : L^n \rightarrow \{0,1\}^n$  is an $\epsilon$-almost IIA social welfare function if it is IIA and there exists a social choice function $G : L^n \rightarrow L^n$ such that $F$ and $G$ are $\epsilon$-close.

Note that we implicitly extended the definitions of closeness and of being IIA from functions having a range of $L$ to those also with a range of ${0,1}^3$.

We can now state Kalai’s quantitative version of Arrow’s theorem:

Theorem (Kalai): For every $\epsilon>0$ there exists a $\delta>0$ such that every neutral function that is $\delta$-almost an IIA social choice function is $\epsilon$-almost a dictatorship.

Following Kalai, we will prove the theorem for definition B of “almost”, but the same theorem for definition A directly follows: take a function $F$ that is an $\delta$-almost IIA social welfare according to definition A, and define a new function $F'$ by letting $F'^{a,b}(x_1...x_n)$ to be the majority value of $F^{a,b}(y_1...y_n)$ where the $y_i$‘s agree with the $x_i$‘s on their $(a,b)$-bit and range over all possibilities on the other bits.  By definition $F'$ is IIA, and it is not difficult to see that since $F'$ satisfied definition A, then the majority vote in the definition is almost always overwhelming and thus $F'$ is $\delta'$-close to $F$ (where $\delta' = O(\sqrt{\delta})$).  Since we defined each bit of $F'$ separately, its range may contain irrational outcomes, but since it is close to $F'$, at most $\delta'$ of these, and thus it satisfies definition B.

The main ingredient of the proof will be an analysis of the correlation between two bits from the output of $F$.

Main Lemma (social choice version): For every $\epsilon>0$ there exists a $\delta>0$ such that if a neutral $F$ is not $\epsilon$-almost a dictatorship then $Pr[F^{a,b}(x_1...x_n)=1\:and\:F^{b,c}(x_1...x_n)=0] \le 1/3-\delta$.

Before we proceed, let us look at the significance of the $1/3-\delta$: If the values of $F^{a,b}$ and $F^{b,c}$ were completely independent, then the joint probability would be exactly $1/4$ (since each of the two bits is unbiased due to the neutrality of $F$).  For a “random” $F$ the value obtained would be almost $1/4$.  In contrast, if $F$ is a dictatorship then the joint probability would be exactly $1/3$ since $Pr[x_i^{a,b}=1\:and\:x_i^{b,c}=0]=1/3$ as $x_i$ is uniform in $L$.  The point here is that if $F$ has non-negligible difference from being a dictatorship then the probability is non-negligibly smaller than $1/3$.

The theorem is directly implied from this lemma as the event $F(x_1 ... x_n) \in L$ is the disjoint union of the three events $F^{a,b}(x_1...x_n)=1\:and\:F^{b,c}(x_1...x_n)=0$$F^{b,c}(x_1...x_n)=1\:and\:F^{c,a}(x_1...x_n)=0$, and $F^{c,a}(x_1...x_n)=1\:and\:F^{a,b}(x_1...x_n)=0$, whose total probability, if $F$ is not $\epsilon$-almost a dictatorship,  is bounded by the lemma by $1-3\delta$ and thus the probability of an irrational outcome is at least $3\delta$.

So let us inspect this lemma.  We have two Boolean functions $F^{a,b}$ and $F^{b,c}$ each operating on $n$-bit strings.  Since $F$ is neutral, these are actually the same Boolean function, so $F^{a,b}=F^{b,c}=f$.  What we are asking is for the probability that $f(z)=1\:and\:f(w)=0$ where $w$ and $z$ are each an $n$-bit strings: $w=(x^{a,b}_1....x^{a,b}_n)$ and $z=(x^{b,c}_1...x^{b,c}_n)$.  The main issue how $w$ and $z$ are distributed.   Well, $z$ is uniformly distributed over $\{0,1\}^n$ and so is $w$. They are not independent though: $Pr[w_i=z_i]=1/3$. We say that the distributions on $w$ and $z$ are (anti-1/3-)correlated.

So our main lemma is equivalent to the following version of the lemma which now talks solely of Boolean functions:

Main Lemma (Boolean version): For every $\epsilon>0$ there exists a $\delta>0$ such that if an odd Boolean function $f:\{ 0,1 \}^n \rightarrow \{ 0,1 \}$ is not $\epsilon$-almost a dictatorship then $Pr[f(z)=1\:and\:f(w)=0] \le 1/3-\delta$, where $z$ is chosen uniformly at random in $\{0,1\}^n$ and $w$ chosen so that $Pr[w_i = z_i]=1/3$ and $Pr[w_i=-z_i]=2/3$.

An odd Boolean function just means that $f(-z_1...-z_n)=-f(z_1....z_n)$ which is follows from the neutrality of $F$ by switching the roles of $a$ and $b$.  (We really only need that $f$ is “balanced”, i.e. takes value 1 on exactly 1/2 of the inputs, but lets keep the more specific requirement for compatibility with the previous version of this lemma.)

## Fourier Transform on the Boolean Cube

At this point using the Fourier transform is quite natural.  While I do not wish to give a full introduction to the Fourier transform on the Boolean cube here, I do want to give the basic flavor in one paragraph, convincing those who haven’t looked at it yet to do so.  1st year algebra and an afternoon’s work should suffice to fill in all the holes.

The basic idea is to view Boolean functions $f:{0,1}^n \rightarrow {0,1}$ as special cases of the real-valued functions on the Boolean cube: $f:{0,1}^n \rightarrow \Re$.  This is a real vector space of dimension $2^n$, and has a natural inner product $ = 2^{-n} \sum_x f(x) g(x)$ The $2^{-n}$ factor is just for convenience and allows viewing the inner product as the expectation over a random choice of $x$: $ = E[f(x)g(x)]$.  As usual the choice of a “nice” basis is helpful and our choice is the “characters”: functions of the form $\chi_S(x) = (-1)^{\sum_{i \in S} x_i}$, where $S$ is some subset of  the $n$ bits.  $\chi_S$  takes values -1 and 1 according to the parity of the bits of $x$ in $S$.  There are $2^n$ such characters and they turn out to be an orthonormal basis.  The Fourier coefficients of $f$, denoted $\hat{f} (S)$, are simply the coefficients under this basis $f=\sum \hat{f} (S)\chi_S$, where we have that $\hat{f} (S) = $.

One reason why this vector space is appropriate here is that the correlation operation we needed between $w$ and $z$ in the lemma above, is easily expressible as a linear transformation $T$ on this space defined by $(Tf)(x)=E[f(y)]$ where $y$ is chosen at random with $Pr[y_i = x_i]=1/3$ and $Pr[y_i=-x_i]=2/3$.  Using this it is possible to elementarily evaluate the probability in the lemma as $\sum_S 3^{-|S|} \hat{f} (S)^2$, where the sum ranges over all $2^n$ subsets $S$ of the $n$ bits.  To get a feel of what this means we need to compare it with the following property of the Fourier transform of a balanced Boolean function: $\sum_S \hat{f}(S)^2 = ||f||_2^2 = 1/2$.  The difference between this sum and our sum is the factor of $3^{-|S|}$ for each term.   The “first” element is this sum is easily evaluated for a balanced function: $\hat{f}(\emptyset)^2 =(E[f])^2 = 1/4$, so in order to prove the main lemma it suffices to show that $\sum_{S \ge 1} 3^{-|S|} \hat{f} (S)^2 < (1/3 - \delta) - 1/4 = 1/12 - \delta$. This is so as long as a non-negligible part of $\sum_{|S| \ge 1} \hat{f}(S)^2 = 1/4$ is multiplied by a factor strictly smaller than $3^{-1}$, i.e. is on sets $|S|>1$. Thus the main lemma boils down to showing that if $\sum_{|S| \ge 2} \hat{f} (S)^2 \le \delta'$  then $f$ is $\epsilon$-almost a dictatorship.  This is not elementary but is proved by E. Friedgut, G. Kalai, and A. Naor and completes our journey from Arrow to Fourier.

## More results

There seems much promise in using these analytical tools for addressing quantitative questions in social choice theory.  I’d like to mention two results of this form.  The first is the celebrated MOO paper (by E. Mossel, R. O’Donnell and K. Oleszkiewicz) that proves, among other things, that among functions that do not give any player unbounded influence, the closest to being an IIA social welfare function is the majority function.  The second, is a paper of mine with E. Friedgut and G. Kalai that obtains a quantitative version of the Gibbard–Satterthwaite theorem showing that every voting method that is far from being a dictatorship can be strategically manipulated.  Our version shows that this can be done on a non-negligible fraction of preferences, but is limited to neutral voting methods between 3 candidates.

### 18 Responses

1. Thanks for the long entry.

I found O Donnel’s survey paper specially useful to get an overview of Fourier analysis applications in social choice theory:

http://www.cs.cmu.edu/~odonnell/papers/analysis-survey.pdf

It also covers applications of Boolean function analysis to prove inapproximability of important problems.

2. I like the idea of including irrational preferences (as well as simplisity) in Definition B of e-almost IIA sw functions. But before allowing those two bad preferences 000 and 111, I would extend the range to include the preferences that contain indifferences. To me, ruling out indifferences on the domain is okay, but ruling them out on the range can be too restrictive.

Usually taking this (minor) point into consideration just complicates the proof, though. Hopefully the same is true for this approach based on Boolean functions (simple games).

—From a social choice guy that works on the boundary of Algorithmic Game Theory and something else.

3. The link to Elchanan Mossel’s paper is broken.

4. Haris: thanks for the link — I added it to the post.

Reiju: I don’t know whether handling indifferences is just a technical nuisance using the Fourier approach or whether it opens up new issues — there has been only little work on this so far.

Preyas: Thanks. I fixed the link.

5. This is a lovely post. I could not have written it so well, but probably I can write about it not as good with some more angles. I will try it some time.

6. Very nice post. However, it’s not quite true that all the proofs of Arrow’s theorem are combinatorial. You might be interested in the paper “Unifying Impossibility Theorems: A Topological Approach” by Baryshnikov. This proof could only be called combinatorial insofar as it is relies on the topology of simplicial complexes which are combinatorial objects.

7. […] own called “Algorithmic game theory” of with interesting posts on the econ-CS interface. One post is about Fourier theoretic proof of Arrow’s theorem and the recent paper by Noam, Ehud Friedgut […]

8. There is an electronic version to another paper by Yuliy Baryshnikov: http://www.springerlink.com/content/l6byjxktnm1n3358/fulltext.pdf

Topological social choice is a fairly developed area and it is related to the issue of “averaging” in mathematics. A recent paper by Shmuel Weinberger with links to earlier papers is here:
http://www.ratio.huji.ac.il/dp_files/dp282.pdf

9. Thanks for the links, Gil, Boris, and Noah.

10. […] blogs: Michael Nielsen’s, the Geomblog, Black belt Bayesian, Secret blogging seminar, and Algorithmic game theory, to name a few. Possibly related posts: (automatically generated)Reading Notes: Chapter 4: Group […]

11. […] New paper: average-case manipulation of elections The classic Gibbard-Satterthwaite theorem states that every nontrivial voting method can be manipulated.   In the late 1980’s an approach to circumvent this impossibility was suggested  by Bartholdi, Tovey, and Trick: maybe a voting method can be found where the manipulation is computationally hard — and thus effective manipulation would be practically impossible?  Indeed voting methods whose manipulation problem was NP-complete were found by them as well as later works.  However, this result is not really satisfying: the NP-completeness only ensures that the manipulation cannot be solved in the worst case; it is quite possible that for most instances manipulation is easy, and thus the voting method can still be effectively manipulated.   What would be desired is a voting method where manipulation would be computationally hard everywhere, except for a negligible fraction of inputs.  In the last few years several researchers have indeed attempted “amplifying” the NP-hardness of manipulation in a way that may get closer to this goal. The new paper of Marcus Isaksson, Guy Kindler, and Elchanan Mossel titled The Geometry of Manipulation – a Quantitative Proof of the Gibbard-Satterthwaite Theorem shatters these hopes.  Solving the main open problem in a paper by Ehud Freidgut, Gil Kalai, and myself, they show that every nontrivial neutral voting method can be manipulated on a non-negligible fraction of inputs.  Moreover, a random flip of the order of 4 (four) alternatives will be such a manipulation, again with non-negligible probability.  This provides a quantitative version of the Gibbard-Satterthwaite theorem,  just like Gil Kalai previously obtained a quantitative version of Arrow’s theorem. […]

12. […] starts with the classic KKL work on influences and reaches some of his modern results on approximation in computational social choice.  Gil also writes on ICS in his […]

13. I am planning on going to beauty school when I’m older, and mainly going to school for hairstyling stuff. So what are the top 3 beauty schools in Texas?.

14. […] to a Fourier-theoretic proof of Arrow theorem (in the balanced case). You can find it discussed in this blog post by Noam Nisan.  Lecture 10 mentioned a few further application of the Fourier method related to […]

15. […] Most proofs of Arrow’s theorem are combinatorial in nature. In 2002 Gil Kalai gave a clever proof based on Boolean Fourier analysis. Noam Nisan goes over this proof in a 2009 blog post. […]