The Netflix $1M collaborative filtering competition has ended with an outcome that is doubly-interesting. First, there was a nice “photo-finish” with two separate groups passing the prize-winning mark and with the winner not yet announced. More interesting is the fact that both contenders at this point are rather large coalitions each composed of several groups who were previously separately competing. In fact, once the first such coalition passed the prize-wining mark, the second coalition formed, combined forces, and reached first place within the 30 days mandated by the rules.
Many have written about the significance of the fact that it is not one “magic bullet” that lets you win such a competition but rather the combination of many “small” ideas and much work. What I am intrigued by is how such a fruitful combination of separately written pieces of software happens. In the case of the first group, who have been working together for a while, one might take the point of view that it is just the usual miracle seen in any large software effort (open source or not) where a sound software architecture with well-defined interfaces lets many people contribute to a project. (Just like Linux or Google Search keep getting better by the combined efforts of many programmers.) However, in case of the second group, there seems to be something more interesting going on in the sense that the pieces were joined ex-post, seemingly with a very simple combination architecture. How can this be done? Can it be done for other challenges as well? (A competition to beat Google search?) Can competitions be designed with collaboration built in? (Maybe the second Netfilx competition?)
Let us look at an easy example for simple software collaboration: optimization given a set of constraints (such as Integer programming). Here the input consists of a set of constraints on n variables, as well as an optimization goal defined on these variables. The output should be an assignment of values to the variables that satisfies the constraints and with as high as possible a value for the optimization goal. The nice thing about such problems is that collaboration is essentially built in by the problem definition. Once clear input and output formats are defined, combining several optimizers is easy: just run all of them on the input instance, and output the best assignment returned by any of the component optimizers. If the total measure of success of an algorithm is the average (say) value of the returned assignment over some distribution of problem instances, then the collaborative optimizer is certainly at least as good as any single component used, and if the different components used have distinct “strengths” then the collaborative optimizer is actually better than any single one. This architecture naturally supports specialization: each optimizer focuses on some subset of instances and tries to do as good a job as possible on it, leaving other inputs for other optimization efforts.
Now back to Netflix-like machine learning challenges where the optimization goal is to produce a single predictor p that agrees as much as possible with the given target function t. The natural ML tendencies would be, I suppose, to combine different p‘s using some kind of weighted voting or averaging, as indeed the second team seems to have done according to the Wired blog post:
To combine competitors’ algorithms, The Ensemble didn’t have to cut and paste much code together. Instead, they simply ran hundreds of algorithms from their 30-plus members (updated) and combined their results into a single set, using a variation of weighted averaging that favored the more accurate algorithms.
Thus the collaboration architecture is not different from the problem statement with the addition of a module that chooses the best weights for averaging. When the aim is to minimize the sum-squares error, as was the case in the Netflix challenge, then the best coefficients for weighting the average can be efficiently obtained by linear regression which should be doable with these data set sizes (I think).
I can see many interesting theoretical and practical research issues in generalizing the scope of collaboration:
- How can we easily allow more sophisticated combinations? E.g. a component that specializes in figuring out “who to ask” among other algorithms? Or branch-n-bound algorithms that use partial results from other optimizers to control branching? In the Netflix case, one researcher found a way to take temporal dynamics into consideration — can this be effortlessly used as a subroutine in other components? What kind of meta-data will be useful here?
- Can we get Organic-like growth with sub-algorithms constantly being added that utilize and then improve the previous state of affairs? How do we best mimic natural-selection? How do we avoid evolutionary dead-ends?
- How do we incentivize partial algorithms? How do we determine the contribution of a single component relative to others? I can immediately think of two types of models here: one based on a cooperative games formalism where for each subset S of components we have the value v(S) that its combination can bring, and another taking a sequential approach where each contribution gets credited according to the marginal improvement that it brings relative to the previous combination. This is somewhat like an intellectual property model when the first one to come up with an idea gets the credit for it, but then others can build upon it — a model that presumably encourages rapid progress.
- What if we don’t have resources (say CPU time) to run all components, how do we choose which to use? Obviously we want to take their cost-performance trade off into account. Can we have components that specialize in such resource allocation?
Boosting is an elegant way to implement a more sophisticated collaboration scheme. Boosting is interesting in that it ‘breeds’ algorithms that complement each other’s strengths.
Regarding the quantification of the contribution of each component, it can be argued that the Netflix competition is a *simple* cooperative game: a coalition is either winning or losing, that is, v(S)=1 or v(S)=0. Therefore, one can measure the contribution of a component using a power index such as the Banzhaf index or the Shapley index.
A similar approach works well in SAT solving competitions:
http://www.cs.ubc.ca/labs/beta/Projects/SATzilla/