Posts Tagged ‘Academia 2.0’

Heeding a call from Suresh, let me advertise the new mathOverflow-like, stackOverflow-like site for Theoretical Computer Science that is being constructed now using a newer platform.  Great potential.


Read Full Post »

Constructive Economics is a rather new blog on the border of Economics and CS written by Abe Othman.  The blog seems to also be interested in AI at large as well as real-world finance.

Read Full Post »

There are two main reasons why researchers publish papers in conferences and journals.  A “real” reason and a “strategic” reason.  The strategic reason is an artifact of the current academic situation where researchers are judged according to their publication list, when considered for positions, promotions, grants, etc.  Given this state of affairs, we have a strong personal incentive to “publish” whether or not our research is worthwhile and whether or not anyone will ever read it.   The “real” reason for publication is dissemination: let other researchers learn about our work, so they can use it, continue it, be influenced by it, etc.  This is what  the whole “scientific/academic establishment” should aim to promote.  Inherent here is the competition for the attention of other researchers who have to decide to spend time and effort reading your paper rather than the countless others vying for their attention.

In my view the main real service that journals and conferences provide in this day and age is exactly the arbitration of this competition for attention: the editors and reviewers of journals and the program committee of conferences  look at lots of papers and suggest a few of them for me to look at.  When chosen right, this is indispensable: there is no way that I could spot by myself  the new important paper of a young graduate student among the hundreds of non-important ones out there.  The main reason why I prefer conferences to journals in CS is that the former seem to be doing a much better job (although still far from perfect) of this identification of new important stuff.

The Internet has revolutionized the mechanics of dissemination of scientific work.   I can still remember when scientific dissemination worked by putting reprints of our papers in envelopes and sending them in (real) mail.  This was replaced by photocopying from conference proceedings, then by sending email attachments, and today we just “put it on the web”.  The standard that seems to be emerging now is to put it on the arXiv.  In comparison, the “social-scientific” structure surrounding “publication” has hardly changed at all, and putting your paper on the arXiv is not “counted as a publication” , provides no signal of your paper’s quality or correctness, and usually does not suffice for getting much attention for your work.    I think that the main ingredient missing from having “putting your paper on the web” be the main form of publication is a surrounding mechanism that can provide a signal of quality and that will help readers focus their attention on the more important work.  How exactly this can work still remains to be seen, but I would like to run an experiment in this direction on this blog.

Experiment: Recommend interesting AGT/E papers on the Web

I am asking readers of this blog to put forward — as a comment to this blog post — recommendations for interesting papers in the field of Algorithmic Game Theory/Economics.  Here are the rules of this experiment:

  • Eligible recommenders: Anyone from the “AGT/E research community”.  I will take this in a wide sense: anyone who has published a paper related to AGT in a recognized scientific forum (conference or journal in CS, AI, GT, economics, …)
  • Eligible papers: Papers must be (1) Available openly on the web. (2) Not already have been published in a journal or conference with proceedings.  It is OK if they were submitted to or accepted  by a conference or journal as long as they have not yet appeared yet.  (3) Related to Algorithmic Game Theory/Economics, taken in a wide sense.
  • What to include: (1) Name of the recommender and a link to their academic home-page — no anonymous recommendations (2) A link to the paper (3) A short explanation of what the paper is about and why you think it is interesting.   There is no implicit assumption of having refereed the paper in any sense.
  • Conflicts: The recommender should follow the usual rules of avoiding conflict of interest followed in program committees: do not recommend (1) your own papers, (2) papers of someone in own department, (3) papers of a frequent collaborator (4) papers of a family member/lover, etc.

[Update: following a suggestion, to start things off, I entered my own recommendation in the format that I was thinking of — as a comment.]

Read Full Post »

I have been teaching the course Topics on the  border of CS and economics with Michal Feldman in the fall semester.   One of the course assignments was to write a Wikipedia entry on a topic of the students’ choice that is related to the course.  This was the first time that any of us tried this, so we left this assignment pretty open and only made sure that the class’s choice was injective and more or less in range.  We were not aware at the time that there are suggested ways on how to organize such a thing, e.g. this entry in English or this one in Hebrew (we allowed both English and Hebrew entries, and most students chose the latter.)

We saw various reasons why this assignment is a good idea: First, we figured that having the “world” reading your work is an added incentive to write well. (This is especially so given the fact that, due to the terrible budget cuts in Israeli universities in the last few years, we had no TA.) Second, we wanted to ensure that the effort that students put into their assignment is not wasted but rather put to a good use.  Third, there was the idea of doing a public service.  Fourth, there was a somewhat vague notion of immersion in the main motivator of the course: what goes on the Internet.

As the course has ended, we compiled a list of the entries written for the course.  I would say that the experiment has ended with mixed results: the quality of entries varies.  Some of them are really good, others are OK — a reasonable start and hopefully will be improved, but some are quite badly written (in various senses: format, writing style, and even technical content.) It seems that adding incentives for what is usually done by volunteering (i.e. giving course credit for writing an entry that is usually done voluntarily) introduces problems.  While normally Wikipedia writers only do so when they want to and are able to do a god job, here we incentivized people to do so even if they could not do a good job or did not want to put enough effort into doing so.  (This is somewhat similar to the observation that paying for blood donation reduces the willingness to do so.)  I can’t really tell if the students that did a really bad job are incapable of doing significantly better or just didn’t put enough effort into it.

Michal and me now feel somewhat responsible for doing some harm to the (mostly Hebrew language) Wikipedia.  We are not really able to go over all the entries ourselves and “fix” them (again, no TA, and many dozens of entries.) We are making small changes here and there as well as adding a few comments in discussion pages, but this is small in magnitude, and doesn’t help much for the worst entries, so we have decided to hire a TA/RA for a while to “clean after this course”.

One of the interesting new entries was the Algorithmic Game Theory entry in the English Wikipedia.  It turns out that there was no such entry previously, despite a call to write one by the “committee to improve Wikipedia’s arguably-somewhat-sketchy coverage of theoretical computer science” over a year ago, which mentioned AGT together with a list of other desired entries, many of which still remain unwritten.   (Looking at the history page of AGT, it seems that an entry for AGT was previously written, but the entry was not appropriate and focused on Algorithmic Mechanism Design and so was “moved” there about two years ago.)

My (preliminary) conclusion: I would do it again, but next time, only with a smaller class size and with a devoted TA.

Read Full Post »

Not a Journal

A recent blog post by Jeff Elly brought to my attention “NAJ Economics“, an overlay journal devoted to Economics that has been in operation for about eight years, although at pretty low volume.  “NAJ” stands for “Not a Journal” (or the geek-wannabe GNU-like “NAJ aint a journal”), and the idea is that the set of (rather distinguished) editors choose papers that they like on the web, peer-review them, and publish links to the “reviewed” papers.  The way it works is that the editors pick what they want to review:  you can not submit your paper to NAJ, nor can you ask them not to review your paper once you’ve put it openly on the web.  The idea is that this gives a peer-reviewed publication: the author takes care of publication on the web, and NAJ provides the peer-review.    (I started thinking about “NAJ AGT” which could handle publication more elegantly by relying on the arXiv.  But then, it doesn’t seem that “NAJ economics” is a success story, so maybe not.)

At the same time, Daniel Lemire posts a 1987 “EWD” by Dijkstra going against the whole notion of trusting peer review too much.  Indeed, Dijkstra rarely published his work in the usual sense but rather mailed out photocopies of his hand-written writings, termed EWDs, to colleagues. While my sympathies (like those of Lemire)  are with this mode of publication, I’m afraid that few non-Turing-Award-winners will get their work noticed this way, so some mechanism for allocation of attention is still needed.

Read Full Post »

An author writes something, and then someone else reads it.  There are some costs involved — who should pay them?  The writer or the reader?  Obviously, the one who gains from the “transaction”: if we are talking about useful or interesting information, then the reader; if we are talking about some form of advertising, then the writer.

The same principle should be true in academic publishing as well.  It used to be that the readers paid to read academic journals.  Publisher abuses of the system, together with the new possibilities opened by the Internet, caused academics to talk about open access journals that do not charge readers.  The main associated costs are those of producing the research and these are anyway paid by government grants as well as professor salaries.   Unfortunately, it seems that the tide is turning towards “open access” journals like PloS ONE in which the authors pay for their results to be published.

By definition, papers published in such journals are equivalent to advertising: you pay for others to notice you.  The dynamics of such journals cannot maintain quality: they have strong incentives to increase quantity and have weak (or no) incentives to increase quality.  Indeed, PLoS ONE does not hide it:

PLoS ONE will rigorously peer-review your submissions and publish all papers that are judged to be technically sound. Judgments about the importance of any particular paper are then made after publication by the readership (who are the most qualified to determine what is of interest to them).

From my point of view, this reduction in the role of referees makes the journal superflous: information that is unfiltered for quality, becomes useless since it is impossible to find the good stuff there — you might as well just put your paper on the arXiv (see also my post on the attention economy).  I personally also don’t quite trust the referees for correctness, not in other journals and certainly not here — correctness is established after the “community” has chewed on the paper for a while (which will only happen if it is interesting).  I find it hard to believe that any favorable reputation will be attached in the long run to publication in such places — they’ll mostly be a sink for readerless publications.

Let me just be clear: in no way am I defending the expensive “closed access” journals.  It is not so difficult to just put your work openly on the web.   If you also need a journal publication for your own promotion/grants, then do what you need to.  But don’t forget: the “science” part is the making it available; the “journal” part is a bureaucratic hurdle.

Read Full Post »

The Netflix $1M collaborative filtering competition has ended with an outcome that is doubly-interesting.  First, there was a nice “photo-finish” with two separate groups passing the prize-winning mark and with the winner not yet announced.  More interesting is the fact that both contenders at this point are rather large coalitions each composed of several groups who were previously separately competing.  In fact, once the first such coalition passed the prize-wining mark, the second coalition formed, combined forces, and reached first place within the 30 days mandated by the rules.

Many have written about the significance of the fact that it is not one “magic bullet” that lets you win such a competition but rather the combination of many “small” ideas and much work.  What I am intrigued by is how such a fruitful combination of separately written pieces of software happens.  In the case of the first group, who have been working together for a while, one might take the point of view that it is just the usual miracle seen in any large software effort (open source or not) where a sound software architecture with well-defined interfaces lets many people contribute to a project.  (Just like Linux or Google Search keep getting better by the combined efforts of many programmers.)  However, in case of the second group, there seems to be something more interesting going on in the sense that the pieces were joined ex-post, seemingly with a very simple combination architecture. How can this be done?  Can it be done for other challenges as well?  (A competition to beat Google search?) Can competitions be designed with collaboration built in? (Maybe the second Netfilx competition?)

Let us look at an easy example for simple software collaboration: optimization given a set of constraints (such as Integer programming).  Here the input consists of a set of constraints on n variables, as well as an optimization goal defined on these variables.  The output should be an assignment of values to the variables that satisfies the constraints and with as high as possible a value for the optimization goal.  The nice thing about such problems is that collaboration is essentially built in by the problem definition.  Once clear input and output formats are defined, combining several optimizers is easy: just run all of them on the input instance, and output the best assignment returned by any of the component optimizers.  If the total measure of success of an algorithm is the average (say) value of the returned assignment over some distribution of problem instances, then the collaborative optimizer is certainly at least as good as any single component used, and if the different components used have distinct “strengths” then the collaborative optimizer is actually better than any single one.  This architecture naturally supports specialization: each optimizer focuses on some subset of instances and tries to do as good a job as possible on it, leaving other inputs for other optimization efforts.

Now back to Netflix-like machine learning challenges where the optimization goal is to produce a single predictor p that agrees as much as possible with the given target function t.  The natural ML tendencies would be, I suppose, to combine different p‘s using some kind of weighted voting or averaging, as indeed the second team seems to have done according to the Wired blog post:

To combine competitors’ algorithms, The Ensemble didn’t have to cut and paste much code together. Instead, they simply ran hundreds of algorithms from their 30-plus members (updated) and combined their results into a single set, using a variation of weighted averaging that favored the more accurate algorithms.

Thus the collaboration architecture is not different from the problem statement with the addition of a module that chooses the best weights for averaging.  When the aim is to minimize the sum-squares error, as was the case in the Netflix challenge, then the best coefficients for weighting the average can be efficiently obtained by linear regression which should be doable with these data set sizes (I think).

I can see many interesting theoretical and practical research issues in generalizing the scope of collaboration:

  • How can we easily allow more sophisticated combinations?  E.g. a component that specializes in figuring out “who to ask” among other algorithms? Or branch-n-bound algorithms that use partial results from other optimizers to control branching?  In the Netflix case, one researcher found a way to take temporal dynamics into consideration — can this be effortlessly used as a subroutine in other components?  What kind of meta-data will be useful here?
  • Can we get Organic-like growth with sub-algorithms constantly being added that utilize and then improve the previous state of affairs?  How do we best mimic natural-selection?  How do we avoid evolutionary dead-ends?
  • How do we incentivize partial algorithms?  How do we determine the contribution of a single component relative to others?  I can immediately think of two types of models here: one based on a cooperative games formalism where for each subset S of components we have the value v(S) that its combination can bring, and another taking a sequential approach where each contribution gets credited according to the marginal improvement that it brings relative to the previous combination.  This is somewhat like an intellectual property model when the first one to come up with an idea gets the credit for it, but then others can build upon it — a model that presumably encourages rapid progress.
  • What if we don’t have resources (say CPU time) to run all components, how do we choose which to use?  Obviously we want to take their cost-performance trade off into account.  Can we have components that specialize in such resource allocation?

Read Full Post »

Older Posts »