Looking in my rearview mirror
Guest post by Reshef Meir
Once upon a time (or so I’m told), the important publication venues were journals. Sure, there were conferences, but their main purpose was to present new findings to the community and to trigger discussions. A conference paper did not really “count”, and results were only considered valid after being published in a respectable journal, verified by its reviewers. Indeed, this is still the situation in some fields.
I have no intention to revive the discussion on the pros and cons of journals, but conferences proceedings in computer science, and in AGT in particular, are nowadays treated as publications for every purpose. They are considered reliable, are highly cited, and theorems are used as building blocks for newer results. We also want institutions and promotion committees to consider conference papers when looking at candidates.
The next sentence should be “…this progress was made possible due to great improvements in the review process of conferences”. But has it?
Almost half of the conference submissions I have personally reviewed [1] contained severe technical errors—where many of the erroneous submissions came from EC. All EC submissions, I should say, were worthy, and would make at least a reasonable case for acceptance if not for the technical errors.
Somewhat surprisingly, I discovered that there is no consensus about rejecting papers once a technical error is spotted [2]. Often authors reply with a corrected version (sometimes weakening the propositions), or a proof sketch, or promise they would fix the proof. For some committee members, this is a satisfactory answer; others assert that the paper should be refereed based on the originally submitted version, and that technical correctness is a threshold condition for acceptance.
I am not arguing that technical correctness should be the only or the primary criterion for acceptance, but this is one criterion that I think there is currently a problem with. To initiate a discussion, here are the main arguments for acceptance/rejection as I perceive them.
Toward acceptance:
1) It is not the reviewers’ job but the authors’ responsibility to verify the correctness of their results (an old debate, see e.g. here, p.3).
2) Proofs are sometimes replaced with sketches or even omitted, so it is unreasonable (and perhaps impossible) to verify correctness anyway.
3) Even in journals, errors are no big deal, since if results are important the error will eventually “float”.
4) We should trust authors, as no one wants an embarrassing error under their name. [3]
5) There is an opportunity cost incurred on the community for delaying the publication of interesting results.
Toward rejection:
a) If errors are found, a corrected version can be submitted to either a different conference or the next meeting of the same conference. Authors should not be given the credit for correcting the paper and resubmitting it, since we know it cannot be properly reviewed.
b) Accepting revised versions from some authors may be unfair toward others. Also, why not submit a corrected version with better motivation, references, or added results?
c) If an error is found, this is an indication that there might be other hidden errors, and that authors should better prepare their paper for publication.
d) There are many non/lightly refereed venues (Arxiv, workshops) for propagating results quickly. It is hard to claim that results in CS do not propagate fast enough.
e) While authors doubtlessly prefer to publish error-free papers, other tasks and priorities may come before verification. Papers usually do not go under major changes between acceptance and the camera-ready version. From my experience as an author though, papers often significantly improve between submissions.
f) Low tolerance for errors will incentivize authors to invest effort in finding their own mistakes prior to submission.
All in all, I agree that the best verification is indeed by the authors themselves, and that it is the authors’ responsibility to publish error-free papers. However I do think there is a problem. I will fully admit that my own papers are not free from errors, and unfortunately some of them have been only found after publication.
One possible solution is a revision of the reviewing process that will put more emphasis on verifying correctness, making another step in the hybridization of journals and conferences. For example, adding more time for the review and allowing authors to submit revisions in particular cases. As such changes are costly, a simpler solution is to make it clear that non-trivial errors will result in rejection unless there are unusual circumstances (like a ground-breaking result that can be easily fixed). The point of this strict line is not to be an adversarial reviewer, but rather to ensure that authors have not just the capability and the responsibility—but also the incentive—to properly verify their own work.
So, should the review process change? Should an article with errors be accepted or rejected?
[1] Aggregating 20 submissions over the period 2009-2013, from AAMAS, SAGT, WINE, AAAI, IJCAI, and ACM-EC. Clearly this is not statistically significant at any rate, but may still indicate a problem. Of course, even in published journal papers errors are common, and I have seen estimations that between 10% and 30% of published papers contain non-trivial errors. Unfortunately I could not find any trusted source but see e.g. here and here.
[2] By “severe technical error” I mean either that a proposition is wrong, or that large chunks of the proof should be rewritten.
[3] Indeed, some people are quite embarrassed if an error is discovered even before publication. In contrast, Lamport (in Sec.4.4 and in general) seems to be skeptical about the attitude of computer scientists towards their published errors.
This is fairly interesting.
Is your sense that people are rushing to publish at conventions precisely because a) the field tolerates errors, and b) but the “points” for being an invited speaker at too hard to turn down?
Clearly the pressure to publish and the fact that it is relatively easier (and faster) to publish in most conferences than in journals makes them attractive.
People probably submit papers they believe to be correct. However you can believe after checking every bit, knowing that even a little error may result in rejection; or believe since proofs seems more-or-less OK and the deadline is tomorrow.
I actually think that the fact that this is even a question is one of the most convincing indictments of “conferences as publications” that I’ve come across.
Note that (as far as I understand) Reshef is not asking “should we accept conference papers that are known to have errors?” (as the title of the post may seem to suggest), but rather “should we accept conference papers where errors were identified and the authors say they are fixable?”.
Thanks Michael, that really highlights my concerns. The third “solution” (which I did not mention) was that researchers will gradually lose confidence in conference publications. Personally, I really don’t want this to happen.
Actually the identification of an error is also nontrivial. If a reviewer claims to have found a bug in a proof, should PC double check with the authors just in the case that the reviewer is unfortunately wrong?
EC and the main AI conferences (AAAI, IJCAI, AAMAS, UAI, etc.) do have an author response phase whereby (in particular) the authors can provide a rebuttal to reviews that claim to have found technical errors. Reshef is asking: what should we do if in the author response phase the authors acknowledge the errors but claim that they are fixable?
Thanks. Understand that. The issue is, this is *not* a standard in typical theoretical computer science conferences such as STOC/FOCS. Note that I’m not asking for a “big” change of adding a rebuttal phase, but only for a chance for the author(s) to clarify/explain.
@anon, this is exactly the intention of a rebuttal phase. To quote from the AAAI instructions:
“The response must focus on any factual errors in the reviews and any questions posed by the reviewers. It must not provide new research results or reformulate the presentation.”
Sometimes an issue that may seem like a significant error may actually be just a typo or a badly-worded phrase. The problem is when there is a real error, but it seems that (or authors claim that) it can be fixed.
To my experience, unfortunately authors’ rebuttals get neglected and has almost zero effect on the reviewers’ comments/scores. And I am talking about the same main AI conferences.
One more argument towards rejection, and from my experience a really big one. Once an incorrect paper gets published, it is not obvious at all that the community ever realizes that there are errors there. I have seen it happening. This very effectively stops any progress on the problem since publishing a weaker result becomes almost impossible, and publishing while at the same time noticing that there is a mistake in the original paper is also often not that easy.
Another version of this phenomenon is when a paper gets published which kind of looks incorrect but it is so complicated that it is almost impossible to be sure. Then you do not even have the option of saying in your paper that it is incorrect, because you just do not know.
Here is something related (but independently observed):
http://cheaptalk.org/2013/04/16/i-move-that-the-aea-stop-publishing-papers-and-proceedings/
The sad truth is that many of the more empirical papers get rejected solely based on the fact that reviewers need *more* comparisons with other methods (even if the paper already shows its significance compared to a handful of methods). However, more theoretical work get reviewed very sloppy just because the reviewer is too busy/lazy to read it carefully, or even worse, in single-blind conferences when there is a well-known name present as one of the authors.
This may well be true, but I am hardly concerned about the fairness or labor moral of the referees.
We can look at it as a mechanism design problem: reviewers can (or are willing to) exert X amount of effort, which seems insufficient to properly verify correctness of technical proofs (and may or may not be sufficient for other types of papers).
So we should either find a way to increase X (which is hard); accept that review is insufficient; or lower the tolerance bar for errors (to incentivize authors exert more effort).
If the referees see how to fix the paper, then it can be accepted, and hope that the authors will fix things. Otherwise, I think that we should NOT trust the authors just because they claim it can be corrected. There are opportunities to publish a correct version later. We would not trust an author whose “proof” of P vs NP is fixable, so we should not do it either for less relevant results.
Conferences in other areas have shepherded acceptance for these instances. For example, if the bugs seem fixable, then a reviewer could track the authors’ progress towards implementing the changes for the camera ready version.