Sunday, May 10, 2015

Retraction in the age of computation

tl;dr: I’ve been wondering recently whether we need to reexamine our internal barometers for retraction now that computational analyses are a bigger part of our work in biomedical sciences, in which the line between data and interpretation are somewhat more blurry. I’m not sure what the answer is, but I would definitely lean on the side of not retracting because of the stigma of “shady data” associated with retractions.

I started thinking about this because I saw Yoav Gilad’s reanalysis of some previous expression profile data and showed that the “interesting” finding went away after correcting for batch effects. Someone on Twitter asked whether the paper should be retracted. Should it?

I grew up with the maxim “Flawed data, retract; flawed interpretation, don’t retract”. I think that made a lot of sense. If the data themselves are not reproducible (fraudulent or otherwise), then that’s of course grounds for retraction. Flawed interpretations come in a couple varieties. Some are only visible in hindsight. For example “I thought that this band on the gel showed proof of XYZ effect, but actually it’s a secondary effect due to ABC that I didn’t realize at the time” is a flaw, yes, but at the time, the author would have been fine in believing that the interpretation was right. Not really retraction worthy, in my opinion. Especially because all theories and interpretations are wrong on some level or another–should we retract Newton’s gravitation because of Einstein?

Now, there’s another sort of interpretational flaw, which comes from a logical error. These can also come in a number of types. Some are just plain old interpretational flaws, like claiming something that your data doesn’t fully support. This can be subtle, like failing to consider a reasonable alternative explanation, which is a common problem. (Flawed experimental design also falls under this heading, I think.) Certainly overclaiming and so forth are rampant and generally considered relatively benign.

Where it gets more interesting is if there is a flaw in the analysis, an issue that is becoming more prevalent as complex computational analyses are more common (and where many authors have to essentially trust that someone did something right). Is data processing part of the analysis or part of the data? I think that puts us squarely in the grey zone. What makes it complex is the interplay between the biological interpretation and the nature of the technical flaw. Here are some examples:
  1. The one that got me thinking about this was when Yoav Gilad reanalyzed some existing expression profiles from human and mouse tissues. The conclusion of the original paper was that human and mouse profiles clustered together, rather than by tissue (surprise!), but upon removing batch effects, one finds that tissues cluster together more tightly than species (whoops!). Retraction? Is this an obvious flaw in methodology? Would it matter whether people figured out the importance of batch effects before or after it was published? If so, how long after? I would say this should not be retracted because these lines seem rather arbitrarily drawn.
  2. Furthermore, if we were to retract papers because the analysis method was not right, then we would go down a slippery slope. What if I analyze my RNA-seq using an older aligner that doesn’t do quite as good a job as the newer one? Is that grounds for retraction? I’m pretty sure most people would say no. But how is that really so different than the above? One could say that in this case, there is little change in the biological conclusion. But there are very few biological conclusions that stand the test of time, so I’m less swayed by that argument.
  3. Things may seem more complicated depending on where the error arises. Let’s take the case of RNA/DNA differences as reported by sequencing, which was a controversial paper that came out a few years back. Many people provided rebuttals with evidence that many of the differences were in fact sequencing artifacts. I’m no expert, but on the face of it, it seems as though the artifact people have a point. Should this paper be retracted? Here, the issue is allegedly a flaw in the early stages of the analysis. Does this count as data or interpretation? To many, it feels like it should be retracted, but where’s the real difference from the two previous examples?
  4. I know a very nice and influential paper in which there is a minor mathematical error in a formula in part of the analysis method (I am not associated with this paper). This changes literally all the results, but only by a small amount, and none of the main conclusions of the paper are affected. Here, the analysis is wrong, but the interpretation is right. I believe they were contacted by a theorist who pointed out the error and asked “when will you retract the paper?”. Should they retract? I would say no, as would most people in this case. Erratum? Maybe that’s the way to go? But I am somewhat sympathetic to the fact that a stated mathematical result is wrong, which is bad. And this is a case in which I’m saying that the biological conclusion should trump the analysis flaw.
Overall, I think the issue of how to deal with problematic papers in which errors involve sometimes murky computational and analytical methods is a difficult one, and I would say that it’s maybe worth figuring out what our standards are. I think the real question is whether computational processing of data is part of the data or part of the interpretation, and I think there are reasonable cases to be made either way. It’s tricky and slightly different than with experiments. If someone does a crappy experiment (like used the wrong buffer), then those data would be marked as irreproducible, and thus could be subject to retraction. If the computational pipeline is documented but has a bug, then technically it’s replicable, if not reproducible. So maybe one way forward is to say that bugs are retractable but methodological flaws are not?

I realize this is a pretty high bar for retraction. For me, that’s fine because, practically speaking, I think it’s far better to just leave flawed papers in the literature. Retractions in biomedical science come with the association of fraud, and I think that associating non-fraudulent but flawed papers with examples of fraud is very harmful. Also, perhaps the data is useful to someone else down the road. We wouldn’t want the data to be designated as “retracted” just because of some mistake in the analysis, right? But this also will depend on what point the data is considered data? For instance, let’s say I used the wrong annotations to quantify transcript abundance per gene and report that data. So then the data is flawed. But probably the raw reads are fine. Hmm. Retract one and not the other?

Anyway, I think it’s something worth thinking about.

Update, 5/12/2015: Lots of interesting commentary around this, especially in the case of the Gilad reanalysis of the PNAS paper. Leonid Kruglyak had a nice point:



Sounds reasonable in this case, right? I still think there are many situations in which this distinction is pretty arbitrary, though. In this case, the issue was that they didn’t watch out for batch effects. Now, once people realized that batch effects were a thing, how long does it take before it’s considered standard procedure to correct for it? 1 year? 2 years? A consensus of 90% of the community? 95%? And what if it turns out 10 years from now that the batch effect thing is not actually a problem after all and the original conclusion was valid? These all sound less relevant in this instance, but I think the principle still applies.

Great point from Joe Pickrell:



I really like the idea of just marking papers as wrong in the comments, perhaps accompanied by making comments more visible. (A more involved version of this could be paper versioning.) In this case, the data were fine, and were the paper retracted, then nobody could do a reanalysis to show that the opposite conclusion actually holds (which is also useful information).

4 comments:

  1. I think the best option is for journals to allow different versions, the way archives do. I see no good reason not to do so in this digital age. Then the bar for retractions can be very high and all other cases as simply corrected by a subsequent version, and all versions are part of the official record.





    ReplyDelete
    Replies
    1. Yes, I think that's the best idea. Maybe can approximate that in the comments section.

      Delete
  2. I generally agree - but the PNAS paper by Snyder's group (Lin et al, http://www.pnas.org/content/111/48/17224.full) the one the Yoav Gilad re-analyzed, specifically considered batch effects and claimed they had fixed that. Their entire result - and the only reason for publishing the paper - is that they found this counter-intuitive effect that contradicted previous results. (Previous results showed clearly that gene expression patterns cluster by tissue; i.e., human and mouse brain tissue show similar patterns. The flawed new study claims that human tissues are *more* similar to each other, across tissues, than they are to mouse.)
    Thus the paper *would not have been published" without this result. So they used flawed methods to get a result that got them a paper. In this case I argue they should retract it - otherwise they are rewarded for their sloppy methods.

    ReplyDelete
    Replies
    1. I can see the argument that this paper should be retracted because the alleged error is egregious, and I can't say I know remotely enough about the methods here. That said, I'm somewhat worried about applying the criterion that the paper would not be published without this result. Consider Pauling's DNA structure. Also based on some flawed assumptions, and paper would not have been published otherwise, and just plain wrong–and still in the literature to this day. Does it really hurt anything by being there other than Pauling's reputation?

      On the other hand, if their methods contained bugs or other problems (i.e., they claimed to have de-batched their results but somehow screwed up the execution of the de-batching), then I think I would consider that retractable. I am looking forward to seeing the details of Gilad's re-analysis and the response from the authors.

      Delete