Friday, January 31, 2014

Some further interpretations of Hedia’s lincRNA data

I’ve been giving some talks on Hedia’s work on non-coding RNA at some meetings where people talk about non-coding RNA, and I’ve realized that we can actually make some further claims that we didn’t think about at the time. Quick recap: we show (by RNA FISH, natch) that knocking down non-coding RNA (linc-Hoxa1) at the transcription site results in transcriptional upregulation of the nearby Hoxa1 gene. We also show that the RNA binds to the protein PurB in vitro and in vivo, and that knocking down PurB destroys the anti-correlation in expression between linc-Hoxa1 and Hoxa1–normally, they are strongly anti-correlated, which is what led us to think that linc-Hoxa1 negatively regulates Hoxa1 in the first place.

One question in the lincRNA field is whether the RNA itself is what regulates transcription, or if it's just the DNA sequence. For instance, if you knock out the DNA sequence including the RNA, can you be sure that it’s the RNA that causes the effect per se, or is it rather the DNA acting like a conventional enhancer (with the RNA itself being just an unimportant byproduct). Our results showing that knockdown at the site of transcription show that the RNA itself is important: the DNA sequence remains intact, but removal of the RNA causes the change in transcription.

The next question is whether it’s just the transcription of the lincRNA that affects transcription, or whether the sequence itself is important. The clearest way to show this would be to just replace the lincRNA sequence in the cell line with different ones and see what happens, but we didn’t do that. However, I think our PurB result argues that the sequence itself is important for transcriptional regulation. If it were merely the act of transcription, then knocking down a potential cofactor would not have any effect, since it’s the RNA that matters. PurB is a purine-binding protein, and seems to bind to a hundreds-of-bases-long sequence of Gs and As in the RNA, so it seems very likely that the specific sequence of the lincRNA is important to the regulation.

In summary, I think the important points are that, in the case of linc-Hoxa1, the RNA itself (and not the DNA) is responsible for the regulation, and that it’s not just the act of transcription but the presence of RNA with a specific sequence that is necessary for the regulation to occur.  Whether this is the case in general is of course an area for further study.

Sunday, January 26, 2014

Simple tips to improve your presentations

Lots of stuff out there on how to give a good talk, but if I were to pick my top three tips that would really improve most academic presentations, I would pick these (courtesy of Uri Alon):
  1. Your presentation should be centered around you, not your slides. The slides are your props, but you are delivering the material.
  2. Make sure every slide has a title that is a complete sentence. Sentence, verb, object. This was transformative for me. It ensures that every slide has a point.
  3. Remove all other text from the slide (other than axes, etc.). Text distracts. The audience should not be reading text, they should be listening to you.

Wednesday, January 22, 2014

Uri Alon's "cloud"


I’m sure many of you have already heard of Uri Alon’s “cloud”, which is that time when you’re working on a science project and everything is unclear, all your original hypotheses no longer make sense, and you have the occasional existential crisis. I’ve gone through that myself several times with virtually every project I’ve worked on, perhaps with one exception. If you haven’t heard of the cloud, though, check out this video:


Monday, January 20, 2014

TurboFISH live demo video

So we finally got around to making a video of one of the coolest things in the lab, namely Sydney's turboFISH protocol.  In this video, she literally goes from cells growing in the incubator to RNA FISH images on the microscope in 12 minutes total!  Check it out:

Thursday, January 16, 2014

Storytelling in science

There were just a couple of interesting pieces on storytelling in science on Nature Methods (point, counterpoint). Basically, one article says that storytelling is an important part of communicating science, and the rebuttal points out many of the dangers of the storytelling viewpoint. I think these writings put forth many of the basic aspects of the debate: storytelling makes things simpler and more compelling for the reader, increasing interest in and broadening the audience for your work, but storytelling also disfavors alternative hypotheses, unbiased presentation of the data (i.e., non-conforming data = supplementary figure 38) and so forth. All valid points, and I’m sort of on the fence about storytelling.

Here are some thoughts on the matter that they didn’t really touch upon very much in these articles, though. Firstly, although I haven’t read nearly as many papers in other fields, I think that biology has a lot more storytelling than other fields. For instance, you might read a chemistry paper about the photostability of fluorescent dyes. No real story, but the work can be extremely high impact. I’m wondering if the reason we have more stories in biology is that, unlike in other fields, the universe of things you could study is so large that you have to devote a pretty significant portion of your thinking to explaining why you chose to study what you studied; i.e., why is this interesting? I could be completely wrong about this, but in physics, I think there are just fewer different (non-applied) phenomena to study, and so you don’t have to have a story–nobody would question why you study dark matter or the Higg’s Boson. In principle, I like the idea that one could just measure things in biology for the sake of filling in the knowledge of the universe, but in reality, I think we’ve all seen those “so what’s the point of this?” papers. On the other hand, it’s hard to really know the point of anything; we just make stories about how it could be important because of evolution, which is sort of a cop out. Anyway, I’m not sure which way I stand on this.

One other argument I have against storytelling is something fundamental to the very idea of storytelling, namely that a story has a beginning, middle, and an end. It’s the last part that’s the problem. By definition, the end means that it’s all wrapped up. Where does it go from there? What avenues do these results open to further study? In a way, the more satisfying the end, the less need for a follow up. Take, for example, our 2010 C. elegans paper with the title “Variability in gene expression underlies incomplete penetrance”. Well, the title pretty much sums it up. So what do we do from there? I guess you can find more examples of it to test the generality of the conclusion and so forth, but that’s definitely diminishing returns and not exactly how you want to start a new lab. Don’t get me wrong, I like that work a lot, but to me, it felt like that story had an end and I didn’t know where to go from there. Maybe that’s my own lack of imagination, though–any thoughts? On the other hand, our PLoS Bio 2006 paper was much more open ended, and I think it had a much broader impact. There, we characterized a phenomenon (transcriptional bursting and cell-to-cell variability) and made some measurements of it, which raised some interesting possibilities.

Which leads me to another few observations about biology and storytelling. In physics, you can make measurements of something important (like the g factor) and the results are meaningful without a story. In biology, it is very hard to make absolute measurements of something, so everything must be relative, like LGR5 is 5 fold more abundant in condition A than condition B. Even with RNA FISH, which does give absolutes, the context could be different ("oh, you used this media prepared on this day, which is why you got a different answer”). So there are no biological “constants” to measure. Maybe you could argue that measuring crystal structures is like measuring biological constants? Certainly not the endless compendiums of high throughput RNA abundance data, which are not really very good resources for a variety of reasons.

Another thing about biology is that the tools and approaches render negative results far less meaningful than positive results. The storytelling approach (as Gautham pointed out) is one in which you experiment based on “Wouldn’t it be cool if X?”, whereas the opposite result of “not X” would be considerably less cool. In principle, shouldn’t the negative be equally informative? In some cases, yes. But what about a genetic screen? What is the meaning of a negative result? Not much, I think. But Gautham’s point is well-taken. I think it’s a lot better to go about asking questions for which the answer is cool either way. Or, let’s say, where the answer is interesting either way. I never really was good at knowing what was cool to begin with.

Tuesday, January 14, 2014

On the practice of turning off all Matlab warnings, and how to avoid our most common one.

- Gautham

warnings off

It is common practice to turn off all matlab warnings, because often programs are written that produce them even under normal usage.

Warnings are there to alert the user that the program is doing something that the user may not have intended, or that this operation relies on some Matlab feature that is scheduled to change or disappear in a future version.

If you turn off warnings it means you don't want any help from any Matlab program. 

You don't want it to tell you that perhaps you may not have wanted the result it just gave you, or that the operation you just requested is unusual and that you may have made a mistake.

Think twice! 

Would you imagine turning off all warnings in your car because you are annoyed by one of them? Suppose you're one of the daring few who likes to drive without a seat belt. Are you equally sure you'll never leave the keys in the ignition?

There is almost always a way to tell the program, if the warning is not of concern, that yes, you do intend for that result, or that, yes, you are prepared for that eventuality.

As an example, the most common reason in our lab for the billowing forth of warnings is when you run imshow:

>> figure
>> imshow(rand(1024,1024))
Warning: Image is too big to fit on screen; displaying at 50%
> In imuitools/private/initSize at 72
  In imshow at 283 

Why is it doing this? Matlab thinks you may assume that the image is being displayed at 100% magnification (the default), which it will if the screen is big enough, and it wants you to know that you are looking at a scaled version. That is very nice of Matlab.


Here is the simple fix for imshow, quickly obtained by searching the web, and clearly stated in the documentation for the function:

>> figure
>> imshow(rand(1024,1024), 'InitialMagnification', 'fit')
>> 

That's it! Now it knows that you'll be happy with the scaled result because you told it specifically that you'll be ok with it.

If you cannot find a way to write your program so that Matlab is clear that it is giving you what you intended, you can turn off the specific warning you want to ignore. In the rare case that even that is not possible, you can turn off all warnings temporarily and then return warning reporting behavior back to what it was before you turned them off, which is a trickier task than one might think. (Scenario: you turn warnings off inside of a function and before your function had a chance to turn them back on some statement in between returns an error. What happens?)

See this reference on how to temporarily disable some or all warnings:

http://blogs.mathworks.com/loren/2006/10/18/controlling-warning-messages-and-state/

How many ways can you be right?

I was reading a little bit today about the Reinhart-Rogoff saga today, and it got me thinking about robust conclusions. For those of you who don’t know, Reinhart and Rogoff are a couple of macroeconomists who wrote an influential paper on the relationship between debt and growth. Basically, they argue in their paper that debt in excess of 90% of GDP is associated with significantly lower economic growth, like there’s some sort of a threshold effect. Politicians who are pro-austerity had then seized upon this research as a key piece of evidence in favor of austerity measures worldwide (causality be damned!). Now *that* is a high impact paper! The brouhaha began when a UMass graduate student (Thomas Herndon) attempted to replicate the results for a class project. Turns out there were a few issues with the original paper. One of the funniest/scariest ones was that they had an error in their spreadsheet that omitted some data from their calculations. I don’t know what’s funnier or scarier: 1. that nobody thought to double check the spreadsheet before, you know, shaping global economic policy, or 2. that serious academics are actually using Excel (Excel!) for this sort of “quantitative" work. Whatever, that’s a whole other can of worms that has been written about to death elsewhere. 

To me, a more interesting issue had to do with the particulars of how they treated their data. The question relates to whether the average by country or by country-year–some of the countries in the data set had a lot more data points than others. Basically, if you average the R&R way (average by country), you see a sharp decrease in GDP growth above 90% debt load. Averaging by country year (which seems to make more sense to me), this effect disappears, which is one of the points of the Herndon re-analysis. Note that R&R always took care to emphasize the median and not the mean, presumably because of outliers or something. Here’s the R&R data. It’s the -0.1 number at 90% and above that goes away in the Herndon paper.



Now, I suppose that R&R had strong reasons for doing the averaging their way (and have said so in public), although it sounds to me like reasonable people could disagree. And I suppose that reasonable people could argue about whether you should use the median or the mean. The point I think is interesting is that the conclusions can change remarkably depending on which seemingly reasonable thing you do. What got me thinking about it is this line from Tyler Cowen’s blog on the subject:
In the paper by the critics, the pp.7-9 discussion of “weighting by country” vs. “weighting by country-year” is very interesting, but the fact that it matters as much as it does makes me more skeptical about the entire enterprise.
Indeed, it does make me question whether these results have any merit at all in either direction! Usually, when we see this sort of stuff in our data, the first thing we check are the number of data points and the error bars–honestly, I’m amazed that they can get away without putting any sort of estimates or even discussion of confidence intervals on their data. See this link for a much more honest portrayal of this data (got the link from some article by Paul Krugman).

Makes me wonder about similar sorts of problems in molecular biology, particularly in the age of deep sequencing. I’ve definitely become very suspicious whenever I hear reports of something that seems to rely very heavily on some newfangled analysis of otherwise run of the mill data, and there have been a number of such high profile reports in recent years that ended up being bogus (which shall of course remain nameless…). Although to be clear, there are also a bunch of non-sequencing examples as well, such as how people quantify alternative splicing, etc. I just feel like robust results should be fairly clear and not particularly dependent on some weird normalization or what have you.

Uri Alon’s network motifs are a great example of something very robust to the particulars. For those of you who are not familiar with it, here’s the idea: given the transcriptional network of E. coli, it seems like particular subnetwork “motifs” are highly overrepresented compared to what a random network would give you. An example is negative feedback, where a transcription factor downregulates its own transcription. Now, one sticky point is what one means by a “random” network. There are many such ways to construct random networks–do you maintain the scaling of the connectivity? Number of edges? Whatever. The point is that the results were so significant that the p-values were something like 10^-20 or less no matter what sort of random network you choose as a null model. So I believe it! I think it also illustrates a good general practice when you are faced with decisions in analysis: if you have to make a choice upon which reasonable (or even moderately unreasonable) people could disagree, just do both and see what happens. If the results are consistent with your interpretation either way, all the better. If not, well, you better make a strong case for your particular analysis method... or perhaps its time to revisit your model.

For instance, in Marshall’s paper on chromosome structure, we saw one really strong gene interaction pair and several more weak interactions. There were multiple ways of generating a p-value for each pair, and they all gave slightly different answers. But only one pair really stuck no matter what we did with the data, and that’s the one we reported.

Of course, things like chromosome structure are fairly esoteric, and so flimsy claims are considerably less influential in the world at large than the Reinhart-Rogoff paper. Now, I don’t think that austerity politics, be they right or wrong, has its roots in this one paper, so it’s perhaps a bit unfair to hold these two solely responsible for austerity measures globally. But it’s disingenuous for them to say that they had no role in the debate, since they were literally in the room while the political types were deciding on policy. Honestly, I personally would not have been comfortable or confident giving advice in such a situation based on the data they show in their paper. But I’ve certainly never been in such a position. And we’re all publicity hounds in this line of work. Consider the following line from the same Tyler Cowen blog post:
My own view, as you can read in The Great Stagnation, is that the primary mechanism is slow growth causing high debt/gdp ratios, not vice versa. In any case this is by far the most important issue, whether or not you agree with my take on it.
Shameless book plug! Is that an economist thing? Anyway, can I tell you some more about Marshall’s paper now? ;)

Friday, January 10, 2014

Our lab's most highly read work


Paul just remarked the other day that this blog is probably the lab’s most high impact publication. I’m not 100% sure, but I think he’s probably right. Should I be making a sad face or doing the happy dance? Or both at the same time?

Tuesday, January 7, 2014

Kid biomechanics

Tonight I was sitting on the bed with my baby daughter, and when I got up, I helped her down by holding her hand and basically lifting her off the bed and setting her down. Got me thinking: what if someone did that to me? Like a gigantic hand coming down, grabbing my hand and just lifting me up by one arm? I would be totally messed up! But my daughter didn't mind at all. Just another example of how kids are just put together a little bit differently. Don't believe me? Try going on some monkey bars sometime–I did that once a little while ago, and it was painful in ways both cruel and unusual. Probably some interesting biophysics there.

Oh, and I wish I had a bunch of pictures of all the weird positions my kids have fallen asleep in. Here's one I found on the internet that captures the spirit.


Kent Beck on Ease at Work

- Gautham

This is a straight link to a youtube video by a programming legend on a topic that is just as relevant to scientists as it is to programmers.

I've always been suspicious of scientists who expound on their martyrdom, because they are usually the same folks who swing the pendulum all the way the other way to wizard when the time comes. And how much of a waste of energy it is to keep up that differential between perception and reality.


Saturday, January 4, 2014

Online textbooks are all about the execution

So I'm gearing up to teach the Biotransport class in my department this spring, and for the past three years, we've taught it using a book from Wiley called "Fundamentals of Fluid Mechanics" or something like that (author is Munson, decent book, I suppose). We also enabled a feature called WileyPLUS, in which there's an online site that accompanies the book. In theory, this is a great thing. For the professor, you can assign problems online, complete with grading and evaluation, even with each student getting slight variations of the problem. For the student, you can get instant feedback on the problem, knowing whether you got it right or wrong, and even get a link to the section of the book that corresponds to the problem. What's not to love?

Apparently, according to the students (and us too), there is a lot not to love–so little, in fact, that we're dropping it this year. Why? Because while the idea is nice in theory, the execution is so incredibly poor that we all end up wasting more time than we save with it. Even now that we've stopped using it, it's still wasting our time!

Let me explain: all the problems we assigned last year were still in the system, and in order to get the list of problems out of there, you have to log in. Which I did, but somehow the previous class didn't appear. So I got onto the online chat help, where a (very helpful, I should say) fellow helped me out over the course of 40 minutes (!). First problem: my Wiley Rep somehow keyed in my e-mail incorrectly, so I couldn't log in to see my stuff. Here's the transcript of how to fix that:

Arjun Raj: I'm trying to login with arjunra@seas.upenn.edu, and it's getting hung up on the login.
Arjun Raj: Can it be that my Wiley Rep transferred everything over to a new account or something?
Sean F.: yes...
Arjun Raj: Can you change the e-mail back?
Sean F.: Ok if you got logged in you can change the email
Sean F.: in the upper right corner click on my account
Sean F.: then click on instructor profile
Sean F.: finally click on maintain e-mail address to update your email.
Arjun Raj: Okay, I got it.
Sean F.: great now that that is take care of we can work on what you contacted us for

Now why in the world would you click "Maintain e-mail" to actually change your e-mail? I understand this is probably some sort of computer engineer-speak that crept its way in, but still. Anyway, then came another 10 minutes of indecipherable button clicking at the behest of Sean F., with endless confusing options at every turn. Then to actually get a simple printout of the problems:

Arjun Raj: Yes, or (even better) all the questions from a particular assignment bundled together. Best is even with answers, but whatever.
Sean F.: the easiest thing is would probably be to print out the assignment.
Arjun Raj: How do I do that?
Arjun Raj: I couldn't find the option.
Sean F.: select properties from the actions column then press go
Sean F.: then scroll down to the bottom of the page and you will see a button to print assignment
Arjun Raj: Ah, okay, got it, thanks! That's just what I was looking for.
Sean F.: Great.

Here's what it actually looks like:



Now why on earth would anyone click there to print out an assignment? Anyway, whatever, I'm maybe not I'm making a compelling case for how annoying this system is to work with, but believe me, it is. Now to be clear, the tech support guy really knew her/his stuff (I suspect the person's name was not actually "Sean F.") and was super helpful at navigating through all the crazy options. But it shouldn't take a 40 minute support chat to do some basic things, nor should it require this much hand-holding just to figure out how to print out an assignment.

And for students, who are the end users, things are not much better. For example, to submit their answer for the homework, they have to use this complex system of codes to enter in their answer symbolically. This proved to be so frustrating for the students in past years–often requiring 10 minutes or more just to key in the answer right–that they basically just gave up, and there's no way that we could in good conscience grade homework that they had to submit this way.

The bottom line is that the point of online anything is to boost people's productivity by saving time and providing a more rich experience. That is the promise of online textbooks, but it's one that at least WileyPLUS simply does not deliver. It's a shame, also, because it wouldn't be that hard to make it a whole lot better–we're not talking about making Google Docs or something here, just some very common sense design improvements. Design people, help!

Thursday, January 2, 2014

To share or not to share?

I was just talking with a friend the other day about a new method his lab has developed, and about whether it’s better to have a method that is so easy that anyone can do it (and so you lose your competitive advantage), or a method that’s so hard that you’ve effectively cornered the market on those results for the foreseeable future. For the record, both my friend and I are in the former camp, but I’ve met many people who are in the latter.

I think the “keep it for myself" attitude is a natural response to a competitive scientific world, and one in which tenure in some ways hinges on your having developed a particular scientific niche for yourself. I think scientists often think to themselves “This new method is a goldmine! All I need to do is just turn the crank and I’ll get all these awesome papers, one after another!” The corollary is, of course, that you’d have to be nuts to get everyone else up and running with your method.

But in my experience, the novelty of a new method wears off a lot faster than you think. It’ll get you some papers, sure, but within a year or two, your reviews will go from saying “This paper shows an interesting application of an exciting new method from XYZ” to stuff like “… even if some people would find this approach already too old-fashioned for systems biology.” [Latter is a quote from an actual (generally positive, thanks!) review just a couple years after publishing the method.]

In our case with RNA FISH, we never really get anything in our reviews about the method being cool in and of itself anymore, and haven’t for a long time. Maybe if you hold on to the method for a while, you might get another paper or so. But pretty shortly thereafter, you really need to do some new science, and it better be good.

The alternative is to try and get your method out there and into as many hands as possible. Publish complete protocols, answer e-mails, work on commercialization, etc. That’s the tack that we’ve used with RNA FISH, and I think in the end it’s been very good for the lab. Yes, others can do RNA FISH now, but that’s the point! Instead of being the ones with some niche, narrowly defined by the work our lab produces, we can say that we in some small way helped shape a method that has had a real impact. Cool! It helps also that the method is simple, robust and commercially viable. The flip side of that is that now some papers don’t even cite our methods paper, they just say “We used Stellaris probes (Biosearch Technologies)”. But whatever. The point is that our work is out there in people’s hands, and that benefits us, in some cases quite directly. Consider, for example, the following quote from a (nicely positive, thanks!) review of one of our recent papers: "Many labs around the world are already using the Raj probes, in multiple systems, so the methods are there for quick translation.”

Actually, the ultimate compliment for methods stuff is to be essentially a required tool in the toolkit. Like qRT-PCR: a reviewer can easily ask you to do qRT-PCR and that would be considered a reasonable request. Looking forward, I think using CRISPR-based genome editing is likely to have a similar fate–in a couple years, people might just ask you to CRISPR a gene in or something like that, and it would similarly seem like a reasonable request. The case for this is certainly helped by the fact that folks like Feng Zhang have made excellent resources for using this tool widely available to all. Thanks! Meanwhile, time to get busy CRISPRing in the lab…