Following the noble example of Jonathan Eisen, and in the spirit of open science, here is the story behind our recent paper (not so recent because I took time to getting around to this, but here it is):
Piasecka B, Lichocki P, Moretti S, Bergmann S, Robinson-Rechavi M (2013) The Hourglass and the Early Conservation Models—Co-Existing Patterns of Developmental Constraints in Vertebrates. PLoS Genet 9(4): e1003476. doi:10.1371/journal.pgen.1003476
We’ve been interested in the hourglass pattern (the idea that mid-development is more evolutionarily conserved than early or late development) for a while, but our early efforts to investigate it using genomics and bioinformatics were not very successful. We did find a significant impact of development of molecular evolution, but nothing like an hourglass (mostly an « early conservation » pattern – Roux et al 2008, Comte et al 2010). And then in December 2010, out came a zebrafish hourglass paper by Domazet-Lošo and Tautz which shared the cover of Nature with a fly hourglass paper. Domazet-Lošo and Tautz reported a very fine grained microarray experiment over zebrafish development, and an original analysis indicating that genes expressed in mid-development would be older, thus more conserved.
I was very excited by this paper, and looked forward to building on it to analyse in more detail this elusive hourglass pattern. But when we dug more into it, we found some problems with the analysis. Indeed, re-analysing the data using standard microarray procedures produced very different results, with older genes in early development. At this point, we re-contacted Tomislav Domazet-Lošo and Dieter Tautz to discuss our findings. Of note, they were very generous and open about sharing their data and about discussing as soon as we contacted them upon reading the paper, for which I thank them sincerely.
First, I need to explain how the original study was performed. The authors performed an incredibly detailed microarray measure of gene expression at 60 stages of zebrafish ontogeny. Separately, they calculated the age of each gene, as the age of the common ancestor between zebrafish and the furthest species with a sequence genome in which an homolog could be detected (sounds complicated but straightforward if you think about it). The ages were not taken in millions of years, but in ranks along the phylogeny: if my ancestor of interest is three nodes from the root of the tree of species-with-genomes-sequenced, it gets a value of 3. These values were then combined in the « transcriptome age index », or TAI, which is defined for each sample (here, each stage of ontogeny) as: the sum of ages of the genes weighted by their level of expression at this stage. E.g., for the gene with age « rank 3 », it contributes to TAI 3*expression at each stage. Thus older genes contribute smaller ranks, and more expressed genes at a stage contribute more to the TAI of this stage. The authors thus interpret a strong dip in TAI around pharyngula as evidence for stonger contribution of older genes at this ontogenic period.
Our observations were:
- the original microarray data, like most transcriptome data, is log-normal, i.e. there are many low values and a few extremely high values. Log-transformation recovers something pretty close to a Normal distribution, and is common and recommended practice in transcriptome analysis (including in other « hourglass » papers).
- performing the TAI after log-transformation, the original pattern is lost, and a pattern of older genes in early development is recovered.
- using other metrics of the relation between gene age and expression over development always recovers this pattern of older genes in early development: correlation between gene age and expression level at each stage; calling genes present/absent and calculating the average gene age at each stage; ratio of expression of oldest to youngest genes.
- also, alternative ways of treating the data recover adult male biased genes as younger than female biased genes, consistent with the literature but in contradiction with the original paper.
We also noticed that the paper discussed in detail variations in the TAI which were included within its confidence interval, and even in one case a variation entirely due to one outlier probe (whose effect is removed by log transformation).
The answer from Tomislav Domazet-Lošo and Dieter Tautz can be summarized as follows: the TAI in its original form is intuitive and has the nice property of always adding up to the same for different microarray measures, which is not the case of a TAI based on log-transformed data. There was also some discussion on the proper role of mathematics in biology.
In coordination with them, we submitted a letter to Nature, which was reviewed and rejected as being « too technical ». I wanted to submit this elsewhere immediately, but Barbara Piasecka (student who took the lead on this re-analysis) wanted to improve much more on our previous work and this one.
Now I made a mistake which I regret, which was not making our original correspondance for Nature available in ArXiv. By not doing it we delayed uselessly public discussion of any issues with the TAI, which was later used in other papers.
Anyway, Barbara analyzed the microarray data with a modular approach, finding nice modules of gene expression specific of groups of ontogenic stages. Her first analyses of these modules were a bit disapointing, since we either found no pattern, or an early conservation pattern which we already knew. Then we had the idea to look at non coding conservation, and there was a striking « hourglass » type pattern. Interestingly, this is found not only is sequence conservation (conserved non coding elements), but also in transposon-free regions, which I find fascinating because they provide an orthogonal view of some type of constraint on a genomic area, and in conserved micro-synteny.
Once we had a nice paper (and a nice poster, here at ECCB) (UPDATE: the poster in FigShare), Evo-Devo colleagues encouraged us to submit it to a journal with a wide readership. But both PLOS Biology and PLOS Genetics turned it down. I have found that it can be very difficult to get this type of interdisciplinary paper published (BTW, in this case, I did submit it to ArXiv). It contains evolution, development, bioinformatics, genomics, molecular evolution. Where does it belong? Our 2008 paper was turned down by Mol Biol Evol before « falling up » to PLOS Genetics. The latter journal saved the day again, while we were shopping for other open access alternatives. After the publication of the plant hourglass paper, and some very constructive discussion with Greg Barsh, Editor in Chief, the paper went throught the submission process and was accepted with minor changes. Waouh! End of a long story.
What can we take home from this story? First, that biology is complicated, and insisting on answers such as « the hourglass exists (and explains diverse data) » or « it doesn’t » may not be the best strategy. Second, that the technical details are very important. In fact, I would say that they are an essential characteristic of science. And related to that, third, that the emphasis of journals such as Nature on « broad impact » or whatever it is can cause them to simply ignore the « technicalities » on which the correctness of the conclusions depend. Forth, that the refusal of Nature to publish such a letter delayed considerably a discussion of the limitations of a widely discussed paper. Fifth, that next time I have remarks on a Nature or Science paper, I’ll first relay them on blogs and in ArXiv, rather than keep them on my hard disk. Sixth, that open minded and interactive editors in chief are very important to publishing inter-disciplinary science.
And my last point will be on the casualties of the impact factor cult. Not only is a paper widely assumed to be important because of where it is published, but some of the reviewers of our correspondance with Nature, while abstaining from judging the content of our analysis, wrote that probably we were doing this just to get a Nature paper. No we were not, we were following the proper and indicated procedure when there is an issue with a published paper. This Nature/Science effect is so strong that it twists all normal scientific discource. And that is truly a pity.