What is a Scientific Paper?

David Bradley reporting from Science Online London 2009 (#solo09)

The “modern” form of scientific publishing began in the 17th century when gentlemen (rarely has it been a lady until very recently) with an inquisitive bent decided it would be a good idea to share the results of their endeavours among their peers, for assessment, confirmation and debate. August bodies that published these seeds of enlightenment as well as the occasional monstrous calf (Robert Boyle quote) grew into the learned societies we still know and love today. Moreover, on the back of scientific industry these organisations and countless commercial concerns since have built vast empires to publish and profit from the growing piles of scientific information.

From Boyle’s “monstrous calf” in 1665 to Watson and Crick’s seminal, single page “paper” in Nature in 1953 humbly announcing that they may have unlocked the secret of life, the status quo has remained…well…the same. There were innovations over the decades, mainly in the arcane areas of typesetting and lithography. With the emergence of the personal computers and the internet, however, things began to change ever so slightly. The flat and static nature of research papers remained pretty much the same, but were copied from treeware to PDF formats and online archives.

With the emergence of the age of digital media, social networking, online collaborative tools, and new business models for publishing, however, the late 90s saw the first waves of a sea change that would, dot.com froth aside be the first ebbing of a revolution the full impact of which science is yet to observe.

In 1665, Robert

At #solo09, Lee-Ann Coleman of the British Library, asked “where next?” There are millions of research papers out there now, so how does science use new technology to mine this information data seam? More urgently though, what format should the modern scientific paper take? It is obvious from the way many pioneering scientists are working today, among them many conference delegates such as Richard Grant, Cameron Neylon, Peter Murray-Rust, that things are changing significantly.

With biology papers and particle physics research having vast author lists and mounds of technical data, should the science blog become the narrative resource, the results and discussion, for a research paper
, with annotated databases and repositories of processes acting as the old method and supplementary data sections?, asked Coleman.

Katherine Barnes of Nature Protocols explains how NPG is already taking steps towards such a view of the scientific paper. This entity is unusual in that it publishes protocols, recipes of how to do the experiment, rather than primary research papers, with a review of the method and detailed description. These “Protocols” are unlike traditional peer-reviewed papers in that they are not peer reviewed in the conventional sense but almost instantaneously critiqued by the community. What is left unsaid, of course, is who should be the critics, are they anonymous and who pays for their services?

The digital video journal JOVE has almost taken this protocol approach to its logical conclusion where each “paper” is an improved video presentation of a protocol. Such is their credibility in this age, that they are indexed in PubMed. It is presumably relatively expensive, but really useful nevertheless, Barnes suggested. Although one audience comment suggested that making a video would be one of the least costly parts of a research project overall.

Like arXiv and the ChemWeb chemistry pre-print server before it, NPG is also touting Precedings as a pre-print journal for biology and also offering innovations in Nature Chemistry, such as 3D structures, links to data, citations and download data. All apparently very innovative but something that Henry Rzepa and Peter Murray-Rust were proving way back in the mid-1990s with the ECTOC and ECHET virtual conferences. The pioneering efforts of those conferences and the likes of ChemWeb and BioMedNet, which were web 2.0 years before the web 2.0 of reflective logos and so-called social media seem to be neglected in discussions today. I digress.

Barnes said that while NPG is always thinking of ways they can improve articles, handle big data sets, add movies, and make the traditional paper (invented wayback when) as useful to scientists today as possible. The publisher still maintains a traditional view as to what a basic paper is, but nevertheless is asking how it can move forward to help researchers in the future.

Theodora Bloom Chief Editor at PLoS Biology did what she described as a whistle-stop tour of what’s wrong with scientific papers at the moment. We’ve come along way since teh 1953 paper by Watson and Crick, she said, but asserted that “Papers” don’t really work now. There are some pressing problems that must be address, not least how to preserve a master copy for the record.

One of the problems that papers present when they meet head on with the digital, which is probably more important to authors than publishers is that the likes of PubMed rarely index the complete author lists for papers with huge numbers of authors such as those that emerge from genomics programs. The inclusion of complete methods is a point of contention, some publishers do, some don’t, some reserve those details for supplementary information. However, they’re handled, it doesn’t seem that an optimal standard approach has been reached that allows other scientists to quickly ascertain the protocols and attempt to reproduce them, an essential part of the validation of science, of course.

Moreover, a paper may have 1500 genes or 25000 images, where and how should those be published or archived? Asked Bloom. How would they be date stamped and if, as some studies claim, many scientists cannot trace the originals of published images and data, how do we preserve the provenance of a published scientific result and so allow technology to detect fraud and inappropriate manipulation?

We need a snapshot database of these “big” papers, suggested Bloom. Again, who pays, who has the storage, virtual and offline. It could be argued that the likes of complex 3D protein structures and such are the essence of a paper anyway, and that the narrative description is just an access point and could be handled independently in the electronic lab book or through a blog. Indeed, if a “paper” is entirely machine readable and digital then where does the author express their views. Perhaps we could go back to the one-page narrative epitomised by Crick and Watson’s humble publication of 1953. This component could become an aside for the “paper”. What is the primary data? asked Bloom.

She also suggested that the time is ripe for integrating the good-old reference section and database information for real-time analysis of author activity and results. Publishing has moved on since 1953, but Bloom also asserts that for all the We have come a long way in the fifty years but not quite far enough, already multiple versions of papers available online and the rich semantic material to work requires a free and open access to the articles, as provided by the PLOS model. Other interested parties might disagree.

The final speaker in the session, Enrico Balli points out that some organisations are already working in the new age of scientific publishing in which the definition of a scientific paper has evolved significantly already. SISSA has published a small number of particle physics papers in its journal and started to run “non-printable” types of papers in the last year in Proceedings of Science (pos.sissa.it), papers such as proceedings, regular papers, educational activities, conference notes etc…

SISSA is also working alongside the UK’s Institute of Physics to publish normal papers alongside any kind of other content attached to those papers. Strange papers include a manual for the software used by a part of the physics community, for instance. It’s not a paper in the strict sense, Balli explained, it’s a manual, although written in a style as if it were a manual to conform to what reviewers expect of a paper. He also highlights the CERN/LHC Atlas project, the biggest particle physics experiment with 5000 people and so enormous author lists. The “paper” is the manual for Atlas and has been published in a journal to explore every single nut, every single bolt. The author list covers twenty screens alone and if it were printed as a traditional paper might be a metre and a half tall.

Balli also asks, what of the data sets in particle physics for recreating experiments. The LHC will create unbelievable amounts of data, everyone will need to cite the data, but of what will the papers that emerge from this vast international collaboration consist? A comment from the floor pointed out that reproducibility of this data might be nice but is probably irrelevant for such enormous experiments.

Balli further points out that they are creating a new un-journal the Journal of Stuff which will be nothing like a traditional journal but allow physicists the opportunity, perhaps with peer review, to publish the stuff they need to publish, data sets, manuals etc.

Nevertheless, to mix a metaphor the seeds are being sown after several centuries of evolution, perhaps it’s time Boyle’s enormous calf was put out to pasture.

Pictured left to right: Katherine Barnes, Lee-Ann Coleman, Theodora Bloom, Enrico Balli

Incidentally, there was quite a lively Q&A after the speakers had done their set pieces with Cameron Neylon, Peter Murray-Rust and others pitching in with how they consider various parties in the publishing industry being to blame for different limitations in innovation when it comes to research papers, while others such as Nature’s Maxine Clarke defended the publishers’ corner to some extent.

Martin Fenner has aggregated many of the excellent posts from #solo09 that have already been published from the prequel, the conference itself, and the various breakouts etc. Although I wrote this post on the train home on Saturday, didn’t want to publish it until today, so these guys were ahead of me with their reports: