PaperID – An Open Source Identifier for Research Papers

As a journalist, I receive a lot of press releases that cite “forthcoming” papers. Depending on the publisher one can usually find the paper in a pre-press state on their website. However, it’s often the case that the DOI does not go live at the same time as the embargo expires on the press release, and so I might legitimately publish an article about the research I cannot use the DOI as the reference and must use the direct URL for the paper. Unfortunately, some publishers then move the paper when the paper publishes, so the link I used ends up broken.

Moreover, this cannot be useful for authors themselves in that a paper that does not make the grade at the International Journal of Good Stuff and ends up being resubmitted to the Parochial Bulletin of Not So Good Stuff will gain a different identification code along the way.

Will Griffiths on ChemSpider was recently discussing the possibility of an OpenURL system. I think we could go one step further.

A simple standardized way of generating a unique identifier for each and every paper that would be transportable between different phases of the publication process from submission to acceptance and publication, or rejection and resubmission elsewhere, would be a much better way of registering papers. The identifier would be created at the point when the final draft is ready to be mailed to the first editorial office in the chain, perhaps based on timestamp, lead author initials, and standard institution abbreviation. It could be the scientific literary equivalent of an InChIkey for each research paper.

There would have to be a standardized validation system, so that authors were sure to be using the right system, but that could be established relatively painlessly through the big institutions, be networked and have cross-checking to avoid duplicates. And, of course, be open source, open access.

The possibilities are endless, PaperID would create an electronic paper trail from author through preprint, in press, to online, and final publication. It might even be back-extended into the area of Open Notebook Science and equally usefully into archival, review, and cross-referencing.

DOI is useful most of the time OpenURL sounds intriguing, but PaperID could be revolutionary.

Author: David Bradley

Freelance science journalist, author of Deceived Wisdom. Sharp-shooting photographer and wannabe rock god.

8 thoughts on “PaperID – An Open Source Identifier for Research Papers”

  1. By this I know all the meaning…It is a great idea! This is a service
    to the authors, where the publishers must still be involved. It’s
    quite Web-2.0-sounded. And the credibility of a paper is directly
    related to the author(s), or a web account. And the ethics of the
    whole science community will rely on a Web 2.0 interactive system! At
    that time the username of G. M. Whitesides will be more famous than
    his real name.

    And if it is not the DOI, will it be an improved Connotea?

    Will it be easier to cheat in such a spontaneous system? Must it be
    adopted and censored by an official organization of any kind? This is
    equal to saying that the current science community has totally accepted
    the web as their main way of communication. At that time we don’t need
    to talk about DOI and Connotea anymore.


  2. Yes, it is quite web 2.0. I wonder whether it might somehow be tied in with the whole peer-review process and actually whether or not it could become a way to sidestep publishers altogether. Of course, that’s opening a whole new can of worms, but what if publishing simply became the process of acquiring a PaperID, the paper is timestamped and version history is maintained, peer reviewing becomes universal and papers are never rejected or accepted they stay online whatever. Publishers would then become aggregators of the PaperID and may run journals carrying abstracts in much the same way that PubMed does now.


  3. baoilleach, you can probably tell that I haven’t actually devised an enabling strategy for this concept, I was just throwing the idea into the ring for discussion and imaging some kind of universal system that would allow researchers, right from a paper’s first draft to label it with a standards-compliant and unique tag, which I flippantly called PaperID, and which I imagined would act for an individual paper analogously to an InChIKEY for a compound.

    Do any other ChemSpy readers have any thoughts on whether a PaperID is (a) a good idea (b) possible (c) pointless?


  4. The OpenRef system described on my blog specifically doesn’t do anything for pre publication papers, but the thinking behind it may inform a possible PaperID.

    First of all, at what stage would a PaperID come into play as a useful entity? I would say, after acceptance, because the content of the paper might change if resubmitted to the same, or a different, journal. For example, the authors might change if the reviewers requested additional experimental studies. However, you suggest applying a PaperID before that.

    The OpenRef is only for published papers, and is based on the information contained in a typical journal citation after removing redundant information. For a system that includes prepublication papers, what information can we use: the authors, the title, the journal, the date of first submission. Including both the title and the authors would (probably) be redundant. The date of submission is a bit of a problem as you may not know it, and is not included in typical citation information (that is, in a list of references in a paper).

    OpenRef builds on the existing system for citing papers. How does that system handle “In press” articles?…well, they don’t really, except to include “In press” instead of the year, volume and pages. In other words, a lookup is required to actually find the paper. In terms of programmatic use, this would mean using an API to search for the paper and return a result. In terms of having something to link to on a web site, a link to a search engine that would return the correct paper seems the most appropriate solution, be it Google Scholar, PubMed or the journal’s own website.


  5. I have read your post and still don’t quite understand what’s the
    difference between the PaperID and an improved DOI. You said a rejected
    paper may also have an ID. I think this may create a huge number of
    idle ID in the database. If only accepted papers (those the journal
    have promised to publish) will have an ID, this makes no difference
    from DOI. The problem with DOI is it doesn’t always become available
    in time for the newest in-press papers (especially from the John Wiley
    database). But I think this can be solved by better cooperation
    between the DOI orgnization and those publishers.

    Another idea of the PaperID is with it you can trace the state of a
    paper. This idea is similar with the Electronic Product Code (EPC, see From the very beginning of a product its
    code is recorded in a database with all its properties and states. And
    every time its state is changed, e.g. on shelf, sold, disposed,
    recycled, etc., its data are also changed correspondingly, until the
    end of its life cycle. This needs all the suppliers and retailers to
    adopt a system (a code standard, the RFID technology, and a software)
    that can trigger an event whenever a product is handled. If the case
    is research papers, I think the situation is similar. All editorial
    offices should upgrade their systems so that their treatment can also
    be automatically reflect in a Paper ID database. This is really a
    revolution. I support it.

    But this can also be realized on the base of current DOI organization.

    — Cheers, Andrew Sun

  6. To be honest Andrew, I haven’t worked through all the ins and outs.

    I think the main difference is that a PaperID would be created by a system accessed by the author at the time of writing the paper, not by the publishers and would be open source rather than a proprietary system like DOI.

    As to the dead IDs produced by rejected papers. Not all rejected papers end up on the scrapheap many are resurrected and resubmitted, this might be a double-edged sword if they have the same PaperID, admittedly.

    The DOI doesn’t kick in for Wiley, RSC, and several other publishers until the papers are formally published, it would be much more effective if they were triggered as soon as the paper is online in some form. However, each publisher may have valid reasons for not wanting to do that.

    I agree that an extended DOI system might be just as worthy, but I am thinking that in an age when authors want to take back control of their papers from the publishers that an open source system they use to create the PaperID at the point of writing the paper would be much more effective and useful to science.

    As to the issue of running out of PaperIDs – if they are based on the author’s initials, institution acronym, and timestamp, then there are obviously infinite variations.


Comments are closed.

If you learned something from Sciencebase, enjoyed a song, snap, or the science, please consider leaving a tip to cover costs. The site no longer runs Google ads or similar systems, so your visit is untainted.