Identifying Digital Gems – David Bradley

Sciencebase readers will likely be aware that when I cite a research paper, I usually use the DOI system, the Digital Object Identifier. This acts like a redirect service taking a unique number, which might look like this assigned to each research paper by its publisher and passing it to a server that works out where the actual paper is on the web.

The DOI system has several handlers, and indeed, that’s one of its strength: it is distributed. So, as long as you have the DOI, you can use any of the handlers (dx.doi.org, http://hdl.handle.net, http://hdl.nature.com/ etc) to look up a paper of interest, e.g. http://dx.doi.org/10.1504/IJGENVI.2008.018637 will take you to a paper on water supplies on which I reported recently.

The DOI is kind of a hard-wired redirect for the actual URL of the object itself, which at the moment will be a research paper. It could, however, be any another digital object: an astronomical photograph, a chemical structure, or a genome sequence, for instance. In fact, thinking about it, a DOI could be used as a shorthand, a barcode, if you like, for whole genomes, protein libraries, databases, molecular depositions.

I’m not entirely sure why we will also need the Library of Congress permalinks, the National Institutes of Health simplified web links, as well as the likes of PURL and all those URL shortening systems like tinyURL and snipurl. A unified approach, which perhaps worked at the point of origin, the creator of the digital object, which I’ve suggested previously and coined the term PaperID, would seem so much more straightforward.

One critical aspect of the DOI is that it ties to hard, unchanging, non-dynamic links (URLs) for any given paper, or other object. Over on the CrossTech blog, Tony Hammond raises an interesting point regarding one important difference between hard and soft links and the rank that material at the end of such a link will receive in the search engines. His post discusses DOI and related systems, such as PURL (the Persistent URL system), which also uses an intermediate resolution system to find a specific object at the end of a URL. There are other systems emerging such as OpenURL and LCCN permalinks, which seek to do something similar.

However, while Google still predominates online search, hard links will be the only way for a specific digital object to be given any weight in its results page. Dynamic or soft links are discounted, or not counted at all, and so never rank in the way that material at the end of a hard link will.

Perhaps this doesn’t matter, as those scouring the literature will have their own databases to trawl that require their own ranking algorithms based on keywords chosen. But, I worry about serendipity. What of the student taking a random walk on the web for recreation or perhaps in the hope of finding an inspirational gem? If that gem is, to mix a metaphor, a moving target behind a soft link, then it is unlikely to rank in the SERPs and may never be seen.

Perhaps I’m being naive, maybe students never surf the web in this way, looking for research papers of interest. However, with multidisciplinarity increasingly necessary in many cross-disciplines it seems unlikely that gems are going to be unearthed through conventional literature searching of a parochial database that covers a limited range of journals and other resources.