PubChem Statistics – David Bradley

In March 2006, I interviewed PubChem’s Steve Bryant for the Reactive Reports chemistry webzine and he revealed some of the inner workings and the aims of the PubChem chemistry database. Ever since, I’ve been rather curious about the growth of the site. How many scientists are using it. Unfortunately, Bryant tells me, getting a handle on that kind of data is difficult. “It’s a very tricky business to accurately condense all the raw log info on hits and IP addresses into an accurate summary of who’s using a given resource and how,” he explains.

However, there are a few tips you might use to extract some useful information from the site nevertheless. There is an easy way to look at current contents of the databases, for instance. The best trick is to go to the “global query” page:

http://www.ncbi.nlm.nih.gov/gquery/gquery.fcgi

Then enter “all[filter]” (no quotes) in the search box. This gives counts of how many records in each database, e.g. 10,358,219 PubChem compounds, 552 assays, etc. There is also a summary of contributors to PubChem, that lists numbers of substances or assays by organization:

http://pubchem.ncbi.nlm.nih.gov/sources/sources.cgi

Now, obviously that doesn’t provide usage stats, but it does highlight a newsworthy aspect of developments at PubChem. Over the past year, there has been an increasing number (and diversity) of the screening assay results. “We’re now up to over 10 million substance test results (sum of the number of substances tested in each assay, across all assays),” says Bryant, “We’ve also put some work into structure-activity analysis tools. For example, from the first
assay answering the all[filter] query (AID 728, Factor XIIa Dose Response Confirmation), try “Related BioAssays | Related BioAssays, by Target Similarity”, the “Structure Activity Analysis”.”

Bryant points out that this “heatmap” display isn’t useful to all users. However, screeners who want to check on the selectivity of their “hits” are using these tools more and more, he says.