Back in June 2001, I reviewed an intriguing site that allows you to compare “stuff”. At the time, the review focused on how the site could be used to find out in how many research papers archived by PubMed two words or phrases coincided. I spent hours entering various terms hoping to turn up some revelationary insights about the nature of biomedical research, but to no avail.
I assumed the site would have become a WWW cobweb by now, but no! compare-stuff is alive and kicking and has just been relaunched with a much funkier interface and a whole new attitude. And as of fairly recently, the site now has a great blog associated with it in which site creator Bob compares some bizarre stuff such as pollution levels versus torture and human rights abuses in various capital cities. Check out the correlation that emerges when these various parameters are locked on to the current Olympic city. It makes for very interesting reading.
Since the dawn of the search engine age people have been playing around with the page total data they return. Comparing the totals for “Company X sucks” and “Company Y sucks”, for example, is an obvious thing to try. Two surviving examples of websites which make this easy for you are SpellWeb and Google Fight, in case you missed them the first time around.
compare-stuff took this a stage further with a highly effective enhancement: normalisation. This means that a comparison of “Goliath Inc” with “David and Associates” is not biased in favour of David or Goliath.
Compare-stuff with its new, cleaner interface now takes this normalisation factor to the logical extreme and allows you to carry out a trend analysis and so follow the relative importance of any word or phrase. For example, “washed my hair”, with respect to a series of related words or phrases, for example “Monday”, “Tuesday”, “Wednesday”…”Sunday”. The site retrieves all the search totals (via Yahoo’s web services), does the calculations and presents you with a pretty graph of the result (the example below also includes “washed my car” for comparison).
Both peak at the weekend but hair washing’s peak is broader and includes Friday, as you might expect. It’s a bit like doing some expensive market research for free, and the cool thing is that you can follow the trends of things that might be difficult to ask in an official survey, for example:
You can analyse trends on other timescales (months, years, time of day, public holidays), or across selected non-time concepts (countries, cities, actors). Here are a few more examples:
As you can see, compare-stuff provides some fascinating sociological insights into how the world works. It’s not perfect though. Its creator, Bob MacCallum, is at pains to point out that it can easily produce unexpected results. The algorithm doesn’t know when words have multiple meanings or when their meaning depends on context. A trivial example would be comparing the trends of “ruby” and “diamond” vs. day of the week.
The result shows a big peak for “ruby” on “Tuesday”, not because people like to wear, buy or write about rubies on Tuesday, but because of the numerous references to the song “Ruby Tuesday” of course.
However, since accurate computer algorithms for natural language processing are still a long way off, MacCallum feels that a crude approach like this is better than nothing, particularly when used with caution. Help is at hand though, the pink and purple links below the plot take you to the web search results, where you can check that your search terms are found in the desired context; in the top 10 or 20 hits that is. On the whole it does seem to work, and promises to be an interesting, fast and cheap preliminary research tool for a wide range of interest areas.
With summer well under way, Independence Day well passed, and thoughts of Thanksgiving and Christmas coming to the fore already (at least in US shops), I did a comparison on the site of E coli versus salmonella for various US holidays. You can view the results live here, as well as tweaking the parameters to compare your own terms.
Originally posted June 4, 2007, updated August 19, 2008