Aug 20, 2010
What’s the point of the semantic web?
I was scanning journal tables of contents as usual this week and it occurred to me that there must be a better way to find relevant and timely research information that would be of interest to Sciencebase readers…and, of course, out pops the following title:
Technically approaching the semantic web bottleneck
Sounded, perfect…kind of…but what’s the semantic web, why’s there a bottleneck and what can be done to lube the tube?
Tim Berners-Lee’s original vision for the semantic web was that information would be just as readable (and understandable) to a person or to a machine. Digital objects, whether web page, image, video, or some other file, would have embedded within them meta data that would provide context to the content and allow software to extract meaning from the file.
Some software currently has a limited understanding of simple meta data, although any SEO will tell you that Google largely ignores web page meta data these days. That point aside, there is so much
that
might be
done if the
web were
effectively self-awareso much that might be done if the web were effectively self-aware (not talking notions of the singularity here, just making it all more useful and easier to use). So, I asked the paper’s author, Nikolaos Konstantinou, for a few examples of how the semantic web, often referred to as Web 3.0 (although you might call it Web 2.1 or Web 2.0++), might benefit us. The first benefit would be more intelligent searches he told me, either across the web or in large-scale data repositories where intelligence is referred to in contrast to the conventional keyword-based search methods employed by the search engines.
“For instance, performing a search in Google for e.g. ‘renaissance paintings’ you will notice that among the first pages of the results returned, the vast majority contains the keywords ‘renaissance paintings’ in the respective page text (or image HTML image ‘alt’ tag),” he says. “That is because the search engine does not process the content available semantically and therefore, the results although they will be accurate, will be far from being complete. This will cause an arts student, for instance, to spend too much time finding relevant content. She would probably have to visit certain museum pages and collect the results on her own.”
This is where the semantic web would come into play, Konstantinou adds. “The vision is to get a list of what you asked for even in the case when your keyword does not exist in the web page. In the example above, a page with Leonardo da Vinci’s paintings will not be considered relevant if the words ‘renaissance paintings’ do not exist in the page. In the semantic web world the system would ‘know’ that Leonardo da Vinci is an artist of the Renaissance and therefore his works would be returned to the user performing the query.”
A second benefit would be knowledge inferred by the existing one. A system built using semantic web technologies, with the support of reasoning procedures could logically deduce informationreasoning procedures could logically deduce information, explains Konstantinou.
“The most classic example about inference is that from the statements ‘all men are mortal’ and ‘Socrates is a man’, we can deduce that ‘socrates is mortal’. This property (transitive property) in combination with a wider set of properties can augment the knowledge inserted in a system, without requiring human insertion of each and every fact, which avoids errors and reduces the workload.”
Simply, by stating 5 facts to a system, using an ontology (a glossary) and a reasoner, the system will be able to deduce 15 facts by applying logic rules (reasoning). This is in fact what allows the intelligent queries mentioned in the Renaissance example. Such a system, when asked “is socrates mortal”? will return a YES, while without reasoning the answer would be NO (or UNKNOWN in other cases). Similarly, socrates would be included in a search like “tell me all the mortals in the system”. “This is, in fact, what is meant by ‘machine understandable’ information, the ability for a machine to process information,” adds Konstantinou.
Now…how do I apply that logic to scanning tables of contents for worthy news items?
Nikolaos Konstantinou, Dimitrios-Emmanuel Spanos, Periklis Stavrou, & Nikolas Mitrou (2010). Technically approaching the semantic web bottleneck Int. J. Web Engineering and Technology, 6 (1), 83-111

"Deceived Wisdom: Why What You Thought Was Right Is Wrong" from David Bradley. Available now on
@lylebot It was only a short taster post not meant as a full-blown introduction.
Yes, you’re right not all searches are logical, but how many hours have you carried out searches to no avail only to learn a tip many months down that line that would’ve given you a sharper search string and helped you home in on the information you needed with much greater precision and in a fraction of the time?
It’s that kind of aspect of the semantic web that brings its power to bear. Moreover, who wants to manually search a billion-entry database on the off-chance that a serendipitous string brings up one useful entry, when a machine might find a dozen that are even more relevant in a tiny fraction of the time?
But human language is not strictly logical. Think about some of your recent searches, the pages that you found relevant, and the information presented in them. Can you formulate that information as precise logical statements? Can you formulate a strictly inferential path from your query to that information? It’s much harder than it sounds, and in my opinion it is not something that can be done in a way that preserves the utility of the web.
That’s not to denigrate the semantic web as a whole. I actually don’t know a lot about it, but I do know some very smart people working on it, so I assume there’s something there. I’m skeptical about what’s presented in this post, however.
One idea would be to make this sort of thing possible:
TocAlert @sciencebase New TOC: Journal of Library Metadata http://tinyurl.com/y2petl3
Sounds interesting, Roddy, definitely like to see this kind of semanticToC moving into the mainstream soon ;-)
The Gold Dust project http://www.hull.ac.uk/golddust/index.html investigated ways to do clever things with RSS metadata and current awareness, but due to delays in the related ticTOCs project achieved only some of its objectives.
So now, we’re still working on making JournalTOCs http://www.journaltocs.hw.ac.uk/ better, but in different ways – most likely to connect it with social media, etc.