What’s the point of the semantic web?

This post was chosen as an Editor's Selection for ResearchBlogging.orgI was scanning journal tables of contents as usual this week and it occurred to me that there must be a better way to find relevant and timely research information that would be of interest to Sciencebase readers…and, of course, out pops the following title:

Technically approaching the semantic web bottleneck

Sounded, perfect…kind of…but what’s the semantic web, why’s there a bottleneck and what can be done to lube the tube?

Tim Berners-Lee’s original vision for the semantic web was that information would be just as readable (and understandable) to a person or to a machine. Digital objects, whether web page, image, video, or some other file, would have embedded within them meta data that would provide context to the content and allow software to extract meaning from the file.

Some software currently has a limited understanding of simple meta data, although any SEO will tell you that Google largely ignores web page meta data these days. That point aside, there is so much that might be done if the web were effectively self-aware (not talking notions of the singularity here, just making it all more useful and easier to use). So, I asked the paper’s author, Nikolaos Konstantinou, for a few examples of how the semantic web, often referred to as Web 3.0 (although you might call it Web 2.1 or Web 2.0++), might benefit us. The first benefit would be more intelligent searches he told me, either across the web or in large-scale data repositories where intelligence is referred to in contrast to the conventional keyword-based search methods employed by the search engines.

“For instance, performing a search in Google for e.g. ‘renaissance paintings’ you will notice that among the first pages of the results returned, the vast majority contains the keywords ‘renaissance paintings’ in the respective page text (or image HTML image ‘alt’ tag),” he says. “That is because the search engine does not process the content available semantically and therefore, the results although they will be accurate, will be far from being complete. This will cause an arts student, for instance, to spend too much time finding relevant content. She would probably have to visit certain museum pages and collect the results on her own.”

This is where the semantic web would come into play, Konstantinou adds. “The vision is to get a list of what you asked for even in the case when your keyword does not exist in the web page. In the example above, a page with Leonardo da Vinci’s paintings will not be considered relevant if the words ‘renaissance paintings’ do not exist in the page. In the semantic web world the system would ‘know’ that Leonardo da Vinci is an artist of the Renaissance and therefore his works would be returned to the user performing the query.”

A second benefit would be knowledge inferred by the existing one. A system built using semantic web technologies, with the support of reasoning procedures could logically deduce information, explains Konstantinou.

“The most classic example about inference is that from the statements ‘all men are mortal’ and ‘Socrates is a man’, we can deduce that ‘socrates is mortal’. This property (transitive property) in combination with a wider set of properties can augment the knowledge inserted in a system, without requiring human insertion of each and every fact, which avoids errors and reduces the workload.”

Simply, by stating 5 facts to a system, using an ontology (a glossary) and a reasoner, the system will be able to deduce 15 facts by applying logic rules (reasoning). This is in fact what allows the intelligent queries mentioned in the Renaissance example. Such a system, when asked “is socrates mortal”? will return a YES, while without reasoning the answer would be NO (or UNKNOWN in other cases). Similarly, socrates would be included in a search like “tell me all the mortals in the system”. “This is, in fact, what is meant by ‘machine understandable’ information, the ability for a machine to process information,” adds Konstantinou.

Now…how do I apply that logic to scanning tables of contents for worthy news items?

Research Blogging IconNikolaos Konstantinou, Dimitrios-Emmanuel Spanos, Periklis Stavrou, & Nikolas Mitrou (2010). Technically approaching the semantic web bottleneck Int. J. Web Engineering and Technology, 6 (1), 83-111

8 thoughts on “What’s the point of the semantic web?”

  1. @Nikolaos Ultimately, you will have a query with which you can filter all news; Picture it as your own window to the internet which shows only the news items you want to see. This query would filter out all “newsworthy” news for you by putting your criteria into “machine-readable” form. Since there are no semantic (i.e. RDFa) annotated news feeds at this point, it is a little early to create this query.

    @Jess Semantic Web is an vision/ideology of what the web can become. Semantic Technology is the technological component that has the ability to realize the potential of this semantic web. How we can or should implement this vision is up to us. So feel free to start a study to explore the requirements for the semantic web, and the pitfalls, etc…. :-)

  2. 30 years ago I became interested in AI and Machine Natural Language Translation. I found much to my chagrin that symbolists, who believe in the arbitrariness of signs, and structuralists, who think that the combinatory nature of syntax is all important, ruled the roost, taking their marching orders from Chomskyan-style linguistics which dominated at the time.

    Because of this ideological bias both linguistics and related computer-oriented sciences have been blindered and hogtied for many decades, with no clear end in sight. The Semantic Web suffers from the same adherence to received wisdom at its very core.

    In fact human languages vary enormously with regard to degrees and types of motivational pressures that continually reshape their workings, but the distributions themselves appear to be lawful. Much of ‘linguistic typology’ has been concerned with working out these distributions. But there is much more yet to be done. There is often information buried within the actual forms of word roots, stems, inflections, phrase and clause structures and so on that is NOT arbitrarily assigned- other information is statistical in nature.

    Until we figure out how the different subsystems interact with each other, exchanging membership or areas of influence over time, etc., any creation such as the Semantic Web, based as it is upon incomplete knowledge and inadequate theory, is bound to be very imperfect. On the up side, there are still fortunes to be made by those who realize this.

  3. @lylebot It was only a short taster post not meant as a full-blown introduction.

    Yes, you’re right not all searches are logical, but how many hours have you carried out searches to no avail only to learn a tip many months down that line that would’ve given you a sharper search string and helped you home in on the information you needed with much greater precision and in a fraction of the time?

    It’s that kind of aspect of the semantic web that brings its power to bear. Moreover, who wants to manually search a billion-entry database on the off-chance that a serendipitous string brings up one useful entry, when a machine might find a dozen that are even more relevant in a tiny fraction of the time?

  4. But human language is not strictly logical. Think about some of your recent searches, the pages that you found relevant, and the information presented in them. Can you formulate that information as precise logical statements? Can you formulate a strictly inferential path from your query to that information? It’s much harder than it sounds, and in my opinion it is not something that can be done in a way that preserves the utility of the web.

    That’s not to denigrate the semantic web as a whole. I actually don’t know a lot about it, but I do know some very smart people working on it, so I assume there’s something there. I’m skeptical about what’s presented in this post, however.

