The Semantic Web is a very nice concept that has been around for more than a decade. Some people say that there can never be an economically viable big application for the Semantic Web, because its applications are too knowledge intensive, and therefore hard to setup and maintain on a big scale. Others say that we will run out of computation power long before we have gathered enough data and knowledge to compute something useful. While these are very general claims, they indeed expose the weak spot of the Semantic Web: the dependence on carefully laid out knowledge, expressed in a computer-readable format. But that’s also the strength of the idea.
If we look at reality, indeed we could say that the other approach, don’t care about the exact knowledge structure, but just use a huge amount of data and a fuzzy algorithm to infer knowledge, has far more followers and business use cases. Just look at Google. I talked about this with David Weiss, and he mentioned the example of the spam algorithm that Google uses to shield your mailbox. It’s really good at classification. Why? Because of the sheer amount of users that click ‘Mark as spam’ every time they see a spam message.
Today, I want to introduce what very well could be an important step in the Semantic Web story, especially for Life Sciences. No, I’m not talking about the long-awaited ConceptWiki. I mean the SADI framework, to which I was pointed by the myGrid guys at ECCB10 – they co-developed a SADI Taverna plugin.
So what’s this SADI about? Let’s start with some basics. I’m not much of an expert myself in the field yet, so that should help keep things short. Semantic Web data can be stored by means of a RDF triple store, in which you specify a subject, a predicate (relation), and an object, such as Bob hasAge 35. The most naive implementation would be an XML file describing your triples in RDF, but there are much more sophisticated and efficient implementations out there, such as a Virtuoso or neo4j. There is a special query language for RDF called SPARQL, which is a bit like SQL but aimed at RDF. If SQL is table multiplication, then SPARQL is graph traversing.
However, there are much more datasources which do not contain RDF triples initially, but can, when asked, spit out knowledge in RDF form. These datasources are not queryable by SPARQL. And that’s where SADI kicks in. SADI employs the genius idea of defining a new type of RDF triples, which do not have the typical subject-predicate-object, but have subject-manipulation-subject and actually describe a webservice. In that way, you can also include potential knowledge, such as ‘I can compute a person’s BMI if you have the length and weight for me’, or even ‘I can provide you with a person’s length in RDF if you want’. SADI makes it possible to also include that type of knowledge in your SPARQL query. It just computes the shortest path to your answer, calling any webservices with the right information where necessary. See the informative presentation of David Wilkinson for a more elaborate introduction and examples on this idea, and have a look at the CardioSHARE demo.