“Freedom and accountability are two sides of the same coin.” ― Frédéric Laloux
SPARQL is the new King of all Data Scientist’s tools (Andreas Blumauer, Sept 9th, 2015 at 11:23pm)
Inspired by the development of semantic technologies in recent years, in the field of statistical analysis the traditional methodology of designing, publishing and consuming statistical datasets is evolving into “Linked Statistical Data”, the associatiin of semantics with dimensions, attributes and observation values based on Linked Data design principles.
The representation of datasets is no longer a combination of magic words and numbers. Everything is becoming meaningful when URIs replace their positions as dereferencable resources, which further establishes the relations between resources implicitly and automatically. Different datasets are no longer isolated and all datasets share a globally, uniquely and uniformly defined structure.
With the “RDF Data Cube Vocabulary” there is already a W3C Recommendation available for linked statistical data . At this point, it is time to start building data-oriented applications and services with the traditional statistical computing languages such as R, while benefiting from the omnipotent semantic power of the SPARQL query language.
Most of the statistical analysis functions are set operations performed on subsets of a dataset (i.e., a slice, a facet, etc.). Calculation is dull machine work but how to group and create a subset is actually the innovation point to produce analytics. Thanks to SPARQL, now this subset can be created from a semantic perspective instead of mathematics and statistics.
Compared to the traditional filtering way using SQL queries, SPARQL queries eliminate the boundaries of data among datasets and among databases.
An example query can be “list the bestsellers of a supermarket for category science fiction movie in year 2014”. Someone may point out that it is also feasible with SQL if the database schema consists of all relevant fields. Well, it is absolutely correct. But what if there are more conditions such as “during weekends, directed by an American director, casted by European actors”? Is it necessary for a supermarket to maintain such data sets? Assume that there is a supermarket of which the boss is a movie fans and he would like to maintain such data and SQL is working perfectly so far. Can we reuse this query, and accordingly the Web application for another supermarket? Here we have good reasons why we use SPARQL. Any accessible resource can be used to construct the query results.
SPARQL is the new King of all Data Scientist’s tools because …