RDF: 1999 RDFS: 2004 SKOS first version: 2008 SPARQL 1.0: 2008 SPARQL 1.1: 2013 N3 first version: 2008 DAML started: 2000 OWL: 2004 2009 OWL 2 2012 GeoSPARQL JSON-LD: 2014 A Little Bit History Data Engineering Programming
Programming the Semantic Web Linyun Fu June 3, 2014 Adapted from Steffen Staab’s keynote at ESWC 2014
This presentation contains program code, engineering, speculation and worse. It may be viewed as offensive by some viewers.
Semantic Web Programming Linked Data SPARQL endpoint RDF file Hypothesis: Semantic Web data is wonderful, but programming with Semantic Web has changed too little since 2000 and still is a mess. We promise flexibility, but code still hard to maintain. Ontology
Linked Data SPARQL endpoint RDF file „Inside“ Data Mgmt „Inside“ Data Mgmt „Outside“ Data Mgmt „Outside“ Data Mgmt Eclipse Visual Studio.... Ontology These is a mismatch between data engineering and programming approaches
„Inside“ Data Mgmt „Inside“ Data Mgmt „Outside“ Data Mgmt „Outside“ Data Mgmt Eclipse Visual Studio.... „Outside“ Understanding by the developer „Search + Code“ „Browse + Code“ Triple-object mapping Code generation Process of federation LD vs endpoint Models of federation Abstraction layers that facilitate a developer‘s life searching, finding, reusing all the complex data strcutres that we have
Programming with Data: What does it cost? C total = t*C tool + d*t*C learn + s*C deu + s*C map + n*C code C tool : Costs for building t many tools, shared; almost free C learn : Costs for learning how to use technology per developer d C deu : Costs for data engineering/understanding s sources C map : Costs for mapping data structure for s sources to objects C code : Actual costs for accessing/manipulating data n times „Inside“Both„Outside“
SWOT of Semantic Web Programming C total = t*C tool + d*t*C learn + s*C deu + s*C map + n*C code threat opportunity weakness Not a strength, yet! strength/w eakness Strong in flexibility Somewhat weak in performance Strong in flexibility Somewhat weak in performance „Outside“ as good as RelDB is not good enough!
Intermediate conclusion Minimize costs for setup: C learn : Costs for learning how to use technology per developer d C deu : Costs for data engineering/understanding s sources C map : Costs for mapping data structure for s sources to objects Minimize costs for core programming: C code : Actual costs for accessing/manipulating data n times Costs for learning and understanding constitute a threat! Need to be overcome! Costs for learning and understanding constitute a threat! Need to be overcome!
We need flexible code to match flexible data structures i.e., Domain-specific languages for Semantic Web Programming XML programming example Why Jena is not good enough
XML programming example: LINQ to XML XElement contacts = new XElement("Contacts", new XElement("Contact", new XElement("Name", "Patrick Hines"), new XElement("Phone", "206-555-0144"), new XElement("Address", new XElement("Street1", "123 Main St"), new XElement("City", "Mercer Island"), new XElement("State", "WA"), new XElement("Postal", "68042") ) );
The Jena Approach Task: List all records for each music artist The Jamendo ontology
From artists to songs Observations SPARQL queries are strings Results are strings Requires good understanding of the data source RDF Typing is lost
Related Work on RDF Access Static Typing Errors detected before execution Misspelling discovered by compiler! Anectode: 2nd place because of misspelt var. Static types are form of documentation Less knowledge about data source required Better IDE integration / autocompletion Code generation Sommer Winter OntoMDE Dynamic Typing E.g. ActiveRDF (Oren et al 2007)) “convention over configuration” dynamic metaprogramming allows for slick code Criticism
SEMANTIC WEB PROGRAMMING: LITEQ – OUR OUTSIDE APPROACH
Node Path Query Language Using Autocompletion Exploration of classes Exploration of relations Querying for instances
Node Path Query Language Using Autocompletion Exploration of classes Exploration of relations
Node Path Query Language: Query Formulation Exploration of classes Exploration of relations Querying for instances Type set of mo:MusicArtist No definition or declaration needed
Node Path Query Language for Code Development Exploration of classes Exploration of relations Querying for instances Developing code with queries All translated into SPARQL queries at Development time Type inference at compile time (but also as part of IDE) Querying again at run time One language to bind them all
Node Path Query Language for Code Development Exploration of classes Exploration of relations Querying for instances Developing code with queries Developing code with new classes All translated into SPARQL queries at Development time Run time update Persistence!
NPQL NPQL (Node Path Query Language) Intensional Queries Describing RDF classes and properties for reuse in IDE and in host language metaprogramming Extensional Queries Class instances and property instances Compilation to SPARQL for reuse of existing endpoints Ongoing discussion about details of NPQL
LITEQ NPQL (Node Path Query Language) Intensional Queries Extensional Queries Compilation to SPARQL LITEQ ( Language Integrated Types, Extensions and Queries) Implementation of NPQL as F# Type Provider in Visual Studio Autocompletion using NPQL queries Automatic typing of extensional query results by intensional queries
Cost savings C total = t*C tool + d*t*C learn + s*C deu + s*C map + n*C code C tool : open source C learn : not free – though autocompletion reduces cognitive load C deu : not free – understanding the RDF schema from your IDE C map : 0 C code : a lot less than for dotNet RDF (Apache Jena?!!) little bit more than for a fictitious perfect object model
Halstead metrics for different tasks: Conventional Semantic Web programming approaches waste up to 50% of your efforts!
Speculation 1 C total n (as in n*C code ) SemWeb coding efforts RelDB/XML coding efforts Applies to small n Diff. costs for learning Speculation 1: Using ontologies and RDF schemata, we can develop more efficiently using the right tools!
Halstead metrics for different tasks: If someone gives me a perfect RDF-to-OO mapping for free then I will not care about whether it is RDF or RelDB underneath!
Speculation 2 C total n (as in n*C code ) SemWeb coding efforts RelDB/XML coding efforts „Perfect“ object model shields developer from database ideosyncracies Diff. costs for setup Speculation 2: For large programmes, our tools need to offer better support to reduce setup costs!
Semantic Web: Make Developers More Productive Linked Data SPARQL endpoint RDF file „Inside“ Data Mgmt „Inside“ Data Mgmt „Outside“ Data Mgmt „Outside“ Data Mgmt Eclipse Visual Studio.... Ontology
What I Think New programming languages vs. code generation + existing PLs – Mismatches between RDF and OO: Oren et al 2007, Saathoff et al 2009 – Only some ontologies can be perfectly mapped to OO classes – A killer PL to come “Data programmability” – Model – Format – …?
References C. Saathoff, S. Scheglmann, S. Schenk. Winter: Mapping RDF to POJOs revisited. E. Oren, R. Delbru, S. Gerke, A. Haller, S. Decker: ActiveRDF: object-oriented semantic web programming. WWW 2007: 817-824 S. Scheglmann, A. Scherp, S. Staab. Declarative Representation of Programming Access to Ontologies. In: 9 th Extended Semantic Web Conference (ESWC2012), Heraklion, Greece, May 27-31, 2012. W. Cook, A. Ibrahim. Integrating Programming Languages & Databases: What’s the Problem?? http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.66.7169&rep=rep1&type=pdf