Presentation on theme: "RDF as a Lingua Franca: Key Architectural Strategies David Booth, Ph.D. Cleveland Clinic (contractor) Semantic Technology Conference 15-June-2009 Latest."— Presentation transcript:
RDF as a Lingua Franca: Key Architectural Strategies David Booth, Ph.D. Cleveland Clinic (contractor) Semantic Technology Conference 15-June-2009 Latest version of these slides: http://dbooth.org/2009/stc/
About the speaker Senior Software Architect, Cleveland Clinic's SemanticDB project Senior research architect, HP Software – W3C GRDDL standard W3C Fellow 2002-2005 – W3C Web Services Architecture document – W3C WSDL 2.0 standard AT&T Bell Labs Ph.D. Computer Science, UCLA
3 Outline Part 1: The Problem – Babelization – SOA and RDF Part 2: Architectural Strategies 1.RDF message semantics 2.GRDDL transformations from XML to RDF 3.REST-based SPARQL endpoints 4.Semantic Data Federation 5.Named graphs 6.Monotonicity Part 3: Example: Cleveland Clinic SemanticDB
5 Problem 1: Babelization Proliferation of data models (XML schemas, etc.) Parsing issues influence data models No consistent semantics Data chaos Tower of Babel, Abel Grimmer (1570-1619)
6 Problem 2: Integration complexity Many data producers, many data consumers Producers and consumers interact in complex ways Tight coupling hampers independent versioning...
7 Problem 3: Client/service versioning Need to version clients and services independently Data models evolve No such thing as the data model: – There are several, slightly different but related models Client v3 Client v2 Client v1 Service v3 Service v2 Service v1
8 RDF and SOA RDF can help: – Bridge vocabularies / data formats – Looser data coupling – Consistent semantics across applications SOA can help: – Looser process coupling How?
10 1. RDF message semantics Interface contract can specify RDF, regardless of serialization RDF pins the semantics Client Service RDF
11 But Web services use XML! XML is well known and used Existing apps may require specific XML or other formats that cannot be changed How can we gain the benefits of RDF message semantics while still accommodating XML?
12 Custom XML serializations of RDF Recall: RDF is syntax independent – Specifies info model -- not syntax! – Can be serialized in any agreed-upon way Therefore: – Can view existing XML formats as custom serialization of RDF! How? GRDDL...
13 What is GRDDL? "Gleaning Resource Descriptions from Dialects of Languages" W3C standard Permits RDF to be "gleaned" from XML XML document or schema specifies GRDDL transformation GRDDL transformation produces RDF from XML document – Transformation is typically written in XSLT
14 2. GRDDL transformations from XML to RDF Therefore: Same XML document can be consumed by: −Legacy XML app −RDF app App interface contract can specify RDF −Serializations can vary −Semantics are pinned by RDF Helps bridge XML and RDF worlds
15 Bridging XML and RDF Normalize to RDF Serialize as XML/other/RDF Service Core App Processing Client XML/other Input: Accept whatever formats are required – Use GRDDL to transform XML to RDF Output: Serialize to whatever formats are required – Generate XML/other directly (or even RDF!), or – SPARQL query can generate specific view first
17 What is REST? REST: Representational State Transfer Architectural style Identified by Roy Fielding in PhD thesis Based on uniform interface – HTTP GET, PUT, POST, DELETE
18 Why REST? HTTP is ubiquitous Simpler than SOAP-based Web services (WS*) Looser process coupling – Easier to change/version the process flow
19 What is SPARQL? W3C standard Query language for RDF Modeled after SQL: SELECT... WHERE...
20 Why SPARQL? RDF gives looser data coupling Insulates consumers from internal model changes – Inferencing can transform data to consumer's desired model One endpoint supports multiple consumer needs – Each consumer gets what it wants Simpler interface for consumers – Uniform SPARQL interface instead of a different set of parameters for each REST endpoint
21 4. Semantic Data Federation Get data from multiple sources Provide data to consumers Model transformation, caching, etc. Conceptual component -- not necessarily a separate service Semantic Data Federation A1 A2 A3 B1 B2 C1 C2 X Y Z Ontologies & Rules SPARQL Adapters
22 Key features of semantic data federation REST-based SPARQL endpoint – Client gets just the data it wants Support for a variety of data sources – E.g., SQL, SPARQL(!), etc. – Easy to add a new data source adapter, e.g., HTTP Caching – Not multiple masters Inferencing Provides loose coupling at both data and process levels
23 Why inferencing? Allows new data sources to be more readily connected to existing data Allows new output vocabularies to be more readily supported in response to client needs Easier versioning with both clients and data sources – Inferencing can help bridge across versions
24 Data source adapters Semantic Data Federation Ontologies & Rules SPARQLAdapters Responsible for: – Mechanics of getting the data – Transforming from native format to RDF May involve custom code or reusable tools – E.g., Gloze performs XML RDF lift/drop
25 Add a new data source Ontologies & Rules Adapter SPARQL Strategy: 1.Adapter transforms native format to corresponding RDF Not directly to hub ontology! 2.Bridging rules transform to hub ontology Adapter Ontologies & Rules Data Source Semantic Data Federation
26 Adding a new output vocabulary Ontologies & Rules Adapter SPARQL Strategy: 1.Bridging rules transform from hub ontologies to new output vocabulary 2.Client can query using desired vocabulary Ontologies & Rules Data Source Client Semantic Data Federation
27 5. Named graphs Different queries require different subsets of data Entire data may be too big to process all at once So... Sets of RDF data can be bundled as named graphs Query strategy can pull in only the named graphs that are needed, i.e., a working set – Graphs can be freely merged – Contents can overlap
28 Using named graphs for data subsets Examples: Specific longitudinal data across patients Detailed data for each surgical event Data on a particular group of patients
29 6. Monotonicity Monotonicity: Old conclusions remain true when new facts are added System design choice – not automatic Without monotonicity: – Data change invalidates everything downstream – System is more tightly coupled – Different components must be versioned in lock step With monotonicity: – New data can be added freely – Easier versioning – More robust
30 Monotonicity is valuable, but not free! Data models can be simpler without monotonicity – Engineering trade-off Non-monotonic design: – “Patient123 highBloodPressure true” Monotonic design: – “Patient123 highBloodPressure true at 12:22PM 23-Aug-2007” – “Patient123 highBloodPressure false at 04:05PM 24-Aug-2007” How to get the best of both worlds?
31 Distilling data to simplify queries Detailed raw data can be distilled into simpler assertion sets – Easier for specific queries Example raw data: – “Patient123 BP: 150/96 at 12:22PM 23-Aug-2007” – “Patient123 BP: 155/97 at 06:32PM 23-Aug-2007” Distilled for “23-Aug-2007”: – “Patient123 highBloodPressure true” Meaning: “Patient123 had high blood pressure at some time”
32 Using named graphs for distilled data Distilled data: – Easier for specific queries – Less general than raw data – May involve information loss Named graph can act as context – Semantics are qualified (or loosened) – E.g. Named graph for 23-Aug-2007 indicates “Patient123 had high blood pressure at some time” SPARQL update language (SPARUL) will make named graphs easy to create from queries Raw data should also be kept (in separate named graphs)
33 Adding named graphs for distilled data “Is obese” “Had high blood pressure prior to admission” “Has condition X” Raw data Named graphs of distilled data
34 Abandoning unneeded named graphs Unneeded named graphs can be ignored – And eventually discarded Raw data Named graphs of distilled data
35 Summary of monotonicity strategy Don't change data! – Create new named graphs instead – Use named graphs to compartmentalize data But if you must change data: – Use named graphs to limit downstream impact – Only regenerate those that are affected Retain both raw data and distilled data (in separate named graphs)
36 Summary of architectural strategies 1.RDF message semantics 2.GRDDL transformations from XML to RDF 3.REST-based SPARQL endpoints 4.Semantic Data Federation 5.Named graphs 6.Monotonicity