Presentation on theme: "High-level Data Access Based on Query Rewritings Ekaterina Stepalina Higher School of Economics."— Presentation transcript:
High-level Data Access Based on Query Rewritings Ekaterina Stepalina Higher School of Economics
High-Level Data Access Concentration on application domain tasks Abstraction from data sources Efficient work Research This problem is actively considered on modern scientific conferences on knowledge representation and ontologies – OWLED (2009), (ICDE IIMAS, 2008), the Semantic Web magazine (2011 – the Mastro System) W3C developed OWL 2, OWL 2 QL (2008) and etc.
Ontology-Based Data Access (OBDA) Large amounts of data (distributed, inconsistent) Main task – query answering (domain-oriented and efficient)
What is Ontology? Ontology is a knowledge domain described on some knowledge representation language. Entity-Relationship and UML Class diagrams can be seen as ontology languages.
Logic-Based Knowledge Representation Enables semantic processing of data Enables inference of implicit knowledge Well studied and actively developed – Description logics (Baader,1999), esp. DL-Lite Standardized – OWL 2 Profiles
DL-Lite Best Suites for OBDA High expressive and computationally efficient Allows delegating query answering to DBMSs and using all advantages of modern relational technologies Supported by the W3C standard - OWL 2 QL
Query Answering Problem Given a query and an n-tuple of objects from A. Decide, whether, or the n- tuple is the answer for with respect to K. For knowledge represented in DL-Lite, we can formulate queries in domain concepts, translate them into ordinary SQL queries and perform over separate databases.
OBDA System Architecture Ontology Editor OBDA-Enabled Reasoner Mapping Processor Data Source Manager Consistency Checker
Query Rewritings OBDA-Enabled Reasoner rewrites the initial ontology query into a set of UCQ (union conjunctive query). Mapping Processor builds an SQL from UCQ and given mappings. The initial query syntax may differ (SparQL, datalog query, etc.)
TBox and ABox in DL TBox is a finite set of concept and role inclusion axioms: ABox is a finite set of assertions: Where - object’s name, A – concept name, P – role name, q – integer.
Interpretation Interpretation (the particular instance of KB) is a pair if non-empty domain and an interpretation function :,, and. UNA (unique name assumption):
OWL 2 QL UNA is ignored; (in)equality must be defined explicitly Language expressive power reduced up to (other designation - ). Basic conceptual modeling relations are available: (A)sym, (Ir)Ref, Tran Main constraints of : – Functional relations cannot be defined – Particular roles cannot be assigned only to specific concepts, all roles are applied to all concepts – Disjunction coverage of knowledge domain cannot be defined
Query Rewriting Sample RDB tables: Person(name, age), Lives (person, city), Manages (boss, employee). Query: Get the names and ages of all people living in the same city with their boss. UCQ: Simplified UCQ: SQL query: SELECT P.name, P.age FROM Person P, Manages M, Lives L1, Lives L2 WHERE P.name=L1.person AND P.name=M.employee AND M.boss=L2.person AND L1.city=L2.city
Query Rewriting Algorithms CGLLR (Calvanese et al., 2007) - Applies all suitable TBox axioms to - Replaces axioms containing existential qualifications with another 3 axioms, which increases the number of UCQ RQR (Pérez-Urbina, Horrocks, Motik, 2009) -Generates clauses from TBox assertions and then resolve clauses with query -Potentially supports more expressive DLs
Query Rewriting Benchmark 9 ontologies with axioms, containing -existential qualification: – Vicodi (V) – Stock exchange (S) – University (U,UX) – Adolena (A,AX) – Synthetic (P1, P5,P5X)
Comparison Results RQR is more preferable to implement in OBDA-enabled reasoners, than CGLLR: – Generates less UCQ, especially for ontologies with large number of existential qualifications – May be further optimized and advanced to more expressive DLs, than
Running Time, ms
Current Work Preparing an ontology for a real application – interactive television platform (IPTV) for testing algorithms on real data Optimizing RQR – reducing the number of generated clauses Main idea – not advance RQR, but support more expressiveness and all OWL 2 QL constructors in powerful mappings
References The Description Logic Handbook: Theory, Implementation and Applications. Cambridge University Press, ISBN Edited by F. Baader, D. Calvanese, D. McGuinness, D. Nardi, P. F. Patel-Schneider. F. Baader. Logic-Based Knowledge Representation. In M.J. Wooldridge and M. Veloso, editors, Artificial Intelligence Today, Recent Trends and Developments, number 1600 in Lecture Notes in Computer Science, pages 13–41. Springer Verlag, Artale, A.; Calvanese, D.; Kontchakov, R. and Zakharyaschev, M. (2009) The DL-Lite family and relations. Journal of Artificial Intelligence Research 36 (1), pp ISSN H.P´erez-Urbina, I.Horrocks, and B.Motik. Eﬃcient Query Answering for OWL 2. In Proceedings of the 8 th International Semantic Web Conference (ISWC2009), Chantilly, Virginia, USA, H.P´erez-Urbina, B.Motik, and I.Horrocks. Tractable Query Answering and Rewriting under Description Logic Constraints. JournalofAppliedLogic, 2009.
High-level Data Access Based on Query Rewritings Questions?