Presentation is loading. Please wait.

Presentation is loading. Please wait.

Unifying Data and Domain Knowledge Using Virtual Views IBM T.J. Watson Research Center Lipyeow Lim, Haixun Wang, Min Wang, VLDB2007 2008. 1. 4 Summarized.

Similar presentations


Presentation on theme: "Unifying Data and Domain Knowledge Using Virtual Views IBM T.J. Watson Research Center Lipyeow Lim, Haixun Wang, Min Wang, VLDB2007 2008. 1. 4 Summarized."— Presentation transcript:

1 Unifying Data and Domain Knowledge Using Virtual Views IBM T.J. Watson Research Center Lipyeow Lim, Haixun Wang, Min Wang, VLDB2007 2008. 1. 4 Summarized by Gong GI Hyun, IDS Lab., Seoul National University

2 Copyright  2008 by CEBT Background  DBMS originally designed for transaction data.  Current DBMSs are not ready to manipulate data in connection with knowledge.  An unending quest Database or Knowledge-base? New applications: the Semantic web, etc.  New extensions are required to bridge the gap between data representation and knowledge representation/inferencing IDS Lab. Seminar - 2Center for E-Business Technology

3 Copyright  2008 by CEBT A Motivating Example  RDBMS allows us to query wines through attributes ID, Type, Origin, Maker, Price.  Human intelligence operates in a quite different way. Humans have ability to combine data with the domain Knowledge. IDS Lab. Seminar - 3Center for E-Business Technology

4 Copyright  2008 by CEBT A Motivating Example  Query 1 To find wines that originate from the US. RDB Case : – Select W.ID From Wine as W Where W.Origin = ‘US’; – Result : No result Human’s answer : Zinfandel – Zinfandel’s Origin EdnaValley is located in California. IDS Lab. Seminar - 4Center for E-Business Technology

5 Copyright  2008 by CEBT A Motivating Example  Query 2 Which wine is a red wine? RDB Case : – Select W.ID From Wine as W Where W.hasColor = ‘red’; – Result : ERROR! – HasColor is not in the schema of the wine table. Human’s Answer : Zinfandel – Zinfandel is red.  Both the user and the DBMS must know what HasColor stands for when it appears in a query, and how to derive the value for HasColor for any given wine. IDS Lab. Seminar - 5Center for E-Business Technology

6 Copyright  2008 by CEBT Domain Knowledge from OWL Ontology  Wine Ontology from the web ontology language OWL (W3C)  Extract class hierarchies, (transitive) properties, implications from OWL IDS Lab. Seminar - 6Center for E-Business Technology

7 Copyright  2008 by CEBT Challenges  How to incorporate domain knowledge (ontology) into a RDBMS? Relational Data model remains ill-suited for semi-structured data.  How to integrate relational data with domain knowledge?  How to query relational data with meaning ?  How to process such queries ? IDS Lab. Seminar - 7Center for E-Business Technology

8 Copyright  2008 by CEBT Overview of our solution  Create a relational virtual view on top of the data and the domain knowledge. Data and knowledge can be queried together. New knowledge can be derived  The virtual view is an interface through which users can query data, domain knowledge, and derived knowledge in a seamlessly unified manner.  Rewrite query on virtual view. IDS Lab. Seminar - 8Center for E-Business Technology

9 Copyright  2008 by CEBT The Virtuality of the View  Users create virtual views over the relational data and the ontology.  Virtual columns/attributes not in original data.  Virtual columns not materialized -- inferred from the ontology. IDS Lab. Seminar - 9Center for E-Business Technology

10 Copyright  2008 by CEBT Creating the Vitual View CREATE VIEW WineView(Id, Type, Origin, Maker, Price, LocatedIn) AS SELECT W.*, R.Regions FROM Wine AS W, RegionKnowledge AS R WHERE W.Origin = R.region IDS Lab. Seminar - 10Center for E-Business Technology

11 Copyright  2008 by CEBT Queries  Query 1 : To find wines that originate from the US Select ID from WineView Where ‘US’ in LocatedIn  Query 2 : Which wine is a red wine?  Select ID from WineView Where hasColor = ‘red’; IDS Lab. Seminar - 11Center for E-Business Technology

12 Copyright  2008 by CEBT Physical Storage Layer  The ontology is modeled as semi-structured data. Traditional RDMSs cannot handle directly.  Hybrid Relational-XML DBMS IBM DB2 9 PureXML supports XML IDS Lab. Seminar - 12Center for E-Business Technology

13 Copyright  2008 by CEBT Ontology Repository  Ontology repository extracts several types of information from the ontology files including class hierarchies, implication rules, transitive properties.  Class Hierarchies subClassOf isA IDS Lab. Seminar - 13Center for E-Business Technology  Transitive Property TransitiveProperty  Implication rules Implication graph

14 Copyright  2008 by CEBT Query Expanding  SELECT V.Id FROM WineView AS V WHERE.hasColor=White; (Type=WhiteWine) → (hasColor=white) (Type=Riesling) → (hasColor=white)  SELECT V.Id FROM Wine AS W WHERE W.type=WhiteWine OR W.type=Riesling; IDS Lab. Seminar - 14Center for E-Business Technology

15 Copyright  2008 by CEBT Experiment  Investigate time to rewrite the queries on virtual views.  Measurement: rewriting time averaged over 5 randomly generated data sets.  Some tweaks : Remove dead nodes Memoization techniques Pre-computation of predicate re-writing IDS Lab. Seminar - 15Center for E-Business Technology

16 Copyright  2008 by CEBT Experiment  Implication Graph Density  Size of transitive property trees IDS Lab. Seminar - 16Center for E-Business Technology

17 Copyright  2008 by CEBT Related Works  Ontology Tools OntoEdit : Use a file system to store ontology. RStar, KAON : Allow the ontology data to be stored in a RDB  Two Limitations of this loosely coupled approach DBMS users cannot reference ontology data directly. Ontology related query processing cannot leverage the query processing and optimization power of a DBMS.  A recent advance in ontology management in DBMSs was introduced by Oracle. IDS Lab. Seminar - 17Center for E-Business Technology

18 Copyright  2008 by CEBT Conclusion  Framework for putting a little semantics into relational SQL systems.  Users register ontologies in DBMS and links them with relational data by creating virtual views.  Virtual columns in the virtual views are not materialized.  Queries on the virtual columns are rewritten to predicates on base table columns.  Future work: performance issues IDS Lab. Seminar - 18Center for E-Business Technology


Download ppt "Unifying Data and Domain Knowledge Using Virtual Views IBM T.J. Watson Research Center Lipyeow Lim, Haixun Wang, Min Wang, VLDB2007 2008. 1. 4 Summarized."

Similar presentations


Ads by Google