Presentation is loading. Please wait.

Presentation is loading. Please wait.

Model-Based Mediation: Framework and Challenges Bertram Ludäscher Data and Knowledge Systems San Diego Supercomputer Center U.C. San.

Similar presentations


Presentation on theme: "Model-Based Mediation: Framework and Challenges Bertram Ludäscher Data and Knowledge Systems San Diego Supercomputer Center U.C. San."— Presentation transcript:

1 Model-Based Mediation: Framework and Challenges Bertram Ludäscher LUDAESCH@SDSC.EDU Data and Knowledge Systems San Diego Supercomputer Center U.C. San Diego

2 2 Outline Information Integration from a DB Perspective Part I: XML-Based Mediation –wrapper/mediator approach –based on querying semistructured data & XML Part II: Model-Based Mediation –basic ideas & architecture, lifting data to knowledge sources –“glue maps” (domain maps, process maps) –formal framework: Description Logic, Frame-Logic –ongoing/future research: mix of DB & KR techniques Summary

3 An Online Shopper’s Information Integration Problem El Cheapo: “Where can I get the cheapest copy (including shipping cost) of Wittgenstein’s Tractatus Logicus-Philosophicus within a week?” ? Information Integration ? Information Integration addall.com “One-World” Mediation “One-World” Mediation amazon.com A1books.com half.com barnes&noble.com WWWpublic library

4 A Home Buyer’s Information Integration Problem What houses for sale under $500k have at least 2 bathrooms, 2 bedrooms, a nearby school ranking in the upper third, in a neighborhood with below-average crime rate and diverse population? ? Information Integration ? Information Integration Realtor Demographics School Rankings Crime Stats “Multiple-Worlds” Mediation “Multiple-Worlds” Mediation

5 A Geoscientist’s Information Integration Problem What is the distribution and U/ Pb zircon ages of A-type plutons in VA? How about their 3-D geometry ? How does it relate to host rock structures? ? Information Integration ? Information Integration Geologic Map (Virginia) Geologic Map (Virginia) GeoChemical GeoPhysical (gravity contours) GeoPhysical (gravity contours) GeoChronologic (Concordia) GeoChronologic (Concordia) Foliation Map (structure DB) Foliation Map (structure DB) “Complex Multiple-Worlds” Mediation “Complex Multiple-Worlds” Mediation

6 A Neuroscientist’s Information Integration Problem What is the cerebellar distribution of rat proteins with more than 70% homology with human NCS-1? Any structure specificity? How about other rodents? ? Information Integration ? Information Integration protein localization (NCMIR) protein localization (NCMIR) neurotransmission (SENSELAB) neurotransmission (SENSELAB) sequence info (CaPROT) sequence info (CaPROT) morphometry (SYNAPSE) morphometry (SYNAPSE) “Complex Multiple-Worlds” Mediation “Complex Multiple-Worlds” Mediation

7 7 Information Integration from a DB Perspective Information Integration Challenge –Given: data sources S_1,..., S_k (DBMS, web sites,...) and user questions Q_1,...,Q_n that can be answered using the S_i –Find: the answers to Q_1,..., Q_n The Database Perspective: source = “database”  S_i has a schema (relational, XML, OO,...)  S_i can be queried  define virtual (or materialized) integrated views V over S_1,...,S_k using database query languages  questions become queries Q_i against V(S_1,...,S_k) Why a Database Perspective? –scalability, efficiency, reusability (declarative queries),...

8 8 Technical Issues and Challenges Integration Method and Architecture –federated DBs, wrapper-mediator approach, GAV/LAV, warehouse/on-demand,... Suitable KRDB Formalisms and Frameworks –XML, DTDs/XML Schema, XPath, XQuery,... –RDF(S), Ontologies, Description Logics, DAML+OIL,... –querying, deduction, subsumption, classification,... Algorithms and Implementation –query composition, rewriting, reasoning, source capabilities,... Information Integration Scenario and Scope –simple/complex, single/multiple worlds,...

9 9 DB mediation techniques Ontologies KR formalisms Model-Based Mediation Information Integration Landscape conceptual distance one-world multiple-worlds conceptual complexity/depth low high addall book-buyer BLAST EcoCyc Cyc WordNet GO home-buyer 24x7 consumer UMLS MIA Entrez RiboWeb Tambis Bioinformatics Geoinformatics

10 10 PART I: XML-Based Mediation

11 11 Abstract (XML-Based) Mediator Architecture S_1 MEDIATOR XML Queries & Results USER/Client USER/Client Wrapper XML View S_2 Wrapper XML View S_k Wrapper XML View Integrated XML View V Integrated View Definition IVD(S_1,...,S_k) Query Q o V (S_1,...,S_k) Query Q o V (S_1,...,S_k)

12 12 XMAS: XML Matching And Structuring language Integrated View Definition: “Find publications from amazon.com and DBLP, join on author, group by authors and title” CONSTRUCT $a1 $t $p { $p } { $a1, $t } WHERE $a1 : $t : IN WRAP(“amazon.com”) AND $a2 : $p : IN WRAP(“www...DBLP…”) AND value( $a1 ) = value( $a2 ) CONSTRUCT $a1 $t $p { $p } { $a1, $t } WHERE $a1 : $t : IN WRAP(“amazon.com”) AND $a2 : $p : IN WRAP(“www...DBLP…”) AND value( $a1 ) = value( $a2 ) XMAS XMAS Algebra

13 13 Some Technical Challenges... Uniform Data Model: Semistructured Databases –flexible mix of data and schema –labeled directed graph/tree, ordered/unordered, ranked/unranked –XML = labeled ordered trees Query Languages –DB community: QLs for semistructured data, e.g., TSIMMIS/MSL, Lorel, Yatl,..., Florid/F-logic [InfSystems98] –CSE/SDSC: XMAS [SSD99,WebDB99,EDBT00] –W3C: XPath, XSLT, XQuery (Working Draft, June 2001) DB Theory: Expressiveness/Complexity Trade-Off –querying: FO, (WF/S-)Datalog, FO(LFP), FO(PFP),..., all –reasoning: query satisfiability, containment, equivalence

14 14... Some More Technical Challenges... DB Practice: Query Composition –compute Q o V(S_1,...,S_k) w/o computing all of V  “push Q through V into S_i”  in Datalog: view unfolding (resolution, unification) + simplification ~ top-down evaluation ~ magic sets  in XML: some solutions ( Papakonstantinou,...) Navigation-Driven Evaluation of Integrated View V: –V materialized => warehousing approach –V virtual => mediator approach –V virtual & driven by user-navigation => VXD approach [EDBT00] (w/ Papakonstantinou, Velikhov)

15 15 XML (XMAS) Query Processing Translator Rewriter/Optimizer composed plan optimized plan XMAS Query Q Composition (Q o V) XMAS View Definition V algebraic plans Plan Execution Compile-time Run-time: lazy VXD evaluation Run-time: lazy VXD evaluation

16 16 A Concrete (Future) XML-Based Mediator System S1 S2 S3 XML (Integrated View) MEDIATOR Engine XQuery Processor Integrated View Definition IVD XML Queries & Results XQuery XPATH XQuery XSLT XQuery XSQL USER/Client USER/Client XML-Wrapper XQuery XScan XPath SQL XSQL http-get XSLT XML-Wrapper First Results & Demos: XMAS language and algebra, VXD evaluation, BBQ UI, [WebDB99] [SSD99] [SIGMOD99] [EDBT00] (w/ Papakonstantinou, Vianu,...)

17 17 Open Issue: Querying XML Streams or: From Pull to Push Given: –stream S of XML events (open, close, data) –XML query Q over S –constraints: 1-pass “on-the-fly” processing, bounded memory Find: –decide whether, and if so how, Q can be evaluated given the constraints Initial Approach: –transducer model XSM (XML Stream Machine) to approximate “streamable” queries –tree transducers, tree-walking automata!? (w/ Papakonstantinou, Mukhopadhyay, Vianu)

18 18 Example: XML Stream Query XML query (r) = for each customer $C, list all orders $O Query-aware DTD design is even more important for stream queries!

19 19 Example: XML Stream Machine (XSM) input/output: stream of XML events memory: finite state control, buffers, transitions: on EVENT do ACTION transducer model

20 20 PART II: Model-Based Mediation

21 21 What’s the Problem with pure XML? XML is Syntax –canonical syntax for labeled ordered trees –a metalanguage, but all semantics lies outside of XML DTDs => tag names + element nesting XML Schema => DTDs + some data modeling Need anything else? => write comments!

22 22 Having Schemas & Semantics wasn’t that bad after all... Query: “What’s the average price of books in MyDB?” A Query Plan in PMX-Query: N=count (MyDB//book); S=sum(MyDB//book/price); Avg=S/N. Consider Structural Constraints: –can a book have multiple prices? –the schema will tell you! –quick fix!? N=count (MyDB//book/price)

23 23 Having Schemas & Semantics wasn’t that bad after all... “What’s the average price of books in MyDB?” Consider Structural/Semantic Constraints: –XML Schema schema: –XML query processor has to be aware of the subclass relationship encoded in the XML Schema schema! –if “aw2” has subelement types that are subtypes of the book subelement types, the XML instance may leave no clues what we’re dealing with! Modified Query Plan: N=count (MyDB//(book | aw2)/price); S=sum(MyDB//(book | aw2)/price); Avg=S/N.

24 24 What’s the Problem with XML & Complex Multiple-Worlds? XML is Syntax –DTDs talk about element nesting –XML Schema schemas give you data types –need anything else? => write comments! Domain Semantics is complex: –implicit assumptions, hidden semantics  sources seem unrelated to the non-expert Need Structure and Semantics beyond XML trees!  employ richer OO models  make domain semantics and “glue knowledge” explicit  use ontologies to fix terminology and conceptualization  avoid ambiguities by using formal semantics

25 25 From XML-Based to Model-Based Mediation Data and Knowledge Sharing Potential: Database Mediation + Knowledge Representation ________________________ = Model-Based Mediation Basic Ideas: –turn primary data sources into knowledge sources –employ secondary glue knowledge sources generic: UMLS,... specific: community/laboratory ontologies

26 XML-Based vs. Model-Based Mediation Raw Data IF  THEN  Logical Domain Constraints Integrated-CM := CM-QL(Src1-CM,...) Integrated-CM := CM-QL(Src1-CM,...)...... (XML) Objects Conceptual Models XML Elements XML Models C2 C3 C1 R Classes, Relations, is-a, has-a,... Glue Maps DMs, PMs Glue Maps DMs, PMs Integrated-DTD := XML-QL(Src1-DTD,...) Integrated-DTD := XML-QL(Src1-DTD,...) No Domain Constraints A = (B*|C),D B =... Structural Constraints (DTDs), Parent, Child, Sibling,... CM ~ {Descr.Logic, ER, UML, RDF/XML(-Schema), …} CM-QL ~ {F-Logic, DAML+OIL, …}

27 27 Information Integration Landscape Conceptual Distance (“number of hops”) –... speciality... (sub-)discipline... interdisciplinary concepts... –... one (micro) world... multiple worlds... Conceptual Complexity –complexity of interactions between relations, concepts, rules Level of Integration –“Let's put links to all our data on a web page!” –portals to primary (databases) and secondary information sources (literature): NCBI,... –specialized web services: (meta-)BLAST,... –integration services: MIA, Entrez,...

28 28 What’s the Glue? What’s in a Link? Syntactic Joins –  (X,Y) := X.SSN = Y.SSN equality –  (X,Y) := X.UMLS-ID = Y.UID “Speciality” Joins –  (X,Y,Score) := BLAST(X,Y,Score) similarity Semantic/Rule-Based Joins –  (X,Y,C) := X isa C, Y isa C, BLAST(X,Y,S), S>0.8 homology, lub –  (X,Y,[produces,B,increased_in]) := X produces B, B increased_in Y. rule-based e.g., X=  - secretase, B=beta amyloid, Y=Alzheimer’s disease YAC (Yet Another Challenge) : –compile semantic joins into efficient syntactic ones X Y 

29 29 Model-Based Mediation Methodology... Lift Sources to export CMs: CM(S) = OM(S) + KB(S) + CON(S) Object Model OM(S): –complex objects (frames), class hierarchy, OO constraints Knowledge Base KB(S): –explicit representation of (“hidden”) source semantics –logic rules over OM(S) Contextualization CON(S): –situate OM(S) data using “glue maps” (GMs):  domain maps DMs (ontology) = terminological knowledge: concepts + roles  process maps PMs = “procedural knowledge”: states + transitions

30 30... Model-Based Mediation Methodology Integrated View Definition (IVD) –declarative (logic) rules with object-oriented features –defined over CM(S), domain maps, process maps –needs “mediation engineers” = domain + KRDB experts Knowledge-Based Querying and Browsing (runtime): –mediator composes the user query Q with the IVD... rewrites (Q o IVD), sends subqueries to sources... post-processes returned results (e.g., situate in context)

31 31 S1 S2 S3 (XML-Wrapper) CM-Wrapper USER/Client USER/Client CM (Integrated View) Mediator Engine FL rule proc. LP rule proc. Graph proc. XSB Engine CM(S) = OM(S)+KB(S)+CON(S) GCM CM S1 GCM CM S2 GCM CM S3 CM Queries & Results (exchanged in XML) Domain Maps DMs Domain Maps DMs Domain Maps DMs Domain Maps DMs Domain Maps DMs Process Maps PMs “Glue” Maps GMs semantic context CON(S) Integrated View Definition IVD Model-Based Mediator Architecture First results & Demos: KIND prototype, formal DM semantics, PMs [SSDBM00] [VLDB00] [ICDE01] [NIH-HB01] (w/ Gupta, Martone)

32 32 Domain Maps (Ontologies) as Glue Knowledge Sources Domain Map = Ontology –representation of terminological knowledge Use in Model-Based Mediation –(derived) concepts as “drop points”, “anchor points”, “context” for source classes –compile-time use: view definition, subsumption, classification,... –runtime use: querying/deduction, path queries,.... Formalisms: –Semantic nets, Thesauri, Frame-logic, Description logics,...

33 33 Ontologies So what is an Ontology? –definition of things that are relevant to your application –representation of terminological knowledge (“TBox”) –explicit specification of a conceptualization –concept hierarchy (“is-a”) –further semantic relationships between concepts –abstractions of relational schemas, (E)ER, UML classes, XML Schemas Examples: –NCMIR ANATOM –GO (Gene Ontology) –UMLS (Unified Medical Language System –CYC

34 34 Formalism for Ontologies: Description Logic DL definition of “Happy Father” (Example from Ian Horrocks, U Manchester, UK)

35 35 Description Logics Terminological Knowledge (TBox) –Concept Definition (naming of concepts): –Axiom (constraining of concepts): => a mediators “glue knowledge source” Assertional Knowledge (ABox) –the marked neuron in image 27 => the concrete instances/individuals of the concepts/classes that your sources export

36 36 Description Logic Statements as F-logic Rules In F-logic: X : happyFather :-- X : man, (X..child) : blue, (X..child) : green, not ( (X..child) : poorunhappyChild ). C : poorunhappyChild :-- not C : rich, not C : happy. Alternatively: DLs as fragments of First-Order Logic

37 37 Querying vs. Reasoning Querying: –given a DB instance I (= logic interpretation), evaluate a query expression (e.g. SQL, FO formula, Prolog program,...) –boolean query: check if I |=  (i.e., if I is a model of  ) –(ternary) query: { (X, Y, Z) | I |=  (X,Y,Z) } => check happyFathers in a given database Reasoning: –check if I |=  implies I |=  for all databases I, –i.e., if  =>  –undecidable for FO, F-logic, etc. –Descriptions Logics are decidable fragments  concept subsumption, concept hierarchy, classification  semantic tableaux, resolution, specialized algorithms

38 38 What’s in an Answer? (What’s in a Link? revisited) Semantic/Rule-Based Joins –  (X,Y,[produces,B,increased_in]) := X produces B, B increased_in Y. rule-based e.g., X=  - secretase, B=beta amyloid, Y=Alzheimer’s disease What is the Erdoes number of person P? –3–3 Really? Why? –authority based: said so –faith based: don’t know but believe firmly –query statement Q =... derived it from DB I –query Q =... derived it from DB I and KB T using derivation D => logic-based systems often “come with explanations” (“computations as proofs”) X Y 

39 39 Formalizing Glue Knowledge: Domain Map for SYNAPSE and NCMIR Domain Map = labeled graph with concepts ("classes") and roles ("associations") additional semantics: expressed as logic rules (F-logic) Domain Map = labeled graph with concepts ("classes") and roles ("associations") additional semantics: expressed as logic rules (F-logic) Domain Map (DM) Purkinje cells and Pyramidal cells have dendrites that have higher-order branches that contain spines. Dendritic spines are ion (calcium) regulating components. Spines have ion binding proteins. Neurotransmission involves ionic activity (release). Ion-binding proteins control ion activity (propagation) in a cell. Ion-regulating components of cells affect ionic activity (release). Domain Expert Knowledge DM in Description Logic

40 40 Source Contextualization & DM Refinement In addition to registering (“hanging off”) data relative to existing concepts, a source may also refine the mediator’s domain map...  sources can register new concepts at the mediator...

41 Example: ANATOM Domain Map

42 42 Browsing Registered Data with Domain Maps

43 43 Compilation : Domain Maps => F-Logic Rules  Domain Maps ~ Ontologies DMs have a formal semantics via a translation to F- Logic (~ Datalog + OO features) => Declarative + “Executable” Specification query evaluation with deductive rules reasoning over decidable fragments: checking concept subsumption, equivalence

44 44 Frame-Logic Example Schema and Instances

45 45 Schema Level (“Ontology”) Instance Level (DB Instance) F-Logic Queries

46 Query Processing “Demo” Query results in context Contextualization CON(Result) wrt. ANATOM. provided by the domain expert and mediation engineer deductive OO language (here: F-logic) provided by the domain expert and mediation engineer deductive OO language (here: F-logic)

47 Example: Inside Query Evaluation push selection @SENSELAB: X1 := select targets of “output from parallel fiber” ; determine source context @MEDIATOR: X2 := “find and situate” X1 in ANATOM Domain Map; compute region of interest (here: downward closure) @MEDIATOR: X3 := subregion-closure(X2); push selection @NCMIR: X4 := select PROT-data(X3, Ryanodine Receptors); compute protein distribution @MEDIATOR: X5 := compute aggregate(X4); display in context @MEDIATOR/GUI: display X5 in context (ANATOM) "How does the parallel fiber output (Yale/SENSELAB) relate to the distribution of Ryanodine Receptors (UCSD/NCMIR)?”

48 48 Some Open Database & Knowledge Representation Issues Mix of Query Processing and Reasoning –FaCT description logic reasoner for DMs? –or reconcilation of DMs via argumentation-frameworks (“games”) using well-founded and stable models of logic programs [ICDT97,PODS97,TCS00] Modeling “Process Knowledge” => Process Maps –formal semantics? (dynamic/temporal/Kripke models?) –executable semantics? (Statelog?) Graph Queries over DMs and PMs –expressible in F-logic [InfSystem98] –scalability? (UMLS Domain Map has millions of entries)...

49 49 Process Maps with Abstractions and Elaborations: => From Terminological to Procedural Glue nodes ~ states edges ~ processes, transitions blue/red edges: processes in Src1/Src2 general form of edges: how about these?

50 50 Summary: Mediation Scenarios & Techniques Federated Databases XML-Based Mediation Model-Based Mediation One-World One-/Multiple-Worlds Complex Multiple-Worlds Common Schema Mediated Schema Common Glue Maps SQL, rules XML query languages DOOD query languages Schema Transformations Syntax-Aware Mappings Semantics-Aware Mappings Syntactic Joins Syntactic Joins “Semantic” Joins via Glue Maps DB expertDB expert KRDB + domain expert

51 51 Models and Formal Approaches: Relating Theory to the World ©2000 by John F. Sowa, http://www.jfsowa.com/krbook/, Knowledge Representation: Logical, Philosophical, and Computational Foundations, Brooks/Cole, Pacific Grove, CA.http://www.jfsowa.com/krbook/Knowledge Representation: Logical, Philosophical, and Computational Foundations All models are wrong, but some are useful!

52 52 Questions? Queries?

53 53 Some References XML-Based and Model-Based Mediation: –MBM: Model-Based Mediation with Domain Maps, B. Ludäscher, A. Gupta, M. E. Martone, 17th Intl. Conference on Data Engineering (ICDE), Heidelberg, Germany, IEEE Computer Society,2001.Model-Based Mediation with Domain Maps(ICDE) –VXD/Lazy Mediaors: Navigation-Driven Evaluation of Virtual Mediated Views, B. Ludäscher, Y. Papakonstantinou, P. Velikhov, Intl. Conference on Extending Database Technology (EDBT), Konstanz, Germany, LNCS 1777, Springer, 2000.Navigation-Driven Evaluation of Virtual Mediated Views (EDBT) –DOOD: Managing Semistructured Data with FLORID: A Deductive Object-Oriented Perspective, B. Ludäscher, R. Himmeröder, G. Lausen, W. May, C. Schlepphorst, Information Systems, 23(8), Special Issue on Semistructured Data, 1998.Managing Semistructured Data with FLORID: A Deductive Object-Oriented PerspectiveInformation Systems, 23(8), Special Issue on Semistructured Data STATELOG (Logic Programming with States) –On Active Deductive Databases: The Statelog Approach, G. Lausen, B. Ludäscher, and W. May. In Transactions and Change in Logic Databases, Hendrik Decker, Burkhard Freitag, Michael Kifer, and Andrei Voronkov, editors. LNCS 1472, Springer, 1998.On Active Deductive Databases: The Statelog ApproachTransactions and Change in Logic Databases Argumentation Frameworks as Games –Games and Total DatalogNeg Queries, J. Flum, M. Kubierschky, B. Ludäscher, Theoretical Computer Science, 239(2), pp.257-276, Elsevier, 2000.Games and Total DatalogNeg QueriesTheoretical Computer Science –Referential Actions as Logical Rules, B. Ludäscher, W. May, G. Lausen, Proc. 16th ACM Symposium on Principles of Database Systems (PODS'97), Tucson, Arizona, ACM Press, 1997.Referential Actions as Logical Rules(PODS'97)


Download ppt "Model-Based Mediation: Framework and Challenges Bertram Ludäscher Data and Knowledge Systems San Diego Supercomputer Center U.C. San."

Similar presentations


Ads by Google