Presentation is loading. Please wait.

Presentation is loading. Please wait.

IFuice – Information Fusion utilizing Instance Correspondences and Peer Mappings Erhard Rahm, Andreas Thor, David Aumueller, Hong-Hai Do, Nick Golovin,

Similar presentations


Presentation on theme: "IFuice – Information Fusion utilizing Instance Correspondences and Peer Mappings Erhard Rahm, Andreas Thor, David Aumueller, Hong-Hai Do, Nick Golovin,"— Presentation transcript:

1 iFuice – Information Fusion utilizing Instance Correspondences and Peer Mappings Erhard Rahm, Andreas Thor, David Aumueller, Hong-Hai Do, Nick Golovin, Toralf Kirsten University of Leipzig, Germany http://dbs.uni-leipzig.de

2 2 Motivating scenario  Integrating... Citeseer ACM DBLP Eventseer Google Scholar Who published at SIGMOD as a PC member? Who are the candidates for the SIGMOD test of time award?  Additional relationships / attributes (Eventseer, Google Scholar) Local file Who referenced publications of my favorite authors?  Hand-picked private data (local file) PubMed SwissProt MIM What information system is used to support biological cancer anlaysis?  Sources from different domains (SwissProt, MIM)

3 3 Schema vs. instance based integration  Data integration using query mediator approach  Mediated (global) schema  Matching / views between global and local schemas  Problems  Construction/evolution of global schema  Sources without or semi-structured schema  Heterogeneous/dirty data, mapping to artificial schema  Instance correspondences  Represent semantic relationships between instances  Allow integration of sources without schema  Can be inferred by weblinks

4 4 iFuice approach  Information Fusion utilizing Instance Correspondences and Peer Mappings  Bottom up integration  High-level operators  Generic way to dynamic information fusion  Mediator  Controls mapping / operator execution  Utilizes a domain model  P2P-like infrastructure  Correspondences between autonomous data sources  Easy link-up of a new source "where it fits best"

5 5 Agenda  Motivation & iFuice approach  Meta data model  Operators  iFuice scripts  Architecture  Summary & outlook

6 6 Data sources  Physical data source (PDS)  Web data (DBLP), local data (files),...  Splitted in logical data sources  Logical data source (LDS)  Refers to one object type  Contains object instances  Object instance  Refers to real world entity  Set of attributes  One attribute is id Publication ConferenceAuthor DBLP Name: Generic schema matching with Cupid URL: http://vldb.org... Conference: VLDB 2001 Authors: Jayant Madhavan, Philip A. Bernstein, Erhard Rahm Publication DBLP

7 7 Mappings  Directed relationship between LDS  Meta data: meaning of the mapping  Semantic mapping type  e.g., "publications of author"  Same mappings vs. association mappings  same = "equality" relationship between PDS  e.g., DBLP publication (id)  ACM publication (id)  Id mappings vs. query mappings  Instance data: instance correspondences  Materialized: mapping tables  On-the-fly: execution result (e.g., from web service)

8 8 Metadata model Source mapping model Publication Conference Auhor DBLP Author Publication ACM Google Scholar  Used by mediator for mapping/operator execution  Domain model indicates available object types and relationships Author Publication Conference AuthPub PubAuth PubConf ConfPub CoAuthor Domain model extract LDSPDS mapping (same: ) Legend

9 9 Operators  Query language capabilites + scripting support  Set-oriented operators  Input: set of object or mapping instances + parameters / query specification  Output: set of object / mapping instances  Can be combined bottom-up within scripts

10 10 Operators overview  Object instances (OI)  Query  OI: queryInstances, queryMatch, attrTransf  OI  OI: getInstances, traverse, traverseSame, map  Aggregated objects (AO)  OI  AO: agg, disagg, fuseAttributes  AO  AO: aggregateSame, aggregateTraverse, aggregateMap  Generic  union, diff, intersect  domain, range, compose

11 11 Operators for object instances  queryInstances executes a query on a peer  $S := queryInstances (Conf@DBLP, Series="SIGMOD") returns all SIGMOD conferences from DBLP  map executes a mapping  map ($S, DBLP.ConfPubs) returns all tuples (conference, publication)  traverse returns the range of a mapping  $P := traverse ($S, DBLP.ConfPubs) returns all publications  traverseSame "navigates" to corresponding objects of another physical source  traverseSame ($P, GoogleScholar) returns "equal" publications at GoogleScholar

12 12 Name: Generic schema matching with Cupid URL: http://vldb.org... Conference: VLDB 2001 Authors: Jayant Madhavan, Philip A. Bernstein, Erhard Rahm Publication DBLP Instance fusion Name: Generic schema matching with Cupid URL: http:// data.cs.washington.edu... NoOfCit: 243 Authors: J Madhavan, PA Bernstein, E Rahm Publication GS  Object instances referring to the same real world object  Aggregated object  Auxillary fusion operators  agg / disagg, fuseAttributes Generic schema matching with Cupid http://vldb.org... http:// data.cs.washington.edu... Jayant Madhavan, Philip A. Bernstein, Erhard Rahm J Madhavan, PA Bernstein, E Rahm VLDB 2001 243 Publication DBLP GS Name: URL: Authors: Conf.: NoOfCit: DBLP GS DBLP GS fuseAttributes Name: Generic schema matching with Cupid URL: http://vldb.org... Conference: VLDB 2001 Authors: Jayant Madhavan, Philip A. Bernstein, Erhard Rahm Name: Generic schema matching with Cupid URL: http:// data.cs.washington.edu... NoOfCit: 243 Authors: J Madhavan, PA Bernstein, E Rahm Publication DBLP GS agg

13 13 Operators for aggregated objects  aggregateSame  Identify corresponding objects in another source (traverseSame)  Aggregate resulting objects with input objects (agg)  aggregateSame ($P, GoogleScholar) returns AOs of (DBLP + GoogleScholar) publications Name: Generic schema matching with Cupid URL: http://vldb.org... Conference: VLDB 2001 Authors: Jayant Madhavan, Philip A. Bernstein, Erhard Rahm Publication DBLP Name: Generic schema matching with Cupid URL: http:// data.cs.washington.edu... NoOfCit: 243 Authors: J Madhavan, PA Bernstein, E Rahm Publication GS traverse Same Name: Generic schema matching with Cupid URL: http://vldb.org... Conference: VLDB 2001 Authors: Jayant Madhavan, Philip A. Bernstein, Erhard Rahm Name: Generic schema matching with Cupid URL: http:// data.cs.washington.edu... NoOfCit: 243 Authors: J Madhavan, PA Bernstein, E Rahm Publication DBLP GS agg

14 14 iFuice scripts  Batch execution of operators  Store (intermediate) results in variables  Scripts can be interpreted as mappings  Other scripts can utilize iFuice "script mappings"  Example: SIGMOD test of time award $ SIGMODPubs := queryTraverse (LDS=DBLP.Conf, {Name="SIGMOD 1995"}, DBLPConfPubs) $CombinedConfPub := aggregateSame ( $SIGMODPubs, GoogleScholar) $CleanedPubs := fuseAttributes ($CombinedConfPub ) $Result := sort ($ CleanedPubs, "NoOfCitings")

15 15 Example: SIGMOD test of time award

16 16 Mediator architecture Mapping handler Duplicate detection Fusion control unit mapping results Cache Meta data model Repository store load mapping callmapping result load iFuice mediator Application Bio navigatoriFuice mediator Personal Infor- mation Manager Script / batchInteractive (step by step) Mediator interface requestresponse Web service or java library Web serviceSQL queryJava class Mapping execution service Wrap different map- ping implementations iFuice script

17 17 Summary & outlook  iFuice : generic way to dynamic information fusion  Based on instance correspondences of P2P sources  Mediator controled data fusion  Two working modes  Script mode: powerful operators for information fusion tasks (with source selection or transparent)  Explorative mode: navigation in information space  Future work  Finishing prototype implementation  Different domains, e.g., bioinformatics and e-commerce  Tool supported (semi-) automatic integration of local / private data sources


Download ppt "IFuice – Information Fusion utilizing Instance Correspondences and Peer Mappings Erhard Rahm, Andreas Thor, David Aumueller, Hong-Hai Do, Nick Golovin,"

Similar presentations


Ads by Google