Download presentation
Presentation is loading. Please wait.
Published byAudrey Cooper Modified over 8 years ago
1
iFuice – Information Fusion utilizing Instance Correspondences and Peer Mappings Erhard Rahm, Andreas Thor, David Aumueller, Hong-Hai Do, Nick Golovin, Toralf Kirsten University of Leipzig, Germany http://dbs.uni-leipzig.de
2
2 Motivating scenario Integrating... Citeseer ACM DBLP Eventseer Google Scholar Who published at SIGMOD as a PC member? Who are the candidates for the SIGMOD test of time award? Additional relationships / attributes (Eventseer, Google Scholar) Local file Who referenced publications of my favorite authors? Hand-picked private data (local file) PubMed SwissProt MIM What information system is used to support biological cancer anlaysis? Sources from different domains (SwissProt, MIM)
3
3 Schema vs. instance based integration Data integration using query mediator approach Mediated (global) schema Matching / views between global and local schemas Problems Construction/evolution of global schema Sources without or semi-structured schema Heterogeneous/dirty data, mapping to artificial schema Instance correspondences Represent semantic relationships between instances Allow integration of sources without schema Can be inferred by weblinks
4
4 iFuice approach Information Fusion utilizing Instance Correspondences and Peer Mappings Bottom up integration High-level operators Generic way to dynamic information fusion Mediator Controls mapping / operator execution Utilizes a domain model P2P-like infrastructure Correspondences between autonomous data sources Easy link-up of a new source "where it fits best"
5
5 Agenda Motivation & iFuice approach Meta data model Operators iFuice scripts Architecture Summary & outlook
6
6 Data sources Physical data source (PDS) Web data (DBLP), local data (files),... Splitted in logical data sources Logical data source (LDS) Refers to one object type Contains object instances Object instance Refers to real world entity Set of attributes One attribute is id Publication ConferenceAuthor DBLP Name: Generic schema matching with Cupid URL: http://vldb.org... Conference: VLDB 2001 Authors: Jayant Madhavan, Philip A. Bernstein, Erhard Rahm Publication DBLP
7
7 Mappings Directed relationship between LDS Meta data: meaning of the mapping Semantic mapping type e.g., "publications of author" Same mappings vs. association mappings same = "equality" relationship between PDS e.g., DBLP publication (id) ACM publication (id) Id mappings vs. query mappings Instance data: instance correspondences Materialized: mapping tables On-the-fly: execution result (e.g., from web service)
8
8 Metadata model Source mapping model Publication Conference Auhor DBLP Author Publication ACM Google Scholar Used by mediator for mapping/operator execution Domain model indicates available object types and relationships Author Publication Conference AuthPub PubAuth PubConf ConfPub CoAuthor Domain model extract LDSPDS mapping (same: ) Legend
9
9 Operators Query language capabilites + scripting support Set-oriented operators Input: set of object or mapping instances + parameters / query specification Output: set of object / mapping instances Can be combined bottom-up within scripts
10
10 Operators overview Object instances (OI) Query OI: queryInstances, queryMatch, attrTransf OI OI: getInstances, traverse, traverseSame, map Aggregated objects (AO) OI AO: agg, disagg, fuseAttributes AO AO: aggregateSame, aggregateTraverse, aggregateMap Generic union, diff, intersect domain, range, compose
11
11 Operators for object instances queryInstances executes a query on a peer $S := queryInstances (Conf@DBLP, Series="SIGMOD") returns all SIGMOD conferences from DBLP map executes a mapping map ($S, DBLP.ConfPubs) returns all tuples (conference, publication) traverse returns the range of a mapping $P := traverse ($S, DBLP.ConfPubs) returns all publications traverseSame "navigates" to corresponding objects of another physical source traverseSame ($P, GoogleScholar) returns "equal" publications at GoogleScholar
12
12 Name: Generic schema matching with Cupid URL: http://vldb.org... Conference: VLDB 2001 Authors: Jayant Madhavan, Philip A. Bernstein, Erhard Rahm Publication DBLP Instance fusion Name: Generic schema matching with Cupid URL: http:// data.cs.washington.edu... NoOfCit: 243 Authors: J Madhavan, PA Bernstein, E Rahm Publication GS Object instances referring to the same real world object Aggregated object Auxillary fusion operators agg / disagg, fuseAttributes Generic schema matching with Cupid http://vldb.org... http:// data.cs.washington.edu... Jayant Madhavan, Philip A. Bernstein, Erhard Rahm J Madhavan, PA Bernstein, E Rahm VLDB 2001 243 Publication DBLP GS Name: URL: Authors: Conf.: NoOfCit: DBLP GS DBLP GS fuseAttributes Name: Generic schema matching with Cupid URL: http://vldb.org... Conference: VLDB 2001 Authors: Jayant Madhavan, Philip A. Bernstein, Erhard Rahm Name: Generic schema matching with Cupid URL: http:// data.cs.washington.edu... NoOfCit: 243 Authors: J Madhavan, PA Bernstein, E Rahm Publication DBLP GS agg
13
13 Operators for aggregated objects aggregateSame Identify corresponding objects in another source (traverseSame) Aggregate resulting objects with input objects (agg) aggregateSame ($P, GoogleScholar) returns AOs of (DBLP + GoogleScholar) publications Name: Generic schema matching with Cupid URL: http://vldb.org... Conference: VLDB 2001 Authors: Jayant Madhavan, Philip A. Bernstein, Erhard Rahm Publication DBLP Name: Generic schema matching with Cupid URL: http:// data.cs.washington.edu... NoOfCit: 243 Authors: J Madhavan, PA Bernstein, E Rahm Publication GS traverse Same Name: Generic schema matching with Cupid URL: http://vldb.org... Conference: VLDB 2001 Authors: Jayant Madhavan, Philip A. Bernstein, Erhard Rahm Name: Generic schema matching with Cupid URL: http:// data.cs.washington.edu... NoOfCit: 243 Authors: J Madhavan, PA Bernstein, E Rahm Publication DBLP GS agg
14
14 iFuice scripts Batch execution of operators Store (intermediate) results in variables Scripts can be interpreted as mappings Other scripts can utilize iFuice "script mappings" Example: SIGMOD test of time award $ SIGMODPubs := queryTraverse (LDS=DBLP.Conf, {Name="SIGMOD 1995"}, DBLPConfPubs) $CombinedConfPub := aggregateSame ( $SIGMODPubs, GoogleScholar) $CleanedPubs := fuseAttributes ($CombinedConfPub ) $Result := sort ($ CleanedPubs, "NoOfCitings")
15
15 Example: SIGMOD test of time award
16
16 Mediator architecture Mapping handler Duplicate detection Fusion control unit mapping results Cache Meta data model Repository store load mapping callmapping result load iFuice mediator Application Bio navigatoriFuice mediator Personal Infor- mation Manager Script / batchInteractive (step by step) Mediator interface requestresponse Web service or java library Web serviceSQL queryJava class Mapping execution service Wrap different map- ping implementations iFuice script
17
17 Summary & outlook iFuice : generic way to dynamic information fusion Based on instance correspondences of P2P sources Mediator controled data fusion Two working modes Script mode: powerful operators for information fusion tasks (with source selection or transparent) Explorative mode: navigation in information space Future work Finishing prototype implementation Different domains, e.g., bioinformatics and e-commerce Tool supported (semi-) automatic integration of local / private data sources
Similar presentations
© 2025 SlidePlayer.com Inc.
All rights reserved.