Presentation is loading. Please wait.

Presentation is loading. Please wait.

Data Exchange with Data-Metadata Translations MAD Algorithm Paolo Papotti Mauricio A. Mauricio A. Hernández Wang-ChiewTan.

Similar presentations


Presentation on theme: "Data Exchange with Data-Metadata Translations MAD Algorithm Paolo Papotti Mauricio A. Mauricio A. Hernández Wang-ChiewTan."— Presentation transcript:

1 Data Exchange with Data-Metadata Translations MAD Algorithm Paolo Papotti Mauricio A. Mauricio A. Hernández Wang-ChiewTan

2 Data Exchange ST database  Data may be stored in different databases. Each database has its own schema. We are interested in representing data of one schema in terms of other schema. Source schema S Target schema S

3 Data Exchange Problem  Major data exchange problem is dealing with translations algorithm between schemas:  Given source and target schema in priori.  Given dependencies specified between two schemas Source schema S Target schema T Translation algorithm which generates good mapping to low level language (XQuery,XSLT).  good mapping saves constrains,dependencies,data. ∑

4 The Data Exchange Problem Low-level mapping (Queries) ST Source schema S Target schema S High-level mapping (Dependencies) Mapping Algorithm  How to restructure data from a source schema to a target schema, according to a given visual specification?

5 Data Exchange Model  Visual specification is a schema representation of table content and relations  structure, root, constrains.  XML, DTD, Relational Source: Rcd Sales: SetOfRcd country region style shipdate units price  Nested Relational(NR) model: Source.Sales country region style shipdate units price USA East Tee 12-07 11 1200 USA East Elec. 12-07 12 3600 USA West Tee 01-08 10 1600 UK West Tee 02-08 12 2000

6  Includes atomic types α,set types setOf[α] and Rcd[α 1,…α k ]  dynamic and placeholders in case of metadata  Constraints conformation expenseDB: Rcd companies: Set of Rcd company: Rcd cid name city grants: Set of Rcd grant: Rcd grantee pi amount proj projects: Set of Rcd project: Rcd name year Nested Relational (NR) model

7  To perform translation, we must understand how two schemas correspond to each other. Simplest form of correspondence - value correspondence.  Value Correspondence is a pair of source element and target element. Mapping Problem - Example Target: Rcd CountrySales: SetOf Rcd country Sales: SetOf Rcd style shipdate units id Source: Rcd Sales: SetOf Rcd country region style shipdate units price

8 Mapping Generation Algorithm: –Input: Source and Target schemas, and correspondences. –Output: declarative schema mapping For example: Mapping Example Source: Rcd Sales: SetOf Rcd country region style shipdate units price Target: Rcd CountrySales: SetOf Rcd country Sales: SetOf Rcd style shipdate units id for $s in Source.Sales exists $t in Target.CountrySales, $c in $t.Sales where $t.country = $s.country and $c.style = $s.style and $c.shipdate = $s.shipdate and $c.units = $s.units

9 expenseDB: Rcd companies: Set of Rcd company: Rcd cid name city grants: Set of Rcd grant: Rcd grantee pi amount proj projects: Set of Rcd project: Rcd name year Nested Relational (NR) model statDB: Set of Rcd cityStat: Rcd orgs: Set of Rcd org: Rcd cid name fundings: Set of Rcd funding: Rcd pi aid financials: Set of Rcd financial: Rcd aid amount proj year

10 expenseDB: Rcd companies: Set of Rcd company: Rcd cid name city grants: Set of Rcd grant: Rcd grantee pi amount proj projects: Set of Rcd project: Rcd name year Linked Translations statDB: Set of Rcd cityStat: Rcd orgs: Set of Rcd org: Rcd cid name fundings: Set of Rcd funding: Rcd pi aid financials: Set of Rcd financial: Rcd aid amount proj year

11 Before performing schema mapping,correspondences should be interpreted semantically for source and target.  Primary Paths,Constrains,Logical Relations. Semantic Translation Model expenseDB: Rcd companies: Set of Rcd company: Rcd cid name city grants: Set of Rcd grant: Rcd grantee pi amount proj Semantic association are represented in two ways  Attributes organization into tables.  Attributes within different tables associated using foreign key dependencies.

12 expenseDB: Rcd companies: Set of Rcd company: Rcd cid name city grants: Set of Rcd grant: Rcd grantee pi amount proj projects: Set of Rcd project: Rcd name year Primary path Translation statDB: Set of Rcd cityStat: Rcd orgs: Set of Rcd org: Rcd cid name fundings: Set of Rcd funding: Rcd pi aid financials: Set of Rcd financial: Rcd aid amount proj year

13 Constraints Translation expenseDB: Rcd companies: Set of Rcd company: Rcd cid name city grants: Set of Rcd grant: Rcd grantee pi amount proj projects: Set of Rcd project: Rcd name year Each constraint is of P 1 P 2 B where P 1 and P 2 are bodies of primary paths and B is an equality condition relating the two paths. for P 1 exists P 2 where B NRI - Nested Referential Integrity

14 Constraints Translation statDB: Set of Rcd cityStat: Rcd orgs: Set of Rcd org: Rcd cid name fundings: Set of Rcd funding: Rcd pi aid financials: Set of Rcd financial: Rcd aid amount proj year  Target schema constraints translation:

15  A logical relation is the result of chasing a primary path of a schema using its NRIs.  The chase is a relational method that enumerates logical joins based on dependencies on schemas. Chasing the primary path S 2 using the constraint r 1 can be represented by: S 2 : select * from g in expenseDB.grants Logical Relation r 1 : for g in expenseDB.grants exists c in expenseDB.companies where c.company.cid= g.grant.grantee S` 2 : select * from g in expenseDB.grants, c in expenseDB.companies, where c.company.cid= g.grant.grantee

16  Chase is ensuring to link all related attributes according to constraints. Logical Relation r 2 : for g in expenseDB.grants exists p in expenseDB.projects where p.project.name= g.grant.proj S` 2 : select * from g in expenseDB.grants, c in expenseDB.companies, where c.company.cid= g.grant.grantee S`` 2 : select * from g in expenseDB.grants, c in expenseDB.companies, p in expenseDB.projects where c.company.cid= g.grant.grantee and g.grant.proj= p.project.name

17 Logical Relations in our Example T1: select * from s in statDB T2: select * from s in statDB, o in s.cityStat.orgs T3: select * from s in statDB, o in s.cityStat.orgs, f in o.org.fundings, f in s.cityStat.financials where f.financial.aid= f.fund.aid T4: select * from s in statDB, f in s.cityStat.financials S 1 : select * from c in expenseDB.companies S 2 : select * from g in expenseDB.grants, c in expenseDB.companies, p in expenseDB.projects where c.company.cid= g.grant.grantee and p.project.name= g.grant.proj S 3 : select * from p in expenseDB.projects  All logical relations for source and target schemas  A 2 chased with r 1 and r 2.  B 3 chased with r 3.

18 Mapping Algorithm  Value correspondences between source and target schemas can be interpreted as simple referential constraints.  V 1 uses the primary paths S 1 from source and T 2 from target. V 1 : for c in expenseDB.companies exists s in statDB, o in s.cityStat.orgs where c.company.cname= o.org.name v 2 : for g in expenseDB.grants exists s in statDB, o in s.cityStat.orgs, f in o.org.fundings where g.grant.pi= f.fund.pi expenseDB: Rcd companies: Set of Rcd company: Rcd cid name city grants: Set of Rcd grant: Rcd grantee pi amount proj projects: Set of Rcd project: Rcd name year statDB: Set of Rcd cityStat: Rcd orgs: Set of Rcd org: Rcd cid name fundings: Set of Rcd funding: Rcd pi aid financials: Set of Rcd financial: Rcd aid amount proj v1v1 v2v2 v3v3

19 Clio Mapping

20 Basic Data Exchange Mapping

21 21 Data exchange scenarios may involve metadata transformations. Data-Metadata Translations Mapping systems support Data-to-Data transformations with fixed schemas. Goal: Extend mapping systems to support Data- Metadata Translations.

22 22 Source.Sales month USA UK Italy Jan 120 223 89 Feb 83 168 56 Target.Sales month country units Jan USA 120 Jan UK 223 Jan Italy 89 Feb USA 83 Feb UK 168 Feb Italy 56 m 1 : for $s in Source.Sales exists $t in Target.Sales where $t.month = $s.month and $t.country = “USA” and $t.units = $s.USA Metadata-to-Data Source: Rcd Sales: SetOf Rcd month USA UK Italy Target: Rcd Sales: SetOf Rcd month country units How can we transform the following source data into the corresponding target? Schema mapping m 1 “USA”

23 23 Source.Sales month USA UK Italy Jan 120 223 89 Feb 83 168 56 Target.Sales month country units Jan USA 120 Jan UK 223 Jan Italy 89 Feb USA 83 Feb UK 168 Feb Italy 56 m 1 : for $s in Source.Sales exists $t in Target.Sales where $t.month = $s.month and $t.country = “USA” and $t.units = $s.USA m 2 : for $s in Source.Sales exists $t in Target.Sales where $t.month = $s.month and $t.country = “UK” and $t.units = $s.UK Metadata-to-Data Source: Rcd Sales: SetOf Rcd month USA UK Italy Target: Rcd Sales: SetOf Rcd month country units How can we transform the following source data into the corresponding target? Schema mapping m 2 “UK”

24 24 Source.Sales month USA UK Italy Jan 120 223 89 Feb 83 168 56 Target.Sales month country units Jan USA 120 Jan UK 223 Jan Italy 89 Feb USA 83 Feb UK 168 Feb Italy 56 m 1 : for $s in Source.Sales exists $t in Target.Sales where $t.month = $s.month and $t.country = “USA” and $t.units = $s.USA m 2 : for $s in Source.Sales exists $t in Target.Sales where $t.month = $s.month and $t.country = “UK” and $t.units = $s.UK m 3 : for $s in Source.Sales exists $t in Target.Sales where $t.month = $s.month and $t.country = “Italy” and $t.units = $s.Italy Metadata-to-Data Source: Rcd Sales: SetOf Rcd month USA UK Italy Target: Rcd Sales: SetOf Rcd month country units How can we transform the following source data into the corresponding target? Schema mapping m 3 “Italy”

25 25 Source: Rcd Sales: SetOf Rcd month USA UK Italy Target: Rcd Sales: SetOf Rcd month country units  countries  label value Select the elements to group Placeholder Copy elements’ values Copy elements’ labels Source.Sales Jan 120 223 89 Feb 83 168 56 Target.Sales Jan USA 120 Jan UK 223 Jan Italy 89 Feb USA 83 Feb UK 168 Feb Italy 56 Set of labels (strings) Dynamic selection of the source element Is a label value for $s in Source.Sales, {“USA”, “UK”, “Italy”} $c in {“USA”, “UK”, “Italy”} exists $t in Target.Sales where $t.month = $s.month and $t.country = $c and $t.units = $s.($c) MetadatA-Data (MAD) mapping: Metadata-to-Data: Our solution

26 26 Target: Rcd Stockquotes: SetOf Rcd time  symbols  label value Source: Rcd StockTicker: SetOf Rcd time symbol price Dynamic element Now we want to support the opposite operation The target schema depends on the source data We define a target template: Nested Dynamic Output Schemas (ndos) Run-time: The dynamic element defines the target instance and the target schema. Data-to-Metadata

27 StockTicker (time: 0900, Symbol : MSFT, Price: 27.20 ) StockTicker (time: 0900, Symbol : IBM, Price: 120.00 ) StockTicker (time: 0905, Symbol : MSFT, Price: 27.30 ) There are two possible interpretations for the target ndos: Consider this mapping and this source instance: Stockquotes (time: 0900, MSFT: 27.20 ) Stockquotes (time: 0900, IBM: 120.00 ) Stockquotes (time: 0905, MSFT: 27.30 ) Target: Rcd Stockquotes: SetOf Rcd time symbols: Choice MSFT IBM Computed Target Instance Source Instance First alternative: Heterogeneous target records Computed Target Schema Data-to-Metadata: Heterogeneous records Target: Rcd Target: Rcd Stockquotes: SetOf Rcd Stockquotes: SetOf Rcd time time symbols  symbols  label label value value Source: Rcd StockTicker: SetOf Rcd StockTicker: SetOf Rcd time time symbol symbol price price

28 Target: Rcd Target: Rcd Stockquotes: SetOf Rcd Stockquotes: SetOf Rcd time time symbols  symbols  label label value value Source: Rcd StockTicker: SetOf Rcd StockTicker: SetOf Rcd time time symbol symbol price price StockTicker (time: 0900, Symbol : MSFT Price: 27.20 ) StockTicker (time: 0900, Symbol : IBM Price: 120.00 ) StockTicker (time: 0905, Symbol : MSFT Price: 27.30 ) There are two possible interpretations for the target ndos: Data-to-Metadata: Homogenous records Consider this mapping and this source instance: Computed Target Instance Source Instance Computed Target Schema Target: Rcd Stockquotes: SetOf Rcd time MSFT IBM Stockquotes (time: 0900, MSFT: 27.20, IBM: null ) Stockquotes (time: 0900, MSFT: null, IBM: 120.00 ) Stockquotes (time: 0905, MSFT: 27.30, IBM: null ) Second alternative: Homogeneous target records

29 29 Natural solution for the Relational data model Stockquotes (time: 0900, MSFT : 27.20, IBM: null ) Stockquotes (time: 0900, MSFT : null, IBM: 120.00) Stockquotes (time: 0905, MSFT : 27.30, IBM: null ) Homogeneity Constraint: “For every pair of tuples t1 and t2, if a is a label in t1, then a is a label in t2” for $t1 in Target.Stockquotes, $t2 in Target.Stockquotes, $a in dom ($t1) exists $a’ in dom ($t2) where $a = $a’ Stockquotes (time: 0900, MSFT : 27.20 ) Stockquotes (time: 0900, IBM : 120.00 ) Stockquotes (time: 0905, MSFT : 27.30 ) Natural solution for semi- structured data models (XSD, DTD, JSON) Data-to-Metadata: Homogenous records Target: Rcd Target: Rcd Stockquotes: SetOf Rcd Stockquotes: SetOf Rcd time time symbols  symbols  label label value value Source: Rcd StockTicker: SetOf Rcd StockTicker: SetOf Rcd time time symbol symbol price price

30 30 Source.Sales country region style shipdate units price USA East Tee 12-07 11 1200 USA East Elec. 12-07 12 3600 USA West Tee 01-08 10 1600 UK West Tee 02-08 12 2000 Data-to-Metadata Mapping Target: Rcd Target: Rcd ByShipdateCountry: SetOf Choice ByShipdateCountry: SetOf Choice dates  dates  label 1 label 1 value 1 : Rcd value 1 : Rcd countries  countries  label 2 label 2 value 2 : SetOf Rcd value 2 : SetOf Rcd style style units units price price Source: Rcd Sales: SetOf Rcd Sales: SetOf Rcd country country region region style style shipdate shipdate units units price price Tee 11 1200 Elec. 12 3600 Tee 10 1600 Tee 12 2000 Tee 11 1200 Elec. 12 3600 Tee 10 1600 Tee 12 2000

31 31 MAD Mapping MetadatA-Data(MAD) mapping three steps: 1.Tableaux  Set of logical relations for source and target schemas with extended expressions of placeholders and dynamic elements >.  Tableaux for > includes the metadata label and the value label of >. Source: Rcd SalesByCountries : SetOf Rcd month USA UK Italy Target: Rcd Sales: SetOf Rcd month country units  countries  label value { $x 1  Source.SalesByCountries, $x 2  >; $x 3 =$x 1.($x 2 ) }

32 32 MAD Mapping 2.Skeletons:  n x m matrix of skeletons is constructed for the set of source tableaux and the set of target tableaux while each entry(i,j) can be potential mapping. 3.Creating MAD Mapping:  At this stage, the value correspondences need to be matched against the tableaux in order to factor them into the appropriate skeletons. Source.Sales.country  Target.CountrySales.country Matched against one or more source tableaux Matched against one or more target tableaux

33 MAD Mapping  A correspondence path p 1 is said to match an absolute path p 2 on tableaux if p 2 is a prefix of p 1. After a match has been found, we then replace the longest possible suffix of the correspondence path with a variable in the tableau. Source.Sales.style  Target.CountrySales.Sales.style Source: Rcd Sales: SetOf Rcd country region style shipdate units price Target: Rcd CountrySales: SetOf Rcd country Sales: SetOf Rcd style shipdate units id { $y o  Target.CountrySales, $y 1  $y 0.Sales } Target tableaux { $x  Source.Sales } Source tableaux $x.style = $y 1.style

34 Source.SalesByCountries. >  Target.Sales.country Source.SalesByCountries.& >  Target.Sales.units MAD Mapping Generation Example Source: Rcd SalesByCountry: SetOf Rcd month USA UK Italy Target: Rcd Sales: SetOf Rcd month country units  countries  label value Source Tableaux: { $x 1  Source.SalesByCountry, $x 2  >; $x 3 :=$x 1.($x 2 ) } Target Tableaux: { $y 1  Target.Sales} for $x 1 in Source.SalesByCountry; $x 2  >; exists $y1 in Target.Sales where $y 1.month = $x 1.month and $y 1.country = $x 2 and $y 1.units = $x 1.($x 2 )

35 35 for $s in Source.Sales exists $t in Target.ByShipdateCountry, $y in  dates , $u in case $t of $y, $z in  countries , $v in $u.($z) where $y = $s.shipdate and $z= $s.country and $v.style = $s.style and $v.units = $s.units and $v.price = $s.price for $s in Source.Sales exists $t in Target.ByShipdateCountry, $y in  dates , $u in case $t of $y, $z in  countries , $v in $u.($z) where $y = $s.shipdate and $z= $s.country and $v.style = $s.style and $v.units = $s.units and $v.price = $s.price MAD Mapping Generation Target: Rcd Target: Rcd ByShipdateCountry: SetOf Choice ByShipdateCountry: SetOf Choice dates  dates  label 1 label 1 value 1 : Rcd value 1 : Rcd countries  countries  label 2 label 2 value 2 : SetOf Rcd value 2 : SetOf Rcd style style units units price price Source: Rcd Sales: SetOf Rcd Sales: SetOf Rcd country country region region style style shipdate shipdate units units price price This is what we get from Clio [PVMHF 02] 1.Modify schemas with dynamic placeholders 2.Compile mappings and match correspondences.

36 36 Formal MAD Algorithm

37 37 Formal MAD Algorithm

38 38 Source schema S Target schema T Declarative (internal) representation GUI XSLTJava Executable code (XSLT, XQuery, Java)  New construct to iterate over elements’ labels: placeholder  Target schema can be incomplete: nested dynamic output schema (ndos)  New constructs for the mapping language  New mapping & query generation algorithms  Including a query to generate the target schema. Data exchange with data-metadata support: Data to Data is a special case MAD vs Clio

39 39 Lots of related work in the relational setting: –FIRA/FISQL [Wyss,Robertson 2005] has an excellent survey. –SchemaSQL [Lakshmanan,Sadri,Subramanian 1996], FIRA/FISQL [Wyss,Robertson 2005] Extensions to SQL to handle metadata as data Only relational dynamic output schemas Language and semantics, NO transformations from GUI –Works about checking chase finite/infinite loop. Some Related Work

40 40 Thank you.


Download ppt "Data Exchange with Data-Metadata Translations MAD Algorithm Paolo Papotti Mauricio A. Mauricio A. Hernández Wang-ChiewTan."

Similar presentations


Ads by Google