Presentation is loading. Please wait.

Presentation is loading. Please wait.

Muse: A System for Understanding and Designing Mappings Bogdan Alexe Laura Chiticariu Renée J. Miller Daniel Pepper Wang-Chiew Tan UC Santa Cruz U. of.

Similar presentations


Presentation on theme: "Muse: A System for Understanding and Designing Mappings Bogdan Alexe Laura Chiticariu Renée J. Miller Daniel Pepper Wang-Chiew Tan UC Santa Cruz U. of."— Presentation transcript:

1 Muse: A System for Understanding and Designing Mappings Bogdan Alexe Laura Chiticariu Renée J. Miller Daniel Pepper Wang-Chiew Tan UC Santa Cruz U. of Toronto UC Santa Cruz Motivation Extensions Choosing Desired Mapping Interpretation with Muse-D Designing Nesting Semantics with Muse-G Muse Overview Schema mapping = relationship between a source database schema and a target database schema Designing a schema mapping is a fundamental problem in information integration Specifying a semantically correct schema mapping is usually a complex task Automatic tools can suggest potential mappings Ensuring mapping correctness still requires intricate manual work Few tools are available for helping a designer understand and design alternative mappings CompDB: Rcd Companies: Set of Company: Rcd cid cname location Projects: Set of Project: Rcd pid pname cid manager Employees: Set of Employee: Rcd eid ename contact OrgDB: Rcd Orgs: Set of Org: Rcd oname Projects: Set of Project: Rcd pname manager Employees: Set of Employee: Rcd eid ename f1f1 f2f2 m 1 : for c in CompDB.Companies exists o in OrgDB.Orgs where c.cname=o.oname and o.Projects = SKProjs(c.cid,c.cname,c.location) m 2 : for c in CompDB.Companies, p in CompDB.Projects, e in CompDB.Employees satisfy p.cid=c.cid and e.eid=p.manager exists o in OrgDB.Orgs, p 1 in o.Projects, e 1 in OrgDB.Employees satisfy p 1.manager=e 1.eid where c.cname=o.oname and e.eid=e 1.eid and e.ename=e 1.ename and p.pname=p 1.pname and o.Projects = SKProjs( ) m 3 : for e in CompDB.Employees exists e 1 in OrgDB.Employees where e.eid = e 1.eid and e.ename=e 1.ename CompDB: Rcd Projects: Set of Project: Rcd pid pname manager tech-lead Employees: Set of Employee: Rcd eid ename contact OrgDB: Rcd Projects: Set of Project: Rcd pname supervisor m a : for p in CompDB.Projects, e1 in CompDB.Employees, e2 in CompDB.Employees satisfy e1.eid=p.manager and e2.eid=p.tech-lead exists p1 in OrgDB.Projects where p.pname=p1.pname and (e1.ename=p1.supervisor or e2.ename=p1.supervisor) and (e1.contact=p1. or e2.contact=p1. ) Example source: Projects P1 DB e4 e5 Employees e4 John e5 Anna Choice values for supervisor and (the designer makes one selection for each attribute) Nesting semantics are expressed through grouping functions, which are defined for each nested set in the target schema A grouping function is a form of Skolem function, with atomic attributes as parameters Example grouping function from mapping m 2 SKProjs( ) : target Project records are grouped according to the values of all attributes of the Company, Project and Employee source records Example: Designing the grouping function for the target Projects set Suppose the set of possible arguments is S = {cid, cname, location} Muse-G probes every attribute in S At each probe, a small carefully chosen source instance is considered, from which two differentiating target instances are obtained: one includes the probed attribute in the grouping function (Scenario 1 below), and the other omits it (Scenario 2 below). Example source: Companies 11 IBM NY 12 IBM NY Projects P1 DB 11 e4 P2 Web 12 e5 Employees e4 John x234 e5 Anna x888 Target instances: Scenario 1: OrgDB Orgs IBM Projects:SK(11,y) DB e4 IBM Projects:SK(12,y) Web e5 Employees e4 John e5 Anna Scenario 2: OrgDB Orgs IBM Projects:SK(y) DB e4 Web e5 Employees e4 John e5 Anna y subset of {IBM,NY} Step 1: Probing on the cid attribute The designer chooses scenario 2 (excludes cid from the grouping function) Example source: Companies 11 IBM NY 14 SBC NY Projects P1 DB 11 e4 P4 WiFi 14 e6 Employees e4 John x234 e6 Kat x331 Target instances: Scenario 1: OrgDB Orgs IBM Projects:SK(IBM,y) DB e4 SBC Projects:SK(SBC,y) WiFi e6 Employees e4 John e6 Kat Scenario 2: OrgDB Orgs IBM Projects:SK(y) DB e4 WiFi e6 SBC Projects:SK(y) DB e4 WiFi e6 Employees e4 John e6 Kat y subset of {NY} Step 2: Probing on the cname attribute The designer chooses scenario 1 (includes cname in the grouping function) Example source: Companies 11 IBM NY 13 IBM SF Projects P1 DB 11 e4 P2 Web 13 e5 Employees e4 John x234 e5 Anna x888 Target instances: Scenario 1: OrgDB Orgs IBM Projects:SK(IBM,NY) DB e4 IBM Projects:SK(IBM,SF) Web e5 Employees e4 John e5 Anna Scenario 2: OrgDB Orgs IBM Projects:SK(IBM) DB e4 Web e5 Employees e4 John e5 Anna Step 3: Probing on the location attribute The designer chooses scenario 2 (excludes location from the grouping function) Ambiguous mapping: The mapping scenario on the left is ambiguous: it can be interpreted in several ways e.g. the project supervisor can be either the manager or the tech-lead In total, there are four alternative interpretations Key idea of Muse-D: provide an example source instance to illustrate the four interpretations in a compact way Target instance: Orgs: Projects: DB John Anna Muse-G can take advantage of constraints on the source schema (such as keys, and more generally, functional dependencies) The designer can refine the desired nesting semantics incrementally Muse is a mapping design wizard that uses data examples to help designers understand, design and refine schema mappings In Muse, the designer works with data examples rather than with complex specifications to understand the semantics of a mapping Muse uses real data examples whenever possible, otherwise it constructs synthetic examples Muse consists of two components: Muse-G (design of desired nesting semantics for mappings) and Muse-D (choosing the desired interpretation of ambiguous mappings) Conclusion: the desired grouping function for Projects is SK(cname)


Download ppt "Muse: A System for Understanding and Designing Mappings Bogdan Alexe Laura Chiticariu Renée J. Miller Daniel Pepper Wang-Chiew Tan UC Santa Cruz U. of."

Similar presentations


Ads by Google