Download presentation
Presentation is loading. Please wait.
Published byShanna McDaniel Modified over 10 years ago
1
1 Matching and Reuse of XML Schemas
2
2 Sample XML Schema
3
3 What is XML schema matching Matching – identifying the relations among the corresponding elements of two schemas e.g. customer/firstName client/name/first customer/name concatenate (client/name/first, client/name/last) Calculate the distance between two Schemas E.g., distance between customer.xsd and client.xsd is 0.67.
4
4 Why XML Schema matching From data integration point of view: Purpose: Automatically identifying corresponding elements between two schemas Relevant works: Database schema matching/mapping, e.g., A. Doan, et al., Reconciling schemas of disparate data sources: A machine-learning approach. SIGMOD, 2001 Generic schema mapping, e.g., J. Madhavan, P. A. Bernstein, E. Rahm. Generic schema matching with Cupid. VLDB, 2001. XML Schema matching. E.g. H. Do, E. Rahm. COMA A system for flexible combination of schema matching approaches. VLDB 2002. From web service composition point of view e.g., matching the output type of one service with the input of another in sequential composition From software reuse point of view: Purpose: Build XML Schema categories and search engines; Relevant works: Software component search: A Mili, R Mili, RT Mittermeir, A survey of software reuse libraries, Annals of Software Engineering, 1998. Agent and service matching: Katia Sycara, Jianguo Lu, Matthias Klusch, Interoperability among Heterogeneous Software Agents on the Internet, Technical Report CMU-RI-TR- 98-22, CMU.
5
5 What are the problems Modelling As graph As tree matching Node similarity Name, type, cardinality. Structure similarity Tree edit distance K. Zhang, D. Shasha. Simple fast algorithms for the editing distance between trees and related problems. SIAM Journal of Computing, 1989.
6
6 Overview of our system XML Schema Name Similarity XML Schema ModellingStructural Relations Name Relations Results retrieval Node Relations Node Similarity Structural similarity
7
7 Three similarities WordNet, string matching Hungarian method Name Similarity Node Similarity Structural Similarity Node name Hierarchical structure Compatibility tables User-defined data type Built-in data type Cardinality Tree matching algorithm
8
8 Modelling Model schemas as trees
9
9 Modelling customerOrder shipping billing address date ship2Add date bill2Add street province postcode schema reference paper author title contents refNo paper customerOrder shipping billing date ship2Add date bill2Add schema street address province postcode street address state zip Address_ca.xsdAddress_us.xsd Model schemas as trees Reference Importing and Inclusion Recursion
10
10 Information excluded in Modelling Related to elements or attributes Default value, value range, unique, nullable… Related to structure Sequence All Choice name first last name last first Model schemas as trees
11
11 Computing node similarity Computing name similarity with the help of: WordNet and its API String matching Hungarian method Add the similarity of other information Data type Minimum cardinality Maximum cardinality Node similarity
12
12 Name similarity from token lists Tokenize names E.g. clientName -> client name submittedReports -> submit report Similarity between two token lists Using Hungarian method for Weighted Bipartite Graph Matching (WBGM) sim i,j sim 0,0 customer delivery address client require shipping address customerDeliveryAddress vs.clientRequiredShippingAddress Node similarity
13
13 Determine the structural relation Tree 1Tree 2 Structure similarity
14
14 Common substructure car make model year color driver firstName lastName license make car model year color driver first last license Structure similarity
15
15 Approximate Common Structure car make model year color driver firstName lastName license make car model year color driver first last license Structure similarity
16
16 Mappings in an ACS car make model year color driver first (firstName) last (lastName) license m ACS1 = {(s1.car, s2.car), (s1.make, s2.make), (s1.year, s2.year), (s1.color, s2.color)} m ACS2 = {(s1.dirver, s2.driver), (s1.fist, s2.firstName), (s1.last, s2.lastName), (s1.license, s2.license)} ACS1 ACS2 Structure similarity
17
17 Evaluation Criteria Matching outcomes Mappings Schema similarity Execution time Collected four groups of Schemas Purchase orders used in COMA (5) Large schemas from XML.org (86) Schemas on hospitality domain (95) Extract from WSDL (419) Evaluation
18
18 Comparison with edit distance algorithm element mapping on data group 1 Evaluation Method 1: our algorithm Method 2: edit distance
19
19 Comparison with edit distance: schema similarity data group 3 and 4 Evaluation Method 1: our algorithm Method 2: edit distance
20
20 Comparison with edit distance: performance on data group 2 Evaluation Method 1: our algorithm Method 2: edit distance
21
21 Comparison with COMA (Mapping) COMA – 'All'COMA – 'All+SchemaM'Our algorithm Precision about 0.95about 0.930.88 Recall about 0.78about 0.890.87 Overall 0.730.820.75 Overall is a measure that combines precision and recall. It reflects the efforts of removing incorrect mappings and adding missing ones. Evaluation
22
22 Conclusion Scalable schema matching Wang Lian, David W. Cheung, Nikos Mamoulis, and Siu-Ming Yiu, An Efficient and Scalable Algorithm for Clustering XML Documents by Structure, TKDE, 2005. Subtyping Apply to web service matching
23
23 Web service synthesis
24
24 Web Service Composition Composite web service: “service implemented by combining the functionality provided by other web services” –G. Alonso et al. Web service composition: the process of developing a composite web service Approaches to web service composition: Conventional programming languages, such as Java, C#; Web service composition languages, such as BPEL; Workflow, pi-calculus, petri net, automata… Web service synthesis. composition
25
25 Web Service Synthesis BPEL and the like are still programming languages They describe exactly how to compose the web services. Web service synthesis We describe what is the service. But don’t describe how to implement it; We don’t even know what are the component services involved; The relevant services are discovered and invoked dynamically; The implementation is synthesized from the web service specification, automatically. Program synthesis has a long history. composition
26
26 Web Service Synthesis WS Syntactic Specification (WSDL) Semantic Specification (Datalog) Service Implementation Service Specification (WSDL/Datalog) WS2 WS1 WS Service Implementation (BPEL) composition
27
27 Syntactic specification: … Semantic Specification: chapters(ISBN, PRICE, TITLE, AUTHOR) <- Chapters(ISBN, PRICE), Book1(TITLE, ISBN, AUTHOR). Synthesis Example Service specification Syntactic: Interface definition defined by WSDL Semantic: Q(ISBN, PRICE, TITLE, RATE) <- Chapters(ISBN, PRICE), Book1(TITLE, ISBN, AUTHOR), Book2(ISBN, COMMENT, RATE). Service Implementation Java code, database Service Specification Syntactic specification: WSDL file Semantic Specification: amazon(ISBN, PRICE, RATE, TITLE, AUTHOR) <- Amazon(ISBN, PRICE), Book1(TITLE, ISBN, AUTHOR), Book2(ISBN, COMMENT, RATE). Chapters amazon MetaSearchService ?? MetaSearchService Implementation composition
28
28 Generate the abstract implementation by query rewriting Syntactic specification: … Semantic Specification: chapters(ISBN, PRICE, TITLE, AUTHOR) <- Chapters(ISBN, PRICE), Book1(TITLE, ISBN, AUTHOR). Service specification Syntactic: Interface definition defined by WSDL Semantic: Q(ISBN, PRICE, TITLE, RATE) <- Chapters(ISBN, PRICE), Book1(TITLE, ISBN, AUTHOR), Book2(ISBN, COMMENT, RATE). Service Implementation Java code, database Service Specification Syntactic specification: WSDL file Semantic Specification: amazon(ISBN, PRICE, RATE, TITLE, AUTHOR) <- Amazon(ISBN, PRICE), Book1(TITLE, ISBN, AUTHOR), Book2(ISBN, COMMENT, RATE). Chapters amazon MetaSearchService Q(ISBN, PRICE, TITLE, RATE) <- amazon(ISBN, PRICE, RATE, TITLE', AUTHOR'), chapters(ISBN, PRICE0, TITLE, AUTHOR). MetaSearchService Abstract Implementation composition
29
29 Generate the Concrete Implementation Syntactic specification: … Semantic Specification: chapters(ISBN, PRICE, TITLE, AUTHOR) <- Chapters(ISBN, PRICE), Book1(TITLE, ISBN, AUTHOR). Service specification Syntactic: Interface definition defined by WSDL Semantic: Q(ISBN, PRICE, PRICE0, TITLE, RATE) <- … Service Implementation Java code, database Service Specification Syntactic specification: WSDL file Semantic Specification: amazon(ISBN, PRICE, RATE, TITLE, AUTHOR) <- Amazon(ISBN, PRICE), Book1(TITLE, ISBN, AUTHOR), Book2(ISBN, COMMENT, RATE). Chapters amazon MetaSearchService Q(ISBN, PRICE, PRICE0, TITLE, RATE) <- amazon(ISBN, PRICE, RATE, TITLE', AUTHOR'), chapters(ISBN, PRICE0, TITLE, AUTHOR). MetaSearchService Abstract Implementation Invoke amazon; Invoke chapters; Combine the output; MetaSearchService Concrete Implementation composition
30
30 It is a lightweight approach… Web services are restricted to be database queries or functions that can be described by database queries or Datalog; Semantic specification is Datalog instead of more powerful specification mechanism employing ontology; Compositions are restricted to data composition instead of full-blown process specification such as BPEL. All those choices are meant for the construction of a practical web service synthesis system… composition
31
31 Mapping between Datalog and Web Services Database vendors also provide wrappers for web services Behind a web service there is a SQL query that corresponds to the web service; SQL defines the semantics of the web service. Major database vendors support the mapping between SQL and Web service; We experimented with DB2WS. Malaika, S. et al. DB2 and Web Services. IBM System Journal, 41(4), pp. 666- 685. 2002. composition
32
32 Generate the Abstract Implementation by Query rewriting Definition: Given a query Q and a set of views V. A rewriting of Q using V is a query Q’ such that Q=Q’, and Q’ refers to one or more views in V. Q T1, T2, T3. Query: Views: Rewriting 2: Q V1, V2. Rewriting 1: Q V1, T3. V1 T1,T2. V2 T2,T3. composition
33
33 Our query rewriting system composition
34
34 Limitations of our approach Focus on database web services; Datalog is not expressive enough. Query rewriting in Description Logic, or OWL. Assume the existence of global database schemas: Service providers need to provide the semantic definition of web services in terms a global database schema; New service specification is also defined using the common schema Schema matching composition
35
35 Other threads Web service collection and clustering From UDDI, Crawler, Search engines such as Google Master thesis to be finished this summer Web service metrics Schema subtyping Based on regular tree grammar Master thesis to be finished this summer Bottom up web service composition Semantic web service
36
36 Service Oriented Architecture Discovery agency ProviderRequester interact find publish
37
37 Web service discovery Keywords search Based on IR techniques, such as vector space model Fast, but not accurate Signature matching Decide subtype relations between input and output of web services Used in service composition, to find composable web services Relaxed matching Approximate matching, allowing small deviations in both structure and words/tags Semantic matching Matching functional requirements of web services Used in adaptive, autonomous systems
Similar presentations
© 2025 SlidePlayer.com Inc.
All rights reserved.