Presentation is loading. Please wait.

Presentation is loading. Please wait.

Web Services and Integration/Mediation Zachary G. Ives University of Pennsylvania CIS 455 / 555 – Internet and Web Systems March 4, 2008.

Similar presentations


Presentation on theme: "Web Services and Integration/Mediation Zachary G. Ives University of Pennsylvania CIS 455 / 555 – Internet and Web Systems March 4, 2008."— Presentation transcript:

1 Web Services and Integration/Mediation Zachary G. Ives University of Pennsylvania CIS 455 / 555 – Internet and Web Systems March 4, 2008

2 2 Today  Reminder HW2 Milestone 2 due Thursday  Distributed programming, concluded: RPC and Web Services  Then onward:  How we can translate between structured data formats  Mediators and information integration

3 Some Common Modes of Building Distributed Applications Data-intensive:  XQuery (fetch XML from multiple sites, produce new XML)  Turing-complete functional programming language  Good for Web Services; not much support for I/O, etc.  MapReduce (built over DHT or distributed file system)  Single filter (map), followed by single aggregation (reduce)  Languages over it: Sawzall, Pig Latin, Dryad, … Message passing / request-response:  e.g., over a DHT, sockets, or message queue  Communication via asynchronous messages  Processing in message handler loop Function calls:  Remote procedure call / remote method invocation 3

4 4 How RPC Generally Works  You write an application with a series of functions  One of these functions, F, will be distributed remotely  You call a “stub generator”  A caller stub emulates the function F:  Opens a connection to the server  Requests F, marshalling all parameters  Receives F’s return status and parameters  A server stub emulates the caller:  Receives a request for F with parameters  Unmarshals the parameters, invokes F  Takes F’s return status (e.g., protection fault), return value, and marshals it back to the client

5 5 Passing Value Parameters  Steps involved in doing remote computation through RPC 2-8

6 6 RPC Components  Generally, you need to write:  Your function, in a compatible language  An interface definition, analogous to a C header file, so other people can program for F without having its source  Generally, software will take the interface definition and generate the appropriate stubs (In the case of Java, RMIC knows enough about Java to run directly on the source file)  The server stubs will generally run in some type of daemon process on the server  Each function will need a globally unique name or GUID

7 7 Parameter Passing Can Be Tricky Because of References  The situation when passing an object by reference or by value. 2-18

8 8 What Are the Hard Problems with RPC? Esp. Inter-Language RPC?  Resolving different data formats between languages (e.g., Java vs. Fortran arrays)  Reliability, security  Finding remote procedures in the first place  Extensibility/maintainability  (Some of these might look familiar from when we talked about data exchange!)

9 9 Web Services  Goal: provide an infrastructure for connecting components, building applications in a way similar to hyperlinks between data  It’s another distributed computing platform for the Web  Goal: Internet-scale, language-independent, upwards-compatible where possible  This one is based on many familiar concepts  Standard protocols: HTTP  Standard marshalling formats: XML-based, XML Schemas  All new data formats are XML-based

10 10 Three Parts to Web Services 1.“Wire” / messaging protocols  Data encodings, RPC calls or document passing, etc. 2.Describing what goes on the wire  Schemas for the data 3.“Service discovery”  Means of finding web services

11 11 The Protocol Stacks of Web Services Enhanced + expanded from a figure from IBM’s “Web Services Insider”, http://www-106.ibm.com/developerworks/webservices/library/ws-ref2/ Other extensions SOAP Attachments WS-Security WS-AtomicTransaction, WS-Coordination SOAP, XML-RPC XML XML Schema Service Description (WSDL) Service Capabilities (WS-Capability) Message Sequencing Orchestration (WS-BPEL) Inspection Directory (UDDI) Wire Format Stack Discovery Stack Description Stack WS-Addressing High-level state transition + msging diagrams between modules

12 12 Messaging Protocol: SOAP  Simple Object Access Protocol: XML-based format for passing parameters  Has a SOAP header and body inside an envelope  As a defined HTTP binding (POST with content-type of application/soap+xml)  A companion SOAP Attachments encapsulates other (MIME) data  The header defines information about processing: encoding, signatures, etc.  It’s extensible, and there’s a special attribute called mustUnderstand that is attached to elements that must be supported by the callee  The body defines the actual application-defined data

13 13 A SOAP Envelope 12

14 14 Making a SOAP Call  To execute a call to service PlaceOrder: POST /PlaceOrder HTTP/1.1 Host: my.server.com Content-Type: application/soap+xml; charset=“utf-8” Content-Length: nnn …

15 15 SOAP Return Values  If successful, the SOAP response will generally be another SOAP message with the return data values, much like the request  If failure, the contents of the SOAP envelop will generally be a Fault message, along the lines of: SOAP-ENV:Client Could not parse message …

16 16 How Do We Declare Functions?  WSDL is the interface definition language for web services  Defines notions of protocol bindings, ports, and services  Generally describes data types using XML Schema  In CORBA, this was called an IDL  In Java, the interface uses the same language as the Java code

17 17 A WSDL Service Service Port PortType Operation PortType Operation PortType Operation Binding

18 18 Web Service Terminology  Service: the entire Web Service  Port: maps a set of port types to a transport binding (a protocol, frequently SOAP, COM, CORBA, …)  Port Type: abstract grouping of operations, i.e. a class  Operation: the type of operation – request/response, one-way  Input message and output message; maybe also fault message  Types: the XML Schema type definitions

19 19 Example WSDL

20 20 JAX-RPC: Java and Web Services  To write JAX-RPC web service “endpoint”, you need two parts:  An endpoint interface – this is basically like the IDL statement  An implementation class – your actual code public interface BookQuote extends java.rmi.Remote { public float getBookPrice(String isbn) throws java.rmi.RemoteException; } public class BookQuote_Impl_1 implements BookQuote { public float getBookPrice(String isbn) { return 3.22; } }

21 21 Different Options for Calling  The conventional approach is to generate a stub, as in the RPC model described earlier  You can also dynamically generate the call to the remote interface, e.g., by looking up an interesting function to call  Finally, the “DII” (Dynamic Instance Invocation) method allows you to assemble the SOAP call on your own

22 22 Creating a Java Web Service  A compiler called wscompile is used to generate your WSDL file and stubs  You need to start with a configuration file that says something about the service you’re building and the interfaces that you’re converting into Web Services

23 23 Example Configuration File

24 24 Starting a WAR  The Web Service version of a Java JAR file is a Web Archive, WAR  There’s a tool called wsdeploy that generates WAR files  Generally this will automatically be called from a build tool such as Ant  Finally, you may need to add the WAR file to the appropriate location in Apache Tomcat (or WebSphere, etc.) and enable it  See http://java.sun.com/developer/technicalArticles/WebServices/ WSPack2/jaxrpc.html for a detailed example http://java.sun.com/developer/technicalArticles/WebServices/ WSPack2/jaxrpc.html

25 25 Finding a Web Service  UDDI: Universal Description, Discovery, and Integration registry  Think of it as DNS for web services  It’s a replicated database, hosted by IBM, HP, SAP, MS  UDDI takes SOAP requests to add and query web service interface data

26 26 What’s in UDDI White pages:  Information about business names, contact info, Web site name, etc. Yellow pages:  Types of businesses, locations, products  Includes predefined taxonomies for location, industry, etc. Green pages – what we probably care the most about:  How to interact with business services; business process definitions; etc  Pointer to WSDL file(s)  Unique ID for each service

27 27 Data Types in UDDI  businessEntity: top-level structure describing info about the business  businessService: name and description of a service  bindingTemplate: how to access the service  tModel (t = type/technical): unique identifier for each service-template specification  publisherAssertion: describes relationship between businessEntities (e.g., department, division)

28 28 Relationships between UDDI Structures publisherAssertion businessEntity businessServicebindingTemplate tModel n 2 1 n 1n m n

29 29 Example UDDI businessEntity http://uddi.ibm.com/registery/uddiget?businessKey=0123http://uddi.ibm.com/registery/uddiget?businessKey=0123... My Books Technical Book Wholesaler … … <!– keyedReferences to tModels  …

30 30 UDDI in Perspective  Original idea was that it would just organize itself in a way that people could find anything they wanted  Today UDDI is basically a very simple catalog of services, which can be queried with standard APIs  It’s not clear that it really does what people really want: they want to find services “like Y” or “that do Z”

31 31 The Problem: With UDDI and Plenty of Other Situations There’s no universal, unambiguous way of describing “what I mean”  Relational database idea of “normalization” doesn’t convert concepts into some normal form – it just helps us cluster our concepts in meaningful ways  “Knowledge representation” tries to encode definitions clearly – but even then, much is up to interpretation The best we can do: describe how things relate  pollo = chicken = poulet = 雞 = 鸡 = j ī = मुर्गी = murg  Note that this mapping may be imprecise or situation-specific!  Calling someone a chicken, vs. a chicken that’s a bird

32 32 This Brings Us Back to XQuery, Whose Main Role Is to Relate XML Suppose we define an XML schema for our target data and our source data A view is a stored query  Function from a set of (XML) sources to an XML output  In fact, in XQuery, a view is actually called a function Can directly translate between XML schemas or structures  Describes a relationship between two items  Transform 2 into 6 by “add 4” operation  Convert from S1 to S2 by applying the query described by view V Often, we don’t need to transfer all data – instead, we want to use the data at one source to help answer a query over another source…

33 33 Lazy Evaluation: A Virtual View Source2.xml Source1.xml Virtual XML doc. XQuery Query Form Browser/App Server(s) Query Results XQuery Source2.xml Source1.xml Composed XQuery HTML XSLT

34 34 Let’s Look at Some Simple Mappings  Beginning with examples of using XQuery to convert from one schema to another, e.g., to import data  First: let’s review what our XQuery mappings need to accomplish…

35 35 Challenges of Mapping Schemas In a perfect world, it would be easy to match up items from one schema with another  Each element would have a simple correspondence to an element in the other schema  Every value would clearly map to a value in the other schema Real world: as with human languages, things don’t map clearly!  Different decompositions into elements  Different structures  Tag name vs. value  Values may not exactly correspond  It may be unclear whether a value is the same It’s a tough job, but often things can be mapped

36 36 Example Schemas Bob’s Movie Database … … … … … * * Mary’s Art List … … … … … * Want to map data from one schema to the other

37 37 Mapping Bob’s Movies  Mary’s Art Start with the schema of the output as a template: $i $y $a $s $t Then figure out where to find the values in the source, and create XPaths

38 38 The Final Schema Mapping Mary’s Art  Bob’s Movies for $m in doc(“movie.xml”)//movie, $a in $m/director/text(), $i in $m/title/text(), $t in $m/title/text() return $i movie $a $t Note the absence of subject… We had no reasonable source, so we are leaving it out.

39 39 Mapping Values  Sometimes two schemas use different representations for the same thing  ID  SSN  English  Hungarian  We typically use an intermediate table defining correspondences – a “concordance table”  It can be generated automatically, and then corrected by hand (since there will often be exceptions)

40 40 An Example Value Mapping Problem Penn student enrollment DB: … 12346 Mary McDonald F03 cse330 12345 Jon Doh Penn dental plan: 323-468-1212 Dental sealant Want to output student names + treatments…

41 41 Translating Values with a Concordance Table return { { $n } { $tr }

42 42 Translating Values with a Concordance Table for $p in doc (“student.xml”) /db/student, $pid in $p/pennid/text(), $n in $p/name/text(), $m in doc (“concord.xml”) /db/mapping, $f in $m/from/text(), $t in $m/to/text(), $d in doc(“dental.xml”)/db/patient, $s in $d/ssn/text(), $tr in $d/treatment/text() where ____________________ return { { $n } { $tr } student.xml: 12346 Mary McDonald F03 cse330 $pid: PennID $n: name

43 43 Translating Values with a Concordance Table for $p in doc (“student.xml”) /db/student, $pid in $p/pennid/text(), $n in $p/name/text(), $d in doc(“dental.xml”)/db/patient, $s in $d/ssn/text(), $tr in $d/treatment/text(), $m in doc (“concord.xml”) /db/mapping, $f in $m/from/text(), $t in $m/to/text() where ____________________ return { { $n } { $tr } student.xml: 12346 Mary McDonald F03 cse330 dental.xml: 323-468-1212 Dental sealant $pid: PennID $n: name $s: ssn $tr: treatment

44 44 Translating Values with a Concordance Table for $p in doc (“student.xml”) /db/student, $pid in $p/pennid/text(), $n in $p/name/text(), $d in doc(“dental.xml”)/db/patient, $s in $d/ssn/text(), $tr in $d/treatment/text(), $m in doc (“concord.xml”) /db/mapping, $f in $m/from/text(), $t in $m/to/text() where ____________________ return { { $n } { $tr } student.xml: 12346 Mary McDonald F03 cse330 dental.xml: 323-468-1212 Dental sealant concord.xml: 12346 323-468-1212 $pid: PennID $n: name $s: ssn $tr: treatment $f: PennID $t: ssn

45 45 Translating Values with a Concordance Table for $p in doc (“student.xml”) /db/student, $pid in $p/pennid/text(), $n in $p/name/text(), $d in doc(“dental.xml”)/db/patient, $s in $d/ssn/text(), $tr in $d/treatment/text(), $m in doc (“concord.xml”) /db/mapping, $f in $m/from/text(), $t in $m/to/text() where ____________________ return { { $n } { $tr } student.xml: 12346 Mary McDonald F03 cse330 dental.xml: 323-468-1212 Dental sealant concord.xml: 12346 323-468-1212 $pid: PennID $n: name $s: ssn $tr: treatment $f: PennID $t: ssn

46 46 Drawbacks to Point-to-Point Mappings  They can get data from one source to another, but what if you want to see elements that aren’t shared?  Painful to create n 2 mappings…  Sometimes we don’t actually want to ship the data from one source to another, but to see both  We don’t want to put Barnes & Noble’s inventory INTO Amazon’s – but we want to see books from both  This leads us to a “mediator” approach…

47 47 Data Integration and Warehousing  Create a middleware “mediator” or “data integration system” over the sources  All sources are mapped to a common “mediated schema”  Warehouse approach actually has a central database, and load data from the sources into it  Virtual approach has just a schema – it consults sources to answer each query  The mediator accepts queries over the central schema and returns all relevant answers

48 48 Data Integration System / Mediator Typical Data Integration Components Mediated Schema Wrapper Source Data Query-based Schema Mappings in Catalog Source Catalog QueryResults

49 Mediator / Virtual Integration Systems  The subject of much research since the 80s and especially 90s  Examples: TSIMMIS, Information Manifold, MIX, Garlic, …  Original focus on Web  Real-world integration companies (IBM, BEA/Oracle, Actuate, …) are focusing on the enterprise – more $$$!  A common model (exemplified by TSIMMIS, Garlic):  Take the source data  Define a schema mapping that produces content for the mediated schema, based on the source data  The data for the mediated schema is the “union” of all of the mappings 49

50 50 Answering Queries in TSIMMIS Based on view unfolding: composing a query and view  The query is being posed over the mediated schema for $b in document(“dblp.xml”)/root/book where $b/title/text = “Distributed Systems” and $b/author/text() = “Tanenbaum” return $b  Wrappers are responsible for converting data from the source into a subset of the mediated schema for $c in sql(“select author,year,title from CISbook’”) return { $c/* }

51 51 The Mediated Schema as a Union of Views from Wrappers  Wrappers have names, some sort of output schema: define function GetCISBooks() as book* { for $c in sql(“select author,year,title from CISbook’”) return { $c/* } }  This gets “unioned” with output from other results: return { { GetCISBooks() } { GetEEBooks() } } book authoryeartitle

52 52 How to Answer the Query Given our query: for $b in document(“dblp.xml”)/root/book where $b/title/text() = “Distributed Systems” and $b/author/text() = “Tanenbaum” return $b We want to find all wrapper definitions that output the right structure to match our query  Book elements with titles and authors (and any other attributes)

53 53 Query Composition with Views  We find all views that define book with author and title, and we compose the query with each of these  In our example, we find one wrapper definition that matches: define function GetCISBooks() as book* { for $b in sql(“select author,year,title from CISbook’”) return { $b/* } } for $b in document(“mediated-schema”)/root/book where $b/title/text() = “Distributed Systems” and $b/author/text() = “Tanenbaum” return $b return { { GetCISBooks() } … }

54 Making It Work for $b in doc (“…”)/root/book where $b/title/text() = “Dist. Systems” and $b/author/text() = “Tanenbaum” return $b 54 book authoryeartitle root authoryeartitle $c $c/author$c/year$c/title

55 55 The Final Step: Unfolded View The query and the view definition are merged (the view is “unfolded”), yielding, e.g.: for $b in sql(“select author,title,year from CISbook where author=‘Tanenbaum’”) where $b/title/text() = “Distributed Systems” return $b

56 56 Summary: Mapping, Integrating, and Sharing Data  Based on XQuery rather than XSLT  “Views” (in XQuery, functions) as the bridge between schemas  Joins and nesting are important in creating these views  Can do point-to-point mappings to exchange data  Very common approach: mediated schema or warehouse  Create a central schema – may be virtual  Map sources to it  Pose queries over this  UDDI versus this approach?  What about search and its relationship to integration? In particular, search over Amazon, Google Maps, Google, Yahoo, …

57 57 Next Time…  We’ll start looking at information retrieval, which is the basis of Web search


Download ppt "Web Services and Integration/Mediation Zachary G. Ives University of Pennsylvania CIS 455 / 555 – Internet and Web Systems March 4, 2008."

Similar presentations


Ads by Google