Presentation is loading. Please wait.

Presentation is loading. Please wait.

Architecting Scientific Data Services Using XML, SOAP, & the Grid

Similar presentations


Presentation on theme: "Architecting Scientific Data Services Using XML, SOAP, & the Grid"— Presentation transcript:

1 Architecting Scientific Data Services Using XML, SOAP, & the Grid
Brian Wilson, Tom Yunck, Gerald Manipon, and Dominic Mazzoni Jet Propulsion Laboratory

2 “On-Demand” Virtualization A Conceptual Grid Storage CPU Scientist
Portal Data Algorithms Networks

3 Vision Open Your Databases!
Support Query on and Retrieval of ALL Metadata Could support remote SQL queries on your entire database. Distributed/Local Metadata - not centralized as in ECHO Support XML formats with schemas & OWL/RDF semantics Examples: Amazon & Google SOAP Web services Open Your Architectures – Modularity, Reuse Think in XML!! (design in XML first, not Java objects) Expose algorithms & application modules via SOAP Massive distributed computing via XML message exchange Reuse modules from OpenDAP/DODS and WMS/WCS Web Services Choreography / Grid Dataflow Leverage WS-Coordination, -Context, -Security, etc. Leverage WS-Resource Framework (WSRF, Globus v4) Semantic Web Reasoning on top of WS substrate Carbon Cycle

4 Outline Web Services Classifying Data Services WS Standards
Using XML, SOAP, & WSDL Thinking in XML: REST versus SOAP Architecting Data Query/Access Services using SOAP SOAP as a Universal Interoperability Glue Classifying Data Services Dimensions / Categories Interop between SOAP, OpenDAP, & OGC WMS/WCS Semantic Web Support Data Discovery to Inventory Level -- Enabling SOAP services WS Standards WS-Coordination, -Notification, -Security, -ReliableMessaging Convergence of Web & Grid Standards WS-Resource Framework (WSRF) Globus Toolkit v4 Carbon Cycle

5 Three Generations of the Web
1st: Static HTML pages with Pictures! Hyperlinks: click and jump! Easy authoring of text with graphics (HTML layout). Killer App: Having your own home page is hip. Cons: Too static, One-way communication. 2nd: Dynamic HTML with streaming audio & video Browser as an all-purpose, ubiquitous user interface. Fancy clients using embedded Java applets. Killer App: Fill out your time card on-line. Cons: Applets clunky, ease of authoring disappears, information is still HTML (semi-structured). 3rd: SOAP-based Web Computing & Semantic Web Exchange structured data in XML format (no fragile HTML); semantics (“meaning”) kept with data. Programmatic interfaces rather than just GUI for a human. Killer Apps: Grid Computing, automated data processing. Carbon Cycle

6 What is Simple Object Access Protocol (SOAP)?
Distributed Computing by Exchange of XML Messages Lightweight, Loosely-Coupled API Programming language independent (unlike Java RMI) Transport protocol independent Multiple Transport Protocols Possible HTTP or HTTPS (HTTP using SSL encryption) POST (SMTP), Instant Messaging (MQSeries, Jabber) Store & Forward Reliable Messaging Services (Java JMS) Even Peer-to-Peer (P2P) protocols SOAP Toolkits for all languages Apache Axis for Java, modules for python/perl, Visual C# .Net. Web Service Description Language (WSDL) Generate call to a service automatically from WSDL document. Data types and formats expressed in XML schema. Publish services in UDDI catalogs for automated discovery. Carbon Cycle

7 Third Generation of the Web
SOAP-based Web Computing & Semantic Web Exchange structured data in XML format (not HTML) Semantics or “meaning” kept with the data Emphasize programmatic interfaces Grid Services – WSRF and Globus Toolkit v4 Leverage many WS-* standards (Coordination, etc.) Simple Object Access Protocol (SOAP) Distributed Computing by Exchange of XML Messages Lightweight, Loosely-Coupled API Programming language independent Multiple Transport Protocols Possible (HTTP, P2P) Web Services Description Language (WSDL) Publish Services in catalogs for automated discovery Carbon Cycle

8 Why use SOAP to create scientific services?
New paradigm of loosely-coupled distributed computing SOAP messaging is exploding in the business world. Used in Grid Computing: WSRF and Globus Toolkit v4 Asynchronous workflow (return results when available) Leverage XML standards (WSDL, UDDI, WS-*) Service API is XML messages, not code. Data types and formats expressed in XML schema. Exchanging large scientific data sets Small objects in XML format Even medium-size objects (2D grid slices) can be in XML. Large objects as references (ftp or http URL) to HDF-EOS or netCDF binary containers (files). Security & authentication less important in science Use HTTPS, Basic (user/password) Authentication, or WS-Security & WS-Authentication Carbon Cycle

9 SOAP Software Toolkits
Java (free) Install Java 1.4, Tomcat 4.1, and Apache Axis 1.0 Optionally install WSDL4J, UDDI4J, & WSIF from IBM Configure Tomcat & Axis: classpaths, etc. Perl (free) Install perl 5.6 & SOAP::Lite module (contains UDDI::Lite) Python (free) Install python 2.2 and SOAP.py Optionally install UDDI4py from IBM DotNet (not free, but cheap) Install Visual Studio .Net with C# or VB compilers Send a payment to MS C++ (some free libraries) But more work required

10 Toolkits Do the Work Transparent Serialization
Java JDBC ResultSet in the server -> XML format -> Java ResultSet Object in the client Or ResultSet -> C# DataTable Object in a C# client Or ResultSet -> Perl 2D array or hash in a perl client Custom Serialization Possible Write a custom serializer for any object Automatic Deployment Axis: Place Java code in the deployment directory and the service is magically exposed. Can customize deployment using WSDD C# .Net: Simply annotate the function as a [Web Method] Perl & Python: A few lines of code. Security & Authentication: Use capabilities of Apache

11 Python Client Example Interoperable SOAP Web Services
#!/usr/local/bin/python # soap_sql.py from SOAPpy import SOAP #import SOAP db = sys.stdin.readline() lines = sys.stdin.readlines() sql = string.join(lines, '\n') server = SOAP.SOAPProxy( namespace = "urn:Genesis_SQL", encoding = None) print server.executeQuery(SOAP.stringType(sql), SOAP.stringType(db)) Input: jplsacc select * from L2data where datasetid == ' _1350sac_g43_1p0' and datatype == ‘gpsProfile’

12 REST versus SOAP REpresentational State Transfer (REST)
Roy Fielding’s Ph.D. Thesis Everything is an object or document with a URI. Problems: Hidden state: cookies, sessions, CGI db, etc. What about virtual datasets? Need to generate too many URI’s. Service Oriented Architecture (SOA) Everything is a service. What about stateful services?  WS-Resource Same Old Tradeoff Like object-oriented programming versus functional programming (need both). It’s both an object and a service.

13 Amazon E-Commerce Services
Item Search – SOAP Service input <SubscriptionId>[Your SOAP key here]</> <Request> <Keywords>free lunch hoagie</> <SearchIndex>Books</> </Request> WSDL File Returns XML Document Contains TotalResults count and a list of Items Each item contains many attributes (title, authors, ASIN, etc.)

14 Amazon E-Commerce Services
Also support REST paradigm for request Service=AWSECommerceService &Operation=ItemSearch &SubscriptionId=[Your SOAP Key here] &Keywords=free%20lunch%20hoagie &SearchIndex=Books &ResponseGroup=Medium Response is same XML Document With results tailored by ResponseGroup(s) Why are they opening their database up? Allow 3rd parties to build apps on top of db Build partnerships Promulgate their software architecture/standard Goal: Amazon lookup part of every business web app. Makes sense even for commercial entities Certainly best for scientific data archives

15 Example: General SQL Query Service
Using Informix Object-Relational Database Contains all Level 2 GPS occultation profile retrievals Multiple parameters (temperature, water vapor pressure, refractivity, & pressure) as a function of altitude GeoSpatial Datablade provides 4D R-tree index on time, latitude, longitude, & altitude. Support “nearest” semantics (multiple hits sorted by distance) GetNearest service & BatchGetNearest (batch of requests) Implemented using Informix ORDBMS & Geodetic DataBlade SQL queries using 4D R-tree index on time, lat, lon, height executeQuery SOAP Service Two input arguments Database to search Text of arbitrary SQL Query Single output SQL ResultSet in XML format

16 Output of the executeQuery Service
version="1.0" encoding="utf-8" ?> <?xml-stylesheet type="text/xsl" href=" <ResultSet> <Result> <datasetid> _1350sac_g43_1p0</datasetid> <datatype>gpsProfile</datatype> <height> </height> <lat> </lat> <lon> </lon> <refractivity> </refractivity> <temperature> </temperature> <pressure> </pressure> <wvpressure> </wvpressure> </Result> <height> </height> <lat> </lat> <lon> </lon> <refractivity> </refractivity> <temperature> </temperature> <pressure> </pressure> <wvpressure> </wvpressure> </Result> . . .

17 executeQuery SOAP Message
Accept: text/xml Accept: multipart/* Content-Length: 663 Content-Type: text/xml; charset=utf-8 SOAPAction: "urn:Genesis_SQL#executeQuery" <?xml version="1.0" encoding="UTF-8"?> <SOAP-ENV:Envelope xmlns:SOAP-ENC=" xmlns:SOAP-ENV=" xmlns:xsd=" xmlns:xsi=" SOAP-ENV:encodingStyle=" <SOAP-ENV:Body> <namesp1:executeQuery xmlns:namesp1="urn:Genesis_SQL"> <c-gensym3 xsi:type="xsd:string">select * from L2data where datasetid == ' _1350sac_g43_1p0’ and datatype == 'gpsProfile' </c-gensym3> <c-gensym5 xsi:type="xsd:string">jplsacc</c-gensym5> </namesp1:executeQuery> </SOAP-ENV:Body> </SOAP-ENV:Envelope>

18 executeQueryResponse SOAP Message
<?xml version="1.0" encoding="UTF-8"?> <soapenv:Envelope xmlns:soapenv=" xmlns:xsd=" xmlns:xsi=" <soapenv:Body> <ns1:executeQueryResponse soapenv:encodingStyle= " xmlns:ns1="urn:Genesis_SQL"> <ns1:executeQueryReturn xsi:type="ns2:string" xmlns:xsi=" xmlns:ns2=" [Returned XML doc here, encoded] </ns1:executeQueryReturn> </ ns1:executeQueryResponse> </ soapenv:Body> </ soapenv:Envelope>

19 History of Distributed Computing History of Programming Languages
CORBA Specialized DCE DCOM SOAP Time Productivity / Ease of Use XML-RPC Python Perl Java Java RMI Productivity / Ease of Use C++ Object Pascal Turbo Pascal C Fortran Time

20 Sad History of Function Call Interfaces
Fortran – partially typed Positional arguments only, by address, partially typed C/Pascal – fully typed Added varargs for unknown number of arguments But used pointers to return multiple results C++ Added function/operator overloading Java Pointers hidden (yes!), but lost operator overloading Perl, Python, IDL – func(arg1, arg2, *array, **hash) Added implicit array to catch variable number of args Added implicit hash to catch named args XML – input & output are hierarchical XML documents Flexible interface since can ignore unneeded tags Hierarchy & dynamic typing imply rich interface Use XQuery/XPath to extract/update needed tags

21 Thinking in XML Design the XML metadata and data representations first. Add an ontology. Then create objects in your favorite programming language. Then create objects in more languages for broader use. 4DRectilinearGrid datatype <4DRectilinearGrid> <Latitudes> <Series>-90, 90, 2 <Longitudes> <URL> <Altitudes> <Times> <Variable> <Name> <Unit> <MissingValueMarker> </Variable> <Grid> <URL> </Grid> </4DRectilinearGrid>

22 New Paradigm for Scientific Computing
Use two or more channels XML Messaging on one Binary data protocol (like OpenDAP) on the other Two channels can use different transports XML Messaging over HTTP or dark P2P OpenDAP over LambdaRail or scalable P2P caching network XML Messaging Channel for: Control, Service Chaining, Workflow, Metadata Exchange Configuration of Binary channel Binary Data Channel(s) Out-of-band from XML point of view Bulk data transfer, replication, or caching Could be multi-protocol (OpenDAP, GridFtp, LambdaGrid)

23 Data Query/Access Services
Services Chain goes beyond simple data access Time & Geolocation Query  yielding data inventory Query on Quality Flags General Metadata Query  catalog satisfying conditions Data Access - by object or granule ID and variable name Data Slicing - as in OpenDAP Data Subsetting – by lat, lon, alt, & time ranges Parameter Subsetting – select only desired physical variables Data Reformatting – choose output format as in WMS/WCS On-Demand Grid Computations – grid diff in GraDS/DODS Variable Bundles – external metadata from scientist Return Composite Data Objects 4DRectilinearGrid, 4DCurvilinearGrid (swath) XML & Native Binary representations User Composites – add custom object to data model Semantic Support Use generic variable names Reason about variables to be bundled

24 Classifying Data Services - Dimensions
Services Provided Is it discovery, query, access, and/or subsetting? Degree of Metadata Support Richness of the Data Model Request Interface / Protocol Response Interface / Protocol Representation of Large Binary Data Objects Transport Layers (http, GridFtp, p2p, LambdaRail) Permanent Names & Version Support Service Chaining Semantic Web Support Interoperability with other services Maturity, Support, Open or Proprietary, Cost

25 Classifying Data Services (I)
Services Provided Is it discovery, query, access, and/or subsetting? Taxonomy on earlier slide Degree of Metadata Support Time & Geolocation Query? General Metadata Query? Retrieve selected metadata fields? Bind external metadata into the dataset? Richness of the Data Model Supports point, swath, grid, and vector data types? Is the data model exposed to the user? Interoperability? (i.e., between OpenDAP, OGC, HDF-EOS) Composite objects, like 4D geo-referenced grid? Add user composites? (as in Java object serialization) Compute grid averages or differences on demand? (Ferret)

26 Classifying Data Services (II)
Request Interface / Protocol REST (one-line HTTP URL’s) versus SOAP OpenDAP & WMS/WCS use REST REST tends to be more ad hoc than structured XML/SOAP Response Interface / Protocol OpenDAP uses custom binary wire protocol OGC WMS/WCS uses REST REST uses HTTP with MIME types & multi-part SOAP uses XML documents with possible attachments Representation for Large Binary Data Objects OpenDAP: custom binary wire protocol WMS/WCS: MIME multi-part embedding SOAP: embedded, attachments, or URL references Binary files referenced by ftp, http, or DODS URL’s Serialized Binary Objects (as in C++, Java, or C# .Net)

27 Classifying Data Services (III)
Transport Layers (http, GridFtp, p2p, LambdaRail) REST uses http for both request & response SOAP can use http, smtp, Instant Messaging, Secure Messaging, or p2p Degree of Modularity & Layering Is the software separated into well-defined protocol layers? Can functions/modules be reused separately? Example: OpenDAP contains a rich data model, data slicing syntax, and a binary wire protocol. Reuse each? Clean design or somewhat ad hoc? Permanent Names & Version Support Globally unique ID’s for data objects / granules Permanent names for service endpoints Versions for data objects & services Is it a service or an object? (REST vs. SOAP) It’s both. And they both need to be permanent.

28 Classifying Data Services (IV)
Service Chaining Can modules be chained together? In REST, here one reaches limits of one-line URL’s In SOAP, have rich world of WS Choreography or Grid Workflow solutions Use semantic web to discover and chain services Semantic Web Support Exposes scientific domain info. for data and service discovery? Generic dataset and variable names? Are the data categorized in an ontology? Is the service categorized in an ontology? Can the “meaning” of the service and the meaning of each of its inputs and outputs be inferred from an ontology? Is the service interoperable with others through semantic mediation?

29 Time & Geolocation Query Service
Inputs StartTime, EndTime, Lat/Lon Box, DatasetName, VariableName, MetadataGroup(s) Output List of items Each item has unique ID, start & end time, lat/lon box More (optional) metadata groups Datasets Implemented for GENESIS GPS database, and for AIRS, MODIS, & MISR granules in DataPool.

30 SciFlo Data Query/Access (SOAP) Services
GeoLocationQuery(startTime, endTime, lat, lon, timeTolerance, distanceTolerace, variable, metadataGroups) Returns granule ID’s and geolocation info. for AIRS L2 swaths that intersect lat/lon point or are near enough. GeoRegionQuery(startTime, endTime, lat/lon region, . . .) Returns granule ID’s and geolocation info. for AIRS L2 swaths that intersect the lat/lon region or are near enough. QueryByMetadata(variable, ListOfConstraints, groups) Returns granule ID’s and selected metadata for AIRS granules that satisfy the metadata constraint expression: (min1 <= field1 <= max1 and/or min2 <= field2 <= max ). GetDataById(IdList, UrlOptions) Given list of unique ID’s, returns list of ftp, http, or DODS URL’s pointing to the granules (files). Uses cache and redirection server. -- Other semantic interfaces possible

31 SciFlo Data Query/Access (SOAP) Services
GetMetadata(type, groups) Returns metadata describing scientific domain, dataset info, list of metadata fields, generic name translation table, location of XML schema documents, location of related ontologies (OWL/RDF), etc. GetHelp(type, groups) A SOAP service that itself returns help documentation describing the available SOAP services. --These Services Combined Can Support: Data Catalog, General Query, Data Access, Generic Names, Data Discovery by Domain and Dataset Keywords Type checking via XML schemas Hooks to semantic web

32 Data Discovery to Inventory Level
SOAP Services just described provide: Domain description Metadata Dataset Info. and Keywords Generic VariableName Lookup Table Query by Metadata or just Time & GeoRegion Layer more semantics on top of these services Dataset Discovery by keyword search Bind in additional metadata provided by scientist Tie ontologies into keywords and genericVariableNames Semantic Inference: AIRS and MODIS both provide atmosphericTemperature and cloudProperties. Hmmm. Compare them. XML/SOAP provides substrate; possibilities only limited by ontology development.

33 Evolving Grid Computing Standards
Carbon Cycle GT4 [From Globus Toolkit “Ecosystem” presentation at GGF11 by Lee Liming]

34 Evolving Grid Computing Standards
History of Scientific Computing as a Utility The Grid began as effort to tightly couple multiple super- or cluster computers together (e.g., Globus Toolkit v1 & v2). Needed job scheduling, submission, monitoring, steering, etc. success OGSI: Open Grid Services Infrastructure Globus v3.2 is open-source implementation using Java/C. A service is Grid-enabled by inheriting from Java class. Standard is complex and growing. Challenge: Ease of installation & use. WS-Resource Framework (WSRF) Capabilities treated as storage or computing resources exposed on the web. Globus Toolkit v4 coming Carbon Cycle

35 Evolving Grid Computing Standards
Carbon Cycle GT4 [From Globus Toolkit “Ecosystem” presentation at GGF11 by Lee Liming]

36 Grid Applications A Compute Server Simulation Tool B Compute Server
Web Browser Web Portal Registration Service Camera Telepresence Monitor Data Viewer Tool Application Developer 9 Off the Shelf 13 Globus Toolkit Grid Community Camera C Database service Chat Tool Data Catalog D Database service Credential Repository E Database service Certificate authority Users work with client applications Application services organize VOs & enable access to other services Collective services aggregate &/or virtualize resources Resources implement standard access & management interfaces

37 Grid Apps Using Globus Middleware
Globus GRAM Compute Server Simulation Tool Globus GRAM Compute Server Web Browser CHEF Globus Index Service Camera Telepresence Monitor Data Viewer Tool Application Developer 2 Off the Shelf 9 Globus Toolkit 4 Grid Community Camera Globus DAI Database service CHEF Chat Teamlet Globus MCS/RLS Globus DAI Database service MyProxy Globus DAI Database service Certificate Authority Users work with client applications Application services organize VOs & enable access to other services Collective services aggregate &/or virtualize resources Resources implement standard access & management interfaces

38 Components in Globus Toolkit 3.2
GSI WS-Security CAS (OGSI) SimpleCA RFT (OGSI) RLS OGSI-DAI WU GridFTP XIO Pre-WS GRAM MDS2 WS-Index (OGSI) JAVA WS Core (OGSI) WS GRAM (OGSI) OGSI C Bindings OGSI Python Bindings (contributed) pyGlobus (contributed) Security Data Management Resource Management Information Services WS Core

39 Planned Components in GT 4.0
GSI WS-Security CAS (WSRF) SimpleCA RFT (WSRF) RLS OGSI-DAI New GridFTP XIO Pre-WS GRAM WS-GRAM (WSRF) CSF (contribution) MDS2 WS-Index (WSRF) JAVA WS Core (WSRF) C WS Core (WSRF) pyGlobus (contributed) Authz Framework Security Data Management Resource Management Information Services WS Core

40

41 WS-* Alphabet Soup

42 WS Architecture

43 WS-Resource Framework Model

44 Using a Web Service to Access a WS-Resource

45 Overview of WS-Coordination

46 Evolving Grid Computing Standards
Carbon Cycle GT4 [From Globus Toolkit “Ecosystem” presentation at GGF11 by Lee Liming]

47 Compare/Contrast Template
Services Provided Implemented Standards / Protocols Inputs, Outputs, & Parameters Metadata Maturity Number of Users, Data Volume Support Level, Type, Availability Open / Proprietary Open? Proprietary? Published? Costs? Activities Interoperability? Future Plans Services to be Implemented

48 Services Provided by WS
Distributed Computing via XML Message Exchange Service Oriented Architecture (SOA) Many transport protocols supported (http, smtp, messaging, p2p) Security, Authentication, Authorization Reliable Messaging Notification Transactions Stateful Resources – creation, lifetime (WSRF) WS Choreography Grid Workflow

49 Implemented Standards / Protocols
XML schema SOAP, WSDL, UDDI WS-Resource Framework WS-ResourceProperties, -ResourceLifetime, -BaseFaults WS-Addressing (endpoints with state pointers) WS-Notification (publish & subscribe) WS-Security, -Policy, -SecurityPolicy WS-Transaction, -ReliableMessaging WS-Coordination, -Context (choreography) BPEL4WS (IBM workflow engine) SciFlo Engine coming

50 Inputs, Outputs, & Parameters
Too numerous to list. See SOAP and WS-* specifications.

51 Metadata See WS-* specifications. ??

52 Maturity SOAP, WSDL, UDDI are mature but still evolving.
Latest versions: SOAP v1.2, WSDL v2.0, UDDI v3 Many WS-* specifications are just being finalized. WSRF and Globus Toolkit v4 coming soon.

53 Support From numerous companies. IBM Websphere, etc. ??

54 Open / Proprietary Both WS-* are standards, with interop. features.
But companies provide added value and push changes in standards. Already WS stacks and development environments for sale.

55 Future Plans Interoperability testing is proceeding.
More WS-* standards and versions will emerge.

56 Services Provided by SciFlo
Data Query / Access (SOAP services) GPS Occultation Database (atmospheric retrievals) AIRS, MODIS, & MISR L2/L3 granules Grid Workflow Engine Abstract XML dataflow documents translated to concrete flows, with missing steps inserted automatically. Parallel dataflow execution engine (scheduler) Semantic inference using XML metadata Operators can be SOAP services, local scripts, or local native executables. Move authenticated operators/executables to the data. SOAP architecture, but also P2P functionality. Every node is both client & server; easy node replication. One-click installation onto server or desktop nodes. Support Linux & Windows. Initiate grid computations from your desktop.

57 Dataflow / Workflow Engines
Grid: Schedule & submit cluster computing jobs Operator tree is a Directed Acyclic Graph (DAG) CONDOR, CONDOR-G, DAGMan Globus Alliance Standards: GSI, GRAM, MDS, RLS, XIO, etc. Chimera -> Pegasus -> DAGMan -> Executing Grid Job Web: Several web choreography standards IBM’s Business Process Execution Language (BPEL4WS) Less convergence here than in OGSI/WSRF Marketplace winners? 10 workflow groups spoke at Global Grid Forum (GGF) meeting Sciflo will use some Globus capabilities via python bindings (pyGlobus). Carbon Cycle

58 Semantics for Web Services
Semantic Web History DAML: DARPA Agent Markup Language OWL: Ontology Web Language (from DAML+OIL) Numerous inference engines & ontologies being developed Semantics for Web Services OWL-S: OWL-based Web service ontology Describe properties & capabilities in computer-interpretable form (beyond WSDL). SciFlo Semantics & Inference Use WSDL+ & OWL-S to describe local operators (executables), remote services, & grid computing jobs. Discover & select operators to fill in missing steps in a dataflow. Carbon Cycle

59 SciFlo Engine iEarth Vision will be enabled by the open-source SciFlo Engine. Automate large-scale, multi-instrument science processing by authoring a dataflow document that specifies a tree of executable operators. iEarth Visual Authoring Tool Distributed Dataflow Execution Engine Move operators (executables) to the data. Built-in reusable operators provided for many tasks such as subsetting, co-registration, regridding, data fusion, etc. Custom operators easily plugged in by scientists. Leverage convergence of Web Services (SOAP) with Grid Services (Globus v3.2). Hierarchical namespace of objects, types, & operators. sciflo.data.EOS.AIRS.L2.atmosphericParameters sciflo.operator.EOS.coregistration.PointToSwath Carbon Cycle

60 SciFlo Document

61

62 Elaborating Workflow Documents
Abstract (skeleton) Workflow is more easily authored. Trivial data format & unit conversions auto-inserted. As toolbox of known & reliable operators grows, even complex ops like regridding become trivial. Could use other backends if desired (BPEL, DAGMan).

63 Distributed Computing Using SciFlo
Carbon Cycle Inject data query or flow execution request into SciFlo network from any node.

64 Permanent Hierarchical Names (“Holy Grail”)
Data Access by Naming Permanent Hierarchical Names (“Holy Grail”) Naming Authority assigned at each namespace level Distributed P2P namespace (P2P catalog lookup) Proper Names AIRS Level2 Parameter Retrieval Dataset (granules): sciflo.data.EOS.AIRS.L2.atmosphericParameters (or metadata) Generic Point-To-Swath Co-registration Operator: sciflo.operator.EOS.coregistration.PointToSwath Generic Names Atmospheric Temperature Data: sciflo.data.atmosphere.temperature.profile (or .grid) Name resolves to list of EOS datasets Semantics attached (3DGeoParameterGrid of temperature) Carbon Cycle

65 Access data objects by naming them! Server architecture
SciFlo’s Strength Lies in Combining Many Elements into a Single Open-Source System Abstract XML dataflow documents translated to concrete flows. Parallel dataflow execution engine Semantic inference using XML metadata Move operators to the data. SOAP architecture, but also P2P functionality. Every node is both client & server; easy node replication. One-click installation onto server or desktop nodes. Initiate grid computations from your desktop. Access data objects by naming them! P2P Distributed Namespace of data sources & operators Server architecture Group of interacting SOAP services (replaceable modules) Implementation in XML, python, & C/C++ (not Java) Strength in Numbers: Let a million nodes bloom! Carbon Cycle

66 SciFlo Data Access (SOAP) Services
getAirsMetaData(startTime, endTime, lat, lon) Returns granule ID’s and geolocation info. for AIRS swaths that intersect lat/lon point. getAirsUrlsFromGranuleIds(idList or xmlString) Redirection server – Returns URLs pointing to AIRS granules in local cache or DataPool. getGpsMetaData(startTime, endTime, NW, SW, SE, NE) Returns occultation ID’s and geolocation info. for GPS limb scans that are inside a lat/lon region. getGpsProfiles(idList or xmlString) Returns URL pointing to netCDF file containing GPS profiles for all occultation ID’s in the list. getGpsProfile(occID) Returns a single GPS profile in XML format. Carbon Cycle

67 AIRS versus GPS Flowchart (part 1)

68 AIRS versus GPS Flowchart (part 2)

69 AIRS & GPS Retrievals Matchup Demo
Interface: HTML web form auto- generated from XML dataflow doc. Input: User enters start/end time & other co-registration criteria. Flow Execution: Calls 4 SOAP data query services & total of 15 operators on 4 computers.

70 AIRS & GPS Retrievals Matchup Demo
Results Page: Shows status updates during execution and then final results. Caching: Reuse intermediate data products or force recompute. Results: Merged data in netCDF file & plots as Flash movie.

71 AIRS/GPS Matchups

72 AIRS/GPS Temperature & Water Vapor Comparison Plots

73 AIRS/GPS Temperature & Water Vapor Comparison Plots

74 Temperature as a function of pressure from AIRS (blue) and a coincident GPS occultation (red)
General Features: High latitude measurements (~73N deg) separated by ~2 deg in longitude and ~0.5 hours Sharper tropopause measured by GPS. AIRS overestimates tropopause temperature by ~5 deg. GPS measure a wavy structure in the stratosphere which is absent in AIRS

75 AIRS/GPS Temperature Difference Statistics Mean temperature profile
for 41 matchups, latitude region -60 to -70 deg., Jan-Apr, 2003

76 Goal: SciFlo nodes inside all Science Data Centers
Summary SciFlo’s Innovation Lies in Combining Many Elements into a Single Open-Source System Abstract XML dataflow documents Semantic inference using XML metadata Parallel dataflow execution engine Move operators to the data. Every node is both client & server; easy node replication. SOAP architecture, but also P2P functionality. Initiate grid computations from your desktop. Goal: SciFlo nodes inside all Science Data Centers Multi-Instrument Earth Science Instrument Cross-Comparisons Multi-Instrument Science Portals Large-scale multivariate statistical studies and verification of weather/climate models. Carbon Cycle


Download ppt "Architecting Scientific Data Services Using XML, SOAP, & the Grid"

Similar presentations


Ads by Google