Presentation is loading. Please wait.

Presentation is loading. Please wait.

NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE SAN DIEGO SUPERCOMPUTER CENTER Particle Physics Data Grid GriPhyN - SDSC Research and Infrastructure.

Similar presentations


Presentation on theme: "NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE SAN DIEGO SUPERCOMPUTER CENTER Particle Physics Data Grid GriPhyN - SDSC Research and Infrastructure."— Presentation transcript:

1 NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE SAN DIEGO SUPERCOMPUTER CENTER Particle Physics Data Grid GriPhyN - SDSC Research and Infrastructure Reagan Moore San Diego Supercomputer Center

2 NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE SAN DIEGO SUPERCOMPUTER CENTER Particle Physics Data Grid Topics Research activities Advanced query interfaces - Amarnath Gupta Knowledge bases - Bertram Ludaescher Infrastructure development SRB replication - Michael Wan MCAT information catalog - Arcot Rajasekar Grid Portals - Mary Thomas WSDL web services - Arun Jagatheesan Grids

3 NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE SAN DIEGO SUPERCOMPUTER CENTER Particle Physics Data Grid LIGO Support Opportunities Pattern recognition in template and chirp- transform data using database technology Derived data product optimization through optimization of input parameters - controlled parameter sweeps Utilization of SRB/MCAT for storage of virtual data products

4 NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE SAN DIEGO SUPERCOMPUTER CENTER Particle Physics Data Grid SDSS Support Opportunities Federation of sky survey services Development of a dynamic cross-match service between SDSS and other sky surveys WSDL based web interface for sky survey services UDDI based service directory Build topic map providing relationships between “Strasbourg sky survey” attributes Correlate attributes through physical laws as well as derived observations

5 NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE SAN DIEGO SUPERCOMPUTER CENTER Particle Physics Data Grid Integration of XSIL and XQuery An XML query language designed for heterogeneous data sources Authors: Don Chamberlin (IBM), Jonathan Robie (SoftwareAG), and Deniela Florescu (INRIA) Quilt is built on previous XML query languages : -- XPath, XQL, XML-QL, XMAS, Lorel, YATL Become a standard query language for XML, called XQuery “List the titles of all books published by Addison Wesley after 1991, in alphabetic order.” FOR $b IN document("www.bn.com/bib.xml")//book [publisher = "Addison Wesley" AND @year > "1991"] RETURN $b/@year, $b/title SORTBY (title)

6 NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE SAN DIEGO SUPERCOMPUTER CENTER Particle Physics Data Grid A flexible, XML based, hierarchical, extensible, transport language for scientific data objects Extensible Scientific Interchange Language (XSIL) 10 0.0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9 Hello Auntie Joan 96

7 NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE SAN DIEGO SUPERCOMPUTER CENTER Particle Physics Data Grid Quilt Extensions Added the concept of data types Float, integer, and boolean versus string Added operator overloading “Sum” on type string concatenates “Sum” on type integer adds Added array operations Get, set, element summation, array summation. Subsequence, concatenate

8 NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE SAN DIEGO SUPERCOMPUTER CENTER Particle Physics Data Grid Logical collection -Elements - attributes Export elements & attributes Grid Container -Logical name -Container metadata -Element attributes -(Data model) -Elements Grid metadata catalog Mapping of logical containers to physical files Grid replica catalog Import into existing or new logical collection Logical collection Transforms On elements Available Transforms Derived data products Derived Data metadata Derived data process metadata Data Grids Linking Collections

9 NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE SAN DIEGO SUPERCOMPUTER CENTER Particle Physics Data Grid SRB Status SRB Features Demonstration of the ability to coordinate bulk metadata and bulk data loads Aggregate files into a “container”, simultaneously write metadata into a file for bulk load into the MCAT information repository Achieved file import rate of 250 files/second Development in progress Improved error statement management mySRB.html web interface for collection support

10 NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE SAN DIEGO SUPERCOMPUTER CENTER Particle Physics Data Grid MCAT Web Interface Provide collection management Create a collection Define collection attributes Ingest data / move / replicate Browse Query Annotate Comment https://srb.npaci.edu/mySRB.html

11 NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE SAN DIEGO SUPERCOMPUTER CENTER Particle Physics Data Grid Grid Portal Development Integrate collection management of derived data products with Grid execution portal Based on Grid Port and SRB Funded by GriPhyN, NPACI, NASA IPG

12 NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE SAN DIEGO SUPERCOMPUTER CENTER Particle Physics Data Grid GridPort + SRB Architecture With SRB capabilities, file access is direct, uniform Uses same authentication as portal and other Grid services Single SRB account access allows for more flexible data management

13 NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE SAN DIEGO SUPERCOMPUTER CENTER Particle Physics Data Grid Other Data Grids NSF - National Virtual Observatory DOE - Particle Physics Data Grid - Babar NSF - United Kingdom data grid NSF - Distributed Terascale Facility

14 NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE SAN DIEGO SUPERCOMPUTER CENTER Particle Physics Data Grid Compute ResourcesCatalogsData Archives Information Discovery Metadata delivery Data Discovery Data Delivery Catalog Mediator Data mediator 1. Portals and Workbenches Bulk Data Analysis Catalog Analysis Metadata View Data View 4.Grid Security Caching Replication Backup Scheduling 2.Knowledge & Resource Management Standard Metadata format, Data model, Wire format Catalog/Image Specific Access Standard APIs and Protocols Concept space 3. 5. 6. 7. Derived Collections Astronomy Sky Survey Data Grid

15 NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE SAN DIEGO SUPERCOMPUTER CENTER Particle Physics Data Grid PPDG - Babar Support Installed SRB at Stanford Added Babar specific metadata attributes to MCAT catalog Developed ability to support “soft links” between collections Allows same file to appear in multiple collections Release in SRB version 1.1.9 UK data grid (SRB / Condor / Globus) Rutherford - opportunity for international demonstration of Babar data replication

16 NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE SAN DIEGO SUPERCOMPUTER CENTER Particle Physics Data Grid TeraGrid Wide Area Network NCSA/UIUC ANL UIC Multiple Carrier Hubs Starlight / NW Univ Ill Inst of Tech Univ of Chicago Indianapolis (Abilene NOC) I-WIRE StarLight International Optical Peering Point (see www.startap.net) Los Angeles San Diego DTF Backbone Abilene Chicago Indianapolis Urbana OC-48 (2.5 Gb/s, Abilene) Multiple 10 GbE (Qwest) Multiple 10 GbE (I-WIRE Dark Fiber) Solid lines in place and/or available by October 2001 Dashed I-WIRE lines planned for summer 2002

17 NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE SAN DIEGO SUPERCOMPUTER CENTER Particle Physics Data Grid PACI 13.6 TF Linux TeraGrid

18 NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE SAN DIEGO SUPERCOMPUTER CENTER Particle Physics Data Grid Further Information http://www.npaci.edu/DICE

19 NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE SAN DIEGO SUPERCOMPUTER CENTER Particle Physics Data Grid SDSC Storage Resource Broker & Meta-data Catalog SRB Archives HPSS, ADSM, UniTree, DMF Databases DB2, Oracle, Postgres File Systems Unix, NT, Mac OSX Application C, C++, Linux I/O Unix Shell Dublin Core Resource, User Defined Application Meta-data Remote Proxies DataCutter Third-party copy Java, NT Browsers Web Prolog Predicate MCAT HRM

20 NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE SAN DIEGO SUPERCOMPUTER CENTER Particle Physics Data Grid Replication Attributes DATA_NAME Global SRB data object name DATA_REPL_ENUM Replica copy number SIZE Size of data in bytes DATA_TYP_NAME Data type (primarily specification of the data format) DATA_CLASS_NAME Logical classification of the data (description of the type). DATA_CLASS_TYPE Classification type ACCESS_CONSTRAINT Access restrictions on data DATA_COMMENTS

21 NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE SAN DIEGO SUPERCOMPUTER CENTER Particle Physics Data Grid Replication Attributes (2) DATA_COMMENTS_TIMESTAMP Time and date stamp for when comments were made on the data object REPL_TIMESTAMP Time and date stamp when the owner modified the data object. PATH_NAME Physical path name of the data object. DATA_CREATE_TIMESTAMP Time and date stamp for when the data was created DATA_IS_DELETED A flag can be turned on that indicates a data object has been deleted, while retaining the data set on storage. DATA_OWNER Data object creator name. DATA_OWNER_DOMAIN Domain/ group of the data object creator.

22 NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE SAN DIEGO SUPERCOMPUTER CENTER Particle Physics Data Grid Quilt Extension (1) – Data Type Original Quilt: No difference between dt1.xml and dt2.xml 21 … … 122 123 … … 203 … … dt1.xml 21 … … 122 123 … … 203 … … dt2.xml After we add data type …

23 NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE SAN DIEGO SUPERCOMPUTER CENTER Particle Physics Data Grid Quilt Extension (2) – Operator Overloading FOR $bill in document(“dt1.xml")//bill RETURN $bill//id, $bill//sponsor_id, $bill//id/text() + $bill//sponsor_id/text() Query 1 : sum of id and sponsor_id ( type = string ) 21 122 21122 123 203 123203 … …

24 NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE SAN DIEGO SUPERCOMPUTER CENTER Particle Physics Data Grid Quilt Extension (2) – Operator Overloading FOR $bill in document(“dt2.xml")//bill RETURN $bill//id, $bill//sponsor_id, $bill//id/text() + $bill//sponsor_id/text() Query 2 : sum of id and sponsor_id ( type = integer ) 21 122 143.0 123 203 326.0 … …

25 NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE SAN DIEGO SUPERCOMPUTER CENTER Particle Physics Data Grid Quilt Extension (3) – Array Operation Value ValueArray ValueIntegerArrayValueFloatArrayValueStringArrayValueBoolArray Value : Interface for Kweelt base type ValueArray : Extend Value. Implement Compare and array- specific operation Accessor – getter, setter Element summation Array summation Subsequence Zip, Unscroll, concatenation, etc Demo : http://pamina2.sdsc.edu/cgi-bin/kweelt/demo.cgihttp://pamina2.sdsc.edu/cgi-bin/kweelt/demo.cgi


Download ppt "NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE SAN DIEGO SUPERCOMPUTER CENTER Particle Physics Data Grid GriPhyN - SDSC Research and Infrastructure."

Similar presentations


Ads by Google