Presentation is loading. Please wait.

Presentation is loading. Please wait.

NeSC Data Projects and Initiatives Dr. Dave Berry Research Manager.

Similar presentations


Presentation on theme: "NeSC Data Projects and Initiatives Dr. Dave Berry Research Manager."— Presentation transcript:

1 NeSC Data Projects and Initiatives Dr. Dave Berry Research Manager

2 Contents The Data Deluge Web Services The DAI vision The OGSA-DAI Project and GGF The OGSA-DAI Software Edikt Other relevant projects in the UK

3 Acknowledgements This talk includes material prepared by: The OGSA-DAI project The e-Diamond project The BRIDGES project The GGF OGSA Working Group and others…

4 The Data Deluge Mont Blanc (4810 m) Entering an age of data CERN: LHC will generate 1GB/s = 10PB/y VLBA (NRAO) generates 1GB/s today Pixar generate 100 TB/Movie Data stored in many different ways Relational databases XML databases Flat files Need ways to facilitate Data discovery Data access Data integration Downtown Geneva

5 Astronomical Databases No. & sizes of data sets as of mid- 2002, grouped by wavelength 1 2 waveband coverage of large areas of the sky Total about 200 TB data Doubling every 12 months Largest catalogues nr. 1B objects Data and images courtesy Alex Szalay, John Hopkins

6 Bioinformatics Databases PDB Content Growth Biobliographic (MedLine, …) Amino Acid Seq (SWISS-PROT, …) 3D Molecular Structure (PDB, …) Nucleotide Seq (GenBank, EMBL, …) Biochemical Pathways (KEGG, WIT…) Molecular Classifications (SCOP, CATH,…) Motif Libraries (PROSITE, Blocks, …)

7 Web Services Using the protocols and ideas that have made the web a success for humans… And applying them to distributed programming HTTP Single networking port Autonomy & Failure handling Open standards Tools & Platforms Apache axis Websphere,.NET, Oracle Application Server, Sun ONE, …

8 From Browsing to Programming Browsing the webProgramming the web ReadersPeopleSoftware DiscoveryGoogle, Altavista, …UDDI, … DescriptionN/AWSDL OperationsGet, post, …Service-specific ProtocolHTTPSOAP over HTTP FormatHTML, XHTMLXML + Schema

9 A Perspective on WS Specifications

10 Open Grid Services Architecture Web Services Business integration Secure and universal access Applications on demand Grid Protocols Vast resource scalability Global Accessibility Resources on demand Continuous Availability Access resource Manage resource Share resource The architecture of the Global Grid Forum

11 Context Services Information Services Infrastructure Services Security Services Resource Mgmt Services Execution Mgmt Services Data Services Policy Mgmt VO Mgmt Access Integration Provisioning Cataloging Boundary Traversal Integrity Authorization Authentication WSRFWSNWSDM Event Mgmt Trouble- shooting Discovery Job Mgmt Logging Execution Planning Workflow Mgmt Workload Mgmt Provisioning Application Mgmt DeploymentConfigurationReservation Naming Self Mgmt Services Heterogeneity Mgmt Service Level Attainment QoS Mgmt Optimization GGF11: OGSA specification informational document

12 Data Access and Integration Web Services for querying and integrating structured data resources The foundation framework for: Building tailored DAI applications Higher-level services: Replication: Data located in multiple locations Federation: Composition of multiple sources Provenance: How was data generated?

13 The OGSA-DAI Project Powered by …. Funded by the Grid Core Programme OGSA-DAI £3 million, 18 months, from Feb 2002 Three major releases, three interim releases DAIT (DAI-Two) Keep the OGSA-DAI brand name £1.5 million, 24 months, from Oct 2003 Four major releases

14 DAI in GGF and OGSA Data Access and Integration Services WG Strong involvement from OGSA-DAI members Standardise the interfaces – WS-DAI OGSA-DAI a reference implementation Experience informing specification work OGSA WG Data Design Team Designing the data-oriented aspects of OGSA Created after GGF10 (March 2004) Led by NeSC

15 Context Services Info Services Infra Services Security Services Rsrc Mgmt Services Execution Mgmt Services Data Services Policy Mgmt VO Mgmt Access Integration Provisioning Cataloging Boundary Traversal Integrity Authorization Authentication WSRFWSNWSDM Event Mgmt Trouble- shooting Discovery Job Mgmt Logging Execution Planning Workflow Mgmt Workload Mgmt Provisioning Application Mgmt DeploymentConfigurationReservation Naming Self Mgmt Services Heterogeneity Mgmt Service Level Attainment QoS Mgmt Optimization OGSA Design Teams OGSA-WG Information Service design team Data Service design team EMS design team Resource Mgmt design team Security Service design team Self Mgmt design team Core (roadmap) design team Naming design team

16 Data Services design team Informal domain expert groups within OGSA May include co-chairs of other WG/RGs Output is included in OGSA specification OGSA-WG OGSA Data Service Design team DAIS-WG GSM-WG GFS-WG Info-D WG ADF, OREP, … Tele cons, F2F meetings

17 OGSA v2 Document Deliverables Root Documents Usecase doc Architecture v2 Glossary Design team Documents Service descriptions Scenarios Working Group Specifications GGF Recommendation documents

18 1a. Request to Registry for sources of data about x 1b. Registry responds with Factory handle 2a. Request to Factory for access to database 2c. Factory returns handle of GDS to client 3a. Client queries GDS with XPath, SQL, etc 3b. GDS interacts with database 3c. Results of query returned to client as XML SOAP/HTTP service creation API interactions RegistryFactory 2b. Factory creates GridDataService to manage access Grid Data Service Client XML / Relationa l database How OGSA-DAI works

19 OGSA-DAI compared to JDBC Language independence at the client end Platform independence Do not have to worry about connection technology, drivers, etc Can handle XML resources Can embed additional functionality at the service end Transformations Third party delivery Avoiding unnecessary data movement Provision of Metadata is powerful Usefulness of the Registry for service discovery Dynamic service binding process

20 GDTS 2 GDS 3 2 GDTS 1 S x S y 1a. Request to Registry for sources of data about x & y 1b. Registry responds with Factory handle 2a. Request to Factory for access and integration from resources Sx and Sy 2b. Factory creates GridDataServices network 2c. Factory returns handle of GDS to client 3a. Client submits sequence of scripts each has a set of queries to GDS with XPath, SQL, etc 3c. Sequences of result sets returned to analyst as formatted binary described in a standard XML notation SOAP/HTTP service creation API interactions Data Registry Data Access & Integration master Client Analyst XML database Relational database GDS GDTS 3b. Client tells analyst GDS 1 Future DAI Services Application Code

21 Activities are the drivers Express a task to be performed by a GDS Three broad classes of activities: Statement Transformations Delivery Extensible: Easy to add new functionality Does not require modification to the service interface Extension operate within the OGSA-DAI framework Functionality: Implemented at the service Work where the data is (do not require to move data back)

22 OGSA-DAI Deck

23 Building Applications Activities are grouped together Perform document Data can flow between activities Optimisation Avoids multiple message exchanges Can deliver to other GDSs Prerequisite for data integration Base middleware for projects requiring data access Some capability for data integration

24 Release 4, April 2004 Provides Data Access components, an extensible framework for building applications and some integration components Built on top of Globus Toolkit 3.2 Supports relational, xml and some files MySQL, Oracle, DB2, SQL Server, Postgres, XIndice, CSV Supports various delivery options SOAP, FTP, GridFTP, HTTP, files, email, inter-service Supports various transforms XSLT, ZIP, GZip Supports message level security using X509 certificates Client Toolkit library for application developers GUI data browser (contributed by FirstDIG project) Separate Distributed Query Processing components Comprehensive documentation and tutorials in XHTML format

25 Downloads by Release 2746 downloads (~4.7 downloads a day)

26 Downloads by country 792 registered users @ 23/8/04

27 Release 5, October 2004 Re-engineered interface-independent core OGSA- DAI functionality. Improved dependability and security integration. New file data resources representing flat files queried using full text searches (e.g. EMBL format). Installation and Configuration Wizard, including all-in-one installer Improved Data Browser which allows XPath querying. Set of standard benchmarks. JSP Quick View interface. Support for other databases (e.g. Access, Exist, HSQL).

28 Release 6, April 2006 Data Integration applications supporting identified scenarios OGSA-DQP as an integrated part of release Fully compliant JDBC Driver for OGSA-DAI Support for WS-Security implementations Support for stored procedures on all supported databases Improved support for different database specific SQL types SQL translation between vendor dialects for subset of queries Support for XQuery data resources We expect to comply with a version of the emerging DAIS specification at this release.

29 Who is Using OGSA-DAI? OGSA-DAI (http://www.ogsadai.org.uk) AstroGrid (http://www.astrogrid.org/) BioSimGrid (http://www.biosimgrid.org/) BioGrid (http://www.biogrid.jp/) Bridges (http://www.brc.dcs.gla.ac.uk/projects/bridges/) eDiaMoND (http://www.ediamond.ox.ac.uk/) FirstDig (http://www.epcc.ed.ac.uk/~firstdig/) GeneGrid (http://www.qub.ac.uk/escience/projects.php#genegrid) GEON (http://www.geongrid.org/) IU RGRBench (http://www.cs.indiana.edu/~plale/projects/RGR/OGSA-DAI.html) myGrid (http://www.mygrid.org.uk/) N2Grid (http://www.cs.univie.ac.at/institute/index.html?project-80=80) ODD-Genes (http://www.epcc.ed.ac.uk/oddgenes/) OGSA-WebDB (http://www.gtrc.aist.go.jp/dbgrid/) MCS (http://www.isi.edu/~deelman/MCS/) INWA (http://www.epcc.ed.ac.uk/projects/inwa/) GridMiner (http://www.gridminer.org/)

30 Project classification

31 Edikt The team: 8 professional software engineers, support staff, project manager, commercialisation manager, architect, and SAB SHEFC funded research and development grant 3 years funding: May 2002 – 2005 +3 years funding upon successful project and review Standards Edikt project Requirements analysis Technology matchmaking Gap fillingRigorous engineering CS Research Grid Services for e-Science Data Management Commercial SW components and skills E-Science Apps

32 Java Framework ELDAS – Data Access Service Implemented using Enterprise Java Beans Data Access Components interface to distinct DBMSs Accessible as a grid data service or a web data service ELDAS DB2 DBMySQL DBXindice DB Web User1 Oracle 9i DB EJB - DAS DAC Another (partial) implementation of the GGF WS-DAI specifications Web Servlet Grid Proxy Grid User1Grid User2

33 e-Science Application Binary Data File BinX – accessing legacy binary data The Problem: Many binary data files Applications must know the data format Binary data formats are machine-specific BinX Library The Solution: Write a stand-aside format description in XML Provide a library to Interpret the description Provide file access across different machines Build higher-level services BinX file describes binary file structure simulations

34 Mammography Mammograms have different appearances, depending on image settings and acquisition systems Standard Mammo Format Standard Mammo Format Temporal mammography Computer Aided Detection 3D View A prototype of a national database of mammographic images in support of the UK breast screening programme

35 DB2 Content Manager DB2 Content Manager DB2 Content Manager DB2 Content Manager DB2 Federation OGSA-DAI Database Files OGSA-DAI Core Services Core Services Core Services Core Services Data Load Training App Training Services UCL KCLUEDCHU Core API Training API Training Application Core & Training API OGSA-DAI Data Load Training App Core & Training API Data Load Training App Core & Training API Data Load Training App Core & Training API

36 The BRIDGES Project Biomedical Research Informatics Delivered by Grid Enabled Services NeSC (Edinburgh and Glasgow) and IBM www.brc.dcs.gla.ac.uk/projects/bridges Supporting project for CFG project Generating data on hypertension Rat, Mouse, Human genome databases Variety of tools used BLAST, BLAT, Gene Prediction, visualisation, … Variety of data sources and formats Microarray data, genome DBs, project partner research data, medical records, … Aim is integrated infrastructure supporting Data federation Security

37 BRIDGES Synteny Grid Service blast + VO Authorisation Information Integrator OGSA-DAI

38 INWA Project Innovation Node Western Australia Informing Business & Regional Policy: Grid-enabled fusion of global data and local knowledge Involved 10 partners (6 UK + 4 Australia) Aim Data mine commercially sensitive data Security an absolute MUST Employ Grid technologies Need access to data and computational resources OGSA-DAI Access data resources SunDCG's TOG (Transfer-queue Over Globus) Handle job submission to analyse micro array data

39 user@australia Curtin,Australia EPCC,UK INWA Grid Engine BankTelco Grid Engine BankTelco OGSA-DAI TOG Data Browser user@edinburgh Telco data Bank data Australian property UK Property

40 Further Information on OGSA-DAI The OGSA-DAI Project Site: http://www.ogsadai.org.uk The DAIS-WG site: http://cs.man.ac.uk/grid-db OGSA-DAI Users Mailing list users@ogsadai.org.uk General discussion on grid DAI matters Formal support for OGSA-DAI releases http://www.ogsadai.org.uk/support support@ogsadai.org.uk OGSA-DAI training courses


Download ppt "NeSC Data Projects and Initiatives Dr. Dave Berry Research Manager."

Similar presentations


Ads by Google