Presentation is loading. Please wait.

Presentation is loading. Please wait.

Neil Chue Hong Project Manager, EPCC +44 131 650 5957 OGSA-DAI Status and Benchmarks All Hands Meeting 2005 Nottingham, 22 September.

Similar presentations


Presentation on theme: "Neil Chue Hong Project Manager, EPCC +44 131 650 5957 OGSA-DAI Status and Benchmarks All Hands Meeting 2005 Nottingham, 22 September."— Presentation transcript:

1 Neil Chue Hong Project Manager, EPCC N.ChueHong@epcc.ed.ac.uk +44 131 650 5957 OGSA-DAI Status and Benchmarks All Hands Meeting 2005 Nottingham, 22 September 2005

2 AHM20052 Overview The all new OGSA-DAI overview Benchmarking and profiling work Project collaboration Future plans

3 AHM20053 OGSA-DAI team IBM Development Team, Hursley NEReSC, Newcastle NeSC, Edinburgh EPCC Team, Edinburgh ESNW, Manchester IBM Dissemination Team

4 AHM20054 OGSA-DAI In One Slide An extensible framework for data access and integration. Expose heterogeneous data resources to a grid through web services. Interact with data resources: – Queries and updates. – Data transformation / compression – Data delivery. Customise for your project using – Additional Activities – Client Toolkit APIs – Data Resource handlers A base for higher-level services – federation, mining, visualisation,…

5 AHM20055 MySQL OGSA-DAI service Engine SQLQuery JDBC Data Resources Activities DB2 The OGSA-DAI Framework GZipGridFTPXPath XMLDB XIndice readFile File SWISS PROT XSLT SQL Server Data- bases Application Client Toolkit

6 AHM20056 MySQL OGSA-DAI service Engine SQLQuery JDBC SQL JDBC SQL JDBC SQL JDBC SQL JDBC Multiple SQL GDS SQLQuery Extensibility Example

7 AHM20057 2005 Timeline 20042003 Release 1 interim Release 2 Release 2 interim Release 3 Release 3.1 Release 4 Release 5 OGSI Release 6  Release 1 OGSA-DAI WSRF 1.0 OGSA-DAI WS-I 1.0/ OGSA-DAI WS-I 1.1 (OMII)

8 AHM20058 Release downloads Data up to 28/07/05

9 AHM20059 Geographical download profiles OGSIWSRFWS-I China (28%)China (32%)UK (30%) UK (20%)UK (19%)China (28%) US (12%)Germany (8%)US (8%) Unknown (10%)US (7%)Japan (7%) 4556330120 Data up to 29/07/05

10 AHM200510 Our stakeholders OMII –Current version of OGSA-DAI WS-I 1.0 distribution runs on OMII –Release 1.1 due out soon –Issues when security is introduced Globus –WSRF 0.9.6 distribution bundled with GT4.0 –WSRF 1.0 distribution bundled with GT4.0.1 Projects –Number of projects have used/use/will use OGSA-DAI AstroGridBiogridBioSimGridBridgescaGridDataMiningGrid eDiamondFirstDigGEDDMGeneGridGEONGridMiner INWAIU RGRBenchLEADMCS my GridN2Grid ODD-GenesOGSA-WebDBSIMDATGOLD

11 AHM200511 Out with the old… Client Client Toolkit API Relational XML Files Client Server Data SOAP DAISGR GDS GDSF

12 AHM200512 … in with the new! Client Generic Client Toolkit API WS-I WSRF DAI Core DSR Data Service WSRF WS-I DSR RelationalXML Files Client Server Data SOAP

13 AHM200513 Changes in moving to WSRF/WS-I Registry component (DAISGR) no longer supported –Hope to leverage of third party registration services –GRIMOIRES (http://www.omii.ac.uk/mp/mp_grimoires.htm)http://www.omii.ac.uk/mp/mp_grimoires.htm –Others … GDS/GDSF roles combined –Use data services –Currently static services but –Reconfigurable services Improvements to the GDS –Data resource abstraction decoupled from the service –Renaming (consistent naming across platform versions) –Ability to enforce control flow constraints (ordering activities) –Refactored exception framework Temporary set-backs (we promise we’ll fix them) –No security model –No concurrency –Previously used GDSs for concurrency –Support now moving to the engine

14 AHM200514 The Client Toolkit (CTk) Provides programmatic abstraction for perform documents – Do not have to write XML explicitly Abstraction over WSI and WSRF services at client side – don’t need to know what type of service is at the other end (almost) – security model is the remaining issue Currently only Java version of CTk – Stabilising API – Publish an API document – Allow 3 rd parties to develop CTk for other programming languages Client Generic Client Toolkit API WS-I WSRF

15 AHM200515 The Server Side Server side: – Presentation layer: – Deal with messaging differences – Get one version per distribution – Core/Business Logic: – Common to all distributions – Data Service Resource (DSR) – Data Layer: – Relational databases – XML document repositories – File based repositories New architecture being rolled out – see Malcolm’s talk in next session – concurrency, sessions and transactions DAI Core DSR Data Service WSRF DSR Relational XML Files WS-I

16 AHM200516 Benchmarking/Profiling Establish benchmark suite to: –Measure performance gains/losses between releases –Reveal implementation issues –Allows focused improvements –Establish best practice –Summer intern (Heather Kelly) produced results Profiling allows us to identify particular areas which are causing poor performance in the benchmarks –Summer intern (Radoslaw Ostrowski) extended Netlogger and did some profiling Most of the results are for OGSA-DAI R6 –one slide showing what is happening in R7

17 AHM200517 Configuration Measure the time to: –Send SQL query to server –Return nRows –Sum the values in one of the columns Do this 30 times –Calculate mean and standard deviation Repeat the process having increased nRows by stepsize Try various different databases Notes: –Time to establish connection in JDBC runs not included –JDBC does not return results in WebRowSet format –Server is already running Data source little blackbook –Test database included in distributions Windows XP Pro SP2 Intel PIII 863MHz 512Mb RAM Windows XP Pro SP2 Intel PIII 863MHz 512Mb RAM SunOS 5.9 UltraSPARC-IIe 502 MHz 128Mb RAM SunOS 5.9 UltraSPARC-IIe 502 MHz 128Mb RAM Tomcat 4.1.29 GT 3.2.1 OGSA-DAI OGSI R6.0 j2sdk 1.4.2_01 Tomcat 4.1.29 GT 3.2.1 OGSA-DAI OGSI R6.0 j2sdk 1.4.2_01 10MBit network

18 AHM200518 Some benchmarks Relational query – StreamServlet requires two communications – could improve this – FTP not iterating over result set – JDBC scales much better than SOAP ResultSet implementations – Forwards-backwards implementation builds DOM tree; larger memory footprint

19 AHM200519 MySQL (nRows = 10000, number of runs = 30, stepsize = 500, blockSize = 200)

20 AHM200520 DB2 (nRows = 10000, number of runs = 30, stepsize = 500, blockSize = 200)

21 AHM200521 PostgreSQL (nRows = 10000, number of runs = 30, stepsize = 500, blockSize = 200)

22 AHM200522 SQL Server (nRows = 10000, number of runs = 30, stepsize = 500, blockSize = 200)

23 AHM200523 Oracle (nRows = 10000, number of runs = 30, stepsize = 500, blockSize = 200)

24 AHM200524 OGSA-DAI WS-I (nRows = 10000, number of runs = 30, stepsize = 500)

25 AHM200525 Database comparison (OGSA-Dai WSRF 1.0, nRows = 10000, number of runs = 30, stepsize = 500)

26 AHM200526 Platform comparison (MySQL database, nRows = 10000, number of runs = 30, stepsize = 500)

27 AHM200527 Profiling: better RowSet conversion ResultSet to RowSet conversion

28 AHM200528 R6->R7: removal of RowSet

29 AHM200529 Challenges Intermediate representation –between multiple models (relational, XML,…) –XML WebRowSet is flexible (c.f. GridMiner) but expansive –DFDL and GridFTP/parallel HTTP? Query definition –translation of queries Data transport and workflow –workflow is typically compute driven Move computation to data –mobile code activities? –data services hosted on DBMS?

30 AHM200530 caBIG “Object-Oriented” view of data –Data types are well-defined and registered in a repository –Standardized metadata facilitates discovery –custom query language implemented as an activity

31 AHM200531 LEAD IU NCSA Illinois UA Huntsville Millersville UCAR Unidata Okla Univ Master catalog Each satellite replicates its contents to the master catalog

32 AHM200532 Users Group and DIALOGUE Workshops 3 rd Users Group meeting –June 1 st –http://www.ogsadai.org.uk/docs/UG3/ DIALOGUE Workshops –Data Integration Applications: Linking Organisations to Gain Understanding and Experience –Columbus, Edinburgh, Vienna, Indiana –Bringing together Data Integration middleware and application providers with users –http://www.datagrids.org

33 AHM200533 Future plans A new version of the OGSA-DAI Engine –should look mostly the same externally –better support for concurrency, sessions and monitoring –see Architecture paper/talk presented on Monday Implementing new versions of specifications –DAIS Specifications Key things that we will be addressing after Release 7: –Performance –A Security Model which can be applied across platforms –Full Transactions provision, including implementation of compensatory activities, distributed transactions –More data integration facilities –Better abstraction over DBMS variation

34 AHM200534 Conclusions OGSA-DAI has had to undergo significant refactoring to keep stakeholders happy Refactoring has allowed us to create an extensible framework which can be used for many data related tasks We need to identify the components and improvements which will be useful to users There is obviously room for improvement on performance, and we are working on it

35 AHM200535 Further information The OGSA-DAI Project Site: –http://www.ogsadai.org.uk The DAIS-WG site: –http://forge.gridforum.org/projects/dais-wg/ OGSA-DAI Users Mailing list –users@ogsadai.org.uk –General discussion on grid DAI matters Formal support for OGSA-DAI releases –http://www.ogsadai.org.uk/support –support@ogsadai.org.uk OGSA-DAI training courses

36 AHM200536 Core features of OGSA-DAI – I A framework for building applications –Supports data access, insert and update –Relational: MySQL, Oracle, DB2, SQL Server, Postgres –XML: Xindice, eXist –Files – CSV, BinX, EMBL, OMIM, SWISSPROT,… –Supports data delivery –SOAP over HTTP –FTP; GridFTP –E-mail –Inter-service –Supports data transformation –XSLT –ZIP; GZIP –Supports security –X.509 certificate based security

37 AHM200537 Core features of OGSA-DAI – II A framework for building data clients –Client toolkit library for application developers A framework for developing functionality –Extend existing activities, or implement your own –Mix and match activities to provide functionality you need Highly-extensible –Customise our out-of-the-box product –Provide your own services, client-side support and data-related functionality Comprehensive documentation and tutorials Latest release supports GT3.2 (to be deprecated), GT4.0, and Axis 1.2 / OMII_2 using Java 1.4

38 AHM200538 OGSA-DAI Design Principles – I Efficient client-server communication –Minimise where possible –One request specifies multiple operations No unnecessary data movement –Move computation to the data –Utilise third-party delivery –Apply transforms (e.g., compression) Build on existing standards –Fill-in gaps where necessary

39 AHM200539 OGSA-DAI Design Principles – II Do not hide underlying data model –Users must know where to target queries –Data virtualisation is hard Extensible architecture –Modular and customisable –e.g., to accommodate stronger security Extensible activity framework –Cannot anticipate all desired functionality –Activity = unit of functionality –Allow users to plug-in their own

40 AHM200540 Data Integration challenges Metadata extraction –define a common model for e.g. database schema? Intermediate representation –between multiple models (relational, XML,…) –XML WebRowSet is flexible (c.f. GridMiner) but expansive –DFDL and GridFTP/parallel HTTP? Query definition –translation of queries Data transport and workflow –workflow is typically compute driven Move computation to data –mobile code activities? –data services hosted on DBMS?

41 AHM200541 Contributing to OGSA-DAI Additional functionality: –Provide activities which implement specific functionality –Provide extra client functionality –Provide different security mechanisms –Provide higher level components and applications Different levels of contributions –Based on OGSA-DAI? –Works with OGSA-DAI? –Part of OGSA-DAI?

42 AHM200542 Distributed Query Processing Queries mapped to algebraic expressions for evaluation Parallelism represented by partitioning queries –Use exchange operators Prototype available from: –http://www.ogsadai.org.ukhttp://www.ogsadai.org.uk Being integrated into OGSA-DAI table_scan (protein) table_scan termID=S92 (proteinTerm) reduce hash_join (proteinId) op_call (Blast) reduce exchange 3,4 12

43 AHM200543 caBIG “Object-Oriented” view of data –Data types are well-defined and registered in a repository –Standardized metadata facilitates discovery –custom query language implemented as an activity

44 AHM200544 LEAD IU NCSA Illinois UA Huntsville Millersville UCAR Unidata Okla Univ Master catalog Each satellite replicates its contents to the master catalog

45 AHM200545 FirstDIG Data mining with the First Transport Group, UK –Example: “When buses are more than 10 minutes late there is an 82% chance that revenue drops by at least 10%” –http://www.epcc.ed.ac.uk/firstdig OGSA-DAI OGSA-DAI Client Application Data Mining Application

46 AHM200546 GridMiner Test application area: medical –traumatic brain injury treatment –Predicting the outcome of seriously ill patients –analytical part focuses on data mining and On-Line Analytical Processing (OLAP) Target: –provide tools to discover and access relevant knowledge and information from different distributed and heterogeneous data sources –building on and extending OGSA-DAI http://www.gridminer.org/

47 AHM200547 GridMiner Scenario Heterogeneities: –Name in A is „First Last“ (as the target format) –Name in C has to be combined Distribution: –3 data sources

48 AHM200548 Software Process Testing Reqs. Prototype Prioritisation Fix Bugs Use Cases Requests Design ImplementQA Release Support Test Cases Programme Board Technical Review Board Technical Reviewer DEVELOPERS USERS REVIEW Contribs Ingest Dissem. Training Nightly unit + system tests Additional test cases System tests based on reqs Continual process → Deep track features Users’ Group Peer Review and Inspection

49 AHM200549 user@australia Curtin,Australia EPCC,UK INWA Grid Engine BankTelco Grid Engine BankTelco OGSA-DAI TOG Data Browser user@edinburgh Telco data Bank data Australian property UK Property


Download ppt "Neil Chue Hong Project Manager, EPCC +44 131 650 5957 OGSA-DAI Status and Benchmarks All Hands Meeting 2005 Nottingham, 22 September."

Similar presentations


Ads by Google