EGEE is a project funded by the European Union under contract IST-2003-508833 International Summer School on Grid Computing Vico Equense, 16 th July 2005.

Slides:



Advertisements
Similar presentations
© 2007 Open Grid Forum Data Management Challenge - The View from OGF OGF22 – February 28, 2008 Cambridge, MA, USA Erwin Laure David E. Martin Data Area.
Advertisements

Research Councils ICT Conference Welcome Malcolm Atkinson Director 17 th May 2004.
Neil Chue Hong Project Manager, EPCC Data Services What, Why, How e-Research Meeting NeSC, 2 nd.
A centre of expertise in data curation and preservation DCC/NeSC eScience Workshop, June 2008 Working in partnership with the eScience community This work.
Open Grid Service Architecture - Data Access & Integration (OGSA-DAI) Dr Martin Westhead Principal Consultant, EPCC Telephone: Fax:+44.
SWITCH Visit to NeSC Malcolm Atkinson Director 5 th October 2004.
OMII-UK Steven Newhouse, Director. © 2 OMII-UK aims to provide software and support to enable a sustained future for the UK e-Science community and its.
DELOS Highlights COSTANTINO THANOS ITALIAN NATIONAL RESEARCH COUNCIL.
Joint Information Systems Committee Digital Library Services BL/JISC Workshop Rachel Bruce JISC Programme Director The Digital Library and its Services,
CVRG Presenter Disclosure Information Tahsin Kurc, PhD Center for Comprehensive Informatics Emory University CardioVascular Research Grid Core Infrastructure.
An Overview of OGSA-DAI Kostas Tourlas
ASCR Data Science Centers Infrastructure Demonstration S. Canon, N. Desai, M. Ernst, K. Kleese-Van Dam, G. Shipman, B. Tierney.
1 Cyberinfrastructure Framework for 21st Century Science & Engineering (CF21) IRNC Kick-Off Workshop July 13,
Connect. Communicate. Collaborate Click to edit Master title style MODULE 1: perfSONAR TECHNICAL OVERVIEW.
Distributed Heterogeneous Data Warehouse For Grid Analysis
EGEE is a project funded by the European Union under contract IST Grid Summer School Vico Equense, 27 July An Introduction.
NextGRID & OGSA Data Architectures: Example Scenarios Stephen Davey, NeSC, UK ISSGC06 Summer School, Ischia, Italy 12 th July 2006.
Mike Smorul Saurabh Channan Digital Preservation and Archiving at the Institute for Advanced Computer Studies University of Maryland, College Park.
SpaceGRID and EGSO Satu Keski-Jaskari Maria Vappula Parallal Computing – Seminar
1 BrainWave Biosolutions Limited Accelerating Life Science Research through Technology.
17 July 2006ISSGC06, Ischia, Italy1 Agenda Session 26 – 14:30-16:00 An Overview of OGSA-DAI OGSA-DAI today – and future features How to extend OGSA-DAI.
1 Digital Libraries and Evidence in the Developing World Context Dr. Jon Ferguson Senior Health Database Scientist IMMPACT Project University of Aberdeen.
Web-based Portal for Discovery, Retrieval and Visualization of Earth Science Datasets in Grid Environment Zhenping (Jane) Liu.
Globus 4 Guy Warner NeSC Training.
OGSA-DAI: Future Work and Wrap-up The OGSA-DAI Team
1 Dr. Markus Hillenbrand, ICSY Lab, University of Kaiserslautern, Germany A Generic Database Web Service for the Venice Service Grid Michael Koch, Markus.
Data Management Kelly Clynes Caitlin Minteer. Agenda Globus Toolkit Basic Data Management Systems Overview of Data Management Data Movement Grid FTP Reliable.
The Queen’s University of Belfast The Queen’s University of Belfast GeneGrid : Using OgsaDai in Bioinformatics Noel Kelly Belfast.
DISTRIBUTED COMPUTING
Extensible Framework for Data Access & Integration Malcolm Atkinson Director 10 th November 2004.
GT Components. Globus Toolkit A “toolkit” of services and packages for creating the basic grid computing infrastructure Higher level tools added to this.
material assembled from the web pages at
NSLA Members ACT Library and Information Service National Library of Australia National Library of New Zealand Northern Territory Library State Library.
Linked-data and the Internet of Things Payam Barnaghi Centre for Communication Systems Research University of Surrey March 2012.
DAIT (DAI Two) NeSC Review 18 March Description and Aims Grid is about resource sharing Data forms an important part of that vision Data on Grids:
OGSA-DAI in OMII-Europe Neil Chue Hong EPCC, University of Edinburgh.
1 1 EPCC 2 Curtin Business School & Edinburgh University Management School Michael J. Jackson 1 Ashley D. Lloyd 2 Terence M. Sloan 1 Enabling Access to.
1 CS 502: Computing Methods for Digital Libraries Lecture 19 Interoperability Z39.50.
Data and storage services on the NGS Mike Mineter Training Outreach and Education
SEEK Welcome Malcolm Atkinson Director 12 th May 2004.
OGSA-DAI.
INFSO-RI Enabling Grids for E-sciencE OGSA DAI Data Access and Integration Marek Ciglan Institute of Informatics, Slovac Academy.
State Key Laboratory of Resources and Environmental Information System China Integration of Grid Service and Web Processing Service Gao Ang State Key Laboratory.
Mike Jackson EPCC OGSA-DAI Architecture + Extensibility OGSA-DAI Tutorial GGF17, Tokyo.
GRID Overview Internet2 Member Meeting Spring 2003 Sandra Redman Information Technology and Systems Center and Information Technology Research Center National.
OGSA-DAI Neil Chue Hong 29 th January 2007 OGF19, Chapel Hill.
Amy Krause EPCC OGSA-DAI An Overview OGSA-DAI Technology Update GGF17, Tokyo (Japan)
Scalable Hybrid Keyword Search on Distributed Database Jungkee Kim Florida State University Community Grids Laboratory, Indiana University Workshop on.
1 OGSA-DAI Status Report Neil P Chue Hong 20 th May 2005.
David Smiley SOA Technology Evangelist Software AG Lead, follow or get out of the way Here Comes SOA.
Utility Computing: Security & Trust Issues Dr Steven Newhouse Technical Director London e-Science Centre Department of Computing, Imperial College London.
26/05/2005 Research Infrastructures - 'eInfrastructure: Grid initiatives‘ FP INFRASTRUCTURES-71 DIMMI Project a DI gital M ulti M edia I nfrastructure.
A centre of expertise in digital information management Shaping the e-future? Grids, Web Services and Digital Libraries Professor Tony.
Adrian Jackson, Stephen Booth EPCC Resource Usage Monitoring and Accounting.
Providing web services to mobile users: The architecture design of an m-service portal Minder Chen - Dongsong Zhang - Lina Zhou Presented by: Juan M. Cubillos.
GRID ANATOMY Advanced Computing Concepts – Dr. Emmanuel Pilli.
Toward a common data and command representation for quantum chemistry Malcolm Atkinson Director 5 th April 2004.
Data and storage services on the NGS.
Copyright 2007, Information Builders. Slide 1 iWay Web Services and WebFOCUS Consumption Michael Florkowski Information Builders.
NERC e-Science Meeting Malcolm Atkinson Director & e-Science Envoy UK National e-Science Centre & e-Science Institute 26 th April 2006.
OGSA-DAI 简介及其它在 China-VO DAS 系统中的应用 杨阳 中国虚拟天文台研发团队 Chinese Virtual Observatory.
ETICS An Environment for Distributed Software Development in Aerospace Applications SpaceTransfer09 Hannover Messe, April 2009.
OGSA-DAI.
Amy Krause EPCC OGSA-DAI An Overview OGSA-DAI on OMII 2.0 OMII The Open Middleware Infrastructure Institute NeSC,
A Semi-Automated Digital Preservation System based on Semantic Web Services Jane Hunter Sharmin Choudhury DSTC PTY LTD, Brisbane, Australia Slides by Ananta.
Grid Services for Digital Archive Tao-Sheng Chen Academia Sinica Computing Centre
Joseph JaJa, Mike Smorul, and Sangchul Song
UK e-Science OGSA-DAI November 2002 Malcolm Atkinson
Brian Matthews STFC EOSCpilot Brian Matthews STFC
Presentation transcript:

EGEE is a project funded by the European Union under contract IST International Summer School on Grid Computing Vico Equense, 16 th July Today’s Wealth of Data: Are we ready for its challenges? Malcolm Atkinson Director National e-Science Centre

3 rd International Summer School on Grid Computing, Vico Equense, 16 July What is e-Science? Goal: to enable better research Method: Invention and exploitation of advanced computational methods  to generate, curate and analyse research data From experiments, observations and simulations Quality management, preservation and reliable evidence  to develop and explore models and simulations Computation and data at extreme scales Trustworthy, economic, timely and relevant results  to enable dynamic distributed virtual organisations Facilitating collaboration with information and resource sharing Security, reliability, accountability, manageability and agility Multiple, independently managed sources of data – each with own time-varying structure Creative researchers discover new knowledge by combining data from multiple sources

3 rd International Summer School on Grid Computing, Vico Equense, 16 July

3 rd International Summer School on Grid Computing, Vico Equense, 16 July Data Access and Integration: motives Key to Integration of Scientific Methods  Publication and sharing of results Primary data from observation, simulation & experiment Encourages novel uses Allows validation of methods and derivatives Enables discovery by combining data independently collected Key to Large-scale Collaboration  Economies: data production, publication & management Sharing cost of storage, management and curation Many researchers contributing increments of data Pooling annotation  rapid incremental publication And criticism  Accommodates global distribution Data & code travel faster and more cheaply  Accommodates temporal distribution Researchers assemble data Later (other) researchers access data and Decisions! Responsibility Ownership Credit Citation ?

3 rd International Summer School on Grid Computing, Vico Equense, 16 July Data Access and Integration: challenges Scale  Many sites, large collections, many uses Longevity  Research requirements outlive technical decisions Diversity  No “one size fits all” solutions will work  Primary Data, Data Products, Meta Data, Administrative data, … Many Data Resources  Independently owned & managed No common goals No common design Work hard for agreements on foundation types and ontologies Autonomous decisions change data, structure, policy, …  Geographically distributed Petabyte of Digital Data / Hospital / Year

3 rd International Summer School on Grid Computing, Vico Equense, 16 July Data Access and Integration: Scientific discovery Choosing data sources  How do you find them?  How do they describe and advertise them?  Is the equivalent of Google possible? Obtaining access to that data  Overcoming administrative barriers  Overcoming technical barriers Understanding that data  The parts you care about for your research Extracting nuggets from multiple sources  Pieces of your jigsaw puzzle Combing them using sophisticated models  The picture of reality in your head Analysis on scales required by statistics  Coupling data access with computation Repeated Processes  Examining variations, covering a set of candidates  Monitoring the emerging details  Coupling with scientific workflows You’re an innovator Your model  their model  Negotiation & patience needed from both sides

3 rd International Summer School on Grid Computing, Vico Equense, 16 July Mohammed & Mountains Petabytes of Data cannot be moved  It stays where it is produced or curated Hospitals, observatories, European Bioinformatics Institute, …  A few caches and a small proportion cached Distributed collaborating communities  Expertise in curation, simulation & analysis Distributed & diverse data collections  Discovery depends on insights  Unpredictable sophisticated application code  Tested by combining data from many sources  Using novel sophisticated models & algorithms What can you do?

3 rd International Summer School on Grid Computing, Vico Equense, 16 July Scientific Data: Opportunities and Challenges Opportunities  Global Production of Published Data  Volume Diversity  Combination  Analysis  Discovery Challenges  Data Huggers  Meagre metadata  Ease of Use  Optimised integration  Dependability Opportunities Specialised Indexing New Data Organisation New Algorithms Varied Replication Shared Annotation Intensive Data & Computation Challenges Fundamental Principles Approximate Matching Multi-scale optimisation Autonomous Change Legacy structures Scale and Longevity Privacy and Mobility Sustained Support / Funding

3 rd International Summer School on Grid Computing, Vico Equense, 16 July The Story so Far Technology enables Grids, More Data & … Distributed systems for sharing information  Essential, ubiquitous & challenging  Therefore share methods and technology as much as possible Collaboration is essential  Combining approaches  Combining skills  Sharing resources (Structured) Data is the language of Collaboration  Data Access & Integration a Ubiquitous Requirement  Primary data, metadata, administrative & system data Many hard technical challenges  Scale, heterogeneity, distribution, dynamic variation Intimate combinations of data and computation  With unpredictable (autonomous) development of both Structure enables understanding, operations, management and interpretation

OGSA-DAI Downloads R5 790 downloads since Dec 04 -Actual user downloads not search engine crawlers -Does not include downloads as part of GT3.2 and GT4 releases Total of 1212 registered users R1.0 (Jan 03)109 R1.5 (Feb 03)110 R2.0 (Apr 03)255 R2.5 (Jun 03)294 R3.0 (Jul 03)792 R3.1 (Feb 04)686 R4.0 (May 04)1083 Total4119 at 17/5/2005 Significant interest from China - two members of staff starting May 22nd

Goals for OGSA-DAI Aim to deliver application mechanisms that: Aim to deliver application mechanisms that: Meet the data requirements of Grid applications Meet the data requirements of Grid applications Functionality, performance and reliability Functionality, performance and reliability Reduce development cost of data-centric Grid applications Reduce development cost of data-centric Grid applications Provide consistent interfaces to data resources Provide consistent interfaces to data resources Acceptable and supportable by database providers Acceptable and supportable by database providers A base for developing higher-level services A base for developing higher-level services Data federation Data federation Distributed query processing Distributed query processing Data mining Data mining Data visualisation Data visualisation

Core features of OGSA-DAI A framework for building applications A framework for building applications Supports relational, xml and some files Supports relational, xml and some files MySQL, Oracle, DB2, SQL Server, Postgres, XIndice, CSV, EMBL MySQL, Oracle, DB2, SQL Server, Postgres, XIndice, CSV, EMBL Supports various delivery options Supports various delivery options SOAP, FTP, GridFTP, HTTP, files, , inter-service SOAP, FTP, GridFTP, HTTP, files, , inter-service Supports various transforms Supports various transforms XSLT, ZIP, GZip XSLT, ZIP, GZip Supports message level security using X509 Supports message level security using X509 Client Toolkit library for application developers Client Toolkit library for application developers Comprehensive documentation and tutorials Comprehensive documentation and tutorials Highly extensible Highly extensible Strength is in customising out-of-box features Strength is in customising out-of-box features

OGSA-DAI Design Principles - I Efficient client-server communication Efficient client-server communication Minimise number of messages exchanged Minimise number of messages exchanged One request abstracts multiple interactions One request abstracts multiple interactions No unnecessary data movement No unnecessary data movement Move computation to the data Move computation to the data Utilise third-party delivery Utilise third-party delivery Apply transforms (e.g., compression) Apply transforms (e.g., compression) Build on existing standards Build on existing standards Filling-in gaps where necessary Filling-in gaps where necessary

OGSA-DAI Design Principles -II Do not virtualise underlying data model Do not virtualise underlying data model Users must know where to target queries Users must know where to target queries Extensible architecture Extensible architecture Modular and customisable Modular and customisable E.g., to accommodate stronger security E.g., to accommodate stronger security Extensible activity framework Extensible activity framework Cannot anticipate all desired functionality Cannot anticipate all desired functionality Activity = unit of functionality Activity = unit of functionality Allow users to add their own Allow users to add their own

Why Use OGSA-DAI Provides common access view Provides common access view Regardless of underlying infrastructure Regardless of underlying infrastructure “Everything looks like a database” metaphor “Everything looks like a database” metaphor Access mechanism common to all clients Access mechanism common to all clients Integrates well with other Grid software Integrates well with other Grid software OGSA, WSRF and OMII compliant OGSA, WSRF and OMII compliant Flexibility Flexibility Extensible activity framework Extensible activity framework Won’t tie you to storage infrastructure Won’t tie you to storage infrastructure

Why You Might Not Want To Use OGSA-DAI OGSA-DAI slower than direct connection methods OGSA-DAI slower than direct connection methods E.g., compared to JDBC E.g., compared to JDBC This should improve with time This should improve with time Scalability issues Scalability issues Mostly but not completely known Mostly but not completely known Depend on type of use (e.g. delivery mechanism) Depend on type of use (e.g. delivery mechanism) Only planning to use one type of data resource Only planning to use one type of data resource and don’t care about interoperability with other Grid software and don’t care about interoperability with other Grid software OGSA-DAI an overkill in that case OGSA-DAI an overkill in that case

DS (Mobius) DS (DAIS) DS (OGSA-DAI) DRAM initiateDataService( ) Registry DSDL Request TADD DR Initiates/ Manages 0 n Id – UUID performRequest() Single Service Session Id - UUID DID Type Format Txn Response TADD Compute & storage resources Registry2 Local Store DRs Logging Service

18 Good Bye Thank you for coming to ISSGC’05 Tell your friends to come next year