1 The Challenge of Data Integration Data + Grid = Discovery? Prof. Malcolm Atkinson Director www.nesc.ac.uk 22 nd January 2003.

Slides:



Advertisements
Similar presentations
Delivery of Industrial Strength Middleware Federated Strengths Agility & Coordination Prof. Malcolm Atkinson Director 21 st January 2004.
Advertisements

Abstraction Layers Why do we need them? –Protection against change Where in the hourglass do we put them? –Computer Scientist perspective Expose low-level.
UK Role in Open Grid Services Architecture Towards an Architectural Road Map A Report to the Technical Advisory Group from The Architecture Task Force.
Research Councils ICT Conference Welcome Malcolm Atkinson Director 17 th May 2004.
National e-Science Centre Glasgow e-Science Hub Opening: Remarks NeSCs Role Prof. Malcolm Atkinson Director 17 th September 2003.
Open Grid Service Architecture - Data Access & Integration (OGSA-DAI) Dr Martin Westhead Principal Consultant, EPCC Telephone: Fax:+44.
Databases and the Grid OGSA-DAI Architecture & Status Malcolm Atkinson OGSA-DAI Chief Architect for all members of the OGSA-DAI team Director of National.
UK e-Science Report on OGSA, OGSI & OGSA-DAI Malcolm Atkinson Director of National e-Science Centre 28 th October 2002 Meeting of the UK.
Grid-Enabling Data: Sticking Plaster, Sellotape, & Chewing Gum? Colin C. Venters National Centre for e-Social Science University.
The UK OMII Context, Vision and Agenda An Institute of the University of Southampton.
Facilitating the use of eInfrastructure: NeSC Training Team Enabling, facilitating and delivering quality training in the UK and Internationally.
UK e-Science OGC Technical Committee Edinburgh Malcolm Atkinson Director & e-Science Envoy e-Science Institute 28 th June 2006.
GEODE Workshop 16 th January 2007 Issues in e-Science Richard Sinnott University of Glasgow Ken Turner University of Stirling.
1 Software & Grid Middleware for Tier 2 Centers Rob Gardner Indiana University DOE/NSF Review of U.S. ATLAS and CMS Computing Projects Brookhaven National.
1 GGF International Summer School on Grid Computing Vico Equense (Naples), Italy Introduction to OGSA-DAI Prof. Malcolm Atkinson Director
The OMII Position At the University of Southampton.
Semantic Web for E-Science and Education Enrico Motta Knowledge Media Institute The Open University, UK.
Advanced Data Mining and Integration Research for Europe ADMIRE – Framework 7 ICT ADMIRE Overview European Commission 7 th.
Globus 4 Guy Warner NeSC Training.
QCDgrid Technology James Perry, George Beckett, Lorna Smith EPCC, The University Of Edinburgh.
Welcome e-Science in the UK Building Collaborative eResearch Environments Prof. Malcolm Atkinson Director 23 rd February 2004.
GridPP Tuesday, 23 September 2003 Tim Phillips. 2 Bristol e-Science Vision National scene Bristol e-Science Centre Issues & Challenges.
GridCastUK-Japan N+N October Cambridge Newcastle Edinburgh Oxford Glasgow Manchester Cardiff Soton London Belfast DL RL Hinxton.
Database Taskforce and the OGSA-DAI Project Norman Paton University of Manchester.
Future UK e-Science Grid Middleware Dr Steven Newhouse London e-Science Centre Department of Computing, Imperial College London.
Extensible Framework for Data Access & Integration Malcolm Atkinson Director 10 th November 2004.
GT Components. Globus Toolkit A “toolkit” of services and packages for creating the basic grid computing infrastructure Higher level tools added to this.
1 School of Computer, National University of Defense Technology A Profile on the Grid Data Engine (GridDaEn) Xiao Nong
DAIT (DAI Two) NeSC Review 18 March Description and Aims Grid is about resource sharing Data forms an important part of that vision Data on Grids:
1 UK e-Science National e-Science Centre Open Day Prof. Malcolm Atkinson Director 17 th January 2003.
Towards an e-Science Roadmap Tony Hey Director UK e-Science Core Programme
Future of e-Science Malcolm Atkinson Director 18 th March 2004.
1 HPDC12 Seattle Structured Data and the Grid Access and Integration Prof. Malcolm Atkinson Director 23 rd June 2003.
SEEK Welcome Malcolm Atkinson Director 12 th May 2004.
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
Grid Services I - Concepts
CLRC and the European DataGrid Middleware Information and Monitoring Services The current information service is built on the hierarchical database OpenLDAP.
Tony Doyle - University of Glasgow 8 July 2005Collaboration Board Meeting GridPP Report Tony Doyle.
GRID Overview Internet2 Member Meeting Spring 2003 Sandra Redman Information Technology and Systems Center and Information Technology Research Center National.
NeSC Workshop - February /14 Study of User Priorities for e-Infrastructure for e-Research (SUPER) Steven Newhouse Jennifer Schopf Andrew Richards.
IBM & HSBC visit Malcolm Atkinson Director & e-Science Envoy UK National e-Science Centre & e-Science Institute 30 th March 2006.
European Network Policy Group Malcolm Atkinson Director 28 th October 2004.
The National Grid Service Mike Mineter.
1 OGSA-DAI Status Report Neil P Chue Hong 20 th May 2005.
Edinburgh e-Science MSc Bob Mann Institute for Astronomy & NeSC University of Edinburgh.
Utility Computing: Security & Trust Issues Dr Steven Newhouse Technical Director London e-Science Centre Department of Computing, Imperial College London.
7. Grid Computing Systems and Resource Management
OGSA-DAI & DAIT projects Update for TAG Prof. Malcolm Atkinson Director 30 th October 2003.
A centre of expertise in digital information management Shaping the e-future? Grids, Web Services and Digital Libraries Professor Tony.
OGSA-DAI Users’ Meeting Introduction Malcolm Atkinson Director 7 th April 2004.
An Introduction to UK e-Science Anne E Trefethen Deputy Director UK e-Science Core Programme.
Providing web services to mobile users: The architecture design of an m-service portal Minder Chen - Dongsong Zhang - Lina Zhou Presented by: Juan M. Cubillos.
The OGSA-DAI Project Databases and the Grid Neil Chue Hong Project Manager EPCC, Edinburgh
Data and storage services on the NGS.
Japanese & UK N+N Data, Data everywhere and … Prof. Malcolm Atkinson Director 3 rd October 2003.
The National Grid Service Mike Mineter.
Welcome Grids and Applied Language Theory Dave Berry Research Manager 16 th October 2003.
UK Role in Open Grid Services Architecture Towards an Architectural Road Map A Report to the Technical Advisory Group from The Architecture Task Force.
RC ICT Conference 17 May 2004 Research Councils ICT Conference The UK e-Science Programme David Wallace, Chair, e-Science Steering Committee.
ETICS An Environment for Distributed Software Development in Aerospace Applications SpaceTransfer09 Hannover Messe, April 2009.
OGSA-DAI.
Amy Krause EPCC OGSA-DAI An Overview OGSA-DAI on OMII 2.0 OMII The Open Middleware Infrastructure Institute NeSC,
Welcome to National e-Science Centre Official Opening
UK e-Science OGSA-DAI November 2002 Malcolm Atkinson
Grid Portal Services IeSE (the Integrated e-Science Environment)
The National Grid Service
Collaboration Board Meeting
The Anatomy and The Physiology of the Grid
Presentation transcript:

1 The Challenge of Data Integration Data + Grid = Discovery? Prof. Malcolm Atkinson Director 22 nd January 2003

2 Overview Essentials of e-Science Collaboration  Resource Sharing  Data Sharing  Mutual Dependence Essentials of the Grid Distributed Virtual Machine? Essentials of Data Sharing Database Research did it? New Challenges Data Access & Integration Building Bricks Band Wagon v Research Opportunity Thresholds, Visions and Questions

3

4 £80m Collaborative projects E-Science Steering Committee DG Research Councils Director Director’s Management Role Director’s Awareness and Co-ordination Role Generic Challenges EPSRC (£15m), DTI (£15m) Industrial Collaboration (£40m) Academic Application Support Programme Research Councils (£74m), DTI (£5m) PPARC (£26m) BBSRC (£8m) MRC (£8m) NERC (£7m) ESRC (£3m) EPSRC (£17m) CLRC (£5m) Grid TAG UK e-Science Programme (1)

5 UK e-Science From presentation by Tony Hey

6 Cambridge Newcastle Edinburgh Oxford Glasgow Manchester Cardiff Southampton London Belfast Daresbury Lab RAL Hinxton UK e-Science Investment National e- Science Centre HPC(x) Projects > 60 started > 30 proposed + EU Projects

7 £80m Collaborative projects E-Science Steering Committee DG Research Councils Director Director’s Management Role Director’s Awareness and Co-ordination Role Generic Challenges EPSRC (£15m), DTI (£15m) Industrial Collaboration (£40m) Academic Application Support Programme Research Councils (£74m), DTI (£5m) PPARC (£26m) BBSRC (£8m) MRC (£8m) NERC (£7m) ESRC (£3m) EPSRC (£17m) CLRC (£5m) Grid TAG UK e-Science Programme (2)

8

9 Collaboration Growing Hard Problems, Multi-disciplinary, Expense Sharing  Ideas  Thought processes and Stimuli  Effort  Resources Requires  Communication  Common understanding & Framework  Mechanisms for sharing fairly  Organisation and Infrastructure Scientists have done this for Centuries

10 Collaboration Growing Data, Policy & Digital Infrastructure Key Sharing  Ideas  Thought processes and Stimuli  Effort  Resources Requires  Communication  Common understanding & Framework  Mechanisms for sharing fairly  Organisation and Infrastructure Text, digital media, structured, organised & curated data, annotation, computable models, visualisation, shared instruments, shared systems, shared administration, … Nationally & Internationally Distributed, … Routine, Daily, Automated, … That Requires very Significant Investment in Digital Systems and their Support

11 Collaboration Growing Digital Communication, Metadata, … Sharing  Ideas  Thought processes and Stimuli  Effort  Resources Requires  Communication  Common understanding & Framework  Mechanisms for sharing fairly  Organisation and Infrastructure Digital networks, digital work- places, digital instruments, … Metadata, ontologies, standards, shared curated data, shared codes, … Common platforms, shared software, shared training, … The Grid SHOULD make this much easier by providing a common, supported high-level of Software and Organisational infrastructure Authentication, Authorisation, Accounting, Provenance, Policies, … Shared Provision of Platform,

12 Interdependence Science has relied on experiment and theory Simulation, Data Mining, Analysis Theory- Greece 400 BC Experiment - Italy 1,500 AD For problems which are: - too large/small - too fast/slow - too complex - too expensive, unethical,... -Testing Understanding Simulation - Europe 1,980 AD

13 Interdependence Theory ExperimentComputing Models Data

14 Database Growth PDB protein structures

15

16 Globus Toolkit ® History DARPA, NSF, and DOE begin funding Grid work NASA begins funding Grid work, DOE adds support The Grid: Blueprint for a New Computing Infrastructure published GT Released Early Application Successes Reported NSF & European Commission Initiate Many New Grid Projects Anatomy of the Grid Paper Released Significant Commercial Interest in Grids Physiology of the Grid Paper Released GT 2.0 Released Does not include downloads from: NMI, UK eScience, EU Datagrid, IBM, Platform, etc.

17 Encompassing Vision data archives sensor nets computers software colleagues instruments

18 People & Industry Global Grid Forum GGF2260Jul 01 GGF3220Oct 01 GGF4400Feb 02 GGF5900Jul 02 GGF6450Oct 02 GGF7>1000Mar 03 UK All Hands AHM’02350Sep 02 GlobusWorld 1450Jan 03 IBM This week “IBM DRIVES GRID COMPUTING FOR COMMERCIAL BUSINESS WITH TEN NEW GRID OFFERINGS” Targets  Financial, Life Sciences  Automotive & Aerospace  Governments Partners  Platform, DataSynapse  Avaki, Entropia  United Devices IBM last 20 months Leaders of OGSI Development teams Grid Jamboree GGF GGF1GGF2GGF3GGF4GGF5

19

20 High-Altitude Views A Rallying Cry Meeting a Hard Challenge requires Many Minds Operating & Maintaining Infrastructure requires Many Hands & Many Companies Another Stab at Distributed Computing Hard Challenge: Intellectually and Practically Important Dependable Ubiquity over Heterogeneity & Fallibility An Ambitious Virtual Machine Consistent large scale computational environments A Global Operating System Collective Resources, Common Management

21 An Architectural View Grid Plumbing & Security Infrastructure SchedulingAccountingAuthorisation MonitoringDiagnosisLogging Application Data & Compute Resources Operations Teams Distributed Providers Application Users Common Application Platform for Group of Applications Application & Platform Developers

22 Open Grid Services Infrastructure Confluence of Web Services & Grid Consistent Interface Description Based on WSDL 1.2 proposal  Extend Properties  Separate Binding from Interface  Function Composition & Inheritence Exploit WS* Investment Grid Features Security Life-Time Management Service (state) Information via Data Elements Discovery Grouping Notification OGSI Version 1 Proposal at GGF7 (March 03)

23 Open Grid Services Architecture Ubiquitous Building Blocks Using OGSI Platform Open & Extensible Encourage Refactoring Experiments Initially The Globus 2 model  Except State Information now distributed Example New Features Global Name Mapping Service Replication and Caching Service Data Access & Integration Metering, Logging, Authorisation, Charging, …

24 Grid Challenge Balancing “Direct” Access to the “Platforms” with Abstraction & Virtualisation Developers often have exploitable application knowledge Automation necessary & helpful  Interface matching, operation validation, …  Optimisation at many scales There isn’t enough effort to develop Languages & Abstractions

25

26 Data Integration Data Resource 1 Data Resource 2 Scientist with Idea 1) Find Data 2) Extract Data 3) Transform Data 4) Combine Data 5) Interpret Data

27 Wellcome Trust: Cardiovascular Functional Genomics Glasgow Edinburgh Leicester Oxford London Netherlands Shared data Public curated data

28 Oxford Glasgow Cardiff Southampton London Belfast Daresbury Lab RAL OGSA-DAI Partners EPCC & NeSC Newcastle IBM USA IBM Hursley Oracle Manchester EPCC & NeSC IBM UK IBM USA Manchester e-SC Newcastle e-SC Oracle £3 million, 18 months, started February 2002 Cambridge Hinxton

29 OGSA-DAI Data Access and Integration for the New Grid Uniform Service Interfaces for Accessing Multiple Data Sources within the Open Grid Services Architecture. UK e-Science Contribution to GT3

30 DAI Key Services GridDataServiceGDSAccess to data & DB operations GridDataServiceFactoryGDSFMakes GDS & GDSF GridDataServiceRegistryGDSRDiscovery of GDS(F) & Data GridDataTranslationServiceGDTSTranslates or Transforms Data GridDataTransportDepotGDTDData transport with persistence Integrated Structured Data Transport Relational & XML models supported Role-based Authorisation Binary structured files (later)

31 DAI Architecture Grid Infrastructure Scheduling Accounting Monitoring Diagnosis Data Intensive Applications for Science X Compute, Data & Storage Resources Distributed Authorisation Data Access Services Data Integration Services Structured Data Simulation, Analysis & Integration Technology for Science X Data Intensive X Scientists Data Integration Architecture GridFTP Naming Caching Generic Virtual Data Access and Integration Technology

32 1a. Request to Registry for sources of data about “x” 1b. Registry responds with Factory handle 2a. Request to Factory for access to database 2b. Factory creates GridDataService to manage access 2c. Factory returns handle of GDS to client 3a. Client queries GDS with XPath, SQL, etc 3b. GDS interacts with database 3c. Results of query returned to client as XML SOAP/HTTP service creation API interactions RegistryFactory Grid Data Service Client XML / Relationa l database

33 1a. Request to Registry for sources of data about “x” & “y” 1b. Registry responds with Factory handle 2a. Request to Factory for access and integration to databases 2b. Factory creates GridDataServices network 2c. Factory returns handle of GDS to client 3a. Client submits set of queries GDS with XPath, SQL, etc 3c. Results of queries returned to consumer as XML or binary SOAP/HTTP service creation API interactions RegistryFactory Client XML / Relationa l database Consumer XML / Relationa l database GDS 3b. Tell consumer

34 Biomedical (or ANY) Data Opportunities Global Production of Published Data Volume  Diversity  Combination  Analysis  Discovery Challenges Data Huggers Meagre metadata Ease of Use Automated, optimised integration Traceability, Dependability Opportunities Specialised Indexing Structurally varied replication Consistent Structured Universe of Discourse Data & Computation Integration Challenges Approximate Matching Multi-scale optimisation Bad habits / industrial structures Safety and Multi-scale optimisation

35 Data Integration Challenges High-Level Languages Describing the Data Extraction Recipes Describing the Sources & Components  Metadata that drives automation & validation Mobility Code & Data Integrating Existing DB technology Moving the DBMS to the Grid context New Optimisation Challenges Data & Computation & Storage & Movement Shared Distributed Annotation Systems How to Reference Provenance & Acknowledgement

36

37 Challenges A Programming & Development Model Dependability at this Scale Foundations for Trust Raising the Level of Automation Supporting New Forms of Collaboration Data

38