Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 The Challenge of Data Integration Data + Grid = Discovery? Prof. Malcolm Atkinson Director www.nesc.ac.uk 22 nd January 2003.

Similar presentations


Presentation on theme: "1 The Challenge of Data Integration Data + Grid = Discovery? Prof. Malcolm Atkinson Director www.nesc.ac.uk 22 nd January 2003."— Presentation transcript:

1 1 The Challenge of Data Integration Data + Grid = Discovery? Prof. Malcolm Atkinson Director www.nesc.ac.uk 22 nd January 2003

2 2 Overview Essentials of e-Science Collaboration  Resource Sharing  Data Sharing  Mutual Dependence Essentials of the Grid Distributed Virtual Machine? Essentials of Data Sharing Database Research did it? New Challenges Data Access & Integration Building Bricks Band Wagon v Research Opportunity Thresholds, Visions and Questions

3 3

4 4 £80m Collaborative projects E-Science Steering Committee DG Research Councils Director Director’s Management Role Director’s Awareness and Co-ordination Role Generic Challenges EPSRC (£15m), DTI (£15m) Industrial Collaboration (£40m) Academic Application Support Programme Research Councils (£74m), DTI (£5m) PPARC (£26m) BBSRC (£8m) MRC (£8m) NERC (£7m) ESRC (£3m) EPSRC (£17m) CLRC (£5m) Grid TAG UK e-Science Programme (1) 2001 - 2003

5 5 UK e-Science From presentation by Tony Hey

6 6 Cambridge Newcastle Edinburgh Oxford Glasgow Manchester Cardiff Southampton London Belfast Daresbury Lab RAL Hinxton UK e-Science Investment National e- Science Centre HPC(x) Projects > 60 started > 30 proposed + EU Projects

7 7 £80m Collaborative projects E-Science Steering Committee DG Research Councils Director Director’s Management Role Director’s Awareness and Co-ordination Role Generic Challenges EPSRC (£15m), DTI (£15m) Industrial Collaboration (£40m) Academic Application Support Programme Research Councils (£74m), DTI (£5m) PPARC (£26m) BBSRC (£8m) MRC (£8m) NERC (£7m) ESRC (£3m) EPSRC (£17m) CLRC (£5m) Grid TAG UK e-Science Programme (2) 2003 - 2005

8 8

9 9 Collaboration Growing Hard Problems, Multi-disciplinary, Expense Sharing  Ideas  Thought processes and Stimuli  Effort  Resources Requires  Communication  Common understanding & Framework  Mechanisms for sharing fairly  Organisation and Infrastructure Scientists have done this for Centuries

10 10 Collaboration Growing Data, Policy & Digital Infrastructure Key Sharing  Ideas  Thought processes and Stimuli  Effort  Resources Requires  Communication  Common understanding & Framework  Mechanisms for sharing fairly  Organisation and Infrastructure Text, digital media, structured, organised & curated data, annotation, computable models, visualisation, shared instruments, shared systems, shared administration, … Nationally & Internationally Distributed, … Routine, Daily, Automated, … That Requires very Significant Investment in Digital Systems and their Support

11 11 Collaboration Growing Digital Communication, Metadata, … Sharing  Ideas  Thought processes and Stimuli  Effort  Resources Requires  Communication  Common understanding & Framework  Mechanisms for sharing fairly  Organisation and Infrastructure Digital networks, digital work- places, digital instruments, … Metadata, ontologies, standards, shared curated data, shared codes, … Common platforms, shared software, shared training, … The Grid SHOULD make this much easier by providing a common, supported high-level of Software and Organisational infrastructure Authentication, Authorisation, Accounting, Provenance, Policies, … Shared Provision of Platform,

12 12 Interdependence Science has relied on experiment and theory Simulation, Data Mining, Analysis Theory- Greece 400 BC Experiment - Italy 1,500 AD For problems which are: - too large/small - too fast/slow - too complex - too expensive, unethical,... -Testing Understanding Simulation - Europe 1,980 AD

13 13 Interdependence Theory ExperimentComputing Models Data

14 14 Database Growth PDB protein structures

15 15

16 16 Globus Toolkit ® History DARPA, NSF, and DOE begin funding Grid work NASA begins funding Grid work, DOE adds support The Grid: Blueprint for a New Computing Infrastructure published GT 1.0.0 Released Early Application Successes Reported NSF & European Commission Initiate Many New Grid Projects Anatomy of the Grid Paper Released Significant Commercial Interest in Grids Physiology of the Grid Paper Released GT 2.0 Released Does not include downloads from: NMI, UK eScience, EU Datagrid, IBM, Platform, etc.

17 17 Encompassing Vision data archives sensor nets computers software colleagues instruments

18 18 People & Industry Global Grid Forum GGF2260Jul 01 GGF3220Oct 01 GGF4400Feb 02 GGF5900Jul 02 GGF6450Oct 02 GGF7>1000Mar 03 UK All Hands AHM’02350Sep 02 GlobusWorld 1450Jan 03 IBM This week “IBM DRIVES GRID COMPUTING FOR COMMERCIAL BUSINESS WITH TEN NEW GRID OFFERINGS” Targets  Financial, Life Sciences  Automotive & Aerospace  Governments Partners  Platform, DataSynapse  Avaki, Entropia  United Devices IBM last 20 months Leaders of OGSI Development teams Grid Jamboree GGF 0 100 200 300 400 500 600 700 800 900 GGF1GGF2GGF3GGF4GGF5

19 19

20 20 High-Altitude Views A Rallying Cry Meeting a Hard Challenge requires Many Minds Operating & Maintaining Infrastructure requires Many Hands & Many Companies Another Stab at Distributed Computing Hard Challenge: Intellectually and Practically Important Dependable Ubiquity over Heterogeneity & Fallibility An Ambitious Virtual Machine Consistent large scale computational environments A Global Operating System Collective Resources, Common Management

21 21 An Architectural View Grid Plumbing & Security Infrastructure SchedulingAccountingAuthorisation MonitoringDiagnosisLogging Application Data & Compute Resources Operations Teams Distributed Providers Application Users Common Application Platform for Group of Applications Application & Platform Developers

22 22 Open Grid Services Infrastructure Confluence of Web Services & Grid Consistent Interface Description Based on WSDL 1.2 proposal  Extend Properties  Separate Binding from Interface  Function Composition & Inheritence Exploit WS* Investment Grid Features Security Life-Time Management Service (state) Information via Data Elements Discovery Grouping Notification OGSI Version 1 Proposal at GGF7 (March 03)

23 23 Open Grid Services Architecture Ubiquitous Building Blocks Using OGSI Platform Open & Extensible Encourage Refactoring Experiments Initially The Globus 2 model  Except State Information now distributed Example New Features Global Name Mapping Service Replication and Caching Service Data Access & Integration Metering, Logging, Authorisation, Charging, …

24 24 Grid Challenge Balancing “Direct” Access to the “Platforms” with Abstraction & Virtualisation Developers often have exploitable application knowledge Automation necessary & helpful  Interface matching, operation validation, …  Optimisation at many scales There isn’t enough effort to develop Languages & Abstractions

25 25

26 26 Data Integration Data Resource 1 Data Resource 2 Scientist with Idea 1) Find Data 2) Extract Data 3) Transform Data 4) Combine Data 5) Interpret Data

27 27 Wellcome Trust: Cardiovascular Functional Genomics Glasgow Edinburgh Leicester Oxford London Netherlands Shared data Public curated data

28 28 Oxford Glasgow Cardiff Southampton London Belfast Daresbury Lab RAL OGSA-DAI Partners EPCC & NeSC Newcastle IBM USA IBM Hursley Oracle Manchester EPCC & NeSC IBM UK IBM USA Manchester e-SC Newcastle e-SC Oracle £3 million, 18 months, started February 2002 Cambridge Hinxton

29 29 OGSA-DAI Data Access and Integration for the New Grid Uniform Service Interfaces for Accessing Multiple Data Sources within the Open Grid Services Architecture. UK e-Science Contribution to GT3

30 30 DAI Key Services GridDataServiceGDSAccess to data & DB operations GridDataServiceFactoryGDSFMakes GDS & GDSF GridDataServiceRegistryGDSRDiscovery of GDS(F) & Data GridDataTranslationServiceGDTSTranslates or Transforms Data GridDataTransportDepotGDTDData transport with persistence Integrated Structured Data Transport Relational & XML models supported Role-based Authorisation Binary structured files (later)

31 31 DAI Architecture Grid Infrastructure Scheduling Accounting Monitoring Diagnosis Data Intensive Applications for Science X Compute, Data & Storage Resources Distributed Authorisation Data Access Services Data Integration Services Structured Data Simulation, Analysis & Integration Technology for Science X Data Intensive X Scientists Data Integration Architecture GridFTP Naming Caching Generic Virtual Data Access and Integration Technology

32 32 1a. Request to Registry for sources of data about “x” 1b. Registry responds with Factory handle 2a. Request to Factory for access to database 2b. Factory creates GridDataService to manage access 2c. Factory returns handle of GDS to client 3a. Client queries GDS with XPath, SQL, etc 3b. GDS interacts with database 3c. Results of query returned to client as XML SOAP/HTTP service creation API interactions RegistryFactory Grid Data Service Client XML / Relationa l database

33 33 1a. Request to Registry for sources of data about “x” & “y” 1b. Registry responds with Factory handle 2a. Request to Factory for access and integration to databases 2b. Factory creates GridDataServices network 2c. Factory returns handle of GDS to client 3a. Client submits set of queries GDS with XPath, SQL, etc 3c. Results of queries returned to consumer as XML or binary SOAP/HTTP service creation API interactions RegistryFactory Client XML / Relationa l database Consumer XML / Relationa l database GDS 3b. Tell consumer

34 34 Biomedical (or ANY) Data Opportunities Global Production of Published Data Volume  Diversity  Combination  Analysis  Discovery Challenges Data Huggers Meagre metadata Ease of Use Automated, optimised integration Traceability, Dependability Opportunities Specialised Indexing Structurally varied replication Consistent Structured Universe of Discourse Data & Computation Integration Challenges Approximate Matching Multi-scale optimisation Bad habits / industrial structures Safety and Multi-scale optimisation

35 35 Data Integration Challenges High-Level Languages Describing the Data Extraction Recipes Describing the Sources & Components  Metadata that drives automation & validation Mobility Code & Data Integrating Existing DB technology Moving the DBMS to the Grid context New Optimisation Challenges Data & Computation & Storage & Movement Shared Distributed Annotation Systems How to Reference Provenance & Acknowledgement

36 36

37 37 Challenges A Programming & Development Model Dependability at this Scale Foundations for Trust Raising the Level of Automation Supporting New Forms of Collaboration Data

38 38


Download ppt "1 The Challenge of Data Integration Data + Grid = Discovery? Prof. Malcolm Atkinson Director www.nesc.ac.uk 22 nd January 2003."

Similar presentations


Ads by Google