Presentation is loading. Please wait.

Presentation is loading. Please wait.

Extensible Framework for Data Access & Integration Malcolm Atkinson Director www.nesc.ac.uk 10 th November 2004.

Similar presentations


Presentation on theme: "Extensible Framework for Data Access & Integration Malcolm Atkinson Director www.nesc.ac.uk 10 th November 2004."— Presentation transcript:

1 Extensible Framework for Data Access & Integration Malcolm Atkinson Director www.nesc.ac.uk 10 th November 2004

2 Database Growth PDB Content Growth

3 Wellcome Trust: Cardiovascular Functional Genomics Glasgow Edinburgh Leicester Oxford London Netherlands Shared data Public curated data BRIDGES IBM

4 Biochemical Pathway Simulator (Computing Science, Bioinformatics, Beatson Cancer Research Labs) DTI Bioscience Beacon Project Harnessing Genomics Programme Slide from Muffy Calder, Glasgow Now largest EU project in the Life Sciences – see http://www.cancerresearchuk.org/news/pressreleases/scottishscientists_22july04 Walter Kolch

5 eDiaMoND – Compute Mammograms have different appearances, depending on image settings and acquisition systems Standard Mammo Format Standard Mammo Format Temporal mammography Computer Aided Detection 3D View Provided by eDiamond project: Prof. sir Mike Brady et al.

6 Automatic registration technology Rigid registration of MR and CT images of the head Inter-subject image warping Provided by IXI project: Prof. Derek Hill et al.

7 Move Computation to Data Code scale Depends on wet-ware  No noticeable rate of improvement Data scale Grows Moore’s Law or Moore’s Law 2 Analysis of data Extracts & derivatives used  Often smaller – more value for current investigation Implies move code to data SQL, Xquery, Java code, … Extensibility mechanisms used by OGSA-DAIers Java mobility (e.g. DataCutter), database procedures, … Increasingly necessary Application control or higher-level service decisions

8 Integration is Everything Motivation No business or research team is satisfied with one data resource Data Curation Expertise Human Centred Integration Human centred Domain-specialist driven Dynamic specification of combination function Iterative processes  Revised request minutes later  Revised request after months of thought Sources inevitably heterogeneous Time-varying content, structure & policies Robust, stable steerable integration services Higher-level services over multiple resources Fundamental requirements for (re)negotiation Federation or Virtualisation preceding integration or kit of integration tools to be interwoven with an application?

9 OGSA Infrastructure Architecture Grid or Web Service Infrastructure Data Intensive Applications for Science X Compute, Data & Storage Resources Distributed Simulation, Analysis & Integration Technology for Science X Data Intensive X Scientists Virtual Integration Architecture Generic Virtual Data Access and Integration Layer Structured Data Integration Structured Data Access Structured Data Relational XML Semi-structured- Transformation Registry Job Submission Data TransportResource Usage Banking BrokeringWorkflow Authorisation OGSA-DAI

10 Database (Xindice, MySQL Oracle, DB2) Request to Registry for sources of data about “x” Registry responds with Factory handle Request to Factory for access to database Factory creates GridDataService Factory returns handle of GDS to client Client queries GDS with SQL, XPath, XQuery etc GDS interacts with database Query results returned XML SOAP/HTTP service creation API interactions Analyst Registry GDSR Factory GDSF Grid Data Service GDS Consumer OR delivered to consumer as XML OGSA-DAI

11 OGSA-DAI Downloads R4 690 downloads since May 04 -Actual user downloads not search engine crawlers -Does not include downloads as part of GT3.2 releases Total of 838 registered users R1.0 (Jan 03)104 R1.5 (Feb 03)108 R2.0 (Apr 03)250 R2.5 (Jun 03)291 R3.0 (Jul 03)792 R3.1 (Feb 04)630 Total2865 United Kingdom 21% China 26% United States 13% Japan 5% Unknown 7% Germany 5% Italy 5% Austria 2% Australia 2% France 3% Taiwan 2% Downloads by Country – OGSA-DAI R4.0

12 Multiple tasks / request Ident Type Value Ident Type Value Ident Type Value Ident Type Value Ident Type Value Ident Type Value Ident Type Value Ident Type Value 123 45 670

13 Be Direct Double Handling costs too much Memory cycles, bus capacity, cache disruption, … Double Handling via discs pathologically bad Data translation expensive Avoid  Deliver as stored, … Compose Stream Main memory is not big enough Stream or use Disk Couple generator & consumer directly Stream from RAM to RAM Requires coupled computation execution Breaks down boundaries and merges data, execution & transport requirements. Demands smart workflow enactment service & foundation services Models for process transformation and optimisation

14 Take Home Message Data Access & Integration Two Models  kit of parts  Virtualisation Ubiquitous Needs Pervasive and growing number and diversity of data collections Opportunity and power to integrate and mine OGSA-DAI Pioneering Talk by Amrey Krause - 5:15 Today Growing Community Implementation Standards Users Join the party of users, contributors & researchers


Download ppt "Extensible Framework for Data Access & Integration Malcolm Atkinson Director www.nesc.ac.uk 10 th November 2004."

Similar presentations


Ads by Google