Presentation is loading. Please wait.

Presentation is loading. Please wait.

A Metadata Binding Store for Distributed Scientific Data Yin Chen, Malcolm Atkinson, Stuart Aitken Dec. 2009 UK e-Science All Hands Meeting 2009, Oxford,

Similar presentations


Presentation on theme: "A Metadata Binding Store for Distributed Scientific Data Yin Chen, Malcolm Atkinson, Stuart Aitken Dec. 2009 UK e-Science All Hands Meeting 2009, Oxford,"— Presentation transcript:

1 A Metadata Binding Store for Distributed Scientific Data Yin Chen, Malcolm Atkinson, Stuart Aitken Dec. 2009 UK e-Science All Hands Meeting 2009, Oxford, 08 Dec. 2009

2 MOTIVATION Scientific data/metadata are generated at great speed and high volume Data and Metadata are often created independently Hypothesis: A binding service is useful to serve various scales distributed scientific data Metadata are the key to data access, discovery, preservation, provenance, interpretation We view the relationship between data and metadata as a binding

3 EurExpress Project, EU funded under FP6, 2005-2009. Aim to capture >20,000 gene via RNA in situ hybridization (ISH). Generate digital ‘transcriptome atlas’ IS BINDING A PROBLEM? Nov.2009: 19,411 assay, 15,715 annotations, ~5TB data Gene Expression Data Repository Meta data Meta data Template Images Annotation (FIATAS) Alicante Annotation (FIATAS) Alicante Genepaint Robotics 14.5 days mouse embryo Section slides Automatic ISHs (8 EU Bio labs) Automatic ISHs (8 EU Bio labs) High resolution gene express images ISH management (LIME system) ISH management (LIME system)

4 REAL WORLD OBSERVATIONS Information inconsistency 1)The Numbers of probe genes miss-matched with the template design 2)The Numbers of gene expression images without metadata Significant human operating errors Consistency checking became more difficult as data increased The bindings have to be efficiently managed!

5 DESIGN PRINCIPLES A binding system manages bindings Generic approach, independent from data resources Federate references of data and metadata Data warehousing approach is no longer feasible Data become too large, too dynamic, too unwieldy to copy No permit to copy Refreshness Allow binding sharing among user communities, scalable Can be combined with other services Design principle: Simple Minimize internal complexity: no conflict Maximize external integrity: less overlap

6 A SIMPLE BINDING STORE Binding Data Model Binding ID – UUID, need no central registration authority, unlimited Binding subject/object – URIs, used by most web accessible data resources Binding description – Tags, efficient, flexible Binding APIs Manipulation operations Discovery operations Delivery operations

7 IMPLEMENTATION Grid tech. OGSA-DAI OGSA-DAI server activities OGSA-DAI client activities OGSA-DAI client toolkits Service Proxy APIs, programmable interface for users Command-line UI Not included in current work

8 E VALUATION Use workload modelling and simulation method No available binding data Observations from wwwPDB, BADC, EurExpress, NanoCMOS, Flickr Creation patterns, access patterns, and content patterns are observed Simulation of the real-world observations

9 WORKLOAD MODELLING Creation Workloads Number of Annotation per day New PDB Structure per Month Number of Data File per day Access Workloads Number of Access per day Tag Behaviours

10 WORKLOAD SIMULATION Zipf’s Dist. α=0.9 Zipf’s Dist. α=0.4 Hidden Markov Model Two Poisson Processes, Two Uniform Dist. Uniform Dist.:Trend: Zipf’s Dist. α=0.2 Weibull Dist. Poisson Process: Probability of the intervals occurrence

11 EXPERIMENT SETUP Inter(R) Core2 2.66GHz, RAM 7GB, 144GB HD, 100Mbps network conn, Red Hat 4.1, Tomcat 5.5, OD 3.1, MySQL 6.0, R 2.9. SSJ, Colt, benchmark script 10 runs per configuration, collected Means, SEs, 95% CIs

12 EXPERIMENT RESULTS Robust to different types of workloads Robust to small ~ large scale workloads Robust to both independent and combined workloads Stressed by the Ultra scale workloads

13 FUTURE WORK A Scalable Binding Store Cloud Computing promises to be scalable Our Evaluation of the Hadoop

14 BINDING APPLICATIONS Web move to web3.0 Binding index Combine with metadata management tools Mashup applications

15 ACKNOWLEDGEMENT National e-Science Center, research group, support team, middleware team MRC HGU Biomedical Statistical Analyse Section: Prof Richard Baldock, Dr Duncan Davidson Newcastle HDBR: Prof Susan Lindsay, Steven N. Lisgo EDINA Geo Research & Data Library: Chris Higgins, Dr David Medyckyj-Scott Data resourses: DGEMap, EurExpress Prof Richard Baldock, Lalit Kumar, NanoCMOS Dr Clive Davenhall, Prof Richard Sinnott Technique support: OGSA-DAI team Research materials: COBrA-CT, OntoGrid Prof Carole Goble, Dr Oscar Corcho, My Grid Dr Phillip Lord


Download ppt "A Metadata Binding Store for Distributed Scientific Data Yin Chen, Malcolm Atkinson, Stuart Aitken Dec. 2009 UK e-Science All Hands Meeting 2009, Oxford,"

Similar presentations


Ads by Google