ODD-Genes: Accelerating data-driven scientific discovery NeSC Review 2003 NeSC 2003-09-30.

Slides:



Advertisements
Similar presentations
National e-Science Centre Glasgow e-Science Hub Opening: Remarks NeSCs Role Prof. Malcolm Atkinson Director 17 th September 2003.
Advertisements

OMII-UK Steven Newhouse, Director. © 2 OMII-UK aims to provide software and support to enable a sustained future for the UK e-Science community and its.
Knowledge Exchange and Economic Benefit Dr Mark Parsons Commercial Director EPCC and NeSC.
Edinburgh Mouse Atlas to e-MouseAtlas Richard Baldock MRC Human Genetics Unit Institute of Genetics and Molecular Medicine MRC Human Genetics.
A centre of expertise in digital information management Tools for the Trade? Supporting Multidisciplinary Research Dr Liz Lyon, Director.
Introduction to BioConductor Friday 23th nov 2007 Ståle Nygård Statistical methods and bioinformatics for the analysis of microarray.
Oncomine Database Lauren Smalls-Mantey Georgia Institute of Technology June 19, 2006 Note: This presentation contains animation.
Prof. Carolina Ruiz Computer Science Department Bioinformatics and Computational Biology Program WPI WELCOME TO BCB4003/CS4803 BCB503/CS583 BIOLOGICAL.
Jano van Hemert Jano van Hemertwww.dgemap.org 3 years EU-funded design study Goal: design the organisational & collaborative structures,
The Rice Functional Genomics Program of China cDNA microarray database (RIFGP-CDMD) consists of complete datasets, including the probe sequences, microarray.
Geospatial Standards – Experiences for the UK Academic Community Workshop on Grid Middleware and Geospatial Standards for Earth System Science Data, National.
Shared Genomics : Engaging clinical scientists with eScience infrastructure David Hoyle, Mark Delderfield, Lee Kitching, Gareth Smith, Iain Buchan North.
Office of Science U.S. Department of Energy Grids and Portals at NERSC Presented by Steve Chan.
Astrophysics, Biology, Climate, Combustion, Fusion, Nanoscience Working Group on Simulation-Driven Applications 10 CS, 10 Sim, 1 VR.
UK e-Science and the White Rose Grid Paul Townend Distributed Systems and Services Group Informatics Research Institute University of Leeds.
Introduction to Geographic Information Systems GIS is a Spatial tool used to query spatial information investigate spatial problems communicate spatial.
Discussion Our current results suggest that it is possible to identify susceptibility regions using this methodology. The presented method takes advantage.
Leicester Database & Archive Service J. D. Law-Green, J. P. Osborne, R. S. Warwick X-Ray & Observational Astronomy Group, University of Leicester What.
Virtual Geophysics Laboratory (VGL) VGL v1.1 Launch Ryan Fraser, Terry Rankine, Joshua Vote, Lesley Wyborn, Ben Evans, Robert Woodcock February 2013 CSIRO.
Virtual Geophysics Laboratory (VGL) VGL v1.2 NeCTAR Project Close R.Fraser, T.Rankine, J.Vote, L.Wyborn, B.Evans, R.Woodcock, C.Kemp July 2013 CSIRO |
Using the Drupal Content Management Software (CMS) as a framework for OMICS/Imaging-based collaboration.
CaGrid, Fog and Clouds Joel Saltz MD, PhD Director Center for Comprehensive Informatics.
Expression profiling of peripheral blood cells for early detection of breast cancer Introduction Early detection of breast cancer is a key to successful.
9/30/2004TCSS588A Isabelle Bichindaritz1 Introduction to Bioinformatics.
Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence From Data Mining To Knowledge.
INTRODUCTION GOAL: to provide novel types of interaction between classification systems and MIAME-compliant databases We present a prototype module aimed.
Epigenome 1. 2 Background: GWAS Genome-Wide Association Studies 3.
1 Yolanda Gil Information Sciences InstituteJanuary 10, 2010 Requirements for caBIG Infrastructure to Support Semantic Workflows Yolanda.
1 UK NeSC Meeting, November 18 th, 2004 Terry Sloan EPCC, The University of Edinburgh INWA : using OGSA-DAI in a commercial environment.
Gene Expression Data Qifang Xu. Outline cDNA Microarray Technology cDNA Microarray Technology Data Representation Data Representation Statistical Analysis.
NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE Molecular Science in NPACI Russ B. Altman NPACI Molecular Science Thrust Stanford Medical.
20 October 2006Workflow Optimization in Distributed Environments Dynamic Workflow Management Using Performance Data David W. Walker, Yan Huang, Omer F.
An Automated System for Deep Proteome Annotation Gary Van Domselaar †, Savita Shrivastava, Paul Stothard and David S. Wishart ‡ Unannotated Protein Sequence.
Critical Appraisal of the Scientific Literature
Research Design for Collaborative Computational Approaches and Scientific Workflows Deana Pennington January 8, 2007.
Usability Talk, 26 th January 2006 Development of Usable Grid Services for the Biomedical Community Prof Richard Sinnott Technical Director National e-Science.
Supporting the Clinical Trial Recruitment Process through the Grid 19 th September th UK e-Science All-Hands Meeting University of Glasgow, Scotland,
Master of Science in Biological Informatics PROGRAM DESCRIPTION The MS in Biological Informatics program program aims.
GEON2 and OpenEarth Framework (OEF) Bradley Wallet School of Geology and Geophysics, University of Oklahoma
CaIntegrator2 – Part 1: Create a Study with Clinical Data Fan Lin, Ph. D Molecular Analysis Tools Knowledge Center Columbia University and The Broad Institute.
Combining the strengths of UMIST and The Victoria University of Manchester “Use cases” Stephen Pickles e-Frameworks meets e-Science workshop Edinburgh,
6 February 2009 ©2009 Cesare Pautasso | 1 JOpera and XtremWeb-CH in the Virtual EZ-Grid Cesare Pautasso Faculty of Informatics University.
Virtual Geophysics Laboratory (VGL) VGL v1.2 NeCTAR Project Close Ryan Fraser, Terry Rankine, Joshua Vote, Lesley Wyborn, Ben Evans, Robert Woodcock July.
An approach to carry out research and teaching in Bioinformatics in remote areas Alok Bhattacharya Centre for Computational Biology & Bioinformatics JAWAHARLAL.
Earth System Curator and Model Metadata Discovery and Display for CMIP5 Sylvia Murphy and Cecelia Deluca (NOAA/CIRES) Hannah Wilcox (NCAR/CISL) Metafor.
Fire Emissions Network Sept. 4, 2002 A white paper for the development of a NSF Digital Government Program proposal Stefan Falke Washington University.
Securing the Grid & other Middleware Challenges Ian Foster Mathematics and Computer Science Division Argonne National Laboratory and Department of Computer.
AHM04: Sep 2004 Nottingham CCLRC e-Science Centre eMinerals: Environment from the Molecular Level Managing simulation data Lisa Blanshard e- Science Data.
National Institute for Biological Standards and Control Assuring the quality of biological medicines Bank Update II: focusing on contributions to R&D Lyn.
OGSA-DAI Usage Scenarios and Behaviour: Determining good practice Mario Antonioletti EPCC, University of Edinburgh
Function BIRN The ability to find a subject who may have participated in multiple experiments and had multiple assessments done is a critical component.
Biotechnology and Bioinformatics: Bioinformatics Essential Idea: Bioinformatics is the use of computers to analyze sequence data in biological research.
High throughput biology data management and data intensive computing drivers George Michaels.
Origami: Scientific Distributed Workflow in McIDAS-V Maciek Smuga-Otto, Bruce Flynn (also Bob Knuteson, Ray Garcia) SSEC.
Shibboleth Use at the National e-Science Centre Hub Glasgow at collaborating institutions in the Shibboleth federation depending.
Human Genetics Unit Managing The High-Throughput Gene Expression Dataflow in Eurexpress Lalit Kumar Yin Chen Duncan Davidson Richard Baldock.
18 May 2006CCGrid2006 Dynamic Workflow Management Using Performance Data Lican Huang, David W. Walker, Yan Huang, and Omer F. Rana Cardiff School of Computer.
Research at the National e-Science Centre Dr. Dave Berry Research Manager 6 th November 2003.
J. Douglas Armstrong Institute for Adaptive and Neural Computation, School of Informatics, University of Edinburgh. Bioinformatics at Edinburgh.
JAX: Exploring The Galaxy Glen Beane, Senior Software Engineer.
1st workshop of the Aberdeen Microarray Analysis Network (AMAN)
ARCH/VCDE F2F BoF And the Presentation Subtitle Goes Here Ravi Madduri December 2008.
Semantic Web - caBIG Abstract: 21st century biomedical research is driven by massive amounts of data: automated technologies generate hundreds of.
RDA US Science workshop Arlington VA, Aug 2014 Cees de Laat with many slides from Ed Seidel/Rob Pennington.
Pathology Spatial Analysis February 2017
Population Information Integration, Analysis and Modeling
Data Warehousing and Data Mining
Genome organization and Bioinformatics
Relating Models to Data
Cell Biology Project.
Presentation transcript:

ODD-Genes: Accelerating data-driven scientific discovery NeSC Review 2003 NeSC

Introduction ODD-Genes Background Science enabled by ODD-Genes Automating routine statistical conditioning of highly variable microarray results. Discovering related data sources Querying discovered data sources for relevant data Identifying significant targets for focussed investigation Caveats & further work

ODD-Genes Background ODD-Genes is a demonstrator Demonstrates how Grid technologies enable e-Science, accelerating scientific discovery SunDCG’s TOG software allows for job submission on remote compute resources OGSA-DAI provides access, control and discovery of data resources ODD-Genes used to investigate Wilms Tumour Routine statistical conditioning of microarray results Data-driven discovery of novel targets for investigation and potential therapy Collaborative project NeSC/EPCC, Edinburgh, UK Scottish Centre for Genomic Technology and Informatics, Edinburgh, UK (GTI) Human Genetics Unit at MRC, Western General Hospital, Edinburgh, UK (HGU)

SunDCG – Enabling Routine Statistical Conditioning Choose analysis to perform Automates analysis process Provides predetermined workflow Can run more than one analysis at a time Multiple reproducible avenues for investigation Reduces cost (human, machine), increases availability TOG enables this by allowing access to HPC resources

SunDCG - Conditioning Results Results of conditioning can be analysed and investigated Researcher has potentially several views of data to explore, all presented simultaneously in parallel (cp traditional serialised, manual process) Researcher can reproduce this initial condition for repeated analyses Researcher need not perform each step manually and serially, or ask dedicated statistician to do so.

OGSA-DAI - Results Investigation Multiple views of data Raw Heat Map Cluster Map Wilms Tumour study takes a new direction two genes appear significant in early development Researchers would like more info on these genes…

OGSA-DAI - Data Resource Discovery OGSA-DAI uses keywords to locate relevant data resources May return data resources previously unknown to researcher Researcher selects most interesting data resource to query for information about gene Researcher selects Mouse atlas – narrow, deep database of spatial gene expression in mice embryonic development Contrast with GTI database of broad, shallow genome-wide gene expression across multiple organisms, stages & conditions

OGSA-DAI - Data Resource Query OGSA-DAI returns data from query Data and annotation displayed Data contains references to related images Researcher rapidly moves from numeric and textual description to spatial representation of relevant gene expression These show that the genes are stem cell markers Targets for focussed investigation, potential therapy

ODD-Genes Caveats & Further Work ODD-Genes is a demonstrator Need to develop production applications for both routine statistical processing and data resource discovery and query Need to parameterise routine conditioning appropriately to complete automation ODD-Genes requires GRID infrastructure Participating researchers need to partner with centres who host application front-ends (or, host the infrastructure themselves) However, alternatives often proprietary, expensive, less flexible ODD-Genes requires registration by data-hosts Critical mass of registered data sources.