FuGE: A framework for developing standards for functional genomics Angel Pizarro Univesrity of Pennsylvania Andrew Jones University of Manchester.

Slides:



Advertisements
Similar presentations
A Workflow Engine with Multi-Level Parallelism Supports Qifeng Huang and Yan Huang School of Computer Science Cardiff University
Advertisements

The MGED Ontology: Providing Descriptors for Microarray Data Trish Whetzel Department of Genetics Center for Bioinformatics University of Pennsylvania.
Database System Concepts and Architecture
 Goals Unambiguous description of how the investigation was performed Consistent annotation, powerful queries and data integration  Details NOT model.
Metadata For CARMEN Phillip Lord and Frank Gibson.
CIM2564 Introduction to Development Frameworks 1 Overview of a Development Framework Topic 1.
Data Management in the DOE Genomics:GTL Program Janet Jacobsen and Adam Arkin Lawrence Berkeley National Laboratory University of California, Berkeley.
1 Using Scalable and Secure Web Technologies to Design Global Format Registry Muluwork Geremew, Sangchul Song and Joseph JaJa Institute for Advanced Computer.
Introduction to Software Design Chapter 1. Chapter 1: Introduction to Software Design2 Chapter Objectives To become familiar with the software challenge.
US GPO AIP Independence Test CS 496A – Senior Design Team members: Antonio Castillo, Johnny Ng, Aram Weintraub, Tin-Shuk Wong Faculty advisor: Dr. Russ.
Data Management: Documentation & Metadata Types of Documentation.
Mapping Physical Formats to Logical Models to Extract Data and Metadata Tara Talbott IPAW ‘06.
Introduction to Software Design Chapter 1. Chapter 1: Introduction to Software Design2 Chapter Objectives To become familiar with the software challenge.
An Overview of Selected ISO Standards Applicable to Digital Archives Science Archives in the 21st Century 25 April 2007 Donald Sawyer - NASA/GSFC/NSSDC.
SCIENCE-DRIVEN INFORMATICS FOR PCORI PPRN Kristen Anton UNC Chapel Hill/ White River Computing Dan Crichton White River Computing February 3, 2014.
1 MAGE-OM and ArrayExpress database model Ugis Sarkans, EBI.
Interoperability Scenario Producing summary versions of compound multimedia historical documents.
Data standards from the Proteomics Standards Initiative Andy Jones University of Liverpool.
T Network Application Frameworks and XML Web Services and WSDL Sasu Tarkoma Based on slides by Pekka Nikander.
Federal Statistical Office eSTATISTIK.core - Integrating Respondents’ IT Systems into Data Collection UNECE Work Session on Statistical Data Editing Bonn,
Data Curation and Management activities within the UCT Computational Biology Group Dr Nicky Mulder.
The Functional Genomics Experiment Model (FuGE) Andy Jones School of Computer Science and Faculty of Life Sciences, University of Manchester.
Taverna and my Grid Basic overview and Introduction Tom Oinn
Database System Concepts and Architecture
Interoperability in Information Schemas Ruben Mendes Orientador: Prof. José Borbinha MEIC-Tagus Instituto Superior Técnico.
AIXM Users’ Conference, March Implementing AIXM in Instrument Flight Procedures Automation Presenter: Iain Hammond MacDonald, Dettwiler &
1 Technologies for distributed systems Andrew Jones School of Computer Science Cardiff University.
Introduction to Software Design Chapter 1. Chapter Objectives  To become familiar with the software challenge and the software life cycle  To understand.
1 MIAME The MIAME website: © 2002 Norman Morrison for Manchester Bioinformatics.
Supporting High- Performance Data Processing on Flat-Files Xuan Zhang Gagan Agrawal Ohio State University.
Taverna and my Grid Open Workflow for Life Sciences Tom Oinn
2 st ISA-TAB workshop Outcome/Summary (to date) Workshops on Data Standards (WODS) – EBI, Cambridge, UK 16 th, 17 th and 18 th June 2008 This workshop.
Javascript Cog Kit By Zhenhua Guo. Grid Applications Currently, most grid related applications are written as separate software. –server side: Globus,
1 Schema Registries Steven Hughes, Lou Reich, Dan Crichton NASA 21 October 2015.
1 maxdLoad The maxd website: © 2002 Norman Morrison for Manchester Bioinformatics.
An Introduction to Designing and Executing Workflows with Taverna Aleksandra Pawlik materials by: Katy Wolstencroft University of Manchester.
The European Bioinformatics Institute MAGE-OM and ArrayExpress a brief introduction to the database model Helen Parkinson European Bioinformatics Institute.
EMBL- EBI Wellcome Trust Genome Campus Hinxton, Cambridge, CB10 1SD, UK Standards and infrastructure for managing experimental metadata Philippe Rocca-Serra,
The Functional Genomics Experiment Object Model (FuGE) Andrew Jones, School of Computer Science, University of Manchester MGED Society.
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
PREMIS Implementation Fair, San Francisco, CA October 7, Stanford Digital Repository PREMIS & Geospatial Resources Nancy J. Hoebelheinrich Knowledge.
XML Standards for Proteomics Data Andrew Jones, Dr Jonathan Wastling and Dr Ela Hunt Department of Computing Science and the Institute of Biomedical and.
FuGE: A framework for developing standards for functional genomics Andrew Jones School of Computer Science, University of Manchester Metabomeeting 2.0.
Representing Flow Cytometry Experiments within FuGE Josef Spidlen 1, Peter Wilkinson 2, and Ryan Brinkman 1 1 BC Cancer Research Centre, Vancouver, BC,
A Practical Approach to Metadata Management Mark Jessop Prof. Jim Austin University of York.
Extending FuGE into other domains Andrew Jones School of Computer Science, University of Manchester
Project Database Handler The Project Database Handler is a brokering application that mediates interactions between the project database and the external.
1 Outline Standardization - necessary components –what information should be exchanged –how the information should be exchanged –common terms (ontologies)
User Profiling using Semantic Web Group members: Ashwin Somaiah Asha Stephen Charlie Sudharshan Reddy.
Mining the Biomedical Research Literature Ken Baclawski.
07/10/2007 VDCT Status Update EPICS Collaboration, October 2007 Knoxville, Tennessee
Systems Biology Markup Language Ranjit Randhawa Department of Computer Science Virginia Tech.
Web Technologies for Bioinformatics Ken Baclawski.
A PPARC funded project Common Execution Architecture Paul Harrison IVOA Interoperability Meeting Cambridge MA May 2004.
SDMX IT Tools Introduction
Sharing the knowledge of electrophysiology data Phillip Lord, Frank Gibson and the CARMEN Consortium.
1 Technical & Business Writing (ENG-715) Muhammad Bilal Bashir UIIT, Rawalpindi.
August 2003 At A Glance The IRC is a platform independent, extensible, and adaptive framework that provides robust, interactive, and distributed control.
XML Tools (Chapter 4 of XML Book). What tools are needed for a complete XML application? n Fundamental components n Web infrasructure n XML development.
Problem: Data Quality and Data Quantity Quality - Ensuring data is worth analyzing Cytometers No standardize setup methodology across cytometer to ensure.
Michael Radloff, Martin Schultz 12th International Conference BPM 2014 Modeling Concepts for Internal Controls in Business Processes – an Empirically Grounded.
ArrayExpress Ugis Sarkans EMBL - EBI
Statistical process model Workshop in Ukraine October 2015 Karin Blix Quality coordinator
2 Copyright © 2006, Oracle. All rights reserved. Defining Data Warehouse Concepts and Terminology.
XML and Distributed Applications By Quddus Chong Presentation for CS551 – Fall 2001.
Web Ontology Language for Service (OWL-S)
The Re3gistry software and the INSPIRE Registry
Regression testing Tor Stållhane.
Metadata The metadata contains
A General Approach to Real-time Workflow Monitoring
Presentation transcript:

FuGE: A framework for developing standards for functional genomics Angel Pizarro Univesrity of Pennsylvania Andrew Jones University of Manchester

Overview Challenge of building data standards Introduction to FuGE Current status Formats developed using FuGE

Data Standards for HT Genomics Major challenge developing standards: Technology still evolving Heterogeneous data formats (and data types) from software and instruments “Important” info about starting sample is almost unlimited Large quantities of metadata to validate results BUT: Most of these problems are shared by microarrays, proteomics, metabolomics etc.

Experiment Workflow Material Treatment Material Treatment Material Treatment Material Data Acquisition Data Data Transformation Data = Inputs and outputs of Protocols = Instance of some Protocol Data

Functional Genomics Experiment (FuGE) Object Model Merges of MAGE and PEDRo models where attempted –Results where and even more complex model that still left other FG technologies untouched –Main motivation was reuse MAGE sample prep and ontology components FuGE project was created as independent project from MGED and PSI Model of common components across FG to enable synergy between standards –Sample description, protocols, investigation structure

Architecture Details FuGE mainly represented as UML model –UML 1.4 using Magic Draw 9.5 Uses AndroMDA to produces platform specific models –XML Schema –Language Bindings and API’s Java, Perl, C, etc. –Database schema

FuGE Common Bio Description Audit Ontology Protocol Reference Investigation Data Material Conceptual Molecule Common: General data format management Auditing Referencing external resources Protocols Bio: Investigation structure Data Materials (organisms, solutions, compounds) Theoretical molecules e.g. sequences FuGE Structure

FuGE Workflow

FuGE is an Enabler Serve as a basis for developing new formats –PSI-GPS and MGED are using FuGE for developing their new data formats Existing formats can be tied together using FuGE –mzData does not describe biosource separation procedure (gels, LC, etc.) –CPAS from FHCRC does this

Use 1: Extending FuGE

Protocol definition says “See ExternalData file for parameters” (rather than storing params in Protocol) Use 2: Tie Together External Formats Protocol ProtocolApplication MaterialExternalData mzData file File format definition Parser will exist to extract data / parameters from mzData file Material can be used to describe the sample. This connects the MS data with a separation workflow inputMaterialoutputData

Status of FuGE Milestone 1 release - Sep 2005 Milestone 2 release - Dec 2005 –Acceptance by PSI and MGED at this time Milestone 3 – Spring 2006 –Milestone 2 of GelML and spML Version 1.0 – Fall 2006

FuGE Extensions MAGE V2 –Format for microarray data and annotations GelML –Format for methods + results of 2D gels –Milestone 1 Dec 2005 –Release scheduled for Spring/Summer 2006 spML –Sample processing: liquid chromatography, capillary electrophoresis, centrifugation –Milestone 1 Dec 2005 CPAS uses a FuGE-inspired manifest for experiments Metabolomics community considering PRIDE contemplating FuGE for data format Flow Cytometry community interested MIACA?

Summary FuGE should help convergence of omics data formats: –Single description of the sample for all types of experiment –Shared representation of protocols –Investigation and workflow structure for integrating different omics projects –Good starting point, proven development methodology

Acknowledgements Other FuGE developers –Andrew Jones (Manchester) –Michael Miller (Rosetta), Paul Spellman (Lawrence Berkley) –MGED, PSI, Fred Hutch CRC, Genologics, and various Contact:

While I have your attention… Space cost –Ultra expensive ~$19/GB ($380 for 20GB) –Cheap (TerraStation NAS) ~$0.80/GB ($16) –Ultra Cheap ($500 PC) ~ $0.50 ($10) MIAPE confounding factors –Will never have a complete list –We are implicitly telling investigators that they don’t know how to do good science (a Bad Thing) –Instead require quality assessment statistics on the data (variance, reproducibility, etc.)