Presentation on theme: "The Functional Genomics Experiment Model (FuGE) Andy Jones School of Computer Science and Faculty of Life Sciences, University of Manchester."— Presentation transcript:
The Functional Genomics Experiment Model (FuGE) Andy Jones School of Computer Science and Faculty of Life Sciences, University of Manchester
History Data sharing for ‘omics data tackled by various groups: –MAGE format for microarrays (MGED 2002) –PEDRo for proteomics (U. Man 2003) Problems for functional genomics: –Common parts modelled differently –Labs performing both techniques must create 2 complex applications to describe similar concepts –Difficult to integrate data Two efforts to merge MAGE and PEDRo (2004) –Merged models even more complex –Did not cover other techniques e.g. metabolomics –But, significant advantages if upstream details can be described only once!
Introduction to FuGE Functional Genomics Experiment model (FuGE) Models common components across functional genomics experiments –Sample description, experimental variables protocols, multidimensional data Three uses of FuGE: 1.A data format for representing laboratory workflows 2.Supplement existing data formats with additional metadata to describe their context within a workflow 3.A framework for building new data formats
FuGE Common Bio Measurement Audit Ontology Protocol Reference Investigation Data Material Conceptual Molecule Common: General data format management Auditing Referencing external resources Protocols Bio: Investigation structure Data Materials (organisms, solutions, compounds) Theoretical molecules e.g. sequences, metabolites stored in a database FuGE structure Description FuGE exists as: 1. Object model (UML) UML XML Schema 2. XML schema...and Java STK, Hibernate relational DB binding etc.
Use 1: Experiment Workflow Material Treatment Material Treatment Material Treatment Material Data Acquisition Data Data Transformation Data = Inputs and outputs = ProtocolApplication Data
Use 2: Tie Together External Formats ProtocolApplication MaterialExternalData mzData file File format definition Parser will exist to extract data / parameters from mzData file Material can be used to describe the sample. This connects the MS data with a separation workflow inputMaterialoutputData
FuGE Status Milestone 1 (Sept 2005) Milestone 2 (Dec 2005) Milestone 3 (May 2006) Beta Java software toolkit –M2 (March 2006); M3 (Sept 2006) FuGE v1 (candidate) –Currently in PSI standards process –Expected to stablise from process by March/April 07
Formats extending from FuGE MAGE version 2 (MGED) GelML and GelInfoML (PSI) analysisXML (PSI) spML (PSI / MSI) NMR (FuGE being evaluated by MSI) Planned migration for mzData and other PSI formats Upstream workflow description for all groups –investigation structure and variables, sample description etc. –Allows assembly of studies that cross-technology boundaries in one data format
Conclusions FuGE accepted by MGED, PSI and MSI –for developing future data formats –for describing parts of experiments common across technology Moving toward convergence of data formats Simplify process of developing new data standards Will facilitate data integration and submission of data to public repositories Improve the uniformity of data sets in public repositories thus facilitates querying Web: http://fuge.sourceforge.net/http://fuge.sourceforge.net/
Acknowledgements FuGE development –Angel Pizarro (UPenn), Michael Miller (Rosetta), Paul Spellman (Lawrence Berkley) –MGED, PSI, Fred Hutchinson CRC, Genologics PSI –Chris Taylor, Henning Hermjakob, Randy Julian MSI –Nigel Hardy and Helen Jenkins (Aber) Work on FuGE in Manchester is funded by the BBSRC Email: email@example.com@cs.man.ac.uk Web: http://fuge.sourceforge.net/http://fuge.sourceforge.net/