XML Standards for Proteomics Data Andrew Jones, Dr Jonathan Wastling and Dr Ela Hunt Department of Computing Science and the Institute of Biomedical and.

Slides:



Advertisements
Similar presentations
Integration of Heterogeneous Informations Sources for Proteomics and Transcriptomics Steffen Möller University of Rostock Proteome Center.
Advertisements

FGE-OM: Functional Genomics Experiment - Object Model Andy Jones Department of Computing Science University of Glasgow.
UC Mass Spectrometry Facility & Protein Characterization for Proteomics Core Proteomics Capabilities: Examples of Protein ID and Analysis of Modified Proteins.
Introduction to BioConductor Friday 23th nov 2007 Ståle Nygård Statistical methods and bioinformatics for the analysis of microarray.
Prof. Carolina Ruiz Computer Science Department Bioinformatics and Computational Biology Program WPI WELCOME TO BCB4003/CS4803 BCB503/CS583 BIOLOGICAL.
Basic Genomic Characteristic  AIM: to collect as much general information as possible about your gene: Nucleotide sequence Databases ○ NCBI GenBank ○
The Imperial College Tissue Bank A searchable catalogue for tissues, research projects and data outcomes Prof Gerry Thomas - Dept. Surgery & Cancer The.
A Systematic approach to the Large-Scale Analysis of Genotype- Phenotype correlations Paul Fisher Dr. Robert Stevens Prof. Andrew Brass.
Knowledge Enabled Information and Services Science What can SW do for HCLS today? Panel at HCSL Workshop, WWW2007 Amit Sheth Kno.e.sis Center Wright State.
Bioinformatics at WSU Matt Settles Bioinformatics Core Washington State University Wednesday, April 23, 2008 WSU Linux User Group (LUG)‏
 Goals Unambiguous description of how the investigation was performed Consistent annotation, powerful queries and data integration  Details NOT model.
FuGO: Development of a Functional Genomics Ontology (FuGO) Patricia L. Whetzel 1, Helen Parkinson 2, Assunta-Susanna Sansone 2,Chris Taylor 2, and Christian.
Data Management in the DOE Genomics:GTL Program Janet Jacobsen and Adam Arkin Lawrence Berkeley National Laboratory University of California, Berkeley.
Proposal for a Standard Representation of the Results of GC-MS Analysis: A Module for ArMet Helen Fuell 1, Manfred Beckmann 2, John Draper 2, Oliver Fiehn.
Jeffery Loo NLM Associate Fellow ’03 – ’05 chemicalinformaticsforlibraries.
The Golden Age of Biology DNA -> RNA -> Proteins -> Metabolites Genomics Technologies MECHANISMS OF LIFE Health Care Diagnostics Medicines Animal Products.
Scientific Data Mining: Emerging Developments and Challenges F. Seillier-Moiseiwitsch Bioinformatics Research Center Department of Mathematics and Statistics.
Proteomics: A Challenge for Technology and Information Science CBCB Seminar, November 21, 2005 Tim Griffin Dept. Biochemistry, Molecular Biology and Biophysics.
1 BrainWave Biosolutions Limited Accelerating Life Science Research through Technology.
We are developing a web database for plant comparative genomics, named Phytome, that, when complete, will integrate organismal phylogenies, genetic maps.
BIOCMS: Resource Integration and Web Application Framework for Bioinformatics DHUNDY R BASTOLA †, *, ANIL KHADKA †, MOHAMMAD SHAFIULLAH † AND HESHAM ALI.
ExPASy - Expert Protein Analysis System The bioinformatics resource portal and other resources An Overview.
A number of slides taken/modified from:
SCIENCE-DRIVEN INFORMATICS FOR PCORI PPRN Kristen Anton UNC Chapel Hill/ White River Computing Dan Crichton White River Computing February 3, 2014.
1 MAGE-OM and ArrayExpress database model Ugis Sarkans, EBI.
Institute of Systems Biology (INBIOSIS)/ School of Biosciences & Biotechnology (Faculty of Science & Technology), Bioinformatics Development in Malaysia.
DOE Genomics: GTL Program IT Infrastructure Needs for Systems Biology David G. Thomassen Office of Biological and Environmental Research DOE Office of.
Data Curation and Management activities within the UCT Computational Biology Group Dr Nicky Mulder.
The Functional Genomics Experiment Model (FuGE) Andy Jones School of Computer Science and Faculty of Life Sciences, University of Manchester.
Introduction The GPM project (The Global Proteome Machine Organization) Salvador Martínez de Bartolomé Bioinformatics support –
© What do bioinformaticians do?
Master’s Degrees in Bioinformatics in Switzerland: Past, present and near future Patricia M. Palagi Swiss Institute of Bioinformatics.
1 MIAME The MIAME website: © 2002 Norman Morrison for Manchester Bioinformatics.
Proteome data integration characteristics and challenges K. Belhajjame 1, R. Cote 4, S.M. Embury 1, H. Fan 2, C. Goble 1, H. Hermjakob, S.J. Hubbard 1,
1 maxdLoad The maxd website: © 2002 Norman Morrison for Manchester Bioinformatics.
1 RNA Bioinformatics Genes and Secondary Structure Anne Haake Rhys Price Jones & Tex Thompson.
Genomics Laboratory University Medical Center Utrecht... Microarray technology group microarray production and use Transcription regulation genome-wide.
Copyright © 2009 Pearson Education, Inc. Genomics, Bioinformatics, and Proteomics Chapter 21 Lecture Concepts of Genetics Tenth Edition.
The European Bioinformatics Institute MAGE-OM and ArrayExpress a brief introduction to the database model Helen Parkinson European Bioinformatics Institute.
Integrating the Bioinformatic Technology Group into your research programme Introduction People and Skills Examples Integrating the BTG Contacts BHRC Away.
The Functional Genomics Experiment Object Model (FuGE) Andrew Jones, School of Computer Science, University of Manchester MGED Society.
Workshop Aims NMSU GO Workshop 20 May Aims of this Workshop  WIIFM? modeling examples background information about GO modeling  Strategies for.
Bioinformatics Core Facility Guglielmo Roma January 2011.
1 Computing Challenges for the Square Kilometre Array Mathai Joseph & Harrick Vin Tata Research Development & Design Centre Pune, India CHEP Mumbai 16.
Knowledge Enabled Information and Services Science Glycomics project overview.
PROGNOCHIP-BASE, FORTH-ICS 1 PrognoChip-BASE: An Information System for the Management of Spotted DNA MicroArray Experiments Extension of BASE v
FuGE: A framework for developing standards for functional genomics Angel Pizarro Univesrity of Pennsylvania Andrew Jones University of Manchester.
Alvis Brazma, Johan Rung, Ugis Sarkans, Thomas Schlitt, Jaak Vilo European Bioinformatics Institute (EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge,
Generating Useful Information in Toxicogenomics: Focused Efforts: Microarray Standards Feb. 6, 2003, The National Academies Chris Stoeckert, Ph.D. Center.
FuGE: A framework for developing standards for functional genomics Andrew Jones School of Computer Science, University of Manchester Metabomeeting 2.0.
Representing Flow Cytometry Experiments within FuGE Josef Spidlen 1, Peter Wilkinson 2, and Ryan Brinkman 1 1 BC Cancer Research Centre, Vancouver, BC,
Cooperative experiments in VL-e: from scientific workflows to knowledge sharing Z.Zhao (1) V. Guevara( 1) A. Wibisono(1) A. Belloum(1) M. Bubak(1,2) B.
Extending FuGE into other domains Andrew Jones School of Computer Science, University of Manchester
Ela Hunt, MRC research fellow Department of Computing Science SyntenyVista BIOINFORMATICS RESEARCH CENTRE.
2009 IADR, MIAMI, FL, USA Hands-on Experience for using the Human Oral Microbiome Database (HOMD) 2009 IADR Workshop, Miami, FL, USA Tsute (George) Chen.
Data collection and organization Bob Sinkovits AfCS Bioinformatics Lab SDSC.
Project Database Handler The Project Database Handler is a brokering application that mediates interactions between the project database and the external.
XML-Based Grid Data System for Bioinformatics Development Noppadon Khiripet, Ph.D Wasinee Rungsarityotin, MS Chularat Tanprasert, Ph.D Royol Chitradon.
A Report on CAMDA’01 Biointelligence Lab School of Computer Science and Engineering Seoul National University Kyu-Baek Hwang and Jeong-Ho Chang.
Bioinformatics Research Overview Outline Biomedical Ontologies oGlycO oEnzyO oProPreO Scientific Workflow for analysis of Proteomics Data Framework for.
Sharing the knowledge of electrophysiology data Phillip Lord, Frank Gibson and the CARMEN Consortium.
Data Management Support for Life Sciences or What can we do for the Life Sciences? Mourad Ouzzani
Post-Genomic Technologies GenomicsProteomics Bioinformatics & Statistics Bioimaging.
High Risk 1. Ensure productive use of GRID computing through participation of biologists to shape the development of the GRID. 2. Develop user-friendly.
High throughput biology data management and data intensive computing drivers George Michaels.
1 LS DAM Overview August 7, 2012 Current Core Team: Ian Fore, D.Phil., NCI CBIIT, Robert Freimuth, Ph.D., Mayo Clinic, Mervi Heiskanen, NCI-CBIIT, Joyce.
National Cancer Institute Uma Mudunuri ABCC, NCI-Frederick ISRCE Monthly Meeting, Nov 9th 2010 bioDBnet The biological DataBase network.
 Facilities Open House Functional Genomics Facility Molishree Joshi, Ph.D. 6/1/2015 Contact Information:
“Proteomics is a science that focuses on the study of proteins: their roles, their structures, their localization, their interactions, and other factors.”
Gene Expression Analysis and Proteins
Presentation transcript:

XML Standards for Proteomics Data Andrew Jones, Dr Jonathan Wastling and Dr Ela Hunt Department of Computing Science and the Institute of Biomedical and Life Sciences, University of Glasgow

Proteomics Mass Spectrometry Database Search Mass spectrometry (MS) to characterise protein spots 4. Database searches to identify proteins 1. 2D-PAGE to separate proteins 2. Image analysis to determine the volume of protein spots 2D-PAGE Image Analysis 2. 4.

Proteomics Data Issues Many different instruments for data collection Great variety of software used for analysis Access to external databases –For protein identification –Protein characterisation after ID High-throughput techniques generate very large data sets Scanner, MS Image analysis, MS viewer Genome, microarray, publications, more... Instruments Software Databases

A Standard Model for Proteomics Improve management of laboratory workflows Data Integration: link local data to external data sources Development of public databases, enabling: –Queries over protocols, raw data and analysis –Experiments to be reproduced or re-analysed by other research groups –Co-analysis of proteome data with genome, transcriptome and other resources

Biological Collaborators Parasitology research group –Investigating host-parasite response with Toxoplasma gondii Ras/Raf pathway research at the Beatson institute Functional Genomics facility at the IBLS Functional Genomics Facility -

MAGE model for Proteomics The MAGE model has been developed to store microarray protocols, data and analysis A similar model will facilitate integration between microarray and proteome data Aspects of the model require few modifications to be applicable to proteomics We are developing a new representation of 2D gel analysis and MS data

Experimental Protocols in MAGE Array Protocol BioAssay BioEvent BioMaterial ArrayDesign MAGE model is extensible Protocol is generated as an ordered list: events, materials and hardware Few changes required to focus on protein extraction rather than mRNA production

2D_PAGE Protocol BioAssay BioEvent BioMaterial 2D_PAGE_ Setup Experimental Protocols for 2D gels MAGE model is extensible Protocol is generated as an ordered list: events, materials and hardware Few changes required to focus on protein extraction rather than mRNA production

Proteomics Data Model Image analysis identifies spots observable on the gel Important to store raw data and analysis from MS Separate package for cross gel analysis e.g. time series 2D_PAGE Protein_Spots MS_SetupMS_Data Multiple_ Analysis Data_Analysis Link From Protocol BioSequence

Proteomics Model Experimental protocol packages require few changes from MAGE New data model includes MS data and statistical analysis between gels Model incorporates storage of external database searches 2D_PAGE Protocol BioAssay BioEvent BioMaterial 2D_PAGE_ Setup Protein_ Spots MS_Setup MS_Data Multiple_ Analysis Data_ Analysis BioSequence Experiment Audit& Security DescriptionMeasurement CommonBQS Annotation Data Protocol

Proteomics Database and Indexing Technology A prototype database for proteomics has been developed We have developed a specialised index structure for XML, in order to improve query performance The performance of the index has currently been tested with 800MB of protein data Data Stores XML Index XML Dictionary 1 Experiment 2gelImage 3spots 4spot … Data Path Tree 1. Protein Information Resource -

Related Research Databases: SWISS-2DPAGE, LIMS systems Standards: Proteomics Standards Initiative (PSI) –Standards for protein-protein interactions and mass spectrometry PEDRo system with PEML: Proteomics experiment markup language PSI:

Work In Progress Work towards an XML standard for proteomics Create standards for capturing statistical processing of large data sets Developing XML indexing technology to improve data integration and query power Developing a proteome database utilising XML indexing and a standard model

Contact Bioinformatics Research Centre - The Functional Genomics Facility is supported by a Wellcome Trust grant for £2.4M. My research is supported by an MRC Bioinformatics PhD studentship, Ela Hunt is supported by an MRC Fellowship. Acknowledgements Researchers in Jonathan Wastling lab for input into the model. Dr Ashwin Kotiwaliwale at the Beatson for the collaboration on the prototype database.