BRC 2011 Session #4 – “Omics” Data. Session #4 - Outline Challenges and Opportunities  pathogen datasets; host datasets; integrating pathogen-host datasets.

Slides:



Advertisements
Similar presentations
Microarray Data Analysis Day 2
Advertisements

Prof. Carolina Ruiz Computer Science Department Bioinformatics and Computational Biology Program WPI WELCOME TO BCB4003/CS4803 BCB503/CS583 BIOLOGICAL.
Wrapup. NHGRI strategic plan What does the NIH think genomics should be for the next 10 years? [Nature, Feb. 2011]
The Rice Functional Genomics Program of China cDNA microarray database (RIFGP-CDMD) consists of complete datasets, including the probe sequences, microarray.
Standardizing Metadata Associated with NIAID Genome Sequencing Center Projects Richard H. Scheuermann, Ph.D. Department of Pathology Division of Biomedical.
Pathways & Networks analysis COST Functional Modeling Workshop April, Helsinki.
Basic Genomic Characteristic  AIM: to collect as much general information as possible about your gene: Nucleotide sequence Databases ○ NCBI GenBank ○
Introduction to Bioinformatics Richard H. Scheuermann, Ph.D. Director of Informatics JCVI.
Host cell responses to viral infection can be monitored by a variety of different high throughput experimental methodologies in order to understand the.
Systems Biology Existing and future genome sequencing projects and the follow-on structural and functional analysis of complete genomes will produce an.
Diverse group of microbial ecologists, molecular biologists, biogeochemists, chemists, toxicologists, system biologists, geneticists.
Bioinformatics: a Multidisciplinary Challenge Ron Y. Pinter Dept. of Computer Science Technion March 12, 2003.
Use of Ontologies in the Life Sciences: BioPax Graciela Gonzalez, PhD (some slides adapted from presentations available at
August 29, 2002InforMax Confidential1 Vector PathBlazer Product Overview.
Proteomics: A Challenge for Technology and Information Science CBCB Seminar, November 21, 2005 Tim Griffin Dept. Biochemistry, Molecular Biology and Biophysics.
CISC667, F05, Lec24, Liao1 CISC 667 Intro to Bioinformatics (Fall 2005) DNA Microarray, 2d gel, MSMS, yeast 2-hybrid.
Modeling Functional Genomics Datasets CVM Lesson 1 13 June 2007Bindu Nanduri.
Class Projects. Future Work and Possible Project Topic in Gene Regulatory network Learning from multiple data sources; Learning causality in Motifs; Learning.
Introduction to molecular networks Sushmita Roy BMI/CS 576 Nov 6 th, 2014.
Genetics: From Genes to Genomes
DEMO CSE fall. What is GeneMANIA GeneMANIA finds other genes that are related to a set of input genes, using a very large set of functional.
Modeling Functional Genomics Datasets CVM Lessons 4&5 10 July 2007Bindu Nanduri.
Microarray Gene Expression Data Analysis A.Venkatesh CBBL Functional Genomics Chapter: 07.
Overview of Bioinformatics A/P Shoba Ranganathan Justin Choo National University of Singapore A Tutorial on Bioinformatics.
A number of slides taken/modified from:
ArrayExpress and Expression Atlas: Mining Functional Genomics data Gabriella Rustici, PhD Functional Genomics Team EBI-EMBL
Ch10. Intermolecular Interactions and Biological Pathways
Cytoscape A powerful bioinformatic tool Mathieu Michaud
Review of Ondex Bernice Rogowitz G2P Visualization and Visual Analytics Team March 18, 2010.
Knowledgebase Creation & Systems Biology: A new prospect in discovery informatics S.Shriram, Siri Technologies (Cytogenomics), Bangalore S.Shriram, Siri.
VectorBase A Resource Centre for Invertebrate Hosts of Human Pathogens Bob MacCallum Imperial College London.
GTL Facilities Computing Infrastructure for 21 st Century Systems Biology Ed Uberbacher ORNL & Mike Colvin LLNL.
Bioinformatics Dr. Víctor Treviño BT4007
Networks and Interactions Boo Virk v1.0.
MMAP: mouse Metabolomics Analysis Platform Preeti Bais 09/09/2014.
EADGENE and SABRE Post-Analyses Workshop 12-14th November 2008, Lelystad, Netherlands 1 François Moreews SIGENAE, INRA, Rennes Cytoscape.
Copyright OpenHelix. No use or reproduction without express written consent1.
1 RNA Bioinformatics Genes and Secondary Structure Anne Haake Rhys Price Jones & Tex Thompson.
Integrating the Bioinformatic Technology Group into your research programme Introduction People and Skills Examples Integrating the BTG Contacts BHRC Away.
Ontologies GO Workshop 3-6 August Ontologies  What are ontologies?  Why use ontologies?  Open Biological Ontologies (OBO), National Center for.
Richard H. Scheuermann, Ph.D. November 5, 2012 Support for Systems Biology Data in IRD/ViPR - Proteomics.
The Functional Genomics Experiment Object Model (FuGE) Andrew Jones, School of Computer Science, University of Manchester MGED Society.
Harbin Institute of Technology Computer Science and Bioinformatics Wang Yadong Second US-China Computer Science Leadership Summit.
Ontology based analyses methods ++ develop a grammar for making productions using mf, bp, cl: –derive a higher level grammar for next level of productions.
Getting Started: a user’s guide to the GO GO Workshop 3-6 August 2010.
Integration of Host Factor Data into the Virus Pathogen Database and Analysis Resource (ViPR) and the Influenza Research Database (IRD) Brett E. Pickett.
XML Standards for Proteomics Data Andrew Jones, Dr Jonathan Wastling and Dr Ela Hunt Department of Computing Science and the Institute of Biomedical and.
Valentina Di Francesco Senior Program Officer for Bioinformatics, Structural Genomics and Systems Biology Microbial Genomics.
Biological Networks & Systems Anne R. Haake Rhys Price Jones.
EB3233 Bioinformatics Introduction to Bioinformatics.
BBN Technologies Copyright 2009 Slide 1 The S*QL Plugin for Cytoscape Visual Analytics on the Web of Linked Data Rusty (Robert J.) Bobrow Jeff Berliner,
While gene expression data is widely available describing mRNA levels in different cancer cells lines, the molecular regulatory mechanisms responsible.
A curated database of biological pathways.
Introduction to biological molecular networks
GO based data analysis Iowa State Workshop 11 June 2009.
GeWorkbench Overview Support Team Molecular Analysis Tools Knowledge Center Columbia University and The Broad Institute of MIT and Harvard.
Module 5: Future 1 Canadian Bioinformatics Workshops
Copyright OpenHelix. No use or reproduction without express written consent1 1.
High throughput biology data management and data intensive computing drivers George Michaels.
CCLE Cancer Cell Line Encyclopedia Alexey Erohskin.
ArrayExpress Ugis Sarkans EMBL - EBI
Networks and Interactions
생물정보학 Bioinformatics.
Functional Annotation of the Horse Genome
Ensembl Genome Repository.
Topic: Medicine of the future Reading: Harbron, Chris (2006)
The Omics Dashboard.
Pathway Visualization
Integrative omic approaches for the study of host–pathogen interactions Integrative omic approaches for the study of host–pathogen interactions (A) Proteomic.
Cancer Cell Line Encyclopedia
Presentation transcript:

BRC 2011 Session #4 – “Omics” Data

Session #4 - Outline Challenges and Opportunities  pathogen datasets; host datasets; integrating pathogen-host datasets  BRC approach to managing “omics” data  mRNAs, ncRNAs, RNAi, proteomics, metabolomics  systems-level analysis Francis Ouellette – “Interesting Gene List” visualization and analysis & training approaches Ideas from Systems Biology and DBP interactions Talking Points Open discussion

Session #4 – Opportunities Andrew R. Joyce & Bernhard Ø. Palsson, Nature Reviews Molecular Cell Biology 7, (March 2006)

Session #4 – Challenges Approach to “omics” data is somewhat pathogen specific  Host “omics” data is relevant for bacteria, viruses and parasites; less so for vectors  Pathogen “omics” relevant for bacteria, parasites and vectors; less so for viruses What kind of “omics” data should be supported by BRCs?  Pathogen vs host  mRNA, ncRNA, RNAi, proteomics, metabolomics, lipidmics, others  Raw, minimally processed or highly interpreted (status of NCBI SRA)  Results data and metadata What should we do with the data?  Make available for download  Make available for browsing  Make available for visualization  Make available for analysis Current infrastructure is focused largely on genomics  Genome sequence and gene/protein annotations about the pathogens; no infrastructure for host genes (Some progress on web services)  Analysis and visualization tools are focused on comparative genomics; few tools for “omics” data analysis and visualization Standard nomenclature for naming our data sets so that they can be more easily identified and exchanged How to acquire data sets of sufficient quality and quantity  Reliable sourcing of data, and acquisition from diverse off-site providers in real time  Availability of data and metadata in public resources – lack of standards; difficult to access Data quality, reliability, and reproducibility  Technology/platform bias and lab-to-lab variations  Noise in data and false positives  Metadata driven analysis requires manual curation efforts to clean up signal from noise Projection of omics data and its interpretation to closely related organisms Use of omics data to improve annotations Moving from data integration to knowledge integration

Session #4 – Opportunities Currently no organized resources for viral pathogen host response/host factor data; this would be very useful for the virology community Many BRC groups have extensive experience with microarray data and network analysis that could be leveraged Host data is becoming increasing relevant for novel drug discovery Using networks to relate different kinds of data Ask system-level biological questions that cannot be answered by any one ‘omics data type alone Visualization of multiple layers of information, simultaneously. How many tracks can one realistically add before a new approach is needed? Use omics data to identify/validate/correct gene models and gene functions, regulatory elements, metabolic and signaling pathways, and phenotypes Development of simple tools and pipelines to enable HT processing of omics data besides sequencing and transcriptomics

Talking points Approach to “omics” data management  Raw vs minimally processed vs interpreted results  Facilitating relevant data capture from targeted projects  Capturing other high value related data  Adoption and use of data standards, especially for metadata Utility of visualization and analysis of IGLs Support for re-analysis of primary “omics” data What to do with non-gene/protein-centric “omics” data

“INTERESTING GENE LIST” VISUALIZATION AND ANALYSIS & TRAINING APPROACHES Francis Ouellette

Overview of Systems Biology & DBP Projects Four systems biology groups funded by NIAID, including:  Systems Virology (Michael Katze group, Univ. Washington)  Influenza H1N1 and H5N1 and SARS Coronavirus  statistical models, algorithms and software, raw and processed gene expression data, and proteomics data  Systems Influenza (Alan Aderem group, Institute for Systems Biology)  various Influenza virus  microarray, mass spectrometry, and lipidomics data ViPR Driving Biological Projects  Abraham Brass, Mass. General Hospital  Dengue virus host factor database from RNAi screen  Lynn Enquist / Moriah Szpara, Princeton University  Deep sequencing and neuronal microarrays for functional genomic analysis of Herpes Simplex Virus

Proposal for “Omics” Data 1. “Omics” data management (host) a) Project metadata b) Assay/experiment metadata c) Data analysis metadata d) Primary results e) Derived results (e.g. “interesting gene lists” (IGLs)) 2. Add additional related datasets 3. Visualize IGLs in context of biological pathways and networks 4. Statistical analysis of pathway sub-network overrepresentation 5. Re-analysis of primary data using assembled pipeline tools

What level of data should be stored and made accessible Primary results data  Need to define what is considered “primary” data for each platform  Microarray example: raw image files (.tiff) vs probe intensity values (.cel)  Opportunity for re-processing leading to re-interpretation Derived/processed results  “Interesting gene lists” from microarray, RNAi, proteomics, and other experimental platforms  “Interesting metabolites lists”

Metadata (MIBBI-compliant) Project Level Metadata  Hypothesis, rationale, study design, etc.  Publications and links pertaining to the project  Data providers - PI, other key personnel, affiliations, contact information Assay Level Metadata  Sample source and characteristics of source  Sample type  Source/sample treatment information  Assay details Data Processing/Analysis Level Metadata  Algorithm(s) used for transforming primary to derived data  Configuration parameters

Interpretation of “Interesting Gene Lists” Visualizing interesting gene lists overrepresentation in protein-protein networks and/or biological pathways Statistical assessment of enrichment

Visualizing Hits from Interesting Gene Lists Select Dataset(s) of interest Choose all (or subset) of genes on list  Intersect/Subtract between studies Visualize selected genes as a biological network

“Quick & Dirty” Overrepresentation Visualization Reactome SkyPainter  Limited to reactions and interactions found in Reactome db  Visualizes “Big Picture” using pathway representations Constructed using gene list from HCV study  HCV host factors residing in the nucleus  Ribonucleoprotein complex, transcription factors, kinases, protein metabolism/modification, nucleic acid binding / metabolism

Visualizing Hits from Gene Lists (Cytoscape)

Statistical Enrichment Analysis Gene Ontology biological process overrepresentation  CLASSIFI Protein interaction network module enrichment (PINME) analysis  Obtain all known human protein-protein interactions from BioGRID  Determine module (sub-network) structures (e.g. using dMoNet)  Identify function of modules (e.g. using CLASSIFI)  Determine overrepresentation statistics for IGLs  Visualize results

Modules in Networks

Talking points Approach to “omics” data management  Raw vs minimally processed vs interpreted results  Facilitating relevant data capture from targeted projects  Capturing other high value related data  Adoption and use of data standards, especially for metadata Utility of visualization and analysis of IGLs Support for re-analysis of primary “omics” data What to do with non-gene/protein-centric “omics” data