Presentation is loading. Please wait.

Presentation is loading. Please wait.

BRC 2011 Session #4 – “Omics” Data. Session #4 - Outline Challenges and Opportunities  pathogen datasets; host datasets; integrating pathogen-host datasets.

Similar presentations


Presentation on theme: "BRC 2011 Session #4 – “Omics” Data. Session #4 - Outline Challenges and Opportunities  pathogen datasets; host datasets; integrating pathogen-host datasets."— Presentation transcript:

1 BRC 2011 Session #4 – “Omics” Data

2 Session #4 - Outline Challenges and Opportunities  pathogen datasets; host datasets; integrating pathogen-host datasets  BRC approach to managing “omics” data  mRNAs, ncRNAs, RNAi, proteomics, metabolomics  systems-level analysis Francis Ouellette – “Interesting Gene List” visualization and analysis & training approaches Ideas from Systems Biology and DBP interactions Talking Points Open discussion

3 Session #4 – Opportunities Andrew R. Joyce & Bernhard Ø. Palsson, Nature Reviews Molecular Cell Biology 7, 198-210 (March 2006)

4 Session #4 – Challenges Approach to “omics” data is somewhat pathogen specific  Host “omics” data is relevant for bacteria, viruses and parasites; less so for vectors  Pathogen “omics” relevant for bacteria, parasites and vectors; less so for viruses What kind of “omics” data should be supported by BRCs?  Pathogen vs host  mRNA, ncRNA, RNAi, proteomics, metabolomics, lipidmics, others  Raw, minimally processed or highly interpreted (status of NCBI SRA)  Results data and metadata What should we do with the data?  Make available for download  Make available for browsing  Make available for visualization  Make available for analysis Current infrastructure is focused largely on genomics  Genome sequence and gene/protein annotations about the pathogens; no infrastructure for host genes (Some progress on web services)  Analysis and visualization tools are focused on comparative genomics; few tools for “omics” data analysis and visualization Standard nomenclature for naming our data sets so that they can be more easily identified and exchanged How to acquire data sets of sufficient quality and quantity  Reliable sourcing of data, and acquisition from diverse off-site providers in real time  Availability of data and metadata in public resources – lack of standards; difficult to access Data quality, reliability, and reproducibility  Technology/platform bias and lab-to-lab variations  Noise in data and false positives  Metadata driven analysis requires manual curation efforts to clean up signal from noise Projection of omics data and its interpretation to closely related organisms Use of omics data to improve annotations Moving from data integration to knowledge integration

5

6 Session #4 – Opportunities Currently no organized resources for viral pathogen host response/host factor data; this would be very useful for the virology community Many BRC groups have extensive experience with microarray data and network analysis that could be leveraged Host data is becoming increasing relevant for novel drug discovery Using networks to relate different kinds of data Ask system-level biological questions that cannot be answered by any one ‘omics data type alone Visualization of multiple layers of information, simultaneously. How many tracks can one realistically add before a new approach is needed? Use omics data to identify/validate/correct gene models and gene functions, regulatory elements, metabolic and signaling pathways, and phenotypes Development of simple tools and pipelines to enable HT processing of omics data besides sequencing and transcriptomics

7 Talking points Approach to “omics” data management  Raw vs minimally processed vs interpreted results  Facilitating relevant data capture from targeted projects  Capturing other high value related data  Adoption and use of data standards, especially for metadata Utility of visualization and analysis of IGLs Support for re-analysis of primary “omics” data What to do with non-gene/protein-centric “omics” data

8 “INTERESTING GENE LIST” VISUALIZATION AND ANALYSIS & TRAINING APPROACHES Francis Ouellette

9 Overview of Systems Biology & DBP Projects Four systems biology groups funded by NIAID, including:  Systems Virology (Michael Katze group, Univ. Washington)  Influenza H1N1 and H5N1 and SARS Coronavirus  statistical models, algorithms and software, raw and processed gene expression data, and proteomics data  Systems Influenza (Alan Aderem group, Institute for Systems Biology)  various Influenza virus  microarray, mass spectrometry, and lipidomics data ViPR Driving Biological Projects  Abraham Brass, Mass. General Hospital  Dengue virus host factor database from RNAi screen  Lynn Enquist / Moriah Szpara, Princeton University  Deep sequencing and neuronal microarrays for functional genomic analysis of Herpes Simplex Virus

10 Proposal for “Omics” Data 1. “Omics” data management (host) a) Project metadata b) Assay/experiment metadata c) Data analysis metadata d) Primary results e) Derived results (e.g. “interesting gene lists” (IGLs)) 2. Add additional related datasets 3. Visualize IGLs in context of biological pathways and networks 4. Statistical analysis of pathway sub-network overrepresentation 5. Re-analysis of primary data using assembled pipeline tools

11 What level of data should be stored and made accessible Primary results data  Need to define what is considered “primary” data for each platform  Microarray example: raw image files (.tiff) vs probe intensity values (.cel)  Opportunity for re-processing leading to re-interpretation Derived/processed results  “Interesting gene lists” from microarray, RNAi, proteomics, and other experimental platforms  “Interesting metabolites lists”

12 Metadata (MIBBI-compliant) Project Level Metadata  Hypothesis, rationale, study design, etc.  Publications and links pertaining to the project  Data providers - PI, other key personnel, affiliations, contact information Assay Level Metadata  Sample source and characteristics of source  Sample type  Source/sample treatment information  Assay details Data Processing/Analysis Level Metadata  Algorithm(s) used for transforming primary to derived data  Configuration parameters

13 Interpretation of “Interesting Gene Lists” Visualizing interesting gene lists overrepresentation in protein-protein networks and/or biological pathways Statistical assessment of enrichment

14 Visualizing Hits from Interesting Gene Lists Select Dataset(s) of interest Choose all (or subset) of genes on list  Intersect/Subtract between studies Visualize selected genes as a biological network

15 “Quick & Dirty” Overrepresentation Visualization Reactome SkyPainter  Limited to reactions and interactions found in Reactome db  Visualizes “Big Picture” using pathway representations Constructed using gene list from HCV study  HCV host factors residing in the nucleus  Ribonucleoprotein complex, transcription factors, kinases, protein metabolism/modification, nucleic acid binding / metabolism

16 Visualizing Hits from Gene Lists (Cytoscape)

17 Statistical Enrichment Analysis Gene Ontology biological process overrepresentation  CLASSIFI Protein interaction network module enrichment (PINME) analysis  Obtain all known human protein-protein interactions from BioGRID  Determine module (sub-network) structures (e.g. using dMoNet)  Identify function of modules (e.g. using CLASSIFI)  Determine overrepresentation statistics for IGLs  Visualize results

18 Modules in Networks

19 Talking points Approach to “omics” data management  Raw vs minimally processed vs interpreted results  Facilitating relevant data capture from targeted projects  Capturing other high value related data  Adoption and use of data standards, especially for metadata Utility of visualization and analysis of IGLs Support for re-analysis of primary “omics” data What to do with non-gene/protein-centric “omics” data


Download ppt "BRC 2011 Session #4 – “Omics” Data. Session #4 - Outline Challenges and Opportunities  pathogen datasets; host datasets; integrating pathogen-host datasets."

Similar presentations


Ads by Google