Presentation is loading. Please wait.

Presentation is loading. Please wait.

MN-B-C 2 Analysis of High Dimensional (-omics) Data Kay Hofmann – Protein Evolution Group Week 5: Proteomics.

Similar presentations


Presentation on theme: "MN-B-C 2 Analysis of High Dimensional (-omics) Data Kay Hofmann – Protein Evolution Group Week 5: Proteomics."— Presentation transcript:

1 MN-B-C 2 Analysis of High Dimensional (-omics) Data Kay Hofmann – Protein Evolution Group http://www.genetik.uni-koeln.de/groups/Hofmann Week 5: Proteomics 3

2 Map genes to pathway components Consider one single pathway at a time Visualize experimental data in pathway diagram Consider a group of genes with interesting experimental finding Find all pathway associations Statistical test for pathways that are over-represented in group Pathway-centric AnalysisGene set centric Analysis

3  Classical network/pathway representation  Implies upstream/downstream ordering Fas-L Fas FADD Casp8 APAF1 Casp9cIAP Casp3 Diablo FLIP Advantages: Rich Information Familiar to Biologists Easy to interpret Disadvantages: Not always known Difficult in multi-experiment context Statistical evaluation problematic Often not regulated as a whole Mainly used for pathway-centric analysis

4 red/green color indicate up/down-regulation

5  If statistics is more important than graphics:  Use of 'categorial' data Examples Fas pathway Apoptosis inducers SNARE complex p53 target Chromosome 12q13.1 Plasma membrane protein NK-Cell marker Fas-L Fas FADD Casp8 APAF1 Casp9 cIAP Casp3 Diablo FLIP Advantages: Suitable for non-network data Better amenable to statistics Many data sources available Disadvantages: Fewer information Less intuitive More tedious interpretation Mainly used for gene set centric analysis

6 regulatednon-regulatedtotal Targets of cMyc20200 non-Targets total10025 000 The group of 100 top-regulated proteins contains 20 cMyc targets. Is this significant? There are 25 000 proteins in total, among them 200 cMyc targets 2480080 180 24900 24720 http://www.langsrud.com/fisher.htm Fisher's exact test ≈ χ2 test = Hypergeometric test p-Value = 1.34E-22 Enrichment = (20*24720)/(80*180) = 34.3-fold

7 Frequently used sources for pathway annotation Gene Ontology (GO) Comprehensive; Ontologies defined by consortium, gene assignments by EBI. Three different ontologies "biological process", "molecular function", "cellular component". Sequence motifs Functional domains and other conserved sequence regions. PROSITE, Pfam, etc. UniPROT keywords Keywords plus words from the publication titles, from the protein name and description. Chromosomal localization Derived from EnsEMBL, useful for tumor analysis, etc. Cell markers Collected from the literature and mutlipe published expression projects KEGG "Kyoto Encyclopedia of Genes and Genomes", mainly metabolic pathways Complex membership From publications (largely high throughput experiments). TF targets Collected from variousdatabases including MSigDB Curated pathways Collected from various databases including NetPath, PathWiki, Reactome

8 GO is the most widely used resource "The Gene Ontology (GO) project is a collaborative effort to address the need for consistent descriptions of gene products in different databases. The GO collaborators are developing three structured, controlled vocabularies (ontologies) that describe gene products in terms of their associated biological processes, cellular components and molecular functions in a species- independent manner" Ontologies defined by consortium (covering all of biology in all organisms) Gene assignments by 'genome authorities' human:EBI, mouse: MGD Three ontologies "biological process", "molecular function", "cellular component". Apoptosis Cell cycle Response to pathogen Protein Kinase Receptor Transcription factor Nucleus Inner Mito. Membrane Ribosome Organized as 'directed acyclic graph' (DAG) Cell Mitochondrium Membrane Mitochondrial Membrane Outer Membrane Organelle Inner Membrane Intermembrane space

9 GO is braindead at multiple levels II. Automatic mass-annotations Example 1: All Keratins (type I, II, cytokeratins, hair keratins, follicular keratins) have the same set of annotations: 'epidermis development', 'intermediate filament', 'keratin filament', 'structural constituent of epidermis', 'structural molecule'. Annotators often fall for misleading names: KCTD family is wrongly classified as 'potassium transporters' (with a whole group of associated annotations like e.g. 'plasma membrane associated') just because they contain a domain called 'potassium channel tetramerization domain'. There are lots of similar examples good coverage in broad 'boring' categories properties that can be gleaned from protein classes properties that are associated with sequence domains/motifs properties that can be guessed from the protein name poor coverage in more specific categories

10 GO is getting better: Cytokine Receptor Binding Interleukin-10 Receptor Binding Prolactin Receptor Binding Cytokine Activity This problem from two years ago has disappeared IL-10 Prolactin GH SOCS2 Number of false-negatives greatly reduced Number of inconsistencies between human and mouse greatly reduced

11 Useful outside resources for PA GSEA: http://www.broadinstitute.org/gsea/index.jsphttp://www.broadinstitute.org/gsea/index.jsp Gene set enrichment analysis. Similar concept as TreeRanker. DAVID: http://david.abcc.ncifcrf.gov/http://david.abcc.ncifcrf.gov/ Several services, including annotation enrichment Cytoscape: http://www.cytoscape.org/http://www.cytoscape.org/ Network designer/editor, extensible through modules. Userful for protein interaction networks, coloring pathways by expression, etc. Genemania: http://genemania.org/ Useful for finding connections within gene sets. Also available as cytoscape module


Download ppt "MN-B-C 2 Analysis of High Dimensional (-omics) Data Kay Hofmann – Protein Evolution Group Week 5: Proteomics."

Similar presentations


Ads by Google