MN-B-C 2 Analysis of High Dimensional (-omics) Data Kay Hofmann – Protein Evolution Group Week 5: Proteomics.

Slides:



Advertisements
Similar presentations
Asking translational research questions using ontology enrichment analysis Nigam Shah
Advertisements

Lecture Outline Introduction Data mining sources: –GO, InterPro, KEGG, UniProt Tools to do the data mining: –FatiGO –FatiWISE.
Gene Ontology John Pinney
Data mining with the Gene Ontology Josep Lluís Mosquera April 2005 Grup de Recerca en Estadística i Bioinformàtica GOing into Biological Meaning.
Systems Biology Existing and future genome sequencing projects and the follow-on structural and functional analysis of complete genomes will produce an.
1 Using Gene Ontology. 2 Assigning (or Hypothesizing About) Biological Meaning to Clusters What do you want to be able to to? –Identify over-represented.
Gene Ontology Luis Tari. Gene Ontology (GO) URL: Gene Ontology is A hierarchy of roles of genes.
Gene Set Analysis 09/24/07. From individual gene to gene sets Finding a list of differentially expressed genes is only the starting point. Suppose we.
Biology 224 Dr. Tom Peavy Sept 27 & 29 Protein Structure & Analysis- part 2.
Biological Interpretation of Microarray Data Helen Lockstone DTC Bioinformatics Course 9 th February 2010.
Today’s menu: -UniProt - SwissProt/TrEMBL -PROSITE -Pfam -Gene Onltology Protein and Function Databases Tutorial 7.
Today’s menu: -SwissProt/TrEMBL -PROSITE -Pfam -Gene Onltology Protein and Function Databases Tutorial 7.
Internet tools for genomic analysis: part 2
Introduction to molecular networks Sushmita Roy BMI/CS 576 Nov 6 th, 2014.
Protein and Function Databases
Today’s menu: -UniProt - SwissProt/TrEMBL -PROSITE -Pfam -Gene Onltology Protein and Function Databases Tutorial 7.
Analysis of GO annotation at cluster level by H. Bjørn Nielsen Slides from Agnieszka S. Juncker.
Gene Ontology and Functional Enrichment Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein.
 2 Outline  Review of major computational approaches to facilitate biological interpretation of  high-throughput microarray  and RNA-Seq experiments.
DEMO CSE fall. What is GeneMANIA GeneMANIA finds other genes that are related to a set of input genes, using a very large set of functional.
BTN323: INTRODUCTION TO BIOLOGICAL DATABASES Day2: Specialized Databases Lecturer: Junaid Gamieldien, PhD
Lecture 4: Gene Annotation & Gene Ontology June 11, 2015.
Daniel Rico, PhD. Daniel Rico, PhD. ::: Introduction to Functional Analysis Course on Functional Analysis Bioinformatics Unit.
Ch10. Intermolecular Interactions and Biological Pathways
Automatic methods for functional annotation of sequences Petri Törönen.
Getting the story – biological model based on microarray data Once the differentially expressed genes are identified (sometimes hundreds of them), we need.
Gene Set Enrichment Analysis (GSEA)
A systems biology approach to the identification and analysis of transcriptional regulatory networks in osteocytes Angela K. Dean, Stephen E. Harris, Jianhua.
Bioinformatics Dr. Víctor Treviño BT4007
EGAN: Exploratory Gene Association Networks by Jesse Paquette Biostatistics and Computational Biology Core Helen Diller Family Comprehensive Cancer Center.
Networks and Interactions Boo Virk v1.0.
Fission Yeast Computing Workshop -1- Searching, querying, browsing downloading and analysing data using PomBase Basic PomBase Features Gene Page Overview.
GENE ONTOLOGY FOR THE NEWBIES Suparna Mundodi, PhD The Arabidopsis Information Resources, Stanford, CA.
Network & Systems Modeling 29 June 2009 NCSU GO Workshop.
The Gene Ontology project Jane Lomax. Ontology (for our purposes) “an explicit specification of some topic” – Stanford Knowledge Systems Lab Includes:
Gene Ontology TM (GO) Consortium Jennifer I Clark EMBL Outstation - European Bioinformatics Institute (EBI), Hinxton, Cambridge CB10 1SD, UK Objectives:
BIOINFORMATIK I UEBUNG 2 mRNA processing.
Ontologies GO Workshop 3-6 August Ontologies  What are ontologies?  Why use ontologies?  Open Biological Ontologies (OBO), National Center for.
Monday, November 8, 2:30:07 PM  Ontology is the philosophical study of the nature of being, existence or reality as such, as well as the basic categories.
Bioinformatics lectures at Rice University Li Zhang Lecture 9: Networks and integrative genomic analysis
Introduction to the GO: a user’s guide Iowa State Workshop 11 June 2009.
Protein and RNA Families
Getting Started: a user’s guide to the GO GO Workshop 3-6 August 2010.
Functional Annotation and Functional Enrichment. Annotation Structural Annotation – defining the boundaries of features of interest (coding regions, regulatory.
Other biological databases and ontologies. Biological systems Taxonomic data Literature Protein folding and 3D structure Small molecules Pathways and.
Motif discovery and Protein Databases Tutorial 5.
Getting Started: a user’s guide to the GO TAMU GO Workshop 17 May 2010.
Gene set analyses of genomic datasets Andreas Schlicker Jelle ten Hoeve Lodewyk Wessels.
Introduction to the GO: a user’s guide NCSU GO Workshop 29 October 2009.
Scope of the Gene Ontology Vocabularies. Compile structured vocabularies describing aspects of molecular biology Describe gene products using vocabulary.
Introduction to biological molecular networks
GO enrichment and GOrilla
Discovering functional interaction patterns in Protein-Protein Interactions Networks   Authors: Mehmet E Turnalp Tolga Can Presented By: Sandeep Kumar.
Tools in Bioinformatics Ontologies and pathways. Why are ontologies needed? A free text is the best way to describe what a protein does to a human reader.
Protein databases Petri Törönen Shamelessly copied from material done by Eija Korpelainen and from CSC bio-opas
Gene Ontology TM (GO) Consortium
2/3/2005 Gene Ontology (GO) The Gene Ontology (GO) project is a collaborative effort to address the need for consistent descriptions.
 What is MSA (Multiple Sequence Alignment)? What is it good for? How do I use it?  Software and algorithms The programs How they work? Which to use?
Gene Annotation & Gene Ontology May 24, Gene lists from RNAseq analysis What do you do with a list of 100s of genes that contain only the following.
Gene Annotation & Gene Ontology
a Cytoscape plugin to assess enrichment of
Networks and Interactions
Canadian Bioinformatics Workshops
Clustering Manpreet S. Katari.
Pathway Analysis June 13, 2017.
GO : the Gene Ontology & Functional enrichment analysis
Overview Gene Ontology Introduction Biological network data
What is an Ontology An ontology is a set of terms, relationships and definitions that capture the knowledge of a certain domain. (common ontology ≠ common.
Pathway Analysis July 9, 2019.
Presentation transcript:

MN-B-C 2 Analysis of High Dimensional (-omics) Data Kay Hofmann – Protein Evolution Group Week 5: Proteomics 3

Map genes to pathway components Consider one single pathway at a time Visualize experimental data in pathway diagram Consider a group of genes with interesting experimental finding Find all pathway associations Statistical test for pathways that are over-represented in group Pathway-centric AnalysisGene set centric Analysis

 Classical network/pathway representation  Implies upstream/downstream ordering Fas-L Fas FADD Casp8 APAF1 Casp9cIAP Casp3 Diablo FLIP Advantages: Rich Information Familiar to Biologists Easy to interpret Disadvantages: Not always known Difficult in multi-experiment context Statistical evaluation problematic Often not regulated as a whole Mainly used for pathway-centric analysis

red/green color indicate up/down-regulation

 If statistics is more important than graphics:  Use of 'categorial' data Examples Fas pathway Apoptosis inducers SNARE complex p53 target Chromosome 12q13.1 Plasma membrane protein NK-Cell marker Fas-L Fas FADD Casp8 APAF1 Casp9 cIAP Casp3 Diablo FLIP Advantages: Suitable for non-network data Better amenable to statistics Many data sources available Disadvantages: Fewer information Less intuitive More tedious interpretation Mainly used for gene set centric analysis

regulatednon-regulatedtotal Targets of cMyc20200 non-Targets total The group of 100 top-regulated proteins contains 20 cMyc targets. Is this significant? There are proteins in total, among them 200 cMyc targets Fisher's exact test ≈ χ2 test = Hypergeometric test p-Value = 1.34E-22 Enrichment = (20*24720)/(80*180) = 34.3-fold

Frequently used sources for pathway annotation Gene Ontology (GO) Comprehensive; Ontologies defined by consortium, gene assignments by EBI. Three different ontologies "biological process", "molecular function", "cellular component". Sequence motifs Functional domains and other conserved sequence regions. PROSITE, Pfam, etc. UniPROT keywords Keywords plus words from the publication titles, from the protein name and description. Chromosomal localization Derived from EnsEMBL, useful for tumor analysis, etc. Cell markers Collected from the literature and mutlipe published expression projects KEGG "Kyoto Encyclopedia of Genes and Genomes", mainly metabolic pathways Complex membership From publications (largely high throughput experiments). TF targets Collected from variousdatabases including MSigDB Curated pathways Collected from various databases including NetPath, PathWiki, Reactome

GO is the most widely used resource "The Gene Ontology (GO) project is a collaborative effort to address the need for consistent descriptions of gene products in different databases. The GO collaborators are developing three structured, controlled vocabularies (ontologies) that describe gene products in terms of their associated biological processes, cellular components and molecular functions in a species- independent manner" Ontologies defined by consortium (covering all of biology in all organisms) Gene assignments by 'genome authorities' human:EBI, mouse: MGD Three ontologies "biological process", "molecular function", "cellular component". Apoptosis Cell cycle Response to pathogen Protein Kinase Receptor Transcription factor Nucleus Inner Mito. Membrane Ribosome Organized as 'directed acyclic graph' (DAG) Cell Mitochondrium Membrane Mitochondrial Membrane Outer Membrane Organelle Inner Membrane Intermembrane space

GO is braindead at multiple levels II. Automatic mass-annotations Example 1: All Keratins (type I, II, cytokeratins, hair keratins, follicular keratins) have the same set of annotations: 'epidermis development', 'intermediate filament', 'keratin filament', 'structural constituent of epidermis', 'structural molecule'. Annotators often fall for misleading names: KCTD family is wrongly classified as 'potassium transporters' (with a whole group of associated annotations like e.g. 'plasma membrane associated') just because they contain a domain called 'potassium channel tetramerization domain'. There are lots of similar examples good coverage in broad 'boring' categories properties that can be gleaned from protein classes properties that are associated with sequence domains/motifs properties that can be guessed from the protein name poor coverage in more specific categories

GO is getting better: Cytokine Receptor Binding Interleukin-10 Receptor Binding Prolactin Receptor Binding Cytokine Activity This problem from two years ago has disappeared IL-10 Prolactin GH SOCS2 Number of false-negatives greatly reduced Number of inconsistencies between human and mouse greatly reduced

Useful outside resources for PA GSEA: Gene set enrichment analysis. Similar concept as TreeRanker. DAVID: Several services, including annotation enrichment Cytoscape: Network designer/editor, extensible through modules. Userful for protein interaction networks, coloring pathways by expression, etc. Genemania: Useful for finding connections within gene sets. Also available as cytoscape module