Gene expression analysis

Slides:



Advertisements
Similar presentations
Applications of GO. Goals of Gene Ontology Project.
Advertisements

Introduction to Bioinformatics
Gene Ontology John Pinney
Gene function analysis Stem Cell Network Microarray Course, Unit 5 May 2007.
Microarray GEO – Microarray sets database
Dimension reduction : PCA and Clustering Agnieszka S. Juncker Slides: Christopher Workman and Agnieszka S. Juncker Center for Biological Sequence Analysis.
COG and GO tutorial.
‘Gene Shaving’ as a method for identifying distinct sets of genes with similar expression patterns Tim Randolph & Garth Tan Presentation for Stat 593E.
Microarrays and Cancer Segal et al. CS 466 Saurabh Sinha.
Introduction to Bioinformatics - Tutorial no. 12
Gene Expression 1. Methods –Unsupervised Clustering Hierarchical clustering K-means clustering Expression data –GEO –UCSC EPCLUST 2.
Comprehensive Annotation System for Infectious Disease Data Alexander Diehl University at Buffalo/The Jackson Laboratory IDO Workshop /9/2010.
Protein and Function Databases
Tutorial 8 Clustering 1. General Methods –Unsupervised Clustering Hierarchical clustering K-means clustering Expression data –GEO –UCSC –ArrayExpress.
Cluster Analysis Hierarchical and k-means. Expression data Expression data are typically analyzed in matrix form with each row representing a gene and.
Ulf Schmitz, Pattern recognition - Clustering1 Bioinformatics Pattern recognition - Clustering Ulf Schmitz
Gene Ontology at WormBase: Making the Most of GO Annotations Kimberly Van Auken.
 2 Outline  Review of major computational approaches to facilitate biological interpretation of  high-throughput microarray  and RNA-Seq experiments.
Introduction to Bioinformatics Algorithms Clustering and Microarray Analysis.
Evaluating Performance for Data Mining Techniques
PAT project Advanced bioinformatics tools for analyzing the Arabidopsis genome Proteins of Arabidopsis thaliana (PAT) & Gene Ontology (GO) Hongyu Zhang,
SPH 247 Statistical Analysis of Laboratory Data 1 May 12, 2015 SPH 247 Statistical Analysis of Laboratory Data.
Using The Gene Ontology: Gene Product Annotation.
SPH 247 Statistical Analysis of Laboratory Data 1May 14, 2013SPH 247 Statistical Analysis of Laboratory Data.
Gene expression analysis
Copyright OpenHelix. No use or reproduction without express written consent1.
Monday, November 8, 2:30:07 PM  Ontology is the philosophical study of the nature of being, existence or reality as such, as well as the basic categories.
From Functional Genomics to Physiological Model: Using the Gene Ontology Fiona McCarthy, Shane Burgess, Susan Bridges The AgBase Databases, Institute of.
Manual GO annotation Evidence: Source AnnotationsProteins IEA:Total Manual: Total
Introduction to the GO: a user’s guide Iowa State Workshop 11 June 2009.
Alastair Kerr, Ph.D. WTCCB Bioinformatics Core An introduction to DNA and Protein Sequence Databases.
Tutorial 7 Gene expression analysis 1. Expression data –GEO –UCSC –ArrayExpress General clustering methods –Unsupervised Clustering Hierarchical clustering.
Getting Started: a user’s guide to the GO GO Workshop 3-6 August 2010.
Functional Annotation and Functional Enrichment. Annotation Structural Annotation – defining the boundaries of features of interest (coding regions, regulatory.
1 Gene function annotation. 2 Outline  Functional annotation  Controlled vocabularies  Functional annotation at TAIR  Resources and tools at TAIR.
Analysis of GO annotation at cluster level by Agnieszka S. Juncker.
Getting Started: a user’s guide to the GO TAMU GO Workshop 17 May 2010.
Rice Proteins Data acquisition Curation Resources Development and integration of controlled vocabulary Gene Ontology Trait Ontology Plant Ontology
Introduction to the GO: a user’s guide NCSU GO Workshop 29 October 2009.
Gene expression & Clustering. Determining gene function Sequence comparison tells us if a gene is similar to another gene, e.g., in a new species –Dynamic.
1 Annotation EPP 245/298 Statistical Analysis of Laboratory Data.
Tutorial 8 Gene expression analysis 1. How to interpret an expression matrix Expression data DBs - GEO Clustering –Hierarchical clustering –K-means clustering.
1 Microarray Clustering. 2 Outline Microarrays Hierarchical Clustering K-Means Clustering Corrupted Cliques Problem CAST Clustering Algorithm.
CZ5211 Topics in Computational Biology Lecture 4: Clustering Analysis for Microarray Data II Prof. Chen Yu Zong Tel:
Clustering [Idea only, Chapter 10.1, 10.2, 10.4].
Extracting Biological Information from Gene Lists
Unsupervised Learning
Big data classification using neural network
Gene Annotation & Gene Ontology
Networks and Interactions
Annotating with GO: an overview
Clustering Manpreet S. Katari.
Hyunghoon Cho, Bonnie Berger, Jian Peng  Cell Systems 
Tutorial 6 : RNA - Sequencing Analysis and GO enrichment
GO : the Gene Ontology & Functional enrichment analysis
Introduction to the Gene Ontology
Mental Functioning and the Gene Ontology
Gene expression.
Microarray Clustering
John Nicholas Owen Sarah Smith
Analysis of GO annotation at cluster level by Agnieszka S. Juncker
Clustering.
Annotating Gene Products to the GO
Cluster Analysis in Bioinformatics
Insight into GO and GOA Angelica Tulipano , INFN Bari CNR
Gene Expression Analysis
Identification of aging-related genes and affected biological processes. Identification of aging-related genes and affected biological processes. (A) Experimental.
Clustering.
Hyunghoon Cho, Bonnie Berger, Jian Peng  Cell Systems 
Unsupervised Learning
Presentation transcript:

Gene expression analysis Tutorial 7 Gene expression analysis

Gene expression analysis Expression data GEO UCSC ArrayExpress General clustering methods Unsupervised Clustering Hierarchical clustering K-means clustering Tools for clustering EPCLUST Mev Functional analysis Go annotation

Gene expression data sources Microarrays RNA-seq experiments

Expression Data Matrix Gene 1 -1.2 -2.1 -3 -1.5 1.8 2.9 Gene 2 2.7 0.2 -1.1 1.6 -2.2 -1.7 Gene 3 -2.5 1.5 -0.1 -1 0.1 Gene 4 2.6 2.5 -2.3 Gene 5 2.2 Gene 6 -2.9 -1.9 -2.4 Each column represents all the gene expression levels from a single experiment. Each row represents the expression of a gene across all experiments.

Expression Data Matrix Gene 1 -1.2 -2.1 -3 -1.5 1.8 2.9 Gene 2 2.7 0.2 -1.1 1.6 -2.2 -1.7 Gene 3 -2.5 1.5 -0.1 -1 0.1 Gene 4 2.6 2.5 -2.3 Gene 5 2.2 Gene 6 -2.9 -1.9 -2.4 Each element is a log ratio: log2 (T/R). T - the gene expression level in the testing sample R - the gene expression level in the reference sample

Expression Data Matrix Black indicates a log ratio of zero, i.e. T=~R Green indicates a negative log ratio, i.e. T<R Grey indicates missing data Red indicates a positive log ratio, i.e. T>R

Different representations Microarray Data: Different representations T>R Log ratio Log ratio T<R Exp Exp

How to search for expression profiles GEO (Gene Expression Omnibus) http://www.ncbi.nlm.nih.gov/geo/ Human genome browser http://genome.ucsc.edu/ ArrayExpress http://www.ebi.ac.uk/arrayexpress/

Searching for expression profiles in the GEO Datasets - suitable for analysis with GEO tools Expression profiles by gene Probe sets *further curated= statistically comparable datasets Microarray experiments Groups of related microarray experiments

Clustering Download dataset Statistic analysis

Clustering analysis

Clustering Download dataset Statistic analysis

The expression distribution for different lines in the cluster

Searching for expression profiles in the Human Genome browser.

Keratine 10 is highly expressed in skin

ArrayExpress http://www.ebi.ac.uk/arrayexpress/

How to analyze gene expression data

Unsupervised Clustering - Hierarchical Clustering

Hierarchical Clustering genes with similar expression patterns are grouped together and are connected by a series of branches (dendrogram). 2 1 3 4 5 6 1 6 3 5 2 4 Leaves (shapes in our case) represent genes and the length of the paths between leaves represents the distances between genes.

How to determine the similarity between two genes? (for clustering) Patrik D'haeseleer, How does gene expression clustering work?, Nature Biotechnology 23, 1499 - 1501 (2005) , http://www.nature.com/nbt/journal/v23/n12/full/nbt1205-1499.html

Hierarchical clustering finds an entire hierarchy of clusters. If we want a certain number of clusters we need to cut the tree at a level indicates that number (in this case - four).

Hierarchical clustering result Five clusters

Unsupervised Clustering – K-means clustering An algorithm to classify the data into K number of groups. K=4

How does it work? 1 2 3 4 The centroid of each of the k clusters becomes the new means. k initial "means" (in this casek=3) are randomly selected from the data set (shown in color). k clusters are created by associating every observation with the nearest mean Steps 2 and 3 are repeated until convergence has been reached. The algorithm divides iteratively the genes into K groups and calculates the center of each group. The results are the optimal groups (center distances) for K clusters.

How should we determine K? Trial and error Take K as square root of gene number

Tools for clustering - EPclust http://www.bioinf.ebc.ee/EP/EP/EPCLUST/

In the input matrix each column should represents a gene and each row should represent an experiment (or individual). Hierarchical clustering Edit the input matrix: Transpose,Normalize,Randomize K-means clustering

In the input matrix each column should represents a gene and each row should represent an experiment (or individual). Hierarchical clustering

Data Clusters

In the input matrix each column should represents a gene and each row should represent an experiment (or individual). K-means clustering

Samples found in cluster Graphical representation of the cluster Graphical representation of the cluster

10 clusters, as requested

Tools for clustering - MeV http://www.tm4.org/mev/ Multi experiment viewer

Gene expression function analysis 1007_s_at 1053_at 117_at 121_at 1255_g_at 1294_at 1316_at 1320_at 1405_i_at 1431_at 1438_at 1487_at 1494_f_at 1598_g_at What can we learn from clusters?

Gene Ontology (GO) http://www.geneontology.org/ The Gene Ontology project provides an ontology of defined terms representing gene product properties. The ontology covers three domains:

Gene Ontology (GO) Cellular Component (CC) - the parts of a cell or its extracellular environment. Molecular Function (MF) - the elemental activities of a gene product at the molecular level, such as binding or catalysis. Biological Process (BP) - operations or sets of molecular events with a defined beginning and end, pertinent to the functioning of integrated living units: cells, tissues, organs, and organisms.

The GO tree

GO sources ISS Inferred from Sequence/Structural Similarity IDA Inferred from Direct Assay IPI Inferred from Physical Interaction TAS Traceable Author Statement NAS Non-traceable Author Statement IMP Inferred from Mutant Phenotype IGI Inferred from Genetic Interaction IEP Inferred from Expression Pattern IC Inferred by Curator ND No Data available IEA Inferred from electronic annotation

Search by AmiGO

Results for alpha-synuclein

DAVID http://david.abcc.ncifcrf.gov/   DAVID  http://david.abcc.ncifcrf.gov/ Functional Annotation Bioinformatics Microarray Analysis Identify enriched biological themes, particularly GO terms Discover enriched functional-related gene/protein groups Cluster redundant annotation terms Explore gene names in batch 

annotation classification ID conversion

Functional annotation Upload Annotation options

Gene expression analysis Expression data GEO UCSC ArrayExpress General clustering methods Unsupervised Clustering Hierarchical clustering K-means clustering Tools for clustering EPCLUST Mev Functional analysis Go annotation