Tutorial 7 Gene expression analysis 1. Expression data –GEO –UCSC –ArrayExpress General clustering methods –Unsupervised Clustering Hierarchical clustering.

Slides:



Advertisements
Similar presentations
A Comparative mapping resource ONTOLOGY DEVELOPMENT AND INTEGRATION IN GRAMENE Pankaj Jaiswal Cornell University.
Advertisements

GO : the Gene Ontology “because you know sometimes words have two meanings” Amelia Ireland GO Curator EBI, Cambridge, UK.
Annotating Gene Products to the GO Harold J Drabkin Senior Scientific Curator The Jackson Laboratory Mouse.
Introduction to Bioinformatics
Gene Ontology John Pinney
Gene function analysis Stem Cell Network Microarray Course, Unit 5 May 2007.
Introduction to Functional Analysis J.L. Mosquera and Alex Sanchez.
Extraction and comparison of gene expression patterns from 2D RNA in situ hybridization images BIOINFORMATICS Gene expression Vol. 26, no. 6, 2010, pages.
Microarray GEO – Microarray sets database
Dimension reduction : PCA and Clustering Agnieszka S. Juncker Slides: Christopher Workman and Agnieszka S. Juncker Center for Biological Sequence Analysis.
Introduction to Bioinformatics Algorithms Clustering.
COG and GO tutorial.
‘Gene Shaving’ as a method for identifying distinct sets of genes with similar expression patterns Tim Randolph & Garth Tan Presentation for Stat 593E.
Microarrays and Cancer Segal et al. CS 466 Saurabh Sinha.
Genome analysis and annotation Part II. THE INSTITUTE FOR GENOMIC RESEARCH TIGRTIGR Evidence View S.mansoni PASA assemblies S. japonicum EST alignments.
Today’s menu: -UniProt - SwissProt/TrEMBL -PROSITE -Pfam -Gene Onltology Protein and Function Databases Tutorial 7.
Biology 224 Dr. Tom Peavy Sept 27 & 29 Protein Structure & Analysis- part 2.
Dimension reduction : PCA and Clustering Christopher Workman Center for Biological Sequence Analysis DTU.
Today’s menu: -UniProt - SwissProt/TrEMBL -PROSITE -Pfam -Gene Onltology Protein and Function Databases Tutorial 7.
Today’s menu: -SwissProt/TrEMBL -PROSITE -Pfam -Gene Onltology Protein and Function Databases Tutorial 7.
Introduction to Bioinformatics - Tutorial no. 12
Gene Expression 1. Methods –Unsupervised Clustering Hierarchical clustering K-means clustering Expression data –GEO –UCSC EPCLUST 2.
Microarray analysis 2 Golan Yona. 2) Analysis of co-expression Search for similarly expressed genes experiment1 experiment2 experiment3 ……….. Gene i:
Protein and Function Databases
Tutorial 8 Clustering 1. General Methods –Unsupervised Clustering Hierarchical clustering K-means clustering Expression data –GEO –UCSC –ArrayExpress.
Cluster Analysis Hierarchical and k-means. Expression data Expression data are typically analyzed in matrix form with each row representing a gene and.
Ulf Schmitz, Pattern recognition - Clustering1 Bioinformatics Pattern recognition - Clustering Ulf Schmitz
GCB/CIS 535 Microarray Topics John Tobias November 15 th, 2004.
Today’s menu: -UniProt - SwissProt/TrEMBL -PROSITE -Pfam -Gene Onltology Protein and Function Databases Tutorial 7.
 2 Outline  Review of major computational approaches to facilitate biological interpretation of  high-throughput microarray  and RNA-Seq experiments.
Introduction to Bioinformatics Algorithms Clustering and Microarray Analysis.
Evaluating Performance for Data Mining Techniques
Methods for Creating GO Annotations Emily Dimmer European Bioinformatics Institute Wellcome Trust Genome Campus Cambridge UK.
GO : the Gene Ontology “because you know sometimes words have two meanings” Amelia Ireland GO Curator EBI, Cambridge, UK.
Functional genomics + Data mining BCH364C/391L Systems Biology / Bioinformatics – Spring 2015 Edward Marcotte, Univ of Texas at Austin.
Annotating Gene Products to the GO Harold J Drabkin Senior Scientific Curator The Jackson Laboratory Mouse.
Networks and Interactions Boo Virk v1.0.
Biology 224 Instructor: Tom Peavy Feb 21 & 26, Protein Structure & Analysis.
GENE ONTOLOGY FOR THE NEWBIES Suparna Mundodi, PhD The Arabidopsis Information Resources, Stanford, CA.
1 Motivation Web query is usually two or three words long. –Prone to ambiguity –Example “keyboard” –Input device of computer –Musical instruments How can.
Gene expression analysis
BIOINFORMATIK I UEBUNG 2 mRNA processing.
Monday, November 8, 2:30:07 PM  Ontology is the philosophical study of the nature of being, existence or reality as such, as well as the basic categories.
Introduction to the GO: a user’s guide Iowa State Workshop 11 June 2009.
Log 2 (expression) H3K4me2 score A SLAMF6 log 2 (expression) Supplementary Fig. 1. H3K4me2 profiles vary significantly between loci of genes expressed.
A Knowledge-Based Clustering Algorithm Driven by Gene Ontology Jill Cheng Affymetrix, Inc. Jan 15, 2004.
Course Work Project Project title “Data Analysis Methods for Microarray Based Gene Expression Analysis” Sushil Kumar Singh (batch ) IBAB, Bangalore.
A B Supporting Information Figure S1: Distribution of the density of expression intensities for the complete microarray dataset (A) and after removal of.
Gene expression & Clustering. Determining gene function Sequence comparison tells us if a gene is similar to another gene, e.g., in a new species –Dynamic.
Radial Basis Function ANN, an alternative to back propagation, uses clustering of examples in the training set.
Analyzing Expression Data: Clustering and Stats Chapter 16.
Flat clustering approaches
GO enrichment and GOrilla
Tutorial 8 Gene expression analysis 1. How to interpret an expression matrix Expression data DBs - GEO Clustering –Hierarchical clustering –K-means clustering.
Lloyd Algorithm K-Means Clustering. Gene Expression Susumu Ohno: whole genome duplications The expression of genes can be measured over time. Identifying.
1 Microarray Clustering. 2 Outline Microarrays Hierarchical Clustering K-Means Clustering Corrupted Cliques Problem CAST Clustering Algorithm.
CZ5211 Topics in Computational Biology Lecture 4: Clustering Analysis for Microarray Data II Prof. Chen Yu Zong Tel:
Given a set of data points as input Randomly assign each point to one of the k clusters Repeat until convergence – Calculate model of each of the k clusters.
Cluster Analysis What is Cluster Analysis? Types of Data in Cluster Analysis A Categorization of Major Clustering Methods Partitioning Methods.
Unsupervised Learning
Hyunghoon Cho, Bonnie Berger, Jian Peng  Cell Systems 
Tutorial 6 : RNA - Sequencing Analysis and GO enrichment
Microarray Clustering
John Nicholas Owen Sarah Smith
Analysis of GO annotation at cluster level by Agnieszka S. Juncker
Gene expression analysis
Clustering.
Clustering.
Hyunghoon Cho, Bonnie Berger, Jian Peng  Cell Systems 
Unsupervised Learning
Presentation transcript:

Tutorial 7 Gene expression analysis 1

Expression data –GEO –UCSC –ArrayExpress General clustering methods –Unsupervised Clustering Hierarchical clustering K-means clustering Tools for clustering –EPCLUST –Mev Functional analysis –Go annotation 2

Gene expression data sources 3 MicroarraysRNA-seq experiments

Expression Data Matrix Each column represents all the gene expression levels from a single experiment. Each row represents the expression of a gene across all experiments. Exp1Exp 2Exp3Exp4Exp5Exp6 Gene Gene Gene Gene Gene Gene

Expression Data Matrix Each element is a log ratio: log 2 (T/R). T - the gene expression level in the testing sample R - the gene expression level in the reference sample Exp1Exp 2Exp3Exp4Exp5Exp6 Gene Gene Gene Gene Gene Gene

Expression Data Matrix Black indicates a log ratio of zero, i.e. T=~R Green indicates a negative log ratio, i.e. T<R Red indicates a positive log ratio, i.e. T>R Grey indicates missing data 6

Exp Log ratio Exp Log ratio Microarray Data: Different representations T<R T>R 7

8 How to search for expression profiles GEO (Gene Expression Omnibus) Human genome browser ArrayExpress

9

Datasets - suitable for analysis with GEO tools Expression profiles by gene Microarray experiments Probe sets Groups of related microarray experiments 10 Searching for expression profiles in the GEO

Download dataset Clustering Statistic analysis 11

Clustering analysis 12

Download dataset Clustering Statistic analysis 13

14 The expression distribution for different lines in the cluster

Searching for expression profiles in the Human Genome browser. 15

Keratine 10 is highly expressed in skin 16

17 ArrayExpress

18

19

20

21

22 How to analyze gene expression data

Unsupervised Clustering - Hierarchical Clustering 23

genes with similar expression patterns are grouped together and are connected by a series of branches (dendrogram) Leaves (shapes in our case) represent genes and the length of the paths between leaves represents the distances between genes. Hierarchical Clustering

How to determine the similarity between two genes? (for clustering) Patrik D'haeseleer, How does gene expression clustering work?, Nature Biotechnology 23, (2005), 25

26 If we want a certain number of clusters we need to cut the tree at a level indicates that number (in this case - four). Hierarchical clustering finds an entire hierarchy of clusters.

Hierarchical clustering result 27 Five clusters

An algorithm to classify the data into K number of groups. 28 K=4 Unsupervised Clustering – K-means clustering

How does it work? 29 The algorithm divides iteratively the genes into K groups and calculates the center of each group. The results are the optimal groups (center distances) for K clusters k initial "means" (in this casek=3) are randomly selected from the data set (shown in color). k clusters are created by associating every observation with the nearest mean The centroid of each of the k clusters becomes the new means. Steps 2 and 3 are repeated until convergence has been reached.

30 How should we determine K? Trial and error Take K as square root of gene number

31 Tools for clustering - EPclust

32

33

34

35

36

37

Edit the input matrix: Transpose,Normalize,Randomize 38 Hierarchical clustering K-means clustering In the input matrix each column should represents a gene and each row should represent an experiment (or individual).

39 Hierarchical clustering In the input matrix each column should represents a gene and each row should represent an experiment (or individual).

40 Clusters Data

41 K-means clustering In the input matrix each column should represents a gene and each row should represent an experiment (or individual).

Graphical representation of the cluster Samples found in cluster 42

10 clusters, as requested 43

44 Tools for clustering - MeV

_s_at 1053_at 117_at 121_at 1255_g_at 1294_at 1316_at 1320_at 1405_i_at 1431_at 1438_at 1487_at 1494_f_at 1598_g_at What can we learn from clusters? Gene expression function analysis

Gene Ontology (GO) The Gene Ontology project provides an ontology of defined terms representing gene product properties. The ontology covers three domains:

47 Cellular Component (CC) - the parts of a cell or its extracellular environment. Molecular Function (MF) - the elemental activities of a gene product at the molecular level, such as binding or catalysis. Biological Process (BP) - operations or sets of molecular events with a defined beginning and end, pertinent to the functioning of integrated living units: cells, tissues, organs, and organisms. Gene Ontology (GO)

The GO tree

GO sources ISSInferred from Sequence/Structural Similarity IDAInferred from Direct Assay IPI Inferred from Physical Interaction TASTraceable Author Statement NASNon-traceable Author Statement IMPInferred from Mutant Phenotype IGIInferred from Genetic Interaction IEPInferred from Expression Pattern ICInferred by Curator NDNo Data available IEAInferred from electronic annotation

Search by AmiGO

Results for alpha-synuclein

DAVID Functional Annotation Bioinformatics Microarray Analysis Identify enriched biological themes, particularly GO terms Discover enriched functional-related gene/protein groups Cluster redundant annotation terms Explore gene names in batch

ID conversion annotation classification

Functional annotation Upload Annotation options

56

Gene expression analysis Expression data –GEO –UCSC –ArrayExpress General clustering methods –Unsupervised Clustering Hierarchical clustering K-means clustering Tools for clustering –EPCLUST –Mev Functional analysis –Go annotation 57