2 What is Gene Expression? The process of transcribing and translating a gene to yield a protein productWhy are we interested in gene expression?Tells us which genes are involved in which functionsAlthough all the cells in the human body have the same genome, only a fraction of the genes are being expressed for any given function.
3 Gene ExpressionCells are different because of differential gene expression - proteomeAbout 40% of human genes are expressed at one time.Gene is expressed by transcribing DNA into single-stranded mRNA - transcriptomemRNA is later translated into a protein
5 Gene ExpressionGenes control cell behavior by controlling which proteins are made by a cellHouse keeping genes vs. cell/tissue specific genesRegulation:Transcriptional (promoters and enhancers)Post Transcriptional (RNA splicing, stability, localization -small non coding RNAs)
8 Traditional Methods Northern Blotting Western Blotting RT-PCR Single RNA isolatedProbed with labeled cDNAWestern BlottingMultiple proteinsProbed with antibodies to a specific proteinRT-PCRPrimers amplify specific cDNA transcripts
10 How do Microarrays work? New Technology (first paper: 1995)Allows study of thousands of genes at same timeGlass slide of DNA moleculesMolecule: string of bases (25 bp – 500 bp)uniquely identifies gene or unit to be studied
11 Gene Expression Microarrays The main types of gene expression microarrays:Short oligonucleotide arrays (Affymetrix)cDNA or spotted arrays (Brown/Botstein).Long oligonucleotide arrays (Agilent Inkjet);Fiber-optic arrays...
12 Fabrications of Microarrays Size of a microscope slideImages:
13 Differing Conditions Ultimate Goal: Helps to: Understand expression level of genes under different conditionsHelps to:Determine genes involved in a diseasePathways to a diseaseUsed as a screening tool
14 Gene Conditions Cell types (brain vs. liver) Developmental (fetal vs. adult)Response to stimulusGene activity (wild vs. mutant)Disease states (healthy vs. diseased)
15 Expressed Genes Genes under a given condition mRNA extracted from cellsmRNA labeledLabeled mRNA is mRNA present in a given conditionLabeled mRNA will hybridize (base pair) with corresponding sequence on slide
16 Two Different Types of Microarrays Custom spotted arrays (up to 20,000 sequences)cDNAOligonucleotideHigh-density (up to 100,000 sequences) synthetic oligonucleotide arraysAffymetrix (25 bases)
22 Gene Expression Data Gene expression data on p genes for n samples mRNA samplessample1 sample2 sample3 sample4 sample5 …Genes3Gene expression level of gene i in mRNA sample jLog (Red intensity / Green intensity)=Log(Avg. PM - Avg. MM)
23 Some possible applications? Sample from specific organ to show which genes are expressedCompare samples from healthy and sick host to find gene-disease connectionProbes are sets of human pathogens for disease detection
24 Huge amount of data from single microarray If just two color, then amount of data on array with N probes is 2NCannot analyze pixel by pixelAnalyze by pattern – cluster analysis
25 Major Data Mining Techniques Link AnalysisAssociations DiscoverySequential Pattern DiscoverySimilar Time Series DiscoveryPredictive ModelingClassificationClustering
26 Some clustering methods and software Partitioning：K-Means, K-Medoids, PAM, CLARA …Hierarchical：Cluster, HAC、BIRCH、CURE、ROCKDensity-based： CAST, DBSCAN、OPTICS、CLIQUE…Grid-based：STING、CLIQUE、WaveCluster…Model-based：SOM (self-organized map)、COBWEB、CLASSIT、AutoClass…Two-way ClusteringBlock clusteringActually a number of clustering methods have been proposed. I’ll go through some representative types in the following slides.
28 A dendrogram (tree) for clustered genes E.g. p=5Let p = number of genes.1. Calculate within class correlation.2. Perform hierarchical clustering which will produce (2p-1) clusters of genes.3. Average within clusters of genes.4 Perform testing on averages of clusters of genes as if they were single genes.Cluster 6=(1,2)Cluster 7=(1,2,3)Cluster 8=(4,5)Cluster 9=(1,2,3,4,5)1412345
29 A real case Nature Feb, 2000 Paper by Allzadeh. A et al Distinct types ofdiffuse largeB-cell lymphomaidentified by geneexpressionprofiling6
31 Gene Expression is Time-Dependent Time Course Databasically, this mining task can be achieved by using clustering techniques. As shown in this sample clustering results, Each curve represents the expression of a gene over conducted experiments, and all genes are classify into six clusters. As you can see, the genes in a same group have very similar pattern, while each group differs from another pretty much.
32 Sample of time course of clustered genestimetimetimebasically, this mining task can be achieved by using clustering techniques. As shown in this sample clustering results, Each curve represents the expression of a gene over conducted experiments, and all genes are classify into six clusters. As you can see, the genes in a same group have very similar pattern, while each group differs from another pretty much.
33 Limitations Cluster analyses: Single gene tests: Usually outside the normal framework of statistical inferenceLess appropriate when only a few genes are likely to changeNeeds lots of experimentsSingle gene tests:May be too noisy in general to show muchMay not reveal coordinated effects of positively correlated genes.Hard to relate to pathways
34 But a few Links Affymetrix www.affymetrix.com Stanford MicroArray DatabaseYale Microarray DatabaseNCBI Gene Expression OmnibusUniversity of North Carolina Database