Gene Expression Chapter 9
What is Gene Expression? The process of transcribing and translating a gene to yield a protein product Why are we interested in gene expression? Tells us which genes are involved in which functions Although all the cells in the human body have the same genome, only a fraction of the genes are being expressed for any given function.
Gene Expression Cells are different because of differential gene expression - proteome About 40% of human genes are expressed at one time. Gene is expressed by transcribing DNA into single-stranded mRNA - transcriptome mRNA is later translated into a protein
Molecular Biology Overview Nucleus Cell Chromosome Protein Gene (DNA) Gene (mRNA), single strand cDNA
Gene Expression Genes control cell behavior by controlling which proteins are made by a cell House keeping genes vs. cell/tissue specific genes Regulation: Transcriptional (promoters and enhancers) Post Transcriptional (RNA splicing, stability, localization -small non coding RNAs)
Gene Expression Regulation: Translational (3’UTR repressors, poly A tail) Post Transcriptional (RNA splicing, stability, localization -small non coding RNAs) Post Translational (Protein modification: carbohydrates, lipids, phosphorylation, hydroxylation, methlylation, precursor protein) cDNA
How do you measure Gene Expression?
Traditional Methods Northern Blotting Western Blotting RT-PCR Single RNA isolated Probed with labeled cDNA Western Blotting Multiple proteins Probed with antibodies to a specific protein RT-PCR Primers amplify specific cDNA transcripts
How do Microarrays work? New Technology (first paper: 1995) Allows study of thousands of genes at same time Glass slide of DNA molecules Molecule: string of bases (25 bp – 500 bp) uniquely identifies gene or unit to be studied
Gene Expression Microarrays The main types of gene expression microarrays: Short oligonucleotide arrays (Affymetrix) cDNA or spotted arrays (Brown/Botstein). Long oligonucleotide arrays (Agilent Inkjet); Fiber-optic arrays ...
Fabrications of Microarrays Size of a microscope slide Images: http://www.affymetrix.com/
Differing Conditions Ultimate Goal: Helps to: Understand expression level of genes under different conditions Helps to: Determine genes involved in a disease Pathways to a disease Used as a screening tool
Gene Conditions Cell types (brain vs. liver) Developmental (fetal vs. adult) Response to stimulus Gene activity (wild vs. mutant) Disease states (healthy vs. diseased)
Expressed Genes Genes under a given condition mRNA extracted from cells mRNA labeled Labeled mRNA is mRNA present in a given condition Labeled mRNA will hybridize (base pair) with corresponding sequence on slide
Two Different Types of Microarrays Custom spotted arrays (up to 20,000 sequences) cDNA Oligonucleotide High-density (up to 100,000 sequences) synthetic oligonucleotide arrays Affymetrix (25 bases)
Microarray Technology
Microarray Image Analysis Microarrays detect gene interactions: 4 colors: Green: high control Red: High sample Yellow: Equal Black: None Problem is to quantify image signals
Microarray Animations Davidson University: http://www.bio.davidson.edu/courses/genomics/chip/chip.html Imagecyte: http://www.imagecyte.com/array2.html
Microarray analysis Operation Principle: Samples are tagged with flourescent material to show pattern of sample-probe interaction (hybridization) Microarray may have 60K probe Microarray analysis
Microarray Processing sequence
Gene Expression Data Gene expression data on p genes for n samples mRNA samples sample1 sample2 sample3 sample4 sample5 … 1 0.46 0.30 0.80 1.51 0.90 ... 2 -0.10 0.49 0.24 0.06 0.46 ... 3 0.15 0.74 0.04 0.10 0.20 ... 4 -0.45 -1.03 -0.79 -0.56 -0.32 ... 5 -0.06 1.06 1.35 1.09 -1.09 ... Genes 3 Gene expression level of gene i in mRNA sample j Log (Red intensity / Green intensity) = Log(Avg. PM - Avg. MM)
Some possible applications? Sample from specific organ to show which genes are expressed Compare samples from healthy and sick host to find gene-disease connection Probes are sets of human pathogens for disease detection
Huge amount of data from single microarray If just two color, then amount of data on array with N probes is 2N Cannot analyze pixel by pixel Analyze by pattern – cluster analysis
Major Data Mining Techniques Link Analysis Associations Discovery Sequential Pattern Discovery Similar Time Series Discovery Predictive Modeling Classification Clustering
Some clustering methods and software Partitioning:K-Means, K-Medoids, PAM, CLARA … Hierarchical:Cluster, HAC、BIRCH、CURE、ROCK Density-based: CAST, DBSCAN、OPTICS、CLIQUE… Grid-based:STING、CLIQUE、WaveCluster… Model-based:SOM (self-organized map)、COBWEB、CLASSIT、AutoClass… Two-way Clustering Block clustering Actually a number of clustering methods have been proposed. I’ll go through some representative types in the following slides.
randomized row column both data clustered Eisen et al. Proc. Natl. Acad. Sci. USA 95 (1998) time
A dendrogram (tree) for clustered genes E.g. p=5 Let p = number of genes. 1. Calculate within class correlation. 2. Perform hierarchical clustering which will produce (2p-1) clusters of genes. 3. Average within clusters of genes. 4 Perform testing on averages of clusters of genes as if they were single genes. Cluster 6=(1,2) Cluster 7=(1,2,3) Cluster 8=(4,5) Cluster 9= (1,2,3,4,5) 14 1 2 3 4 5
A real case Nature Feb, 2000 Paper by Allzadeh. A et al Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling 6
Discovering sub-groups 7
Gene Expression is Time-Dependent Time Course Data basically, this mining task can be achieved by using clustering techniques. As shown in this sample clustering results, Each curve represents the expression of a gene over conducted experiments, and all genes are classify into six clusters. As you can see, the genes in a same group have very similar pattern, while each group differs from another pretty much.
Sample of time course of clustered genes time time time basically, this mining task can be achieved by using clustering techniques. As shown in this sample clustering results, Each curve represents the expression of a gene over conducted experiments, and all genes are classify into six clusters. As you can see, the genes in a same group have very similar pattern, while each group differs from another pretty much.
Limitations Cluster analyses: Single gene tests: Usually outside the normal framework of statistical inference Less appropriate when only a few genes are likely to change Needs lots of experiments Single gene tests: May be too noisy in general to show much May not reveal coordinated effects of positively correlated genes. Hard to relate to pathways
But a few Links Affymetrix www.affymetrix.com Stanford MicroArray Database http://smd.stanford.edu/resources/restech.shtml Yale Microarray Database http://www.med.yale.edu/microarray/ NCBI Gene Expression Omnibus http://www.ncbi.nlm.nih.gov/geo/ University of North Carolina Database https://genome.unc.edu/