Gene Expression And Regulation Bioinformatics January 11, 2006 D. A. McClellan

1 Gene Expression And Regulation Bioinformatics January 11, 2006 D. A. McClellan (

2 Gene Expression Expressed in the transcriptome Every eukaryotic genome contains between 5000-60,000 protein-coding genes Only a small subset of those genes are transcribed

3 by region (e.g. brain versus kidney) in development (e.g. fetal versus adult tissue) in dynamic response to environmental signals (e.g. immediate-early response genes) in disease states by gene activity Gene expression is regulated in several basic ways Page 157

4 DNARNA cDNA phenotypeprotein Page 159 Central Dogma of Molecular Biology

5 DNARNA cDNA protein DNARNA cDNA protein UniGene SAGE microarray Fig. 6.2 Page 159

6 Expression Databases & Analyses UniGene: for the comparison of cDNA libraries –Goals: (1) create one unique entry for each gene, (2) collect all the ESTs associated with each gene SAGE: Serial Analysis of Gene Expression library DNA microarrays

7 Fig. 6.3 Page 161 exon 1exon 2exon 3intron transcription RNA splicing (remove introns) polyadenylation Export to cytoplasm AAAAA 3’5’ 3’ 5’3’

8 Relationship of mRNA to genomic DNA for RBP4 Fig. 6.4 Page 162

9 Analysis of gene expression in cDNA libraries A fundamental approach to studying gene expression is through cDNA libraries. Isolate RNA (always from a specific organism, region, and time point) Convert RNA to complementary DNA Subclone into a vector Sequence the cDNA inserts. These are Expressed Sequence Tags Page 162-163 vector insert

10 UniGene: unique genes via ESTs Find UniGene at NCBI: UniGene clusters contain many ESTs UniGene data come from many cDNA libraries. Thus, when you look up a gene in UniGene you get information on its abundance and its regional distribution. Page 164

11 Cluster sizes in UniGene This is a gene with 1 EST associated; the cluster size is 1 Page 164 & Fig. 2.3, Page 23

12 Cluster sizes in UniGene This is a gene with 10 ESTs associated; the cluster size is 10 Page 164

13 Cluster sizes in UniGene (human) Cluster sizeNumber of clusters 1  10,400 27,100 3-46,800 5-85,300 9-163,800 17-323,100  500-10001,500  2000-4000130  8000-16,00012  16,000-30,0003 UniGene build 186, 9/05 Page 164

14 Ten largest human UniGene clusters Cluster sizeGene 22,925eukary. translation EF ( Hs. 522463 ) 22,320eukary. translation EF ( Hs. 4395522 ) 16,562actin, gamma 1 ( Hs.514581 ) 16,309GAPDH ( Hs.169476 ) 16,231actin, beta ( Hs.520640 ) 11,076ribosomal prot. L3 ( Hs.119598 ) 10,517dehydrin (Hs.524390) 10,087enolase 1 (alpha) (Hs.517145) 9,973ferritin (Hs.433670) 8,966metastasis associated ( Hs.187199 ) UniGene build 186, 9/05 Table 6.2 Page 165

15 UniGene brain libraries

16 UniGene lung libraries

17 Fig. 6.7 Page 167

18 Fig. 6.7 Page 167 BrainLung

19 n-sec1 up-regulated in brain CamKII up-regulated in brain surfactant up- regulated in lung Page 167

20 Fisher’s exact test provides a p value Digital differential display (DDD) results in UniGene are assessed for significance using Fisher’s exact test to generate a p value. p = The null hypothesis (that gene 1 is not differentially regulated in a comparison of two libraries) is rejected when p is < 0.05/G (where G = the number of UniGene clusters analyzed). Pages 165 N A ! N B ! c! C! (N A + N B )! g1 A ! g1 B ! (N A – g1 A )!(N B – g1 B )!

21 Pitfalls in interpreting cDNA library data bias in library construction variable depth of sequencing library normalization error rate in sequencing contamination (chimeric sequences) Pages 166-168

22 Fig. 6.8 p. 168-169

23 Serial analysis of gene expression (SAGE) 9 to 11 base “tags” correspond to genes measure of gene expression in different biological samples SAGE tags can be compared electronically Page 169

24 Tag 1 Tag 2 Tag n Cluster 1 Cluster 2 Cluster 3 Cluster 1 SAGE tags are mapped to UniGene clusters Page 169

