Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Functional Genomics Introduction Julie A Dickerson Electrical and Computer Engineering Iowa State University.

Similar presentations


Presentation on theme: "1 Functional Genomics Introduction Julie A Dickerson Electrical and Computer Engineering Iowa State University."— Presentation transcript:

1 1 Functional Genomics Introduction Julie A Dickerson Electrical and Computer Engineering Iowa State University

2 Module Structure: Day 1 Introduction to Functional Genomics Transcriptomics Analysis and Experiment Design for Microarray Data (Dr. Peng Liu) RNA-Seq Data (Mr. Kun Liang) LAB:  Using R for Normalizing, processing microarray data, and clustering analysis of ‘omics data (John Van Hemert)

3 June 15, 2010 BBSI - 2010 3 Module Structure: Day 2 Metabolomics (Dr. Ann Perera) Proteomics (Dr. Young-Jin Lee) Pathways and data integration methods (Dr. Julie Dickerson and Erin Boggess) Lab:  Analyzing integrated sets of microarray, proteomics and metabolomics data (Erin Boggess)

4 4 F1: Outline Module Structure What is Functional Genomics? Data Types Available Transcriptomics  Basic biology behind microarrays  What can you learn from microarrays?  Types of arrays  Limitations of microarrays

5 5 Functional Genomics Definition Functional genomics is a field of molecular biology that attempts to make use of the data produced by genomic projects to describe gene (and protein) functions and interactions. Functional genomics focuses on the dynamic aspects such as gene transcription, translation, and protein-protein interactions, as opposed to the static aspects such as DNA sequence or structures. From Wikipedia, the free encyclopedia

6 Genome Wide View of Metabolism Streptococcus pneumoniae Explore capabilities of global network How do we go from a pretty picture to a model we can manipulate?

7 Metabolic Pathways Metabolites glucose Enzymes phosphofructokinase Reactions & Stoichiometry 1 F6P => 1 FBP Kinetics Regulation gene regulation metabolite regulation hexokinase phosphoglucoisomerase phosphofructokinase aldolase triosephosphate isomerase G3P dehydrogenase phosphoglycerate kinase phosphoglycerate mutase enolase pyruvate kinase

8 Metabolic Modeling: The Dream

9 June 11, 2009 BBSI - 2009 9 Data Types Available for Determining Function Genomes Genes Proteins Metabolites Phenotypes Sequence Microarrays, Nextgen sequencing Proteomics Metabolomics Phenomics

10 10 A VERY Simplified Eukaryotic Cell nucleus chromosome DNA strands DNA contains thousands of genes. cytoplasm Dan Nettleton, Department of Statistics, IOWA STATE UNIVERSITY,Copyright © 2008 Dan Nettleton

11 11 Posttranscriptional Modifications to Primary Transcript Primary transcript Intervening sequences corresponding to introns that are removed through splicing 3’ UTR 5’ UTR Primary transcript after modification: messenger RNA (mRNA) AAAAAA...AAAA poly-A tail Coding portions of RNA sequence corresponding to exons 5’ UTR3’ UTR 5’ cap G Dan Nettleton, Department of Statistics, IOWA STATE UNIVERSITY,Copyright © 2008 Dan Nettleton

12 12 Transcription takes place inside the nucleus. nucleus chromosome DNA strands cytoplasm Translation takes place outside the nucleus. Dan Nettleton, Department of Statistics, IOWA STATE UNIVERSITY,Copyright © 2008 Dan Nettleton

13 13 Translation mRNA Ribosome amino acid sequence folds to become a protein Dan Nettleton, Department of Statistics, IOWA STATE UNIVERSITY,Copyright © 2008 Dan Nettleton

14 14 During translation transfer RNA (tRNA) translates the genetic code... AACGUGU codon AAU leu UGC thr tRNA anticodon amino acids Dan Nettleton, Department of Statistics, IOWA STATE UNIVERSITY,Copyright © 2008 Dan Nettleton

15 15 The Genetic Code UUUpheUCUserUAUtyrUGUcys UUCpheUCCserUACtyrUGCcys UUAleuUCAserUAASTOPUGASTOP UUGleuUCGserUAGSTOPUGGtrp CUUleuCCUproCAUhisCGUarg CUCleuCCCproCAChisCGCarg CUAleuCCAproCAAglnCGAarg CUGleuCCGproCAGglnCGGarg AUUileACUthrAAUasnAGUser AUCileACCthrAACasnAGCser AUAileACAthrAAAlysAGAarg AUGmetACGthrAAGlysAGGarg GUUvalGCUalaGAUaspGGUgly GUCvalGCCalaGACaspGGCgly GUAvalGCAalaGAAgluGGAgly GUGvalGCGalaGAGgluGGGgly First Base Second Base U C A G U CA G mRNA codon amino acid Dan Nettleton, Department of Statistics, IOWA STATE UNIVERSITY,Copyright © 2008 Dan Nettleton

16 16 Miscellaneous Comments The biology is more complicated than I described. Humans have somewhere around 30,000 genes. (The exact number is a subject for debate.) Regulation of these genes seems to be more important than number! Much of the variation is created by differences in how cells use the genes they have. Microarrays are a tool that can help us understand how cells of various types use their genes in response to varying conditions.

17 9/20/2015 BCB570 Gene Expression Data Analysis 17 Microarrays With only a few exceptions, every cell of the body contains a full set of chromosomes and identical genes. Only a fraction of these genes are turned on, however, and it is the subset that is "expressed" that confers unique properties to each cell type. "Gene expression" is the term used to describe the transcription of the information contained within the DNA, the repository of genetic information, into messenger RNA (mRNA) molecules that are then translated into the proteins that perform most of the critical functions of cells.

18 9/20/2015 BCB570 Gene Expression Data Analysis 18 Microarrays Microarrays work by exploiting the ability of a given mRNA molecule (target) to bind specifically to, or hybridize to, the DNA template (probe) from which it originated. This mechanism acts as both an "on/off" switch to control which genes are expressed in a cell as well as a "volume control" that increases or decreases the level of expression of particular genes as necessary. Source: The Genetic Science Learning Center, University of Utah

19 9/20/2015 BCB570 Gene Expression Data Analysis 19 DNA Microarrays Small, solid supports onto which the sequences from thousands of different genes are immobilized, or attached, at fixed locations. The DNA is printed, spotted, or actually synthesized directly onto the support. The spots themselves can be DNA, complementary DNA (cDNA, DNA synthesized from a mRNA template), or oligonucleotides. (or oligo, a short fragment of a single-stranded DNA that is typically 5 to 50 nucleotides long)

20 9/20/2015 BCB570 Gene Expression Data Analysis 20 Why do microarray experiments? Comparing two conditions to find differentially expressed genes  Control/treatment  Disease/normal Compare more than two conditions; some of which may interact  Different treatments, different strains Exploratory analysis  What genes are expressed under drought stress?

21 9/20/2015 BCB570 Gene Expression Data Analysis 21 Why use microarrays (cont)? What happens over time?  Developmental stages Predicting certain conditions (cancer vs. normal) Patterns of gene expression that characterize a patient’s or organism’s response

22 9/20/2015 BCB570 Gene Expression Data Analysis 22 Differentially Expressed Genes Find genes that show a large difference in expression between groups and are similar within a group Statistical tests (t-test), look at if the groups have different means or variances (chi-squared, F-statistics) Adapted from “Practical Microarray Analysis”, Presentation by Benedikt Brors, German Cancer Research Center

23 9/20/2015 BCB570 Gene Expression Data Analysis 23 Multiple Conditions Are there differences in expression level between the k conditions? Analysis of Variance (ANOVA) Mutant 1Mutant 2 InoculatedControlInoculatedControl

24 24 Some Example Microarray Experiments from Iowa State University Jim Reecy from Animal Science: muscle undergoing hypertrophy vs. normal muscle David Putthoff, Steve Rodermel, Thomas Baum from Plant Pathology: roots infected with soybean cyst nematodes vs. uninfected roots Anne Bronikowski in Genetics: wheel-running mice vs. non-runners Roger Wise, Rico Caldo in Plant Pathology: interaction between multiple isolates of powdery mildew and multiple genotypes of barley. Dan Nettleton, Department of Statistics, IOWA STATE UNIVERSITY,Copyright © 2008 Dan Nettleton

25 Wild-type vs. Myostatin Knockout Mice Belgian Blue cattle have a mutation in the myostatin gene.

26 26 Identifying Genes Involved in Pathways That Distinguish Compatible from Incompatible Interactions Barley Genotype Mla6 Mla13 Mla1 Bgh Isolate 5874 K1 Incompatible Compatible Caldo, Nettleton, Wise (2004). The Plant Cell. 16, 2514-2528.

27 27 An Example Gene of Interest Hours after Inoculation Log Expression Incompatible Compatible Dan Nettleton, Department of Statistics, IOWA STATE UNIVERSITY,Copyright © 2008 Dan Nettleton

28 9/20/2015 BCB570 Gene Expression Data Analysis 28 Exploratory Analysis Find patterns in data to see what genes are expressed under different conditions Analysis includes clustering methods Used when little or no prior knowledge exists about the problem

29 9/20/2015 BCB570 Gene Expression Data Analysis 29 Copyright ©1999 by the National Academy of Sciences Perou, Charles M. et al. (1999) Proc. Natl. Acad. Sci. USA 96, 9212-9217 Fig. 5 (see Supplemental data at http://www.pnas.orgwww.pnas.org) for the full cluster diagram with all gene names\]

30 9/20/2015 BCB570 Gene Expression Data Analysis 30 Time Series Goal: find patterns of co-expressed genes over time or partial time Typical length is 3-10 time points Cluster to find similar patterns (k-means, self-organizing maps) Correlations to find genes that behave like a given gene of interest. 0 hours4 hours12 hours 24 hours

31 9/20/2015 BCB570 Gene Expression Data Analysis 31 Classification Learn characteristic patterns from a training set and evaluate with a test set. Classify tumor types based on expression patterns Predict disease susceptibility, stages, etc.

32 9/20/2015 BCB570 Gene Expression Data Analysis 32 Source: “Practical Microarray Analysis”, Presentation by Benedikt Brors, German Cancer Research Center

33 33 Some Commonly Used Tools for Microarray Analysis Oligonucleotide arrays  Affymetrix GeneChips  Nimblegen  Agilent

34 34 Oligonucleotides An oligonucleotide is a short sequence of nucleotides. (oligonucleotide=oligo for short) An oligonucleotide microarray is a microarray whose probes consist of synthetically created DNA oligonucleotides. Probes sequences are chosen to have good and relatively uniform hybridization characteristics. A probe is chosen to match a portion of its target mRNA transcript that is unique to that sequence. Oligo probes can distinguish among multiple mRNA transcripts with similar sequences. Dan Nettleton, Department of Statistics, IOWA STATE UNIVERSITY,Copyright © 2008 Dan Nettleton

35 9/20/2015 BCB570 Gene Expression Data Analysis 35 Simplified Example gene 1 gene 2 shared green regions indicate high degree of sequence similarity throughout much of the transcript ATTACTAAGCATAGATTGCCGTATA oligo probe for gene 1 GCGTATGGCATGCCCGGTAAACTGG oligo probe for gene 2... Source: Dan Nettleton Course Notes Statistics 416/516X

36 36 Oligo Microarray Fabrication Oligos can be synthesized and stored in solution. Oligo sequences can be synthesized on a slide or chip using various commercial technologies. The company Affymetrix uses a photolithographic approach which we will describe briefly. Dan Nettleton, Department of Statistics, IOWA STATE UNIVERSITY,Copyright © 2008 Dan Nettleton

37 37 Affymetrix GeneChips Affymetrix (www.affymetrix.com) manufactures GeneChips.www.affymetrix.com GeneChips are oligonucleotide arrays. Each gene (more accurately sequence of interest or feature) is represented by multiple short (25-nucleotide) oligo probes. Some GeneChips include probes for around 120,000 genes and gene variants. mRNA that has been extracted from a biological sample can be labeled (dyed) and hybridized to a GeneChip. Only one sample is hybridized to each GeneChip. Dan Nettleton, Department of Statistics, IOWA STATE UNIVERSITY,Copyright © 2008 Dan Nettleton

38 9/20/2015 BCB570 Gene Expression Data Analysis 38 Different Probe Pairs Represent Different Parts of the Same Gene gene sequence Probes are selected to be specific to the target gene and have good hybridization characteristics. Source: Dan Nettleton Course Notes Statistics 416/516X

39 39 Affymetrix Probe Sets A probe set is used to measure mRNA levels of a single gene. Each probe set consists of multiple probe cells. Each probe cell contains millions of copies of one oligo. Each oligo is intended to be 25 nucleotides in length. Probe cells in a probe set are arranged in probe pairs. Each probe pair contains a perfect match (PM) probe cell and a mismatch (MM) probe cell. A PM oligo perfectly matches part of a gene sequence. A MM oligo is identical to a PM oligo except that the middle nucleotide (13 th of 25) is replaced by its complementary nucleotide. Dan Nettleton, Department of Statistics, IOWA STATE UNIVERSITY,Copyright © 2008 Dan Nettleton

40 40 A Probe Set for Measuring Expression Level of a Particular Gene probe pair probe cell gene sequence...TGCAATGGGTCAGAAGGACTCCTATGTGCCT... AATGGGTCAGAAGGACTCCTATGTG AATGGGTCAGAACGACTCCTATGTG perfect match sequence mismatch sequence probe set Dan Nettleton, Department of Statistics, IOWA STATE UNIVERSITY,Copyright © 2008 Dan Nettleton

41 41 Different Probe Pairs Represent Different Parts of the Same Gene gene sequence Probes are selected to be specific to the target gene and have good hybridization characterictics. Dan Nettleton, Department of Statistics, IOWA STATE UNIVERSITY,Copyright © 2008 Dan Nettleton

42 42 Affymetrix’s Photolithographic Approach GeneChip mask AA C C G G T T T A T T A A C C Dan Nettleton, Department of Statistics, IOWA STATE UNIVERSITY,Copyright © 2008 Dan Nettleton

43 43 Source: www.affymetrix.com

44 44 Source: www.affymetrix.com

45 45 Source: www.affymetrix.com

46 46 Source: www.affymetrix.com Image from Hybridized GeneChip

47 47 Image Processing for Affymetrix GeneChips Image processing for Affymetrix GeneChips is typically done using proprietary Affymetrix software. The entire surface of a GeneChip is covered with square-shaped cells containing probes. Probes are synthesized on the chip in precise locations. Thus spot finding and image segmentation are not major issues. Dan Nettleton, Department of Statistics, IOWA STATE UNIVERSITY,Copyright © 2008 Dan Nettleton

48 48 Probe Cell 8 x 8 =64 pixels border pixels excluded 75th percentile of the 36 pixel intensities corresponding to the center 36 pixels is used to quantify fluorescence intensity for each probe cell. These values are called PM values for perfect-match probe cells and MM values for mismatch probe cells. The PM and MM values are used to compute expression measures for each probe set. Dan Nettleton, Department of Statistics, IOWA STATE UNIVERSITY,Copyright © 2008 Dan Nettleton

49 Normalization Outputs from each individual probe pair are statistically combined to give an expression level for the gene represented by the probe set. Normalization accounts for background noise on the chip, levels of control probes, etc Key methods are MAS5.0, RMA, GCRMA

50 Summary of Microarrays Positives: commercial chips are accurate and repeatable in experienced hands and the statistics and modeling have been well- explored Negatives: cost, can only see what is on the chip and difficult to update to new knowledge. June 11, 2007 BBSI - 2007 50

51 Short Read Sequencing Sequencing technology has evolved in the last 15 years Eventual goal is to be able to sequence a genome for $1000 (NIH). Why not just sequence the transcriptome directly and see what is there? June 11, 2007 BBSI - 2007 51

52 Sequencing by synthesis (454) Takes a single strand of DNA and synthesizes its complementary strand enzymatically one base pair at a timedetecting which base was actually added at each step. Pyrosequencing detect the activity of DNA polymerase with a chemiluminescent enzyme. Reads are about 400-500 bp June 11, 2007 BBSI - 2007 52

53 Other Techologies Illumina Solexa: 40-100 bp, tag DNA or RNA at both ends ABI SOLID around 50 bp

54 Digital Gene Expression Sequence census methods for functional genomics Barbara Wold & Richard M Myers


Download ppt "1 Functional Genomics Introduction Julie A Dickerson Electrical and Computer Engineering Iowa State University."

Similar presentations


Ads by Google