Presentation on theme: "1 Importance of –omics and Systems Biology Wen-Hsiung Li ( 李文雄 ) Ecology and Evolution University of Chicago Biodiversity and Genomics Research Center."— Presentation transcript:
1 Importance of –omics and Systems Biology Wen-Hsiung Li ( 李文雄 ) Ecology and Evolution University of Chicago Biodiversity and Genomics Research Center Academia Sinica, Taiwan
2 What is -omics? It is the suffix of Genomics, Proteomics, Transcriptomics, etc. We shall start with the introduction of genome, proteome, transcriptome, etc.
3 Transcription Translation Gene (DNA) mRNA polypeptide or protein Central dogma: information flows from DNA through RNA to protein
4 What is a genome? In all bacteria and eukaryotes, the genetic (hereditary) material is DNA. A gene is a DNA sequence that serves one or more functions. Genes are arranged on chromosomes. A chromosome may contain not only genes but also regulatory elements and non-coding DNA. The genome of an organism is a complete set of the genetic material in the organism. That is, the genome should contain all the genetic information of the organism.
5 The human genome: The first 22 chromosomes and the X & the Y. For example: Humans have 22 pairs of autosomes and two sex chromosomes X and Y. So the human genome consists of the 22 autosomes, and one X and one Y chromosome.
6 What is Genomics? (1) Genomics is the study of genomes. The first step to study a genome is to determine the entire DNA sequences of the genome. One major purpose is to identify all the genes in the genome. Another purpose is to identify all the regulatory elements in the genome.
7 What is Genomics? (2) It also aims to understand the structure of the genome such as how genes and regulatory elements are arranged in the genome and which parts of the genome are functional and which parts are non-functional.
8 A segment of the E. coli genome Modified from Lodish et al. 1999
10 Modified from Gregory 2005 human E. coli Genome sizes
11 Rough estimates of gene copy numbers GenomesGene number Human22,000 Mouse24,000 Chicken16,700 Pufferfish21,800 Ciona intestinalis14,000 fruitfly14,000 worm20,000 Budding yeast6,000 E. coli4,200
12 What is a transcriptome? The transcriptome of an organism refers to the total set of RNA transcripts that the organism can produce. For a single cell organism such as a bacterium, it is all the RNA transcript that the cell is capable of producing. In a multicellular organism, the transcriptome includes all the RNA transcripts that all the cells in the organism can produce.
13 What is a transcriptome? (2) In a complex organism such as human, we may also talk about the transcriptome of an organ or a tissue such as the liver. Even in an unicellular organism such an E. coli cell, the RNA transcripts can vary drastically with external environments. Thus, the transcriptome of a cell under a condition reflects the genes that are active under the condition.
14 Microarrays DNA microarrays are a powerful tool for obtaining large amounts of gene expression data. When many genes of a genome are determined, we can use the gene sequences to design DNA hybridization probes and spot them on a glass (chip). The chip can then be used to study the expression profile of many genes. The profile includes the timing of on-and-off and the peak of the expression.
16 Types of microarrays 1. cDNA arrays: Spot the cDNA sequences of the genes you want to study on the glass. Each spot is a specific cDNA and is very tiny. 2. Oligo arrays: Instead of the entire cDNA, you select only a specific segment of the gene sequence as your probe and synthesize the DNA (or by PCR). Spot each probe on the glass. cDNA probe: longer→ stronger hybridization Oligo probe: shorter, more specific. Cannot be too short. 40 nucleotides or longer
17 Types of microarrays 3. Affymetrix arrays: Instead of spotting, it chemically synthesizes each probe directly on the glass. Each probe is usually 25 nucleotides long. Many probes are usually selected for a gene. Also, for comparison for each probe another probe with a mismatch in the middle is synthesized on another spot. This practice, however, may be eliminated.
21 Pattern of genes expressed in a cell is usually characteristic of its current state Virtually all differences in cell state or type are correlated with changes in mRNA levels of many genes Understanding the function of uncharacterized genes by comparison of expression patterns Combine with metabolic schemas to understand how pathways are changed under varying conditions Gene Expression Studies
22 Competitive Hybridization Cancer Cell mRNA *cDNA Normal Cell mRNA *cDNA Hybridization Scan red Scan green Compute Differential Expressions
25 http://sequence.aecom.yu.edu/bioinf/microarray/reader.html R G
26 Differential Expression Lashkari et al. (1997) Overexpression in the untreated sample Overexpression in the treated sample High and equal expression between untreated and treated samples Low and/or equal expression between untreated and treated samples
29 Proteome (1) The proteome of an organism is the set of proteins produced by it during its life. It may also refer to the expressed proteins at a given time point under a defined condition, or to the proteins expressed in a cell, tissue, or organ.
30 Proteome (2) The proteome is larger than the genome, especially in eukaryotes, in the sense that there are more proteins than genes, owing to alternative splicing of genes and post-translational modifications such as glycosylation or phosphorylation.
31 What is Proteomics? The large-scale study of proteins, particularly their structures and functions. Much more complicated than genomics: An organism's genome is constant, but a proteome varies from cell to cell & changes through its biochemical interactions with the genome and the environment. One organism has radically different protein expression in different parts of its body, different stages of its life cycle and different environmental conditions.
32 Technologies for proteomics (1) 2-D gel electrophoresis –Separates proteins in a mixture on the basis of their molecular weight and charge Mass spectrometry –Reveals identity of proteins Protein chips –A wide variety of identification methods
33 Technologies for proteomics (2) Yeast two-hybrid method –Determines how proteins interact with each other Biochemical genomics –Screens gene products for biochemical activity
34 2-D gel electrophoresis Polyacrylamide gel Voltage across both axes –pH gradient along first axis neutralizes charged proteins at different places –pH constant on a second axis where proteins are separated by weight x–y position of proteins on stained gel uniquely identifies the proteins Basic Acidic High MW Low MW
35 Differential in gel electrophoresis Label protein samples from control and experimental tissues –Fluorescent dye #1 for control –Fluorescent dye #2 for experimental sample Mix protein samples together Identify identical proteins from different samples by dye color with benzoic acid Cy3 without benzoic acid Cy5
36 Caveats associated with 2-D gels Poor performance of 2-D gels for the following reasons: –Very large proteins –Very small proteins –Less abundant proteins –Membrane-bound proteins Presumably, the most promising drug targets
37 Mass spectrometry Measures mass-to- charge ratio Components of mass spectrometer –Ion source –Mass analyzer –Ion detector –Data acquisition unit A mass spectrometer
38 Identifying proteins with mass spectrometry Preparation of protein sample –Extraction from a gel –Digestion by proteases — e.g., trypsin Mass spectrometer measures mass-charge ratio of peptide fragments Identified peptides are compared with database –Software used to generate theoretical peptide mass fingerprint (PMF) for all proteins in database –Match of experimental readout to database PMF allows researchers to identify the protein
39 Limitations of mass spectrometry Not very good at identifying minute quantities of protein Trouble dealing with phosphorylated proteins Doesn’t provide concentrations of proteins Improved software eliminating human analysis is necessary for high-throughput projects
41 Results from a yeast two-hybrid experiment Goal: To characterize protein–protein interactions among 6,144 yeast ORFs –5,345 were successfully cloned into yeast as both bait and prey –Identity of ORFs determined by DNA sequencing in hybrid yeast –692 protein–protein interaction pairs –Interactions involved 817 ORFs
42 Caveats associated with the yeast two-hybrid method There is evidence that other methods may be more sensitive Some inaccuracy reported when compared against known protein–protein interactions –False positives –False negatives
43 Protein-protein interactions Most proteins function in collaboration with other proteins, and one goal of proteomics is to identify which proteins interact. This often gives important clues about the functions of newly discovered proteins. Methods: The traditional method is yeast two- hybrid analysis. New methods include protein microarrayss, immunoaffinity chromatography followed by mass spectrometry, and combinations of experimental methods such as phage display and computational methods.
44 Other -omes Large-scale high-throughput technologies have led to other – omes. For example: Metabolome refers to the complete set of metabolites of an organism or a cell.
45 What is Systems Biology? (1) Systems biology is the study of biological systems. It includes the study of (1) what the components or parts of the system are, (2) how the interactions between components of a system can give rise to the function and behavior of the system (3) the dynamics and stability of the system, and (4) how the failure of one component may affect the function of other parts or the system.
46 What is Systems Biology? (2) Approaches: (1) Reductionist approach: Look at components individually but do not try to integrate observations from different parts. (2) Systems approach: O bserve, through quantitative measures, multiple components simultaneously and rigorously integrate data from different components with mathematical models.
47 Examples of Biological Systems Biological networks: (1) Regulatory networks (2) Protein-protein interaction networks (3) Others such as genetic networks
48 Models of regulation (Lee et al. 2002). Blue circles are TFs; red squares are target genes.
57 Next generation: Renewable Energy Biomass Program The vast bulk of plant material is cell wall, which consists of cellulose (40- 50%), hemicellulose (20-30%), and lignin (20-30%), depending on plant species. The race now is to develop technology to use cellulose and hemicellulose for ethanol production.
58 (http:// www.jsxnw.gov.cn ) Rice Straw as a Source of Biofuels
59 Napiergrass 狼尾草 as a Biofuel Crop Advantages: fast growth, disease resistance, adaptability, minimal management, easy to propagate
Napiergrass and Rice: Genetics and Genomics Napiergrass and Rice: Genetics and Genomics Breeding of Napier grass for high productivity and high cellulose content to reduce cost Establishment of Napiergrass tissue culture and transformation system for future improvement Expression of endoglucanase and other lignocellulolytic enzymes in Napiergrass and rice as a bioreactor or for autohydrolysis
63 A combination of 3 enzymes is required to degrade Cellulose: Cellobiohydrolases (Exoglucanases exo-b-1,4-glucanases ) Endoglucanases (endo- -1,4- glucanases, EG) - Glucosidases (BGLU)
64 Endo-cellulase (Endoglucanase) Endo-cellulase breaks internal bonds to disrupt the crystalline structure of cellulose, exposing individual cellulose polysaccharide chains
65 Exo-cellulase (Exoglucanase) It cleaves 2-4 units from the ends of the chains produced by endocellulase, resulting in tetrasaccharides or disacharide (cellobiose). Two main types of exo-cellulases (cellobiohydrolases, CBH): one type works processively from the reducing end, and the other works processively from the non-reducing end of cellulose.
69 The key step is to breakdown cellulose into glucose and hemicellulose into xylose Two main obstacles in cellulose breakdown Lignins prevent access of cellulose to enzyme attack. Cellulose in crystalline form cannot be degraded efficiently by cellulases.
70 So, there is great need to find powerful cellulases to break down cellulose into glucose How to look for good cellulases? It is mainly from microbes from decaying composes of rice straw, sugarcane bagasse, etc., or from guts (stomach) of termites, grasshoppers, cattle, etc.
71 Now suppose you have found a microbe that seems to possess excellent cellulases. What can you do? Of course, you want to identify the genes that code for the cellulases But how do you do that?
72 Identification of cellulase genes in an organism (1) Traditional approach: Try to isolate the enzyme. Sequence a small segment of the enzyme and use the amino acid sequence to design PCR primers to amplify the gene from genome DNA. Or use the amino acid sequence to design hybridization probes and hybridize them to the cDNA library to isolate the cDNA for the gene.
73 Identification of cellulase genes in an organism (2) Genomic approach: Sequence the genome and annotate the genes, using genes identified in other genomes, especially from related genomes. From the annotated genes try to see if there are genes annotated as cellulase genes. If yes, try to select candidate genes to test for good cellulase activities.
74 It can efficiently degrade lignin and gain access to cellulose and hemicellulose of plant cell walls. The genomic sequence is completed. Its genome contains genes for cellulase, xylanase, and lignin degrading enzymes, which can be explored for biomass conversion and industrial usage. Phanerochaete chrysosporium, a white rot fungus
75 How can transcriptomic study help identify cellulase genes in a microbe? From selected candidate genes for cellulases, one can design oligonucleotide sequences as probes and spot them on chips. One can then feed the organism with cellulose or rice straw powder and obtain the RNAs to produce cDNAs. Hybridize the cDNAs to the chip to see which cellulase genes have high expressions.
76 An example from Clostridium thermocellum to design an oligo probe Go to http://genome.jgi-psf.org (Joint Genome Institute, DOE) to download genomic sequence of Clostridium thermocellum.http://genome.jgi-psf.org Retrieve sequences of putative EXG, EG, BGLU, and carbohydrate metabolic genes.
77 A total of 180 genes were selected: Cellulosome and GHs Carbohydrate metabolism Regulatory proteins
78. Use Picky algorithm (download from http://www.complex.iastate.edu/downloa d/Picky/) to design 55 - 60mer oilgonucleotide corresponding to unique sequence in each gene. Print slides.
79 Grow C. thermocellum with following carbohydrates sources: Cellobiose Avicel (pure cellulose) Microcrystalline (cellulose in crystalline) Sugarcane Bagasse ( 甘蔗渣 ) Grass ( 狼尾草 ) Rice straw ( 稻草稈 )
80 1.Isolate RNAs from different cultures. 2.Perform microarray experiments using RNA from the cellobiose culture as the control. 3.Hybridize the cDNAs to the chip to see whether different cellulase genes are affected by different carbohydrate sources.
81 The bottom line: Transcriptomic study can help us make intelligent choices in selecting candidate cellulase genes for further studies.
82 Two approaches for bioethanol production: Direct cellulase treatment Not efficient and expensive Consolidated bioprocessing (CBP) Engineering of a microbe or a group of compatible microbes that can carry out cellulase production, hydrolysis, and fermentation, in a single process. Combining cellulase production, hydrolysis, and fermentation into one single process (systems biology approach).
83 A major obstacle in CBP Such microbes are currently unavailable or inefficient. The key requirement for CBP is a microbe or a group of compatible microbes that can carry out cellulase production, hydrolysis, and fermentation in a single process.
84 Genetics and genomics are well understood. Genetic and metabolic engineering are not too difficult It is highly efficient in converting sugars to ethanol It has relatively high ethanol tolerance It grows fast in sugars But: It has no cellulase activity Yeast as a starting point?
85 Aim 1. Transcriptome and regulatory network studies of Clostridium thermocellum, Phanerochaete chrysosporium and interesting Taiwan microbial isolates during growth on different biomass feedstocks. Our Aims
86 A rare organism that is both cellulolytic and ethanogenic. It produces a cellulase system, cellulosome, highly active on crystalline cellulose. A promising organism for the industrial process, Consolidated Bioprocessing (CBP), that directly converts cellulosic materials to ethanol by fermentative microorganism. Clostridium thermocellum as a starting point
87 A schematic diagram of the C. thermocellum cellulosome Demain, Newcomb and Wu (2005) Microbiol. Mol. Biol. Rev. 69: 124-154.
88 It can efficiently degrade lignin and gain access to cellulose and hemicellulose of plant cell walls. The genomic sequence is completed. Its genome contains genes for cellulase, xylanase, and lignin degrading enzymes, which can be explored for biomass conversion and industrial usage. Phanerochaete chrysosporium, a white rot fungus
89 We found that both Clostridium thermocellum and Phanerochaete chrysosporium can grow on powders of rice straw, sugarcane bagasse, or P. alopecuroides ( 狼尾草 ), even without any pretreatment. Microarray analyses indicate that glycosyl hydrolase genes and cellulosome enzyme encoded genes were differentially regulated in C. thermocellum grown under different substrates.
90 A serious disadvantage for Clostridium thermocellum and its relatives is that its genetics is little understood and there is no transformation tool.
91 Aim 2. Metabolic and genetic engineering of fungal and bacterial strains for efficient conversion of cellulose and hemicellulose to ethanol. Identify and test combinations of EXG and EG and BGLU from different microbial species that can efficiently degrade cell wall of different feedstocks.
92 Steps in metabolic engineering of S. cerevisiae for efficient conversion of cellulose to ethanol. Step 1: Select cellulase genes from microbial species.
93 Steps in metabolic engineering of S. cerevisiae for efficient conversion of cellulose to ethanol. Step 1: Select cellulase genes from microbial species. Step 2: Cloning of EXG, EG, and BGLU genes into appropriate yeast expression vector.
94 From Yan-Ping Shih et al. Protein Sci 2002; 11: 1714-1719 Employing a “Sticky ends PCR cloning” for high throughput cloning of cellulase genes Clone into an EcoR1 & Xho1 double digested yeast expression vector
95 Steps in metabolic engineering of S. cerevisiae for efficient conversion of cellulose to ethanol. Step 1: Select cellulase genes from the white rot fungus Phanerochaete chrysosporium. Step 2: Cloning of EXG, EG, and BGLU genes into appropriate yeast expression vector. Step 3: Transform resulting constructs into a host yeast strain with desired properties, including vigorous fermentation capability and high ethanol tolerance.
96 Steps in metabolic engineering of S. cerevisiae for efficient conversion of cellulose to ethanol. Step 4. Select transformants expressing proper enzyme activity. Step 5. Test combinations of yeast strains expressing EG, EXG, and BGLU that can efficiently utilize different sources of cellulose for ethanol production.
97 Aim 3: To develop genetic engineering techniques for microorganisms The ability to express foreign genes in cellulotic or thermophilic microorganisms is needed. We will develop transformation techniques and forward and reverse genetic approaches for organisms with promising properties. Such techniques will be used in engineering microbes expressing various lignocellulolytic enzymes or utilizing multiple sugars for fermentation.
98 To do list: Determine cellulose, hemi-cellulose, and lignin degradation capability. Optimize growth conditions. Physiological studies and metabolic profiling. Gene expression profiling. Genomic sequencing of desirable microbes. To identify novel cellulases or other enzymes for improving efficiency of biomass conversion.