The evolution of expression patterns in the Arabidopsis genome Todd Vision Department of Biology University of North Carolina at Chapel Hill.

Slides:



Advertisements
Similar presentations
Evolution of genomes.
Advertisements

Genomics: READING genome sequences ASSEMBLY of the sequence ANNOTATION of the sequence carry out dideoxy sequencing connect seqs. to make whole chromosomes.
Chap. 6 Problem 2 Protein coding genes are grouped into the classes known as solitary (single) genes, and duplicated or diverged genes in gene families.
Sydney Brenner et al. Jan 26, 2007
Genome evolution There are both proximate and ultimate explanations in molecular biology Mutation continually generates variation in genome content and.
Putting gene family evolution in its chromosomal context Todd Vision Department of Biology University of North Carolina at Chapel Hill.
Finding Eukaryotic Open reading frames.
The gene family play and the chromosomal theater Todd Vision Department of Biology University of North Carolina at Chapel Hill.
. Class 1: Introduction. The Tree of Life Source: Alberts et al.
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. CHAPTER 18 LECTURE SLIDES.
ECE 501 Introduction to BME
1 Is Gene Position Adaptively Favored?. 2 Why do we care? Genomic clusters of genes Yeast 98% of genes in metabolic pathways cluster (Lee & Sonnhammer)
PLANT BIOTECHNOLOGY & GENETIC ENGINEERING (3 CREDIT HOURS)
From Gene to Protein. Genes code for... Proteins RNAs.
Outline Arabidopsis gene expression (MPSS) Two evolutionary issues in the evolution of expression profiles: –Physical clustering of co-expressed genes.
EVOLUTIONARY AND COMPUTATIONAL GENOMICS Shin-Han Shiu Plant Biology / CMB / EEBB / Genetics / QBMI.
Making, screening and analyzing cDNA clones Genomic DNA clones
Online Counseling Resource YCMOU ELearning Drive… School of Architecture, Science and Technology Yashwantrao C havan Maharashtra Open University, Nashik.
DNA and Chromosome Structure. Chromosomal Structure of the Genetic Material.
Large-Scale Copy Number Polymorphism in the Human Genome J. Sebat et al. Science, 305:525 Luana Ávila MedG 505 Feb. 24 th /24.
Fine Structure and Analysis of Eukaryotic Genes
How Proteins are Made. I. Decoding the Information in DNA A. Gene – sequence of DNA nucleotides within section of a chromosome that contain instructions.
Chapter 2 Genes Encode RNAs and Polypeptides
- any detectable change in DNA sequence eg. errors in DNA replication/repair - inherited ones of interest in evolutionary studies Deleterious - will be.
DNA Technology Chapter 20.
Genomics BIT 220 Chapter 21.
Eukaryotic Gene Control. Developmental pathways of multicellular organisms: All cells of a multicellular organism start with the same complement of DNA.
RNA and Protein Synthesis
Genome Organization and Evolution. Assignment For 2/24/04 Read: Lesk, Chapter 2 Exercises 2.1, 2.5, 2.7, p 110 Problem 2.2, p 112 Weblems 2.4, 2.7, pp.
RNA and Protein Synthesis
Expression of the Genome The transcriptome. Decoding the Genetic Information  Information encoded in nucleotide sequences contained in discrete units.
The Biology and Genetic Base of Cancer. 2 (Mutation)
DNA sequencing. Dideoxy analogs of normal nucleotide triphosphates (ddNTP) cause premature termination of a growing chain of nucleotides. ACAGTCGATTG ACAddG.
1 TRANSCRIPTION AND TRANSLATION. 2 Central Dogma of Gene Expression.
Chapter 21 Eukaryotic Genome Sequences
Pattern Matching Rhys Price Jones Anne R. Haake. What is pattern matching? Pattern matching is the procedure of scanning a nucleic acid or protein sequence.
Copyright © 2009 Pearson Education, Inc. Chapter 14 The Genetic Code and Transcription Copyright © 2009 Pearson Education, Inc.
BDC331 Conservation Genetics 2015 Mr. Adriaan Engelbrecht Department of Biodiversity and Conservation Biology New Life Sciences Building Core 2, Room
Chapter 2 From Genes to Genomes. 2.1 Introduction We can think about mapping genes and genomes at several levels of resolution: A genetic (or linkage)
1 From Mendel to Genomics Historically –Identify or create mutations, follow inheritance –Determine linkage, create maps Now: Genomics –Not just a gene,
MPL The DNA Sequence of chimpanzee chromosome 22 and comparative analysis with its human ortholog, chromosome 21 Bioinformatics Dae-Soo Kim.
Evolution at the Molecular Level
中国免疫学信息网 SAGE 的原理及其应用 新乡医学院免疫学研究中心 王 辉.
Johnson - The Living World: 3rd Ed. - All Rights Reserved - McGraw Hill Companies Genomics Chapter 10 Copyright © McGraw-Hill Companies Permission required.
Evolution at the Molecular Level. Outline Evolution of genomes Evolution of genomes Review of various types and effects of mutations Review of various.
SAGExplore web server tutorial. The SAGExplore server has three different modules …
Lesson Four Structure of a Gene. Gene Structure What is a gene? Gene: a unit of DNA on a chromosome that codes for a protein(s) –Exons –Introns –Promoter.
Evolution at the Molecular Level. Outline Evolution of genomes Evolution of genomes Review of various types and effects of mutations Review of various.
Finding genes in the genome
IB Saccharomyces cerevisiae - Jan Major model system for molecular genetics. For example, one can clone the gene encoding a protein if you.
Cells use information in genes to build several thousands of different proteins, each with a unique function. But not all proteins are required by the.
BIOINFORMATICS Ayesha M. Khan Spring 2013 Lec-8.
Rest of Chapter 11 Chapter 12 Genomics, Proteomics, and Transgenics Jones and Bartlett Publishers © 2005.
Genetic Code and Interrupted Gene Chapter 4. Genetic Code and Interrupted Gene Aala A. Abulfaraj.
GROUP 2 DNA TO PROTEIN. 9.1 RICIN AND YOUR RIBOSOMES.
The Transcriptional Landscape of the Mammalian Genome
Lesson Four Structure of a Gene.
Lesson Four Structure of a Gene.
Evolution of gene function
Context Cell nucleus chromosome gene double helix.
Expression of the Genome
I. Central Dogma "Central Dogma": Term coined by Francis Crick to explain how information flows in cells.
SGN23 The Organization of the Human Genome
Expression of the Genome
Relationship between Genotype and Phenotype
Ab initio gene prediction
How Proteins are Made.
Identification and differential expression of human collagenase-3 mRNA species derived from internal deletion, alternative splicing, and different polyadenylation.
Expression of the Genome
Relationship between Genotype and Phenotype
Presentation transcript:

The evolution of expression patterns in the Arabidopsis genome Todd Vision Department of Biology University of North Carolina at Chapel Hill

Driving forces in genome evolution Proximate vs. ultimate explanations Deleterious mutations are frequent and selection cannot effectively act on all of them –Substitutions –Insertions and deletions –Duplications –Transpositions Cellular processes will be affected by this rain of mutations At the molecular level, we must entertain ultimate explanations that do not invoke adaption

An example: Codon bias Genes differ in the frequency that they use the preferred codon for a given amino acid, thereby affecting –Translational efficiency –Translational accuracy The strongest codon bias is typically seen in short, highly expressed genes under strong purifying selection Realized codon bias is a balance between selection for preferred codons and a continual rain of mutations toward unpreferred codons

What are the consequences of mutational rain on the regulatory networks that modulate gene expression?

Outline Arabidopsis gene expression (MPSS) Two evolutionary issues in the evolution of expression profiles: –Physical clustering of co-expressed genes –Divergence of duplicated genes

Digital expression profiling “Bar-code” counting raises fewer concerns about cross-hybridization, probe selection, background hybridization, etc. Serial Analysis of Gene Expression (SAGE) –Count occurrence of bp mRNA signatures –Long SAGE: bp signatures –Uses conventional sequencing technology Massively Parallel Signature Sequencing (MPSS) –Count occurrence of bp mRNA signatures –Cloning and sequencing is done on microbeads –Commercialized by Lynx Therapeutics

MPSS library construction AAAAAAA extract mRNA from tissue AAAAAAA TTTTTTT 5’ - Add standard primer (added by cloning) 3’ - Add unique 32 bp tag and standard primer AAAAAAA mRNA Cut w/ Sau3A AAAAAAA TTTTTTT AAAAAAA Convert to cDNA TTTTTTT Add linker Brenner et al., PNAS 97: Remove 3’ primer and expose single stranded unique tag (digest, 3'  5' exonuclease) Anneal to beads coated with unique anti-tag (32 bp, complementary to tag on mRNA) PCR AAAAAAA TTTTTTT GATC

MPSS library construction The result of the library construction is a set of microbeads. Each bead contains many DNA molecules, all derived from the 3’ end of a single transcript. Beads are loaded in a monolayer on a microscope slide for the sequencing of 17 – 20 bp from the 5’ end. AAAAAAA Brenner et al., PNAS 97: Sort by FACS to remove ‘empty’ beads

MPSS Sequencing Repeat Cycle Steps of four bases; overhang is shifted by four bases in each round NNNN Digest with Type IIS enzyme to uncover next 4 bases 9 bp 13 bp CNNN ^ ^ GNNN CODEC4 RS DECODERED Sequence by hybridization 16 cycles for 4 bp NNXN CODEX2 XNNN CODEX4 NXNN CODEX3 NNNX CODEX1 RS NNNN + Add adaptors Brenner et al., Nat. Biotech. 18:630-4.

MPSS Sequencing GATCAATCGGACTTGTC GATCGTGCATCAGCAGT GATCCGATACAGCTTTG GATCTATGGGTATAGTC GATCCATCGTTTGGTGC GATCCCAGCAAGATAAC GATCCTCCGTCTTCACA GATCACTTCTCTCATTA GATCTACCAGAACTCGG. GATCGGACCGATCGACT , ,285 Each bead provides a signature of bp Tag # Signature Sequence # of Beads (Frequency) Two sets of signatures are generated from each sample in different reading frames staggered by two bases Total # of tags: >1,000,000 ATG TGA

A catalog of signatures in the Arabidopsis genome All potential signatures (GATC + 13 bp) are identified on both strands of the genomic sequence. There is one potential signature appx. every 293 bp on each strand of genome A signature is classified according to its position relative to the 29,084 genes & pseudogenes in the TIGR annotation Signatures may not be unique. The number of ‘hits’ in the genome is recorded “Hits” At genome % of totalRandom % % % % % % % % % % % % % %0 > %0 Total851,212851,212

Classifying signatures Potential alternative splicing or nested gene Potential alternative termination Potential un-annotated ORF Potential anti-sense transcript Anti-sense transcript or nested gene? Duplicated: expression may be from other site in genome Triangles refer to colors used on our web page: Class 1 - in an exon, same strand as ORF. Class 2 - within 500 bp after stop codon, same strand as ORF. Class 3 - anti-sense of ORF (like Class 1, but on opposite strand). Class 4 - in genome but NOT class 1, 2, 3, 5 or 6. Class 5 - entirely within intron, same strand. Class 6 - entirely within intron, anti-sense. Grey = potential signature NOT expressed Class 0 - signatures found in the expression libraries but not the genome. or Typical signatures

Arabidopsis signatures Class# in genome % of total 1 sense exonic 203, ’UTR, <500 bp 44, anti-sense exonic 197, inter-genic 288, intronic 60, anti-sense intronic 57, TOTAL 851, Based on TIGR annotation (release 3.0, July 2002) 355 genes lack potential Class 1 or 2 signatures (undetectable) On average, there are 8.5 class 1 & 2 signatures per gene 8422 genomic signatures have secondary classes due to overlap or near overlap of two genes in the TIGR annotation.

Core Arabidopsis MPSS libraries sequenced by Lynx for Blake Meyers, U. of Delaware SignaturesDistinct Library sequencedsignatures Root3,645,41448,102 Shoot2,885,22953,396 Flower1,791,46037,754 Callus1,963,47440,903 Silique2,018,78538,503 TOTAL12,304,362133,377

Catalog of expressed signatures Counting only signatures with abundance ≥ 4 PPM in at least one library. Total is for for 7 libraries (core + 1 new root & flower library) ClassPositionCount 1 or 2Exon or 3’UTR25,568 3 through 6Elsewhere in genome14,424 0 No match in genome!10,871

Genome-wide expression profiling Arabidopsis Of the 29,084 gene models, 14,674 match unique, expressed signatures Chr. I Chr. II Chr. III Chr. IV Chr. V

Query by Sequence Arabidopsis gene identifier chromosomal position BAC clone ID MPSS signature Library comparison Site includes Library and tissue information FAQs and help pages

Outline Arabidopsis gene expression (MPSS) Two evolutionary issues in the evolution of expression profiles: –Physical clustering of co-expressed genes –Divergence of duplicated genes

Physical clustering of co-expression Caenorhabditis elegansRoy et al., (2002) Nature 418, 975 Lercher et al (2003) Genome Research 13, 238 Drosophila melanogasterBoutanaev et al (2002) Nature 420, 666 Spellman and Rubin (2002) J Biology 1, 5 Homo sapiens Caron et al (2001) Science 291, 1289 Lercher et al (2002) Nature Genetics 31, 180 Saccharomyces cerevisiae Cohen et al (2000) Nature Genetics 26, 183 Hurst et al (2002) Trends in Genetics 18, 604 Mannila et al (2002) Bioinformatics 18, 482 ‘ What are the proximate explanations? –shared cis-regulatory elements –chromatin packaging, etc. What are the ultimate explanations? –Adaptive: greater transcriptional efficiency/accuracy? –Maladaptive: mutational rain chipping away at insulators and other mechanisms that over-ride regional controllers of gene expression?

Measuring expression distance library 1 library 2 library 3

Clustering of tissue-specific expression Flower (red) Silique (violet) Leaf (green) Root (blue) Callus (white) Chromosome 1

Statistical tests of coexpression clustering Measured median pairwise expression distance (MPED) in non-overlapping windows of 20 genes –Summed unique class 1 and 2 signatures for each gene –Only one gene within each tandemly arrayed family was counted Out of 100 shuffles of gene order –Zero shuffles had as many windows with small MPED (less than 1.5) as the unshuffled data –Zero shuffles had as large a variance in MPED among windows as the unshuffled data

Coexpression in Arabidopsis

Selection and recombination In regions of low recombination –deleterious mutations can hitch-hike to high frequency along with favorable ones –favorable mutations are kept at low frequency by linkage to deleterious ones Therefore, the effectiveness of natural selection is causally related to recombination rate Are clusters more concentrated in regions of –high recombination (i.e. are they adaptive) –low (i.e. are they maladaptive)?

Measuring recombination rate Chromosome 1

Co-expression is greater in low recombination regions

Co-expression clusters MPSS data provides evidence for clusters of co-expression among non- related genes in Arabidopsis Co-expression is greater in regions of low recombination Thus, co-expression clusters may be maladapative, at least on average

Outline Arabidopsis gene expression (MPSS) Two evolutionary issues in the evolution of expression profiles: –Physical clustering of co-expressed genes –Divergence of duplicated genes

Divergence of duplicated genes Age of duplication Expression distance

Duplicated genes in Arabidopsis

Modes of gene duplication Tandem (unequal crossing-over) Dispersed (transposition) Segmental (polyploidy)

Divergence of duplicated genes All gene families of size 2 in Arabidopsis were classified as ‘dispersed’, ‘segmental’ or ‘tandem’ Expression distance was calculated for each The number of silent (i.e. synonymous) substitutions per site was calculated for each (as a proxy for age since duplication)

Divergence and mode of duplication

Divergence of duplicated genes Almost all expression divergence occurs during (or immediately following) duplication Initial expression divergence is more extreme for tandem than dispersed duplicates Tandem and dispersed duplicates with the most divergent expression profiles are quickly lost Segmental duplicates plateau at a lower level of expression divergence than dispersed duplicates The average divergence in relative expression level in each tissue is about 8-fold.

Lessons learned Clusters of co-expression in Arabidopsis may be largely the result of a rain of weakly deleterious mutations that homogenize the expression profiles of neighboring genes Divergence in expression profile between duplicated genes is dependent on the nature of the mutation that gave rise to the duplication

Thanks! UNC Chapel Hill –Jianhua Hu University of Delaware –Blake Meyers NSF Plant Genome Research Program –DBI (TJV) –DBI (BCM)