Presentation is loading. Please wait.

Presentation is loading. Please wait.

The gene family play and the chromosomal theater Todd Vision Department of Biology University of North Carolina at Chapel Hill.

Similar presentations


Presentation on theme: "The gene family play and the chromosomal theater Todd Vision Department of Biology University of North Carolina at Chapel Hill."— Presentation transcript:

1 The gene family play and the chromosomal theater Todd Vision Department of Biology University of North Carolina at Chapel Hill

2 Outline  Large-scale duplication and loss of genes in the angiosperms  Looking into the future of plant phylogenomics  A case study in gene family demography  Duplication and functional divergence

3 Paul Franz, University of Amsterdam

4 Arabidopsis as a hub for plant comparative maps data from Arumuganathan & Earle (1991)Plant Mol Biol Rep 9:208-218

5 Tomato-Arabidopsis synteny Bancroft (2001) TIG 17, 89 after Ku et al (2000) PNAS 97, 9121

6 Duplicated genes in Arabidopsis

7 Modes of gene duplication  Tandem (T) unequal crossing-over mostly young  Dispersed (D) transposition all ages  Segmental (S) polyploidy all old

8 Paleotetraploidy? The Arabidopsis Genome Initiative. 2000. Nature 408:796

9 Vision et al. (2000) Science 290:2114-7.

10 Microsynteny within blocks

11 distribution of d A Problems proteins diverge at different rates high d A is difficult to estimate Solution average d A within blocks in blocks not in blocks

12 A B D C E F 0 50100 150200 Mya discrete duplication events monocots (rice) Asterids (tomato) Rosids (Arabidopsis) 110-160 Mya 160-240 Mya

13 the 2-4 complex (one ancestral segment broken up by 4 large inversions)

14 coefficient of variation = 0.67 coefficient of variation = 0.53

15 Mayer et al. (2001) Genome Res. 11, 1167 Rice-Arabidopsis microsynteny

16 Blanc, Hokamp, Wolfe (2003) Genome Res. 13, 137-144.

17 Arabidopsis Rice Arabidopsis Rice Arabidopsis duplication

18 Block 37 after Asterid-Rosid split Block 57 before monocot-dicot divergence Raes, Vandepoele, Saeys, Simillion, Van de Peer (2003) J. Struct. Func. Genomics 3, 117-129

19 Divergence among duplicated genes in rice Goff et al. (2002) Science 296: 92

20 Hidden syntenies Simillion, Vandepoele, Van Montagu, Zabeau, Van de Peer (2002) PNAS 99, 13627

21 Interspecies comparison can reveal hidden syntenies Vandepoele, Simillion, Van de Peer (2002) TIG 18, 606-608

22 Comparative mapping in a phylogenetic context

23 Major plant genome datasets Family Genus genome EST map Aizoaceae Mesembryanthemum crystallinum X Brassicaceae Arabidopsis thaliana X X X Brassica spp. X Fabaceae Glycine max X X Medicago truncatula X X Phaseolus spp. X Malvaceae Gossypium arboreum X X Solanaceae Capsicum annuum X Lycopersicon esculentum X X Solanum tuberosum X X Poaceae Hordeum vulgare X X Oryza sativa X X X Sorghum bicolor/propinguim X X Triticum aestivum X X Zea mays X X Other Beta vulgaris X Chlamydomonas reinhardtii X X Pinus taeda X X Populus spp. X Prunus spp. X

24 Plant unigene datasets speciesTIGRPlantGDB barley4988574621 beetna13565 chlamydomonas30296na citrusna4266 coffeena392 cotton2435027854 grape4988574621 iceplant84558945 lettuce 21960na lotus 11025na maize5506371655 marchantiana1059 medicago 3697643384 oatna361 onion11726na pine2688224668 poplarna20935 potato2427524839 rice6077852156 rye51995384 sorghum3327334363 soybean6782673946 sunflower20520na tomato 3101235725 wheat10950995949 + Arabidopsis 27170

25 Wikström et al (2001) Proc R Soc Lond B 268, 2211

26 Plant phylogenomics: Phytome  The goal is to integrate Organismal phylogeny Gene family  sequence  alignment  phylogeny Genetic and physical maps

27 Some uses for Phytome  Starting with a chromosome segment Identify homologous segments Predict unobserved gene content (candidate QTL)  Starting with a gene family Resolve orthology/paralogy relationships Identify coevolving families  Starting with a species Explore lineage-specific diversification Guide comparative mapping wet-work

28 Homolog identification Multiple sequence alignment Protein sequence prediction Protein family clustering Phylogenetic inference Unigene collections Annotations Phytome Current pipeline

29

30 Lineage specific diversification Arabidopsis Cotton Medicago Tomato Rice 1033 436173 334 696 836 715 919 152 genes are “single copy” in all four species

31 A tale of two sisters: the ARF and the Aux/IAA gene families  Modulate whole plant response to auxin  Interact via dimerization ARFs are transcription factors Aux/IAAs bind and repress ARFs in the absence of auxin

32 The chromosomal context

33 Diversification of ARFs

34 Diversification of the Aux/ IAA s

35

36 Why the different patterns of diversification?  12% (ARF) vs 40% (Aux/IAA) segmental duplications  Presumably reflects differential retention  Possible explanations Dosage requirements Coevolution with other interacting genes Regional transcriptional regulation

37 Divergence of duplicated genes Age of duplication Divergence in expression profile

38 Duplicate pairs in yeast and human (Gu et al. 2002, Makova and Li 2003)  Appx. 50% of pairs diverge very rapidly  Proportion of divergent pairs increases with K s and K a Plateaus at K a ~0.3 in human  In humans, Immune response genes over-represented among young, divergent pairs Distantly related pairs with conserved expression tend to be either ubiquitous or very tissue specific

39 Retention of duplicated genes  Nonfunctionalization, or loss of one copy The fate of most pairs  Neofunctionalization (NF) Positive selection on a new mutation can maintain the pair  Subfunctionalization (SF) Mutations that increase the specificity of duplicates can fix due to drift provided that, combined, the two copies provide the functionality of the ancestral gene. Once SF happens, both copies are indispensable and are retained. One prediction of the model is that SF more likely for tandem than dispersed pairs (due to linkage)

40 Digital expression profiling  Massively Parallel Signature Sequencing (MPSS) Count occurrence of 17-20 bp mRNA signatures Cloning and sequencing is done on microbeads Similar to Serial Analysis of Gene Expression (SAGE)  “Bar-code” counting reduces concerns of cross-hybridization probe affinity background hybridization  Advantages Accurate counts of low expression genes Can distinguish expression profiles of duplicate genes

41 MPSS library construction AAAAAAA extract mRNA from tissue AAAAAAA TTTTTTT 5’ - Add standard primer (added by cloning) 3’ - Add unique 32 bp tag and standard primer AAAAAAA mRNA Cut w/ Sau3A AAAAAAA TTTTTTT AAAAAAA Convert to cDNA TTTTTTT Add linker Brenner et al., PNAS 97:1665-70. Remove 3’ primer and expose single stranded unique tag (digest, 3'  5' exonuclease) Anneal to beads coated with unique anti-tag (32 bp, complementary to tag on mRNA) PCR AAAAAAA TTTTTTT GATC

42 MPSS library construction The result of the library construction is a set of microbeads. Each bead contains many DNA molecules, all derived from the 3’ end of a single transcript. Beads are loaded in a monolayer on a microscope slide for the sequencing of 17 – 20 bp from the 5’ end. AAAAAAA Brenner et al., PNAS 97:1665-70. Sort by FACS to remove ‘empty’ beads

43 MPSS Sequencing Repeat Cycle 8 7 6 5 Steps of four bases; overhang is shifted by four bases in each round NNNN Digest with Type IIS enzyme to uncover next 4 bases 9 bp 13 bp CNNN 4 3 2 1 ^ ^ GNNN CODEC4 RS DECODERED Sequence by hybridization 16 cycles for 4 bp NNXN CODEX2 XNNN CODEX4 NXNN CODEX3 NNNX CODEX1 RS 4 3 2 1 NNNN + Add adaptors Brenner et al., Nat. Biotech. 18:630-4.

44 MPSS Sequencing GATCAATCGGACTTGTC GATCGTGCATCAGCAGT GATCCGATACAGCTTTG GATCTATGGGTATAGTC GATCCATCGTTTGGTGC GATCCCAGCAAGATAAC GATCCTCCGTCTTCACA GATCACTTCTCTCATTA GATCTACCAGAACTCGG. GATCGGACCGATCGACT 2 53 212 349 417 561 672 702 814. 2,935 1 2 3 4 5 6 7 8 9. 30,285 Each bead provides a signature of 17-20 bp Tag # Signature Sequence # of Beads (Frequency) Two sets of signatures are generated from each sample in different reading frames staggered by two bases Total # of tags: >1,000,000 ATG TGA

45 Classifying signatures Potential alternative splicing or nested gene Potential alternative termination Potential un-annotated ORF Potential anti-sense transcript Anti-sense transcript or nested gene? Duplicated: expression may be from other site in genome Triangles refer to colors used on our web page: Class 1 - in an exon, same strand as ORF. Class 2 - within 500 bp after stop codon, same strand as ORF. Class 3 - anti-sense of ORF (like Class 1, but on opposite strand). Class 4 - in genome but NOT class 1, 2, 3, 5 or 6. Class 5 - entirely within intron, same strand. Class 6 - entirely within intron, anti-sense. Grey = potential signature NOT expressed Class 0 - signatures found in the expression libraries but not the genome. or Typical signatures

46 Core Arabidopsis MPSS libraries sequenced by Lynx for Blake Meyers, U. of Delaware SignaturesDistinct Library sequencedsignatures Root3,645,41448,102 Shoot2,885,22953,396 Flower1,791,46037,754 Callus1,963,47440,903 Silique2,018,78538,503 TOTAL12,304,362133,377

47 http://www.dbi.udel.edu/mpss Query by Sequence Arabidopsis gene identifier chromosomal position BAC clone ID MPSS signature Library comparison Site includes Library and tissue information FAQs and help pages

48 Genome-wide MPSS profile in Arabidopsis Of the 29,084 gene models, 17,849 match unambiguous, expressed class 1 and/or 2 signatures Chr. I Chr. II Chr. III Chr. IV Chr. V

49 Dataset of duplicate pairs  Gene families of size two in Arabidopsis classified as Dispersed (280) Segmental (149) Tandem (63)  For each pair Measure similarity/distance in expression profile Estimate of K s and K A

50 Expression distance library 1 library 2 library 3

51 The number of genes with >5 ppm expression in a given number of libraries among the 984 genes in pairs analyzed and among all Arabidopsis genes with MPSS profiles. LibrariesGenes in pairsAll genes 0153 (15.5%)4160 (23.3%) 1124 (12.6%)2643 (14.8%) 273 (7.4%)1727 (9.6%) 393 (9.5%)1777 (10.0%) 4109 (11.1%)1930 (10.8%) 5432 (43.9%) 5612 (31.4%)

52 Asymmetry in levels of expression among libraries within pairs Symmetry of divergence Type of PairABCD ________________________________________________________________ Young Dispersed (Ks  0.5)146186 15.7%68.5%9.0%6.7% Tandem (Ks  0.5)8 29109 14.3%51.817.9%16.1% Old Dispersed (Ks>0.5)351112421 18.3%58.1%12.6%11.0% Segmental (All)3110477 20.8%69.8%4.7%4.7% A: Each copy has higher expression in at least one library B: One copy has higher expression in all libraries that differ and at least two libraries differ C: Copies differ in expression in only one library D: Copies do not differ in expression in any libraries

53 d N =0.48+0.37  K A, p<0.0001

54

55 Pairs with small Ks but dissimilar expression profiles. KsKadupgene paircallusflowerleafrootsilique 0.03<0.01DAT1G8070071591114094 AT1G80980001817 0.170.05TAT2G4628024621016030880 AT2G46290282912916 0.200.06TAT2G154004145534 AT2G15430421281413618 0.220.05DAT1G362801391310 AT4G184404087696951 0.260.05TAT1G7127088564452107 AT1G7130000001 0.270.07TAT3G132902022116 AT3G133002462457219277 0.270.10TAT1G2939018238898165 AT1G293950635036 0.270.06TAT3G26070161693460524 AT3G2608034913414135 0.280.13DAT3G5619021611514423956 AT3G56450150641

56 Pairs with large Ks but similar expression profiles. KsKadupgene paircallusflowerleafrootsilique 0.870.28TAT3G16220161057319 AT3G162302112351313 0.890.13DAT3G03660140000 AT5G17810710000 0.950.29DAT2G41180571478429 AT3G56710751539314 0.970.28DAT1G31814239430 AT5G1632005510198 0.980.23DAT5G072300344000 AT5G620800288000 0.990.26DAT3G221608661044 AT4G15120342000

57 A closing thought  1965 The Ecological Theater and the Evolutionary Play, G. E. Hutchison  2004 The Chromosomal Theater and the Gene Family Play  Phylogenetics has a great deal to contribute to understanding the evolutionary interplay of genome structure and function

58 Dan Brown Brandon Gaut Steven Tanksley Liqing Zhang Jason Phillips Dihui Lu David Remington Jason Reed Tom Guilfoyle Blake Meyers NSF


Download ppt "The gene family play and the chromosomal theater Todd Vision Department of Biology University of North Carolina at Chapel Hill."

Similar presentations


Ads by Google