Presentation is loading. Please wait.

Presentation is loading. Please wait.

Putting gene family evolution in its chromosomal context Todd Vision Department of Biology University of North Carolina at Chapel Hill.

Similar presentations


Presentation on theme: "Putting gene family evolution in its chromosomal context Todd Vision Department of Biology University of North Carolina at Chapel Hill."— Presentation transcript:

1 Putting gene family evolution in its chromosomal context Todd Vision Department of Biology University of North Carolina at Chapel Hill

2 Abstract  In complex genomes, the continual duplication, functional divergence, and loss of genes over time results in gene content divergence among related lineages. In addition to changes in content, the order of genes within the genome can be disturbed by a host of different rearrangement events. Changes in gene content and order are of interest for a number of reasons. Such mutations, particularly those that affect gene content, may, as a class, have dramatic phenotypic consequences; thus, they merit study from a functional perspective. In order to predict the location of genes in non-model organisms using comparative mapping, molecular breeders will need to have better models for how gene content and order and evolve. And from an evolutionary perspective, it is of interest to understand how carefully our gene content and gene order is the directly governed by selective forces, and what other forces are at work. Here, I describe what we currently know about the evolution of gene content and order among the flowering plants. This clade contains all of the world's major food crops, and is thus the focus of a great deal of comparative mapping effort. I will offer my thoughts on what computational biology has to contribute to this emerging area of inquiry.

3 Outline  Gene order rearrangement in plants Chromosomal perspective Gene family perspective  Gene duplication and functional divergence Segmental duplications as a tool

4

5 Chromosomal perspective  Biological importance Clustering of gene function Clustering of transcriptional activity  Applied importance Conservation of gene order (synteny)

6 Devos and Gale 2000 Plant Cell 12, 637

7 Arabidopsis as a hub for plant comparative maps Arumuganathan and Earle 1991 Plant Mol Biol Rep 9, 208.

8 Arabidopsis paleopolyploidy The Arabidopsis Genome Initiative 2000 Nature 408, 796

9 Non-overlapping syntenies

10 Blanc et al. 2003 Genome Res. 13, 137.

11 Blanc and Wolfe 2004 Plant Cell 16, 1667.

12 Tomato-Arabidopsis synteny Bancroft 2001 TIG 17, 89 after Ku et al. 2000 PNAS 97, 9121.

13 Mayer et al. 2001 Genome Res. 11, 1167. Rice-Arabidopsis microsynteny

14

15 Hidden syntenies Simillion et al. 2002 PNAS 99, 13627.

16 Interspecies comparison can reveal hidden syntenies Vandepoele et al. 2002 TIG 18, 606.

17 Simillion et al. 2004 Genome Res. 14, 1095

18 From descriptive to predictive  Can we predict the gene content of homologous segments when markers are sparse?  Utility for QTL mapping Prioritize candidate genes in a QTL region from a non-sequenced genome Provide markers for fine-mapping

19 Hidden Markov Models (HMM) 12end p 1 (a) p 1 (b) p 2 (a) p 2 (b) t 1,1 t 1,2 t 2,2 t 2,end Transition probabilities Hidden states Emission probabilities Observed states: a->b->a Hidden states: 1->1->2->end Probability: p 1 (a) t 1,1 p 1 (b) t 1,2 p 2 (a) t 2,end

20 A gene content HMM  Observed states a homologous gene is either observed or not  Hidden states presence or absence of gene within a segment  Emission probabilities A gene will be unobserved if it is not present A gene may be unobserved even if it is present Dependent on the density of the gene map  Transition probabilities reflect conservation of gene content along the branches of a phylogeny

21 Transition probabilities and the segment phylogeny

22 A1A1 P PA2A2 PA 1-   1 A1A1 1-  1-  1  1-   1-  i ii 1 A2A2 Loss (L) Loss-Gain (LG) Multiple Loss-Gain (MLG)

23 Estimating model parameters  Segment phylogeny Each set of homologous genes is missing from some segments Estiimate an “averaged” distance matrix Build tree with neighbor-joining and midpoint rooting  HMM parameter estimation Loss rate(s) Gain rate Number of genes present at the root

24 Do parameter estimates converge? LG model n=100 genes no missing data  1 = 0.1,  2 = 0.3 1000 replicates Initial  SE 0.050.1060.0060.2940.018 0.30.1060.0060.2940.018

25 Accuracy of hidden state assignments 5 segment phylogeny,  =  1 =0.1,  2 =0.3,  =0.1, 24% gain

26 Vandepoele et al 2003 Plant Cell 15, 2192. A large multiplicon 12 segments from rice and arabidopsis 56 sets of homologous genes

27 Self-validation test ? ? ? ? ?

28 Probability of gene presence (8 longest segments) Branch lengths scaled so that longest branch is 1.0 Estimate of  = 0.7 SegmentTrueEstimateDiff 10.2510.173+0.078 20.2250.166+0.059 30.2620.171+0.091 40.1490.175-0.026 50.2680.171+0.097 60.2330.167+0.066 70.2260.170+0.056 80.1480.168-0.020

29 Summary: gene content HMM  Multispecies comparative maps Becoming more common Most species only partially characterized Usefulness also compromised by sparse synteny  Probabilistic models will allow us to move from simple descriptions of the extent of synteny to predictive tools that can guide further experiments

30 Gene family perspective  Modes of duplication Tandem (T) Dispersed (D) Segmental (S) T D S

31 A tale of two sisters: the ARF and the Aux/IAA gene families  Modulate whole plant response to auxin  Interact via dimerization ARFs are transcription factors Aux/IAAs bind and repress ARFs in the absence of auxin

32 Diversification of ARFs Remington et al 2004 Plant Cell 135, 1738

33 The chromosomal context Remington et al 2004 Plant Cell 135, 1738

34 Diversification of the Aux/ IAA s Remington et al 2004 Plant Cell 135, 1738

35

36 Why the different patterns of diversification?  12% (ARF) vs 40% (Aux/IAA) segmental duplications  Presumably reflects differential retention  Possible explanations Dosage requirements Coevolution with other interacting genes Regional transcriptional regulation

37 How typical is the Aux/IAA family? Cannon et al. 2004 BMC Plant Biology 4, 10. Gene familyGenesS events Proteasome alpha & beta subunits239 Ser/Thr phosphatase2610 Ras related GTP-binding7219 Auxin-independent growth promoter 338 Major instrinsic protein3810 Calmodulin7920 Phosphatidylcholine transferase308 Cation/hydrogen exchanger288

38 Blanc and Wolfe 2004 Plant Cell 16, 1679. Segmental duplication of pathways?

39 Summary: gene family perspective  Chromosomal context can matter  Gene families differ in their patterns of duplicate gene proliferation Presumably due to differential retention  Polyploidy Qualitatively differs from other gene duplication modes Divergence of whole pathways possible

40 Functional divergence and chromosomal context Do patterns of divergence (ie spatiotemporal expression) differ among T, D, and S duplicates?

41 Retention of duplicated genes  Neofunctionalization (NF) Mutations lead to new divergent functions that are positively selected  Subfunctionalization (SF) Mutations knock out ancestral functions and make both copies indispensible New divergent functions evolve secondarily SF more likely for tandem than dispersed pairs (due to linkage)  There are other possibilities Duplicates retained when higher expression is favored

42 Divergence of duplicated genes Age of duplication Divergence in expression profile

43 Duplicate pairs in yeast and human (Gu et al. 2002, Makova and Li 2003)  Appx. 50% of pairs diverge very rapidly  Proportion of divergent pairs increases with synonymous substitions (K s )  Less so with replacement changes (K a ) Plateaus at K a ~0.3 in human  In humans, distantly related pairs with conserved expression tend to be either ubiquitous or very tissue specific

44 Digital expression profiling  Massively Parallel Signature Sequencing (MPSS) Count occurrence of 17-20 bp mRNA signatures Cloning and sequencing is done on microbeads Similar to Serial Analysis of Gene Expression (SAGE)  “Bar-code” counting reduces concerns of cross-hybridization probe affinity background hybridization  Which enables Accurate counts of low expression genes Distinguishing expression profiles of duplicate genes

45 MPSS technology Brenner et al. 2000 PNAS 97:1665. Sort by FACS and deposit in channeled monolayer Clone 3’ ends of transcripts to microbeads Sequence 17-20 bp from 5’ end by hybridization

46 MPSS Data GATCAATCGGACTTGTC GATCGTGCATCAGCAGT GATCCGATACAGCTTTG GATCTATGGGTATAGTC GATCCATCGTTTGGTGC GATCCCAGCAAGATAAC GATCCTCCGTCTTCACA GATCACTTCTCTCATTA GATCTACCAGAACTCGG. GATCGGACCGATCGACT 2 53 212 349 417 561 672 702 814. 2,935 signaturefrequency Total # of tags: >1,000,000

47 Classifying signatures Potential alternative splicing or nested gene Potential alternative termination Potential un-annotated ORF Potential anti-sense transcript Anti-sense transcript or nested gene? Duplicated: expression may be from other site in genome Triangles refer to colors used on our web page: Class 1 - in an exon, same strand as ORF. Class 2 - within 500 bp after stop codon, same strand as ORF. Class 3 - anti-sense of ORF (like Class 1, but on opposite strand). Class 4 - in genome but NOT class 1, 2, 3, 5 or 6. Class 5 - entirely within intron, same strand. Class 6 - entirely within intron, anti-sense. Grey = potential signature NOT expressed Class 0 - signatures found in the expression libraries but not the genome. or Typical signatures

48 Core Arabidopsis MPSS libraries sequenced by Lynx for Blake Meyers, U. of Delaware SignaturesDistinct Library sequencedsignatures Root3,645,41448,102 Shoot2,885,22953,396 Flower1,791,46037,754 Callus1,963,47440,903 Silique2,018,78538,503 TOTAL12,304,362133,377

49 http://www.dbi.udel.edu/mpss Query by Sequence Arabidopsis gene identifier chromosomal position BAC clone ID MPSS signature Library comparison Site includes Library and tissue information FAQs and help pages

50 Genome-wide MPSS profile in Arabidopsis Of the 29,084 gene models, 17,849 match unambiguous, expressed class 1 and/or 2 signatures Chr. I Chr. II Chr. III Chr. IV Chr. V

51 Dataset of duplicate pairs  Arabidopsis gene families of size 2 classified as Dispersed (280) Segmental (149) Tandem (63)  For each pair Measured similarity/distance in expression profile Estimated silent K s and replacement K A changes

52 Expression distance library 1 library 2 library 3

53 Major findings  Many pairs are divergent in sequence but not expression and vice versa  Pairs have atypically high expression Especially slowly evolving pairs  Divergence increases with K a, Particularly among S duplicates! Divergence tends to be highly asymmetric

54 LibrariesGenes in pairsAll genes 0153 (15.5%)4160 (23.3%) 1124 (12.6%)2643 (14.8%) 273 (7.4%)1727 (9.6%) 393 (9.5%)1777 (10.0%) 4109 (11.1%)1930 (10.8%) 5432 (43.9%) 5612 (31.4%) Expression level >5 ppm in x libraries

55

56 d N =0.48+0.37  K A, p<0.0001

57 Asymmetric divergence Type of PairABCD ___________________________________________________ Young Dispersed (Ks  0.5)146186 15.7%68.5%9.0%6.7% Tandem (Ks  0.5)8 29109 14.3%51.8%17.9%16.1% Old Dispersed (Ks>0.5)351112421 18.3%58.1%12.6%11.0% Segmental (All)3110477 20.8%69.8%4.7%4.7% A: Each copy has higher expression in at least one library B: One copy has higher expression in all libraries that differ and at least two libraries differ C: Copies differ in expression in only one library D: Copies do not differ in expression in any libraries

58 Why put gene family evolution into a chromosomal context?  We can begin to understand and utilize patterns of evolution in gene order  We can gain insight into the function and evolution of gene families that are not apparent from beanbag genomics

59 Thanks to: Zongli Xu David Remington Jason Reed Tom Guilfoyle Blake Meyers NSF


Download ppt "Putting gene family evolution in its chromosomal context Todd Vision Department of Biology University of North Carolina at Chapel Hill."

Similar presentations


Ads by Google