Presentation is loading. Please wait.

Presentation is loading. Please wait.

Integration of data to uncover evolutionary trends and infer protein function: The tale of Rcs1 Integration of data to uncover evolutionary trends and.

Similar presentations


Presentation on theme: "Integration of data to uncover evolutionary trends and infer protein function: The tale of Rcs1 Integration of data to uncover evolutionary trends and."— Presentation transcript:

1 Integration of data to uncover evolutionary trends and infer protein function: The tale of Rcs1 Integration of data to uncover evolutionary trends and infer protein function: The tale of Rcs1 MRC Laboratory of Molecular Biology Cambridge MRC Laboratory of Molecular Biology Cambridge M. Madan Babu

2 Overview of research Evolution of biological systems Evolutionary of transcriptional networks Evolution of networks within and across genomes Nature Genetics (2004) J Mol Biol (2006a) Evolution of transcription factors Nuc. Acids. Res (2003) Structure and dynamics of transcriptional networks Structure and function of biological systems Uncovering a distributed architecture in networks Methods to study network dynamics J Mol Biol (2006b)J Mol Biol (2006c) Discovery of novel DNA binding proteins Data integration, function prediction and classification Nature (2004) Nuc. Acids. Res (2005)Cell Cycle (2006) C C H H Discovery of transcription factors in Plasmodium Evolution of a global regulatory hubs

3 Rcs1 – regulator of cell size 1 S. cerevisiae - wild type S. cerevisiae - Rcs1 mutant Micrographs and data from SCMD Roundness of mother cell 1.29 1.20 The following parameters that were used to define cell-size for the Rcs1 mutant were at least 2 Standard deviation (2 ) from the mean values of the wild-type Mother cell-size 874 760 Contour length of mother cell 108 100 Long axis length of mother cell 36 33 Short axis length of mother cell 30 27 Size of mutant cells are twice that of the parental strain The critical size for budding in the mutant is similarly increased Rcs1 binds specific DNA sequences

4 C6-Fungal C2H2-Zn bZip Homeo Gata bHLH Fkh Hsf Apses Myb Mads HMG1 LisH Gcr1 Rcs1 Ace1 AT-Hook Tig Abf1 Tea Ime1 Dal82 Tigger P53 Rcs1 is a global regulatory hub – Network analysis I Transcriptional regulatory network in yeast 123 41 314 Aft2pRcs1p Number of target genes regulated Sub-network of Rcs1 and Aft2 No. of members Distribution of DNA binding domains in yeast transcription factors Rcs1p and Aft2p are global regulatory hubs with an as yet uncharacterized DNA binding domain How did the paralogous hubs that regulate distinct sets of genes evolve?

5 Relationship to WRKY DNA binding domain – Sequence analysis I Non-redundant database +.... Lineage specific expansion in several fungi and is seen in lower eukaryotes Candida albicans (ascomycete) Yarrowia lipolytica (ascomycete) Ustilago maydis (basidiomycete) Cryptococcus sp (basidiomycetes) E. cuniculi (microsporidia) Giardia lamblia (diplomonad) Dictyostelium discoideum Entamoeba histolytica Profiles + HMM of this region Non-redundant database + WRKY domain (Arabidopsis) FAR-1 type transposase (Medicago truncatula) Globular region maps to WRKY DNA-binding domain

6 Non-redundant database + WRKY DNA-binding Domain from Arabidopsis WRKY4 Rcs1 (S. cerevisiae) Gcm1 (Drosophila) WRKY DNA-binding domain maps to the same globular region Confirmation of relationship to WRKY DBD – Sequence analysis II Multiple sequence alignment of all globular domains JPRED/PHD Sequence of secondary structure is similar to the WRKY DNA-binding domain and GCM1 protein seen in mouse Homologs of the conserved globular domain constitutes a novel family of the WRKY DNA-binding domain S1S2S3S4

7 Characterization of the globular domain – structural analysis I A. thaliana transcription factor (WRKY4:1wj2:NMR structure) S1S2S3 S1S2S3 Predicted SS of Rcs1 DBD SS of WRKY4 S4 S1S2S3 S1S2S3 Predicted SS of Rcs1 DBD SS of GCM1 S4 Mus musculus Glial Cell Missing - 1 (GCM-1:1odh:X-ray structure) Both WRKY and GCM1 have similar network of stabilizing interactions Template structure

8 S1S2S3 4 residues involved in metal co-ordination and 10 residues involved in key stabilizing hydrophobic interactions that determine the path of the backbone in the four strands of the GCM1-WRKY domain show a strong pattern of conservation. S4 Characterization of the globular domain – structural analysis II Core fold of the Rcs1 DBD will be similar to the WRKY-GCM1 domain and may bind DNA in a similar way

9 Classification of WRKY-GCM1 superfamily – Cladistic analysis I S1S2S3S4 S1 S2 S3 S4 C C H H Zn 2+ Template structure + S1 S2 S3 S4 C C H H Zn 2+ Classical WRKY (C) WRKY motif in S1 Short loop between S2 & S3 S1 S2 S3 S4 C H H Zn 2+ N-terminal helix Conserved W in S4 Large insert between S2 & S3 Insert containing version (I) W C S1 S2 S3 S4 C C H C Zn 2+ HxC containing version (HxC) HxC instead of HxH N-terminal helix Short insert between S2 & S3 S1 S2 S3 S4 C C H H Zn 2+ FLYWCH domain (F) Conserved W in S2 Sequence features W S1 S2 S3 S4 C H H Zn 2+ Insertion of Zn ribbon between S2 and S3 GCM domain (G) C GC HxC IF WRKY4Rcs1 Far1 Mdg Gcm1

10 Domain context for the different families – network analysis I S1 S2 S3 S4 C C H H Zn 2+ Classical WRKY (C) S1 S2 S3 S4 C H H Zn 2+ Insert containing version (I) W C S1 S2 S3 S4 C C H C Zn 2+ HxC containing version (HxC) S1 S2 S3 S4 C C H H Zn 2+ FLYWCH domain (F) W S1 S2 S3 S4 C H H Zn 2+ GCM domain (G) C C e.g. WRKY4e.g. Rcs1 e.g. Far1 e.g. Mod (mdg) CC Tandem Stand alone Zn cluster I I I Tandem Stand alone HxC MULE Tpase OUT protease MULE Tpase Mobile element Stand alone HxC e.g. 101.t00020 e.g. At2g23500 F BED finger Stand alone POZ F G G Stand alone e.g. Gcm1 SMBD Zn knuckle

11 Human Fly Worm Fungi Plants Entamoeba Slim mould GC HxC IF Phyletic distribution – Comparative genome analysis I TF only TF + TP Plants Lower eukaryotes Fungi Higher Eukaryotes Transcription factor Transposase GCM1 and FLYWCH versions evolved from an insert containing version that is a transposase Classical version of the WRKY evolved from an insert containing version that is a transposase HxC and Insert containing versions are seen as both transcription factors and as transposases

12 -explain that there has been multiple transitions from transposase to TFs in the fungal genomes -explain how this could have happened by showing the snapshot of the breakup of selfish elements into two distinct products -explain that the transposase can itself regulate the gene expression of itself

13 Outline of the presentation Rcs1 and aft2 have a distinct version of the WRKY type DNA binding domain Sensitive sequence search reveals that Oryza sativa (monocot) Arabidopsis thaliana (dicot) Medicago truncatula (dicot) Nicotiana tabacum (dicot)

14 Structural equivalences of WRKY-GCM1 domain proteins with Bed and Zn finger S1 S2 S3 S4 C C H Zn 2+ H Zn C C C C S1 S2 S3 S4 C C H H Zn 2+ WRKY (1wj2) GCM-type WRKY (1odh) S1 S2 S3 C C H H Zn 2+ S4 S1 S2 H1 C C H H Zn 2+ Bed-finger (2ct5) Classical Zn-finger (1m36)

15 Why Rcs1? While systematically analyzing the genes which gave rise to abnormal cell size, We and the other noted that mutants of Rcs1 give abnormal cell shape. It was known to be an important transcription factor involved in cell size regulation – explain showing graphs and images Independently, during the analysis of the TNET in yeast We looked at the hubs and the DNA binding domains That were present in them. Interestingly, there were two Hubs that did not have any known DNA binding domain Identified in them, but the region which mediates DNA was known – explain showing the family relationship Of the hubs -only two members, and both are hubs -how and when did they evolve? Standard search procedures using Pfam and other databases did not provide any clue about the domain. So we set out to characterize the DNA binding region from Rcs1p and its paralog Aft2p using sensitive sequence search and other computational methods. -show output from Pfam hits

16 Structural aspects of the DNA binding domain Explain the residues involved in metal chelating -DNA contacting surface -Inserts in the loops -Stabilizing contacts involved WRKY DNA binding domain – Structure analysis I

17 WRKY DNA binding domain – Structure analysis II Structure comparisons identify several other Known transcription factors including the GCM protein in eukaryotes -Explain the insert of a zinc ribbon in the loop In fact sequence comparison without the insert can pick these WRKY proteins

18 Multiple starting points identified all homologs in the different species This allowed us to classify the sequences into different families Each with a specific feature suggesting common evolutionary relationship Based on shared and derived features of the domains - List the 5 families and point to features involved using a structure template Classification of WRKY domains – Cladistic analysis I

19 Phylogenetic distribution and domain architecture for the different families - I Phyletic profiles of the different domains points to the possibility that these transcription factors could have evolved from transposases With at least two distinct recruitment into transcription factors. -In plants in one case -In the base of the fungal genomes in the other case

20 Phylogenetic distribution and domain architecture for the different families - II

21 Comparative genomics using the fungal genomes provides the clue for the evolution of these TFs -explain that there has been multiple transitions from transposase to TFs in the fungal genomes -explain how this could have happened by showing the snapshot of the breakup of selfish elements into two distinct products -explain that the transposase can itself regulate the gene expression of itself

22 Comparative genomics using the fungal genomes provides the clue for the evolution of these TFs -extensive recruitment of the transposase in the different fungal lineages -multiple jumps within the fungal lineage -very recent duplication event in the order Saccharomycetales suggest hubs could Evolve rapidly -Candida rbf1 and other TFs independently duplicated and evolved as global regulators

23 Since it happened in fungal genomes, we ask how does this behave in the plants. -show the gene expression patterns for the different subfamilies. We see two trends one where divergence has primarily occurred in the expression changes rather than in the protein sequence, and the other in which proteins with the same expression pattern have different binding site residues. -spatio-temporal changes in gene expression -It is experimentally well known that the FLYWCH and the GCM proteins are developmentally important regulatory proteins. So in three lineages there has been recruitment of the transposase into becoming a developmentally important global regulator. Analysis of the gene expression data in plants

24 There are interesting traces of gene expression pattern when we see for the different WRKY containing proteins. TPases are expressed in the root and in the pollen enhancing the possibility of rapidly expanding themselves during evolution.

25 Acknowledgements S Balaji Lakshminarayan Iyer Aravind group L Aravind

26 * Encephalitozoon cuniculi Dictyostelium discoideum Plants Giardia lamblia Ciliates Apicomplexa Fungi Caenorhabditis elegans Homo sapiens Drosophila melanogaster Classical WRKY HxC-type WRKY MULE transposase Animals Entamoeba histolytica Plant specific Zn-cluster SWIM domain POZ 1- 5 GLP_79_64671_67418_Glam_71077115) GLP_9_36401_35940_Glam_71071693) 101.t00020_Ehis_67474280 dd_03024_Ddis_28829829 ECU05_0180_Ecun_19173554 mutA_Ylip_49523824 TTR1_Atha_30694675 WRKY41_Osat_46394336 WRKY58_Atha_22330782 At2g34830_Atha_27754312 NtEIG-D48_Ntab_10798760 FAR1_Atha_18414374 AT4g19990_Atha_7268794 LOC_Os11g31760_Osat_77551147 At2g23500_Atha_3242713 C26E6.2_Cele_32565510 T24C4.2_Cele_17555262 C20orf164_Hsap_13929452 KIAA1552_Hsap_10047169 hGCMa_Hsap_1769820 mod(mdg4)_Dmel_24648712 LOC411361_Amel_66547010 CG13845_Dmel_24649011 gcm_Dmel_17137116 GCM-type WRKY Zinc knuckle BED finger * * Plant specific N-all-beta TIR domain LRR STAND ATPase FLYWCH-type WRKY Insert-containing WRKY C G HxC I F G F G F F F F C CC C C C C I I I I CHGG_08318_CGLO_88179597 I I * Isochoris matase I AN6124.2_ANID_67539908 I AT-hook HxC MtrDRAFT_AC146590g49v2_Mtru_92891293 1- 5 I AFT2_Scer_6325054 HxC Afu2g08220_Afum_71000950 I I OTU I I II YALI0C00781g_Ylip_50547661 CHGG_00311_Cglo_88184608 I I YALI0A02266g_Ylip_50543034 * MtrDRAFT_AC126008g21v1_Mtru_92876827 * * I UM03656.1_Umay_71019145 * C C C HxC C Ci-ZF-1_Cint_93003122 PHD finger C2H2 finger I F54C4.3_Cele_3790719 I T24C4.7_Cele_17555272 I Plant-specific mobile domain *

27 Expression profiles of WRKY-GCM1 domain proteins in Arabidopsis WRKY proteins show tissue specific expression WRKY proteins show light specific expression

28 123 41 314 Aft2pRcs1p Number of target genes regulated Aft2p Rcs1p Transcriptional network involving Aft2p and Rcs1p UM03656.1 Umay 71019145CAGL0H03487G CGLA 49526254CAGL0G09042G CGLA 49526062CaO19.2272 Calb 68482460DEHA0F25124g Dhan 50425555KLLA0D03256g Klac 50306475AFL087C AGOS 44984319ORFP Sklu Contig1830.2 kluyveriKwal 24045 waltiiORFP Skud Contig2057.12 kudriavzeiiORFP Scas Contig720.21 castelliRCS1 SCER 51830313ORFP 7853 mikataeORFP 8601 paradoxusORFP 21513 mikataeORFP Scas Contig690.14 castelliORFP 22109 paradoxusAFT2 SCER 6325054ORFP Skud Contig1659.3 kudriavzeii Relationship between Rcs1p and Aft2p homologs * *

29 SequenceStructureExpressionInteraction Conclusion Integration of different types of experimental data allowed us to Identify the DNA binding domain in Rcs1


Download ppt "Integration of data to uncover evolutionary trends and infer protein function: The tale of Rcs1 Integration of data to uncover evolutionary trends and."

Similar presentations


Ads by Google