Presentation is loading. Please wait.

Presentation is loading. Please wait.

Integration of data to uncover evolutionary trends and infer protein function: The tale of Rcs1 Integration of data to uncover evolutionary trends and.

Similar presentations


Presentation on theme: "Integration of data to uncover evolutionary trends and infer protein function: The tale of Rcs1 Integration of data to uncover evolutionary trends and."— Presentation transcript:

1 Integration of data to uncover evolutionary trends and infer protein function: The tale of Rcs1 Integration of data to uncover evolutionary trends and infer protein function: The tale of Rcs1 MRC Laboratory of Molecular Biology Cambridge MRC Laboratory of Molecular Biology Cambridge M. Madan Babu

2 Overview of research Evolution of biological systems Evolutionary of transcriptional networks Evolution of networks within and across genomes Nature Genetics (2004) J Mol Biol (2006a) Evolution of transcription factors Nuc. Acids. Res (2003) Structure and dynamics of transcriptional networks Structure and function of biological systems Uncovering a distributed architecture in networks Methods to study network dynamics J Mol Biol (2006b)J Mol Biol (2006c) Discovery of novel DNA binding proteins Data integration, function prediction and classification Nature (2004) Nuc. Acids. Res (2005)Cell Cycle (2006) C C H H Discovery of transcription factors in Plasmodium Evolution of a global regulatory hubs

3 Rcs1 – regulator of cell size 1 S. cerevisiae - wild type S. cerevisiae - Rcs1 mutant Micrographs and data from SCMD Roundness of mother cell The following parameters that were used to define cell-size for the Rcs1 mutant were at least 2 Standard deviation (2 ) from the mean values of the wild-type Mother cell-size Contour length of mother cell Long axis length of mother cell Short axis length of mother cell Size of mutant cells are twice that of the parental strain The critical size for budding in the mutant is similarly increased Rcs1 binds specific DNA sequences

4 C6-Fungal C2H2-Zn bZip Homeo Gata bHLH Fkh Hsf Apses Myb Mads HMG1 LisH Gcr1 Rcs1 Ace1 AT-Hook Tig Abf1 Tea Ime1 Dal82 Tigger P53 Rcs1 is a global regulatory hub – Network analysis I Transcriptional regulatory network in yeast Aft2pRcs1p Number of target genes regulated Sub-network of Rcs1 and Aft2 No. of members Distribution of DNA binding domains in yeast transcription factors Rcs1p and Aft2p are global regulatory hubs with an as yet uncharacterized DNA binding domain How did the paralogous hubs that regulate distinct sets of genes evolve?

5 Relationship to WRKY DNA binding domain – Sequence analysis I Non-redundant database Lineage specific expansion in several fungi and is seen in lower eukaryotes Candida albicans (ascomycete) Yarrowia lipolytica (ascomycete) Ustilago maydis (basidiomycete) Cryptococcus sp (basidiomycetes) E. cuniculi (microsporidia) Giardia lamblia (diplomonad) Dictyostelium discoideum Entamoeba histolytica Profiles + HMM of this region Non-redundant database + WRKY domain (Arabidopsis) FAR-1 type transposase (Medicago truncatula) Globular region maps to WRKY DNA-binding domain

6 Non-redundant database + WRKY DNA-binding Domain from Arabidopsis WRKY4 Rcs1 (S. cerevisiae) Gcm1 (Drosophila) WRKY DNA-binding domain maps to the same globular region Confirmation of relationship to WRKY DBD – Sequence analysis II Multiple sequence alignment of all globular domains JPRED/PHD Sequence of secondary structure is similar to the WRKY DNA-binding domain and GCM1 protein seen in mouse Homologs of the conserved globular domain constitutes a novel family of the WRKY DNA-binding domain S1S2S3S4

7 Characterization of the globular domain – structural analysis I A. thaliana transcription factor (WRKY4:1wj2:NMR structure) S1S2S3 S1S2S3 Predicted SS of Rcs1 DBD SS of WRKY4 S4 S1S2S3 S1S2S3 Predicted SS of Rcs1 DBD SS of GCM1 S4 Mus musculus Glial Cell Missing - 1 (GCM-1:1odh:X-ray structure) Both WRKY and GCM1 have similar network of stabilizing interactions Template structure

8 S1S2S3 4 residues involved in metal co-ordination and 10 residues involved in key stabilizing hydrophobic interactions that determine the path of the backbone in the four strands of the GCM1-WRKY domain show a strong pattern of conservation. S4 Characterization of the globular domain – structural analysis II Core fold of the Rcs1 DBD will be similar to the WRKY-GCM1 domain and may bind DNA in a similar way

9 Classification of WRKY-GCM1 superfamily – Cladistic analysis I S1S2S3S4 S1 S2 S3 S4 C C H H Zn 2+ Template structure + S1 S2 S3 S4 C C H H Zn 2+ Classical WRKY (C) WRKY motif in S1 Short loop between S2 & S3 S1 S2 S3 S4 C H H Zn 2+ N-terminal helix Conserved W in S4 Large insert between S2 & S3 Insert containing version (I) W C S1 S2 S3 S4 C C H C Zn 2+ HxC containing version (HxC) HxC instead of HxH N-terminal helix Short insert between S2 & S3 S1 S2 S3 S4 C C H H Zn 2+ FLYWCH domain (F) Conserved W in S2 Sequence features W S1 S2 S3 S4 C H H Zn 2+ Insertion of Zn ribbon between S2 and S3 GCM domain (G) C GC HxC IF WRKY4Rcs1 Far1 Mdg Gcm1

10 Domain context for the different families – network analysis I S1 S2 S3 S4 C C H H Zn 2+ Classical WRKY (C) S1 S2 S3 S4 C H H Zn 2+ Insert containing version (I) W C S1 S2 S3 S4 C C H C Zn 2+ HxC containing version (HxC) S1 S2 S3 S4 C C H H Zn 2+ FLYWCH domain (F) W S1 S2 S3 S4 C H H Zn 2+ GCM domain (G) C C e.g. WRKY4e.g. Rcs1 e.g. Far1 e.g. Mod (mdg) CC Tandem Stand alone Zn cluster I I I Tandem Stand alone HxC MULE Tpase OUT protease MULE Tpase Mobile element Stand alone HxC e.g. 101.t00020 e.g. At2g23500 F BED finger Stand alone POZ F G G Stand alone e.g. Gcm1 SMBD Zn knuckle

11 Human Fly Worm Fungi Plants Entamoeba Slim mould GC HxC IF Phyletic distribution – Comparative genome analysis I TF only TF + TP Plants Lower eukaryotes Fungi Higher Eukaryotes Transcription factor Transposase GCM1 and FLYWCH versions evolved from an insert containing version that is a transposase Classical version of the WRKY evolved from an insert containing version that is a transposase HxC and Insert containing versions are seen as both transcription factors and as transposases

12 -explain that there has been multiple transitions from transposase to TFs in the fungal genomes -explain how this could have happened by showing the snapshot of the breakup of selfish elements into two distinct products -explain that the transposase can itself regulate the gene expression of itself

13 Outline of the presentation Rcs1 and aft2 have a distinct version of the WRKY type DNA binding domain Sensitive sequence search reveals that Oryza sativa (monocot) Arabidopsis thaliana (dicot) Medicago truncatula (dicot) Nicotiana tabacum (dicot)

14 Structural equivalences of WRKY-GCM1 domain proteins with Bed and Zn finger S1 S2 S3 S4 C C H Zn 2+ H Zn C C C C S1 S2 S3 S4 C C H H Zn 2+ WRKY (1wj2) GCM-type WRKY (1odh) S1 S2 S3 C C H H Zn 2+ S4 S1 S2 H1 C C H H Zn 2+ Bed-finger (2ct5) Classical Zn-finger (1m36)

15 Why Rcs1? While systematically analyzing the genes which gave rise to abnormal cell size, We and the other noted that mutants of Rcs1 give abnormal cell shape. It was known to be an important transcription factor involved in cell size regulation – explain showing graphs and images Independently, during the analysis of the TNET in yeast We looked at the hubs and the DNA binding domains That were present in them. Interestingly, there were two Hubs that did not have any known DNA binding domain Identified in them, but the region which mediates DNA was known – explain showing the family relationship Of the hubs -only two members, and both are hubs -how and when did they evolve? Standard search procedures using Pfam and other databases did not provide any clue about the domain. So we set out to characterize the DNA binding region from Rcs1p and its paralog Aft2p using sensitive sequence search and other computational methods. -show output from Pfam hits

16 Structural aspects of the DNA binding domain Explain the residues involved in metal chelating -DNA contacting surface -Inserts in the loops -Stabilizing contacts involved WRKY DNA binding domain – Structure analysis I

17 WRKY DNA binding domain – Structure analysis II Structure comparisons identify several other Known transcription factors including the GCM protein in eukaryotes -Explain the insert of a zinc ribbon in the loop In fact sequence comparison without the insert can pick these WRKY proteins

18 Multiple starting points identified all homologs in the different species This allowed us to classify the sequences into different families Each with a specific feature suggesting common evolutionary relationship Based on shared and derived features of the domains - List the 5 families and point to features involved using a structure template Classification of WRKY domains – Cladistic analysis I

19 Phylogenetic distribution and domain architecture for the different families - I Phyletic profiles of the different domains points to the possibility that these transcription factors could have evolved from transposases With at least two distinct recruitment into transcription factors. -In plants in one case -In the base of the fungal genomes in the other case

20 Phylogenetic distribution and domain architecture for the different families - II

21 Comparative genomics using the fungal genomes provides the clue for the evolution of these TFs -explain that there has been multiple transitions from transposase to TFs in the fungal genomes -explain how this could have happened by showing the snapshot of the breakup of selfish elements into two distinct products -explain that the transposase can itself regulate the gene expression of itself

22 Comparative genomics using the fungal genomes provides the clue for the evolution of these TFs -extensive recruitment of the transposase in the different fungal lineages -multiple jumps within the fungal lineage -very recent duplication event in the order Saccharomycetales suggest hubs could Evolve rapidly -Candida rbf1 and other TFs independently duplicated and evolved as global regulators

23 Since it happened in fungal genomes, we ask how does this behave in the plants. -show the gene expression patterns for the different subfamilies. We see two trends one where divergence has primarily occurred in the expression changes rather than in the protein sequence, and the other in which proteins with the same expression pattern have different binding site residues. -spatio-temporal changes in gene expression -It is experimentally well known that the FLYWCH and the GCM proteins are developmentally important regulatory proteins. So in three lineages there has been recruitment of the transposase into becoming a developmentally important global regulator. Analysis of the gene expression data in plants

24 There are interesting traces of gene expression pattern when we see for the different WRKY containing proteins. TPases are expressed in the root and in the pollen enhancing the possibility of rapidly expanding themselves during evolution.

25 Acknowledgements S Balaji Lakshminarayan Iyer Aravind group L Aravind

26 * Encephalitozoon cuniculi Dictyostelium discoideum Plants Giardia lamblia Ciliates Apicomplexa Fungi Caenorhabditis elegans Homo sapiens Drosophila melanogaster Classical WRKY HxC-type WRKY MULE transposase Animals Entamoeba histolytica Plant specific Zn-cluster SWIM domain POZ 1- 5 GLP_79_64671_67418_Glam_ ) GLP_9_36401_35940_Glam_ ) 101.t00020_Ehis_ dd_03024_Ddis_ ECU05_0180_Ecun_ mutA_Ylip_ TTR1_Atha_ WRKY41_Osat_ WRKY58_Atha_ At2g34830_Atha_ NtEIG-D48_Ntab_ FAR1_Atha_ AT4g19990_Atha_ LOC_Os11g31760_Osat_ At2g23500_Atha_ C26E6.2_Cele_ T24C4.2_Cele_ C20orf164_Hsap_ KIAA1552_Hsap_ hGCMa_Hsap_ mod(mdg4)_Dmel_ LOC411361_Amel_ CG13845_Dmel_ gcm_Dmel_ GCM-type WRKY Zinc knuckle BED finger * * Plant specific N-all-beta TIR domain LRR STAND ATPase FLYWCH-type WRKY Insert-containing WRKY C G HxC I F G F G F F F F C CC C C C C I I I I CHGG_08318_CGLO_ I I * Isochoris matase I AN6124.2_ANID_ I AT-hook HxC MtrDRAFT_AC146590g49v2_Mtru_ I AFT2_Scer_ HxC Afu2g08220_Afum_ I I OTU I I II YALI0C00781g_Ylip_ CHGG_00311_Cglo_ I I YALI0A02266g_Ylip_ * MtrDRAFT_AC126008g21v1_Mtru_ * * I UM _Umay_ * C C C HxC C Ci-ZF-1_Cint_ PHD finger C2H2 finger I F54C4.3_Cele_ I T24C4.7_Cele_ I Plant-specific mobile domain *

27 Expression profiles of WRKY-GCM1 domain proteins in Arabidopsis WRKY proteins show tissue specific expression WRKY proteins show light specific expression

28 Aft2pRcs1p Number of target genes regulated Aft2p Rcs1p Transcriptional network involving Aft2p and Rcs1p UM Umay CAGL0H03487G CGLA CAGL0G09042G CGLA CaO Calb DEHA0F25124g Dhan KLLA0D03256g Klac AFL087C AGOS ORFP Sklu Contig kluyveriKwal waltiiORFP Skud Contig kudriavzeiiORFP Scas Contig castelliRCS1 SCER ORFP 7853 mikataeORFP 8601 paradoxusORFP mikataeORFP Scas Contig castelliORFP paradoxusAFT2 SCER ORFP Skud Contig kudriavzeii Relationship between Rcs1p and Aft2p homologs * *

29 SequenceStructureExpressionInteraction Conclusion Integration of different types of experimental data allowed us to Identify the DNA binding domain in Rcs1


Download ppt "Integration of data to uncover evolutionary trends and infer protein function: The tale of Rcs1 Integration of data to uncover evolutionary trends and."

Similar presentations


Ads by Google