Presentation is loading. Please wait.

Presentation is loading. Please wait.

Computational analysis of membrane proteins implicated in metal transport in Arabidopsis thaliana Stefanie Hartmann Max Planck Institute for Molecular.

Similar presentations


Presentation on theme: "Computational analysis of membrane proteins implicated in metal transport in Arabidopsis thaliana Stefanie Hartmann Max Planck Institute for Molecular."— Presentation transcript:

1 Computational analysis of membrane proteins implicated in metal transport in Arabidopsis thaliana Stefanie Hartmann Max Planck Institute for Molecular Plant Physiology Supervisors: Joachim Selbig, Ute Krämer CIAVVLCLVFMSVEVVGGIKANSLAILTDAAHLLSDVAAFAISLFSLWAAGWEATPRQTYGFFRIEILGALVSIQLI WLLT ALFLLINTAYMVVEFVAGFMSNSLGLISDACHMLFDCAALAIGLYASYISRLPANHQYNYGRGRFEVLSGYVNAV FLVLVG CFVVVLCLLFMSIEVVCGIKANSLAILADAAHLLTDVGAFAISMLSLWASSWEANPRQSYGFFRIEILGTLVSIQLI WLLT LIAVLLCAIFIVVEVVGGIKANSLAILTDAAHLLSDVAAFAISLFSLWASGWKANPQQSYGFFRIEILGALVSIQMIW LLA --- IFLYLIVMSVQIVGGFKANSLAVMTDAAHLLSDVAGLCVSLLAIKVSSWEANPRNSFGFKRLEVLAAFLSVQLIWL VS

2

3 12 membrane proteins involved in metal transport in Arabidopsis

4 Metal transporters are of great importance because… …they provide an adequate supply of essential trace metals …they prevent an excess of these potentially toxic ions in silico analyses may help design further experiments on basic research on metal homeostasis development of new ways of phytoremediation

5 Cation Diffusion Facilitator (CDF) proteins also referred to as cation efflux (CE) proteins occur in archaea, bacteria, eukaryotes are involved in transporting heavy metals (Co 2+, Cd 2+, Zn 2+, Ni 2+ ) the CDF family of proteins had 13 members in 1997 the CE Pfam family today has 348 members (July 2003) CDF signature sequence: S X (ASG) (LIVMT) 2 (SAT) (DA) (SGAL) (LIVFYA) (HDN) X 3 D X 2 (AS) 426 (Jan 2004)

6 CDF1: At2g46800 S LAILTDAAHLLS D VAA CDF2: At3g61940 S LAILADAAHLLT D VGA exact match CDF3: At3g58810 S LAILTDAAHLLS D VAA CDF4: At2g29410 S LAVMTDAAHLLS D VAG CDF5: At2g04620 S LGLISDACHMLF D CAA 1 mismatch CDF6: At2g47830 S TAIIADAAHSVS D VVL CDF7: At2g39450 S LAIIASTLDSLL D LLS CDF8: At1g16310 S MAVIASTLDSLL D LLS 2 mismatches CDF9: At1g79520 S MAVIASTLDSLL D LLS CDF10: At3g58060 S IAIAASTLDSLL D LMA CDF11: At3g12100 R VGLVSDAFHLTF G CGL CDF12: At1g51610 S HVIMAEVVHSVA D FAN 4 mismatches The Arabidopsis thaliana CDF protein family 3 mismatches

7 Research questions: Can all 12 proteins be classified as CDF proteins? i.e., are there predicted structural and functional similarities of these 12 Arabidopsis proteins? secondary structure prediction, inclusion in membrane- and transporter databases, evaluation of common motifs, etc

8 Research questions: Can all 12 proteins be classified as CDF proteins? i.e., are there predicted structural and functional similarities of these 12 Arabidopsis proteins? What are the relationships of the 12 Arabidopsis proteins among each other and to other published sequences? secondary structure prediction, inclusion in membrane- and transporter databases, evaluation of common motifs, etc intron/exon structure, phylogenetic reconstructions

9 Research questions: Can all 12 proteins be classified as CDF proteins? i.e., are there predicted structural and functional similarities of these 12 Arabidopsis proteins? What are the relationships of the 12 Arabidopsis proteins among each other and to other published sequences? Is it possible to predict the 3D structure of these proteins? secondary structure prediction, inclusion in membrane- and transporter databases, evaluation of common motifs, etc intron/exon structure, phylogenetic reconstructions fold recognition by threading

10 Sequence retrieval - four ambiguous sequences  TIGR Arabidopsis thaliana database  TAIR: The Arabidopsis Information Resource  MIPS Arabidopsis thaliana genome database different assignment of introns, use of alternative start codons Sequence analysis - three additional ambiguous sequences  SWALL  Pfam vs. TIGR/TAIR/MIPS insertions and deletions, different amino acid sequence Cloning and RT-PCR revealed correct sequences for six of the seven ambiguous CDFs

11 Inclusion in membrane and transport databases cation efflux, Pfam entry PF01545 Arabidopsis Membrane Protein Library (AMPL) ARAMEMNON Transport Protein Database PlantsT CDF1  CDF2  CDF3  CDF4  CDF5  ()()  - CDF6  CDF7  - CDF8  -- CDF9  -- CDF10  -  CDF11  CDF12  - 

12 Inclusion in membrane and transport databases cation efflux, Pfam entry PF01545 Arabidopsis Membrane Protein Library (AMPL) ARAMEMNON Transport Protein Database PlantsT CDF1  CDF2  CDF3  CDF4  CDF5  ()()()()  - CDF6  ()()()()  CDF7  ()()()()  - CDF8  ()()()() -- CDF9  ()()()() -- CDF10  ()()  -  CDF11  ()()  CDF12  ()()  - 

13 Inclusion in membrane and transport databases cation efflux, Pfam entry PF01545 Arabidopsis Membrane Protein Library (AMPL) ARAMEMNON Transport Protein Database PlantsT CDF1  CDF2  CDF3  CDF4  CDF5  ()()()()  – CDF6  ()()()()  CDF7  ()()()()  – CDF8  ()()()() –– CDF9  ()()()() –– CDF10  ()()  –  CDF11  ()()  CDF12  ()()  – 

14 Hidden Markov models used for secondary structure prediction states (loops, transmembrane domains, etc) are defined states are connected in a biologically reasonable way (transitions) each state has a specific probability distribution over the 20 amino acids each transition has a specific transition probability amino acid probabilities and transition probabilities are learned models are first taught using a training set, the trained model is then used for the prediction membranecytoplasmic sidenon-cytoplasmic side

15 number of TMD N-terminus within cytoplasm CDF162 / 3 CDF263 / 3 CDF362 / 3 CDF45-62 / 3 CDF563 / 3 CDF60-61 / 3 CDF74-62 / 3 CDF85-63 / 3 CDF95-63 / 3 CDF104-62 / 3 CDF1163 / 3 CDF124-63 / 3 Results of secondary structure predictions TMHMM v2(Tusnady and Simon, 1998, 2001) HMMTOP v2(Sonnhammer et al. 1998) Memsat2 (Jones et al. 1994, McGuffin et al. 2000) (14)

16 number of TMD N-terminus within cytoplasm CDF162 / 3 CDF263 / 3 CDF362 / 3 CDF45-62 / 3 CDF563 / 3 CDF60-61 / 3 CDF74-62 / 3 CDF85-63 / 3 CDF95-63 / 3 CDF104-62 / 3 CDF1163 / 3 CDF124-63 / 3 Results of secondary structure predictions TMHMM v2(Tusnady and Simon, 1998, 2001) HMMTOP v2(Sonnhammer et al. 1998) Memsat2 (Jones et al. 1994, McGuffin et al. 2000) (14)

17 CDF signature CE signature

18 Prediction of subcellular localization mTP: mitochondrialcTP: chloroplast SP: signal peptide targeting peptide transit peptide(ER/secretory pathway)

19 Prediction of subcellular localization - methods N-terminal sorting signals display characteristic amino acid compositions sequence-based methods predicting N-terminal sorting signals are based on this observation mTP: mitochondrialcTP: chloroplast SP: signal peptide targeting peptide transit peptide(ER/secretory pathway)  TargetP mTP, cTP, SPneural network-based  iPSORT mTP, cTP, SPdecision list  Predotar mTP, cTPneural network-based  SignalP NN  SignalP HMM SP neural network-based based on hidden Markov models

20 TargetPiPSORT Predotar SignalP NN HMM CDF1 CDF23/4 CDF3 CDF4 CDF5cTP 1/4 CDF6mTPcTPmTP CDF7 CDF8cTP*mTP*2/4*Y* CDF9 CDF10 CDF11 CDF12mTP Prediction of subcellular localization - results mTP: mitochondrialcTP: chloroplast SP: signal peptide targeting peptide transit peptide(ER/secretory pathway)

21 Exon structure of the CDF proteins # of exons 1 9 12 13 6 7 5

22 Gene organization of the CDF proteins CDF1 CDF2 CDF3 CDF4 CDF5 CDF11 CDF6 CDF12 CDF7 CDF8 CDF9 CDF10

23 Phylogenetic Relationships within Cation Transporter Families of Arabidopsis Plant Physiology 2001; 126 (4): 1646–1667 CDF4 CDF3 CDF2 CDF1 CDF12 CDF10 CDF11 CDF6 omitted:CDFs 5, 7, 8, 9

24 Phylogenetic analysis of the Arabidopsis CDF proteins

25 Phylogenetic analysis of sequences containing the CE signature Arabidopsis group I sequences, monocot and dicot sequences, mammalian metal transporters Arabidopsis group II sequences, monocot and dicot sequences, prokaryotic and eukaryotic seqs several two-domain proteins outgroup

26 N C working model: topology of Arabidopsis CDF proteins CDF signature sequence cytoplasm cell exterior/organelle

27 Information derived from the 3D structure of a protein assignment of function guide mutagenesis- experiments ligand and functional sites evolutionary relationships residue solvent exposure putative interaction sites

28 Structure determination 1.Classical approaches 2.Computational approaches X-ray crystallography NMR spectroscopy comparative (“homology”) modeling fold recognition (“threading”) ab initio methods

29 The number of folds occurring in nature is limited: There are many sequences with no significant sequence identity but with the same or similar folds The basis of fold recognition (“threading”) …HEAIDHKPKLTGMKTGRVVSSMKSNFFADLP… …HDGRSSMTRFSRYFRKTGRVSEYYKKQERLLE… PDB statistics: http://www.rcsb.org/pdb/holdings.html

30 Fold recognition methods aim: to find an optimal sequence-structure alignment 1.“threading” of an unknown target sequence into the backbone structure of template proteins of known structure ………CLVFMSVEVVGGIKANSLAILTD………

31 4.99 Å Fold recognition methods 2. evaluation of the compatibility between target sequence and proposed 3D structure using environment-based mean force potentials or using knowledge-based mean force potentials 3.Output: a list of folds (sorted or unsorted), their “compatibility score”, sometimes other information such as SCOP descriptors, alignment, rudimentary 3D model of the query protein, raw scores, solvation energy for the model, links

32 No new insights regarding the structure of CDF proteins Membrane proteins are significantly under-represented in structural databases – and therefore also in fold libraries If there is no fold similar to the native fold of the target protein, this approach cannot succed. Threading methods cannot be used for modeling of transmembrane proteins

33 Will the 3D structure of CDFs be available soon? for fold recognition methods to be used successfully: significantly more 3D structures of membrane proteins are needed fold recognition methods specifically for integral membrane proteins may eventually be developed cyrystallization of bacterial homologs and subsequent extraploation of structural features as an alternative? approach for globular proteins: predicting a protein’s solubility and propensity to crystallize, based on results from high-throughput structure determination

34 Can threading results be used as an independent way to verify group assignment? Were some structural hits specific for any of the CDF groups? 1.Which hits were common to 2. “Phylothreading” which of the CDF sequences? 1 2 3 4 5 1 2 3 4 5

35 Can threading results be used as an independent way to verify group assignment? Were some structural hits specific for any of the CDF groups? 1.Which hits were common to 2. “Phylothreading” which of the CDF sequences? 1 2 3 4 5 1 2 3 4 5

36 Which hits were common to which of the CDF sequences? Structural hits predicted for most CDF sequences for group I sequences for group II sequences for CDF5 and CDF11 for CDF6 and CDF12 Results were unable to provide evidence to verify group assignments based on other methods 1… …170 1 2 … 11 12

37 “Phylothreading” Phylothreading results can neither verify nor refute group assignments based on other methods

38 N C cytoplasm cell exterior/organelle Threading: non-transmembrane CDF fragments N-terminus histidine-rich loop between TMD 4 and 5 C-terminus

39 “Phylothreading”: CDF C-terminal fragments “phylothreading” results confirm the assignment of CDF sequences to groups that were based on independent methods

40 Conclusions The 12 Arabidopsis protein sequences reveal structural and therefore probably functional conservation My results support the classification of these proteins as CDF metal transporters I propose that the CDF protein family of A. thaliana contains two groups, each containing at least four proteins that are structurally and functionally closely related Threading methods cannot be used for transmembrane proteins or for their non-transmembrane domains (yet) Threading results for multiple sequences may be used to confirm (or find?) relationships among these sequences (“phylothreading”) I was able to evaluate and compare a number of online tools that are available for the analysis of sequence data

41 Conclusions 1. Sequence retrieval revealed conflicting information for 7 of the 12 proteins 2. The 12 Arabidopsis protein sequences reveal striking structural and therefore probably functional conservation 3. My results support the classification of these proteins as CDF metal transporters 4. I propose that the CDF protein family of A. thaliana contains two groups, each containing four proteins that are structurally and functionally closely related 5. I was able to evaluate and compare a variety of online tools available for the analysis of sequence data

42 Conclusions 1. Sequence retrieval revealed conflicting information for 7 of the 12 proteins 2. The 12 Arabidopsis protein sequences reveal striking structural and therefore probably functional conservation 3. My results support the classification of these proteins as CDF metal transporters 4. I propose that the CDF protein family of A. thaliana contains two groups, each containing four proteins that are structurally and functionally closely related 5.I was able to evaluate and compare a variety of online tools available for the analysis of sequence data 6. Threading methods cannot be used for transmembrane proteins or for their non-transmembrane domains (yet) 7. Threading results for multiple sequences can be used to confirm (or find?) relationships among these sequences (“phylothreading”)

43

44

45 METHODS

46 Phylogenetic analysis: tree-building methods distance-based methods overall distance between all pairs of sequences are calculated and then used to calculate a tree (Neighbor Joining) character-based methods the individual substitutions among the sequences are used to determine the most likely ancestral relationships (Maximum Parsimony, Maximum Likelihood) Bayesian inference of phylogenies...CLVFMSVEVVGGIKANSLAILTD......NTAYMVVEFVAGFMSNSLGLISD......CLLFMSIEVVCGIKANSLAILAD......CAIFIVVEVVGGIKANSLAILTD......YLIVMSVQIVGGFKANSLAVMTD...

47 Phylogenetic analysis: statistical evaluation of trees bootstrap analysis how much support exists for particular branches in a phylogeny? 1.tree construction, determination of the “best” tree 2.bootstrap datasets (pseudosamples) are created from the original dataset by random sampling with replacement 3.tree construction using the bootstrap datasets 4.comparison of the bootstrap tree with the inferred tree 5.this is repeated several hundred times 6.bootstrap value: percentage of times an interior branch in the bootstrap tree was the same as the one in the inferred tree...CLVFMSVEVVGGIKANSLAILTD......NTAYMVVEFVAGFMSNSLGLISD......CLLFMSIEVVCGIKANSLAILAD......CAIFIVVEVVGGIKANSLAILTD......YLIVMSVQIVGGFKANSLAVMTD...

48 2. evaluation of the compatibility between target sequence and proposed 3D structure Fold recognition methods using environment-based mean force potentials (Bowie, Fischer, Eisenberg: 1991-1996) - residue positions are categorized into environment classes - the 3D protein structure is converted into a 1D sequence - generate alignment of this 1D string to target sequence using knowledge-based mean force potentials (Sippl: 1990-1995) - information is automatically learned from databases of protein structures - pairwise interactions between structurally adjacent residues are calculated - transformation of mean force potentials as a function of distance

49

50 Fold recognition methods aim: to find an optimal sequence-structure alignment 1.“threading” of an unknown target sequence into the backbone structure of template proteins of known structure ………CLVFMSVEVVGGIKANSLAILTD……… query sequence fold library

51 4.99 Å Fold recognition methods 2. evaluation of the compatibility between target sequence and proposed 3D structure using environment-based mean force potentials or using knowledge-based mean force potentials

52 4.99 Å 2. evaluation of the compatibility between target sequence and proposed 3D structure using environment-based mean force potentials* or using knowledge-based mean force potentials* Fold recognition methods * distant-dependent forces that act between atoms/residues (electrostatic and van der Waals interactions, influences on the surrounding medium on these interactions, contacts between two or three amino acids, angles between residue pairs, …)

53 4.99 Å Fold recognition methods 2. evaluation of the compatibility between target sequence and proposed 3D structure using environment-based mean force potentials or using knowledge-based mean force potentials 3.Output: a list of folds (sorted or unsorted), their “compatibility score”, sometimes other information such as SCOP descriptors, alignment, rudimentary 3D model of the query protein, raw scores, solvation energy for the model, links

54 Threading methods used UCLA-DOE Fold Server P. Mallick et al., 2002 (BLAST, PSI-BLAST, SDP, DASEY) Threader D.T. Jones et al., 1992 mGenThreader L.J. McGuffin & D.T. Jones 2003 3D-PSSM L.A. Kelley et al., 2000 Arby I. Sommer et al., unpublished (PSI-BLAST, 123D, Jprop)

55 top 10 structural hits are returned, all were kept compatibility of target sequence and all 2000 available templates is evaluated; lists were sorted by Z-value, approximately 10-20 best hits were kept top 20 structural hits are returned, all were kept a list of the 10-20 best scores is returned; the corresponding hits were extracted from a large table UCLA-DOE: Threader: mGenThreader: 3D-PSSM: Arby: Selection of structural hits for further analysis

56 Evaluation of the top score for each CDF sequence UCLA 0 1 2 3 4 very poor score poor score borderline significant significant very significant Threader scores: no guidelines 350 300 250 200 150 100 50 0 highly confident worthy of attention 2.5 2.0 1.5 1.0 0.5 0.0 1.0 0.8 0.6 0.4 0.2 0.0 guess low confidence medium confidence high confidence certain mGen- Threader 3D-PSSM

57 There is no consensus of top fold predicted by different methods example: top two structural hits for CDF1 Threader:1ONEphosphopyruvate hydrolase 1C3Qthiazole kinase mGenThreader:1L8Mhis-rich protein (model) 1QGRimportin beta UCLA-DOE:1B8Fhistidine ammonia-lyase 1HFAclathrin assembly protein 3D-PSSM:1PW4glycerol-3-phosphate transporter 1KPWgreen cone pigment Arby:1HZXbovine rhodopsin 1EZVyeast cytochrome bc1

58 No new insights regarding the structure of CDF proteins Membrane proteins are significantly under-represented in structural databases – and therefore also in fold libraries If there is no fold similar to the native fold of the target protein, this approach cannot succed. Threading methods cannot be used for modeling approaches

59

60 Threading results: C-termini 1. Structural information no information of domains for metal transport available. BUT: several of the returned hits are proteins in which bound metals have structural or catalytic roles 2. Verification of group assignment i. Hits predicted for more than one C-terminus:48 folds specific for group I: 3 specific for group II: 2 specific for CDF5 and CDF11: 2 ii. “Phylothreading”

61 IIIIIIIVVVI TMD Pfam CE signature CDF signature BLOCKS (eMOTIF) 1 2 12 3 7 4 5 6 9 8 10 11 Positions of conserved domains and signature sequences 6-12 11, 12 10, 11

62 Arabidopsis CDF proteins group I: - contain his-rich region between TMD 4 and 5 - one member is confirmed to transport Zn ions - genome structure conserved (no introns) group II: - lack the his-rich region between TMD 4 and 5 - proteins may transport Mn ions - C-terminal regions differ from group I sequences no group assignment: - CDF6, CDF12: possibly distant common ancestry and mitochondrial localization - CDF5, CDF11: close relationship also in PFAM tree

63 N C working model: topology of Arabidopsis CDF proteins CDF signature sequence cytoplasm cell exterior/organelle

64

65 Gene organization of the CDF proteins

66 Phylogenetic analysis of sequences containing the CE signature

67 Phylogenetic analysis: tree-building methods maximum parsimony methods the best tree topology minimizes the total amount of evolutionary change that has occurred distance methods the best tree topology minimizes the the total distance among taxa maximum likelihood methods given a particular substitution model and given a particular tree, how likely is the observed data?...CLVFMSVEVVGGIKANSLAILTD......NTAYMVVEFVAGFMSNSLGLISD......CLLFMSIEVVCGIKANSLAILAD......CAIFIVVEVVGGIKANSLAILTD......YLIVMSVQIVGGFKANSLAVMTD...

68 Inclusion in membrane and transport databases cation efflux, Pfam entry PF01545 Arabidopsis Membrane Protein Library (AMPL) ARAMEMNON Transport Protein Database PlantsT CDF1  CDF zinc transporter CDF CDF2  CDF putative MTP CDF CDF3  CDF putative MTP CDF CDF4  CDF putative MTP CDF CDF5  singleton (CDF related) putative cation transporter CDF- CDF6  singleton unknown protein CDF CDF7  family unknown protein CDF- CDF8  family hypothetical protein -- CDF9  family unknown protein -- CDF10  family putative MTP -CDF CDF11  singleton putative MTP CDF CDF12  singleton putative MTP -CDF

69 Inclusion in membrane and transport databases cation efflux, Pfam entry PF01545 Arabidopsis Membrane Protein Library (AMPL) ARAMEMNON Transport Protein Database PlantsT CDF1  CDF zinc transporter CDF CDF2  CDF putative MTP CDF CDF3  CDF putative MTP CDF CDF4  CDF putative MTP CDF CDF5  singleton (CDF related) putative cation transporter CDF- CDF6  singleton unknown protein CDF CDF7  family unknown protein CDF- CDF8  family hypothetical protein -- CDF9  family unknown protein -- CDF10  family putative MTP -CDF CDF11  singleton putative MTP CDF CDF12  singleton putative MTP -CDF

70 Inclusion in membrane and transport databases cation efflux, Pfam entry PF01545 Arabidopsis Membrane Protein Library (AMPL) ARAMEMNON Transport Protein Database PlantsT CDF1  CDF zinc transporter CDF CDF2  CDF putative MTP CDF CDF3  CDF putative MTP CDF CDF4  CDF putative MTP CDF CDF5  singleton (CDF related) putative cation transporter CDF- CDF6  singleton unknown protein CDF CDF7  family unknown protein CDF- CDF8  family hypothetical protein -- CDF9  family unknown protein -- CDF10  family putative MTP -CDF CDF11  singleton putative MTP CDF CDF12  singleton putative MTP -CDF

71 Inclusion in membrane and transport databases cation efflux, Pfam entry PF01545 Arabidopsis Membrane Protein Library (AMPL) ARAMEMNON Transport Protein Database PlantsT CDF1  CDF zinc transporter CDF CDF2  CDF putative MTP CDF CDF3  CDF putative MTP CDF CDF4  CDF putative MTP CDF CDF5  singleton (CDF related) putative cation transporter CDF- CDF6  singleton unknown protein CDF CDF7  family unknown protein CDF- CDF8  family hypothetical protein -- CDF9  family unknown protein -- CDF10  family putative MTP -CDF CDF11  singleton putative MTP CDF CDF12  singleton putative MTP -CDF


Download ppt "Computational analysis of membrane proteins implicated in metal transport in Arabidopsis thaliana Stefanie Hartmann Max Planck Institute for Molecular."

Similar presentations


Ads by Google