Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Computational Analysis of Protein-DNA Interactions Changhui (Charles) Yan Department of Computer Science Utah State University.

Similar presentations


Presentation on theme: "1 Computational Analysis of Protein-DNA Interactions Changhui (Charles) Yan Department of Computer Science Utah State University."— Presentation transcript:

1 1 Computational Analysis of Protein-DNA Interactions Changhui (Charles) Yan Department of Computer Science Utah State University

2 2 I Problem I Identifying amino acid residues involved in protein-DNA interactions from sequence

3 3 Materials And Methods 56 double-stranded DNA binding proteins previously used in the study of Jones et al. (2003) Encoding

4 4 Materials And Methods

5 5 Leave-one-out cross-validation Na ï ve Bayes Naïve Bayes Classifier

6 6 Na ï ve Bayes Naïve Bayes Classifier Leave-one-out cross-validation

7 7 Leave-One-Out Cross-Validations Sequence-basedSequence/structure-based Identities (ID) ID + entropyID + rASAID + rASA + entropy Correlation coefficient 0.250.290.280.30 Accuracy(%)77757677 Specificity+(%)37 3639 Sensitivity+(%)43535152

8 8 Pit-1, PDB 1au7 TP:30 FP: 16 TN: 86 FN:14 CC: 0.51 (2 nd ) Accuracy: 79% Predicted Actual Predictions in The Context of 3-D Structures

9 9 -Cro, PDB 6cro TP:10 FP: 5 TN: 34 FN:10 CC: 0.37 (19 th ) Accuracy: 73% PredictedActual

10 10 Predictions C With PROSITE Motifs Predictions Compared With PROSITE Motifs Predicted binding sites substantially overlap with 34 of the 37 “DNA-binding” PROSITE motifs In 52 of the 56 proteins, the predictor identifies at least 20% of the DNA-binding residues 28 of the 56 proteins contain no PROSITE motifs that are annotated as “DNA-binding”

11 11 Comparison With Previous Study MethodNaïve Bayes classifier Ahmad and Sarai method * C Correlation Coefficient0.260.23 Accuracy (%)8066 Specificity+(%)2921 Sensitivity+(%)4868 * Ahmad, S. and Sarai, A. (2005) PSSM-based prediction of DNA binding sites in proteins. BMC Bioinformatics, 6, 33.

12 12 Summary A simple sequence-based Naive Bayes classifier predicts interface residues in DNA-binding proteins with 75% accuracy, 37% specificity+, 53% sensitivity+ and correlation coefficient of 0.29 Predicted binding sites correctly indicate the locations of actual binding sites substantially overlap with known PROSITE motifs

13 13 Problem II Identification of Helix-Turn-Helix (HTH) DNA-binding motifs

14 14 HTH Motifs Sequences sharing low similarities can fold into a similar HTH structure Sequences sharing low similarities can fold into a similar HTH structure Identifying HTH motifs from sequence is extremely challenging Identifying HTH motifs from sequence is extremely challenging

15 15 Trick 1 Including more information Including more information Amino acid sequence Amino acid sequence Secondary structure Secondary structure

16 16 Hidden Markov Model (HMM) LQQITHIANQL-GLE----KDVVRVWF

17 17 Hidden Markov Model (HMM_AA_SS) LQQITHIANQL-GLE----KDVVRVWF HHHEEHEEEHMHE----HHEEMMEH

18 18 Trick 2 There are similarities among the 20 naturally occurred amino acids There are similarities among the 20 naturally occurred amino acids Reduced alphabets Reduced alphabets

19 19 Reduced Alphabets Schemes for reducing amino acid alphabet based on the BLOSUM50 matrix by Henikoff and Henikoff (1992) derived by grouping and averaging the similarity matrix elements as described in the text. (Murphy et al. 2000)

20 20 Cross-Families Evaluations True Positive 1 False Positive 2 HMM_AA30 HMM_AA_SS (20 letters) 3 2270 HMM_AA_SS (Murphy_15) 3 4740 HMM_AA_SS (Murphy_10) 3 4703 HMM_AA_SS (Murphy_8) 3 4315 1.True positive: HTH motifs that are correctly identified as such. 2.False positive: Non-HTH motifs that are identified as HTH motifs. 3.The alphabet used to encode amino acid sequences.

21 21 Questions

22 22 Within-family Three-Fold Cross-Validations. Family (number of HTH motifs in the family) HMM_AAHMM_AA_SS (Murphy_15) PF00126 (1635)15941622 PF00165 (90)6380 PF00196 (30)2630 PF04545 (164)137164 PF01022 (42)39 PF00046 (189)176188 PF03965 (48)48

23 23 Comparisons of HMM_AA_SS with FFAS03 in Cross-Family Evaluations Total HTH motifs Recognized by both FFAS03 and HMM_AA_SS Recognized by FFAS03 only Recognized by HMM_AA_SS only 5631352471

24 24 Putative HTH motifs in Ureaplasma parvum ProteinLocationAnnotation from Uniprot sp|Q9PQE5|SCPB_UREPA176-214Participates to chromosomal partition during cell division sp|Q9PQV6|RPOB_UREPA540-587DNA-directed RNA polymerase sp|Q9PR27|SYY_UREPA340-380Tyrosyl-tRNA synthetase sp|Q9PQC2|SYA_UREPA217-265Alanyl-tRNA synthetase sp|Q9PQ74|DPO3A_UREPA365-400DNA polymerase III subunit alpha sp|Q9PQX7|Y166_UREPA507-553Hypothetical protein


Download ppt "1 Computational Analysis of Protein-DNA Interactions Changhui (Charles) Yan Department of Computer Science Utah State University."

Similar presentations


Ads by Google