Presentation is loading. Please wait.

Presentation is loading. Please wait.

 Developed Struct-SVM classifier that takes into account domain knowledge to improve identification of protein-RNA interface residues  Results show that.

Similar presentations


Presentation on theme: " Developed Struct-SVM classifier that takes into account domain knowledge to improve identification of protein-RNA interface residues  Results show that."— Presentation transcript:

1  Developed Struct-SVM classifier that takes into account domain knowledge to improve identification of protein-RNA interface residues  Results show that the ROC curve of Struct-SVM dominates the ROC curve of Support Vector Machine (SVM) classifier Artificial Intelligence Research Laboratory Bioinformatics and Computational Biology Program Computational Intelligence, Learning, and Discovery Program Department of Computer Science Predicting Protein-RNA Binding Sites Using Structural Information Cornelia Caragea, Michael Terribilini, Jivko Sinapov, Jae-Hyung Lee, Fadi Towfic, Drena Dobbs and Vasant Honavar Introduction Dataset Fig. 1. Receiver Operaring Characteristi (ROC) Curves for SVM and Struct-SVM classifiers on the protein-RNA dataset RNA molecules play diverse functional and structural roles in cells: Learning System L X test,j = surface Training Data Collection of Surface Windows Collection of Non-Surface Windows Test Data Final Predictions Resulting Classifier X test,j yes no h(x test,j )=yh(x test,j )=-1 Seq2SeqWins SeqWins2TargetAA Sequence: Label: windowise x i =(x i,1,…,x i,j-k,…,x i,j,…,x i,j+k,…,x i,m ) y i =(y i,1,…,y i,j-k,…,y i,j,…,y i,j+k,…,y i,m ) Seq2SeqWins SeqWins2TargetAA SeqWins2ZeroOne SeqWins2Blast SeqWins2SS SS2ZeroOne TargetAA2Struct Struct2Blast SeqWins2CXValue SeqWins2Roughness x’ i,j-1 =(x i,j-1-k,…,x i,j-1,…,x i,j-1+k ) x’ i,j =(x i,j-k,…,x i,j,…,x i,j+k ) x’ i,j+1 =(x i,j+1-k,…,x i,j+1,…,x i,j+1+k ) … … x’ i,j-1 =(x i,j-1 ) x’ i,j =(x i,j ) x’ i,j+1 =(x i,j+1 ) … … 1T0K_B SINQKLALVIKSGK YTLGYKSTVKSLRQ GKSKLIIIAANTPV LRKSELEYYAMLSK TKVYYFQGGNNELG TAVGKLFRVGVVSI LEAGDSDILTTLA ANTPVLRKS 001100100 xixi y i {0,1}*  messengers for transferring genetic information from DNA to proteins  primary genetic material in many viruses  enzymes important for protein synthesis and RNA processing  essential and ubiquitous regulators of gene expression in living organisms These functions depend on interactions between RNA molecules and specific proteins in cells. Protein-RNA interface residue identification  RNA-Protein Interface dataset, RB181: consists of RNA-binding protein sequences extracted from structures of known RNA-protein complexes solved by X-ray crystallography in the Protein Data Bank Feature Extraction Struct-SVM Classifier A machine learning classifier that incorporates domain knowledge to improve classification (that is, the structure of the protein) Results Classifier/ PerfMeasure SVMStruct-SVM Accuracy 0.680.74 Correlation Coefficient 0.250.30 Area Under ROC Curve 0.730.76 Table 1. Accuracy, Correlation Coefficient and Area Under the ROC Curves for SVM and Struct- SVM Conclusions References [1] Chen, Y., Varani, G. (2005). Protein families and RNA recognition. Febs J 272:2088-2097. [2] Burges, C. J. C. (1998) A tutorial on support vector machines for pattern recognition. Data Mining and Knowledge Discovery, 2:121–167, 1998 [3] Towfic, F., Caragea, C., Dobbs, D., and Honavar, V. (2008). Struct-NB: Predicting protein-RNA binding sites using structural features. International Journal of Data Mining and Bioinformatics, In press. Acknowledgements: This work is supported in part by a grant from the National Institutes of Health (GM 066387) to Vasant Honavar & Drena Dobbs


Download ppt " Developed Struct-SVM classifier that takes into account domain knowledge to improve identification of protein-RNA interface residues  Results show that."

Similar presentations


Ads by Google