Presentation is loading. Please wait.

Presentation is loading. Please wait.

Frontiers in the Convergence of Bioscience and Information Technologies 2007 Seyed Koosha Golmohammadi, Lukasz Kurgan, Brendan Crowley, and Marek Reformat.

Similar presentations


Presentation on theme: "Frontiers in the Convergence of Bioscience and Information Technologies 2007 Seyed Koosha Golmohammadi, Lukasz Kurgan, Brendan Crowley, and Marek Reformat."— Presentation transcript:

1 Frontiers in the Convergence of Bioscience and Information Technologies 2007 Seyed Koosha Golmohammadi, Lukasz Kurgan, Brendan Crowley, and Marek Reformat University of Alberta Department of Electrical and Computer Engineering Classification of Cell Membrane Proteins This presentation and other related information are available at http://www.ualberta.ca/~golmoham/

2 Knowledge of cell membrane protein type is important –Critical for determining their function –Determining type of protein using traditional experimental methods is costly and time consuming Large and widening gap between known proteins (over 3.3 million) and annotated proteins Automatedaccurate Automated and accurate methods of classifying uncharacterized proteins are highly desirable Classification of Cell Membrane Proteins 1/10 Problem definition

3 Cell Membrane Proteins Classification of Cell Membrane Proteins 2/10

4 Methodology Classification of Cell Membrane Proteins 3/10

5 Datasets and test procedures Classification of Cell Membrane Proteins 5/10 Two datasets were used to design and test our system. These standard benchmark datasets allow for a fair comparison with other methods –2059 proteins were used to design the prediction system Chou, Prediction of protein cellular attributes using pseudo amino acid composition, Proteins (2001) 43:246-55 –2625 proteins were used for an independent test Chou and Elrod, Prediction of membrane protein types and subcellular locations, Proteins (1999) 34:137-53 Three test methods were used for evaluation of the performance of proposed prediction system –in-sample resubstitution (self-consistency) on the design dataset –out-of-sample jackknife (leave-one-out) on the design dataset –out-of-sample test on the independent dataset

6 Feature-based sequence representation Classification of Cell Membrane Proteins 4/10

7 Applying different classifiers to feature-based representation of proteins Classification of Cell Membrane Proteins 6/10 Decision Tree with Naive Bayes at the leaves K* -nearest neighbor Support Vector Machine with polynomial kernel K-nearest neighbor Neural Network with back propagation training 9 classifiers with the highest total accuracy

8 Our method results in a glance Classification of Cell Membrane Proteins 7/10 Test method Self- consistency JackknifeIndependent Accuracy [%]Overall99.986.997.1 Type I10083.5 96.4 80.6 Type II10052.6 Multipass10095.8 99.0 78.6 Lipid10045.1 96.5 GPI99.161.5 Specificity [%] Type I10094.7 99.2 99.8 Type II10098.3 93.9 99.9 Multipass99.983.4 Lipid10099.9 99.8 GPI10098.7

9 Our method outperforms existing methods Classification of Cell Membrane Proteins 8/10 ClassifierReference Test method Self- consistency Jack-knifeIndependent K* This paper 99.986.997.7 Ensemble of NNs Shen and Chou 2007 not available85.896.8 Fuzzy KNN Shen and Chou 2006 not available85.695.7 Stacking Wang et al. 2006 98.785.494.3 OET-KNN Shen et al. 2006 99.584.794.2 Weighted SVM Wang et al. 2004 99.982.490.3 SLLE Wang et al. 2005 not available82.395.7 Augmented covariant discriminant Chou 2001 90.980.987.5 SVM Cai et al. 2004 not available80.485.4

10 Conclusions Classification of Cell Membrane Proteins 9/10 The proposed method outperforms existing methods –higher accuracy in both jackknife and independent dataset tests The improved prediction quality of our method is a result of applying a comprehensive feature-based sequence representation –existing methods use either composition or pseudo amino acid composition for protein representation. –in contrast, our method uses seven feature sets for the same task –there might be other features that are not tested in this study and could further improve the prediction accuracy


Download ppt "Frontiers in the Convergence of Bioscience and Information Technologies 2007 Seyed Koosha Golmohammadi, Lukasz Kurgan, Brendan Crowley, and Marek Reformat."

Similar presentations


Ads by Google