High Throughput Computing and Protein Structure Stephen E. Hamby.

Slides:



Advertisements
Similar presentations
(SubLoc) Support vector machine approach for protein subcelluar localization prediction (SubLoc) Kim Hye Jin Intelligent Multimedia Lab
Advertisements

Chapter 5 Multiple Linear Regression
CSC321: Introduction to Neural Networks and Machine Learning Lecture 24: Non-linear Support Vector Machines Geoffrey Hinton.
ECG Signal processing (2)
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
An Introduction of Support Vector Machine
11/9/99ICTAI-99, Chicago1 Protein Secondary Structure Prediction Using Data Mining Tool C5 Meiliu Lu †, Du Zhang †, Hongjun Xu †, Ken Tse-yau Lau ‡, and.
SVM—Support Vector Machines
CSCI 347 / CS 4206: Data Mining Module 07: Implementations Topic 03: Linear Models.
Protein Backbone Angle Prediction with Machine Learning Approaches by R Kang, C Leslie, & A Yang in Bioinformatics, 1 July 2004, vol 20 nbr 10 pp
Face Recognition & Biometric Systems Support Vector Machines (part 2)
Classification and Prediction: Regression Via Gradient Descent Optimization Bamshad Mobasher DePaul University.
Lecture 14 – Neural Networks
Structure Prediction. Tertiary protein structure: protein folding Three main approaches: [1] experimental determination (X-ray crystallography, NMR) [2]
Speeding up multi-task learning Phong T Pham. Multi-task learning  Combine data from various data sources  Potentially exploit the inter-relation between.
Structure Prediction. Tertiary protein structure: protein folding Three main approaches: [1] experimental determination (X-ray crystallography, NMR) [2]
Biological Data Mining A comparison of Neural Network and Symbolic Techniques
Mossbauer Spectroscopy in Biological Systems: Proceedings of a meeting held at Allerton House, Monticello, Illinois. Editors: J. T. P. DeBrunner and E.
Lasso regression. The Goals of Model Selection Model selection: Choosing the approximate best model by estimating the performance of various models Goals.
Protein Structures.
Jeff Howbert Introduction to Machine Learning Winter Machine Learning Feature Creation and Selection.
Template-based Prediction of Protein 8-state Secondary Structures June 12 th 2013 Ashraf Yaseen and Yaohang Li DEPARTMENT OF COMPUTER SCIENCE OLD DOMINION.
Attention Deficit Hyperactivity Disorder (ADHD) Student Classification Using Genetic Algorithm and Artificial Neural Network S. Yenaeng 1, S. Saelee 2.
Protein Tertiary Structure Prediction
Gaussian process modelling
Efficient Model Selection for Support Vector Machines
U NIVERSITY OF M ASSACHUSETTS A MHERST Department of Computer Science 2011 Predicting Solar Generation from Weather Forecasts Using Machine Learning Navin.
Prediction model building and feature selection with SVM in breast cancer diagnosis Cheng-Lung Huang, Hung-Chang Liao, Mu- Chen Chen Expert Systems with.
Protein Secondary Structure Prediction with inclusion of Hydrophobicity information Tzu-Cheng Chuang, Okan K. Ersoy and Saul B. Gelfand School of Electrical.
Outline 1-D regression Least-squares Regression Non-iterative Least-squares Regression Basis Functions Overfitting Validation 2.
Neural Networks for Protein Structure Prediction Brown, JMB 1999 CS 466 Saurabh Sinha.
Jeff Howbert Introduction to Machine Learning Winter Regression Linear Regression.
Kernel Methods A B M Shawkat Ali 1 2 Data Mining ¤ DM or KDD (Knowledge Discovery in Databases) Extracting previously unknown, valid, and actionable.
Applications of Neural Networks in Time-Series Analysis Adam Maus Computer Science Department Mentor: Doctor Sprott Physics Department.
Frontiers in the Convergence of Bioscience and Information Technologies 2007 Seyed Koosha Golmohammadi, Lukasz Kurgan, Brendan Crowley, and Marek Reformat.
Nonlinear Data Discrimination via Generalized Support Vector Machines David R. Musicant and Olvi L. Mangasarian University of Wisconsin - Madison
Kernel adaptive filtering Lecture slides for EEL6502 Spring 2011 Sohan Seth.
Meng-Han Yang September 9, 2009 A sequence-based hybrid predictor for identifying conformationally ambivalent regions in proteins.
RSVM: Reduced Support Vector Machines Y.-J. Lee & O. L. Mangasarian First SIAM International Conference on Data Mining Chicago, April 6, 2001 University.
Some Aspects of Bayesian Approach to Model Selection Vetrov Dmitry Dorodnicyn Computing Centre of RAS, Moscow.
Speech Communication Lab, State University of New York at Binghamton Dimensionality Reduction Methods for HMM Phonetic Recognition Hongbing Hu, Stephen.
LOGO iDNA-Prot|dis: Identifying DNA-Binding Proteins by Incorporating Amino Acid Distance- Pairs and Reduced Alphabet Profile into the General Pseudo Amino.
Reservoir Uncertainty Assessment Using Machine Learning Techniques Authors: Jincong He Department of Energy Resources Engineering AbstractIntroduction.
Feature Extraction Artificial Intelligence Research Laboratory Bioinformatics and Computational Biology Program Computational Intelligence, Learning, and.
Learning Chaotic Dynamics from Time Series Data A Recurrent Support Vector Machine Approach Vinay Varadan.
Machine Learning Methods of Protein Secondary Structure Prediction Presented by Chao Wang.
A Kernel Approach for Learning From Almost Orthogonal Pattern * CIS 525 Class Presentation Professor: Slobodan Vucetic Presenter: Yilian Qin * B. Scholkopf.
Support Vector Machines Optimization objective Machine Learning.
We propose an accurate potential which combines useful features HP, HH and PP interactions among the amino acids Sequence based accessibility obtained.
A distributed PSO – SVM hybrid system with feature selection and parameter optimization Cheng-Lung Huang & Jian-Fan Dun Soft Computing 2008.
Table 1. Advantages and Disadvantages of Traditional DM/ML Methods
CSE 4705 Artificial Intelligence
Estimating Link Signatures with Machine Learning Algorithms
Schizophrenia Classification Using
Predict House Sales Price
S519: Evaluation of Information Systems
Machine Learning Feature Creation and Selection
Extra Tree Classifier-WS3 Bagging Classifier-WS3
TEMPLATE-BASED METHODS FOR PROTEIN MODEL QA
Machine Learning Today: Reading: Maria Florina Balcan
Hyperparameters, bias-variance tradeoff, validation
Dynamic Authentication of Typing Patterns
Pattern Recognition and Machine Learning
Shih-Wei Lin, Kuo-Ching Ying, Shih-Chieh Chen, Zne-Jung Lee
Analysis for Predicting the Selling Price of Apartments Pratik Nikte
Topological Signatures For Fast Mobility Analysis
Linear Discrimination
Determining the Risk Level Regarding to the Positioning of an Exam Machine Used in the Nuclear Environment, based of polynomial regression Mihai OPROESCU1,
Support Vector Machines 2
Presentation transcript:

High Throughput Computing and Protein Structure Stephen E. Hamby

Overview Introduction To Protein Structure Dihedral Angles Previous Work Support Vector Regression Optimisation Prediction Results Conclusions

Introduction To Protein Structure Molecules with massive biological importance Structure determination gives insight into …. Function, Dynamics, Potential drug targets. Experimental structure determination is…. Expensive, Slow, Difficult

Introduction To Protein Structure Primary Structure: Order of Amino Acids Secondary Structure: Building blocks Tertiary Structure: Complete 3D Structure

Introduction To Protein Structure Secondary Structure Types α-helix β-sheet Random Coil

Dihedral Angles

Finding the secondary structure of a protein is a step towards finding its complete structure Predicting dihedral angles can help us to get the secondary structure How Can We Predict Dihedral Angles?

Previous work Destruct Multiple neural networks. Iterative method. Predicts secondary structure and dihedral angles.

Previous work Twin neural networks give a consensus prediction. Predicts dihedral angles from various amino acid properties amino acid composition and predicted structure. Real Spine

Support Vector Regression Kernel machine learning raises the data to a higher dimension so a linear relationship can be found.

Support Vector Regression Attempts to fit a linear function to the data in a high dimensional feature space Accurate but… Slow, needs optimisation, black box.

Support Vector Regression Kernel Choice We tested the various kernels available through the PyML package. These the are linear, polynomial, and gaussian kernels. We tested them using the CASP4 dataset. Gaussian kernel produced the best results.

Optimisation Three interdependent parameters Grid based optimisation on a the CASP4 dataset Around hour jobs. Run in blocks of 10 on Jupiter Accuracy assessed using the Pearson correlation coefficient

Prediction Support vector machine using a Gaussian kernel and optimal parameters. Training on the CB513 dataset. Tested by 10 fold cross validation CASP 4 used as a test set.

Results DestructReal SpineSVM Prediction Pearson Correlation Coefficient CASP4 Test set gives Pearson Correlation Coefficient of 0.56 Results measured by cross validation

Results Using Secondary structure predictions made by cascade correlation neural networks: Dihedrals assisted by predicted structure Pearson correlation coefficient Subsequent iterations should lead to better predictions of both structure and dihedral angles.

What Next? Using further iterations to improve accuracy. Current method is a black box. Can we use a program like Trepan to get some definite rules about secondary structure.

Conclusions Dihedral Angles define protein secondary structure Using Support Vector Machines it is possible to predict dihedral angles We (hopefully!) can use predicted dihedral angles to improve the accuracy of secondary structure prediction.

Acknowledgements Jonathan Hirst Hirst group members BBSRC The University of Nottingham