Protein Prediction with Neural Networks! Chris Alvino CS152 Fall ’06 Prof. Keller.

Slides:



Advertisements
Similar presentations
Lecture 16 Hidden Markov Models. HMM Until now we only considered IID data. Some data are of sequential nature, i.e. have correlations have time. Example:
Advertisements

Blast to Psi-Blast Blast makes use of Scoring Matrix derived from large number of proteins. What if you want to find homologs based upon a specific gene.
HIDDEN MARKOV MODELS IN COMPUTATIONAL BIOLOGY CS 594: An Introduction to Computational Molecular Biology BY Shalini Venkataraman Vidhya Gunaseelan.
Protein Backbone Angle Prediction with Machine Learning Approaches by R Kang, C Leslie, & A Yang in Bioinformatics, 1 July 2004, vol 20 nbr 10 pp
Profile Hidden Markov Models Bioinformatics Fall-2004 Dr Webb Miller and Dr Claude Depamphilis Dhiraj Joshi Department of Computer Science and Engineering.
1 Profile Hidden Markov Models For Protein Structure Prediction Colin Cherry
Hidden Markov Models: Applications in Bioinformatics Gleb Haynatzki, Ph.D. Creighton University March 31, 2003.
Profiles for Sequences
Foundations of Statistical NLP Chapter 9. Markov Models 한 기 덕한 기 덕.
Hidden Markov Models (HMMs) Steven Salzberg CMSC 828H, Univ. of Maryland Fall 2010.
A Hidden Markov Model for Protein Secondary Structure Prediction
ECE 8527 Homework Final: Common Evaluations By Andrew Powell.
درس بیوانفورماتیک December 2013 مدل ‌ مخفی مارکوف و تعمیم ‌ های آن به نام خدا.
Prediction to Protein Structure Fall 2005 CSC 487/687 Computing for Bioinformatics.
PROTEIN SECONDARY STRUCTURE PREDICTION WITH NEURAL NETWORKS.
Structure Prediction. Tertiary protein structure: protein folding Three main approaches: [1] experimental determination (X-ray crystallography, NMR) [2]
Chapter 9 Structure Prediction. Motivation Given a protein, can you predict molecular structure Want to avoid repeated x-ray crystallography, but want.
Profile-profile alignment using hidden Markov models Wing Wong.
Biological inspiration Animals are able to react adaptively to changes in their external and internal environment, and they use their nervous system to.
Progressive MSA Do pair-wise alignment Develop an evolutionary tree Most closely related sequences are then aligned, then more distant are added. Genetic.
Structure Prediction. Tertiary protein structure: protein folding Three main approaches: [1] experimental determination (X-ray crystallography, NMR) [2]
Artificial Neural Networks Thomas Nordahl Petersen & Morten Nielsen.
Lecture 9 Hidden Markov Models BioE 480 Sept 21, 2004.
Hidden Markov Models Usman Roshan BNFO 601. Hidden Markov Models Alphabet of symbols: Set of states that emit symbols from the alphabet: Set of probabilities.
CISC667, F05, Lec20, Liao1 CISC 467/667 Intro to Bioinformatics (Fall 2005) Protein Structure Prediction Protein Secondary Structure.
Efficient Estimation of Emission Probabilities in profile HMM By Virpi Ahola et al Reviewed By Alok Datar.
Structure Prediction in 1D
Biological sequence analysis and information processing by artificial neural networks.
Introduction to Bioinformatics - Tutorial no. 8 Predicting protein structure PSI-BLAST.
Detecting the Domain Structure of Proteins from Sequence Information Niranjan Nagarajan and Golan Yona Department of Computer Science Cornell University.
Artificial Neural Networks for Secondary Structure Prediction CSC391/691 Bioinformatics Spring 2004 Fetrow/Burg/Miller (slides by J. Burg)
Rising accuracy of protein secondary structure prediction Burkhard Rost
Proteins Secondary Structure Predictions Structural Bioinformatics.
Prediction to Protein Structure Fall 2005 CSC 487/687 Computing for Bioinformatics.
Predicting Secondary Structure of All-Helical Proteins Using Hidden Markov Support Vector Machines Blaise Gassend, Charles W. O'Donnell, William Thies,
BINF6201/8201 Hidden Markov Models for Sequence Analysis
Sequence analysis: Macromolecular motif recognition Sylvia Nagl.
Protein Secondary Structure Prediction Based on Position-specific Scoring Matrices Yan Liu Sep 29, 2003.
Neural Networks for Protein Structure Prediction Brown, JMB 1999 CS 466 Saurabh Sinha.
Secondary structure prediction
2 o structure, TM regions, and solvent accessibility Topic 13 Chapter 29, Du and Bourne “Structural Bioinformatics”
Web Servers for Predicting Protein Secondary Structure (Regular and Irregular) Dr. G.P.S. Raghava, F.N.A. Sc. Bioinformatics Centre Institute of Microbial.
HMMs for alignments & Sequence pattern discovery I519 Introduction to Bioinformatics.
Protein Secondary Structure Prediction G P S Raghava.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Reestimation Equations Continuous Distributions.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Reestimation Equations Continuous Distributions.
Protein Structure Prediction ● Why ? ● Type of protein structure predictions – Sec Str. Pred – Homology Modelling – Fold Recognition – Ab Initio ● Secondary.
CZ5226: Advanced Bioinformatics Lecture 6: HHM Method for generating motifs Prof. Chen Yu Zong Tel:
PARALLELIZATION OF ARTIFICIAL NEURAL NETWORKS Joe Bradish CS5802 Fall 2015.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Elements of a Discrete Model Evaluation.
Prediction of Protein Binding Sites in Protein Structures Using Hidden Markov Support Vector Machine.
Hidden Markov Model and Its Application in Bioinformatics Liqing Department of Computer Science.
Comparative methods Basic logics: The 3D structure of the protein is deduced from: 1.Similarities between the protein and other proteins 2.Statistical.
Machine Learning Methods of Protein Secondary Structure Prediction Presented by Chao Wang.
V diagonal lines give equivalent residues ILS TRIVHVNSILPSTN V I L S T R I V I L P E F S T Sequence A Sequence B Dot Plots, Path Matrices, Score Matrices.
V diagonal lines give equivalent residues ILS TRIVHVNSILPSTN V I L S T R I V I L P E F S T Sequence A Sequence B Dot Plots, Path Matrices, Score Matrices.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Reestimation Equations Continuous Distributions.
A Hybrid Model of HMM and RBFN Model of Speech Recognition 길이만, 김수연, 김성호, 원윤정, 윤아림 한국과학기술원 응용수학전공.
Protein motif /domain Structural unit Functional unit Signature of protein family How are they defined?
Protein Structure Prediction. Protein Sequence Analysis Molecular properties (pH, mol. wt. isoelectric point, hydrophobicity) Secondary Structure Super-secondary.
Graphical Models for Segmenting and Labeling Sequence Data Manoj Kumar Chinnakotla NLP-AI Seminar.
Predicting Structural Features Chapter 12. Structural Features Phosphorylation sites Transmembrane helices Protein flexibility.
Improved Protein Secondary Structure Prediction. Secondary Structure Prediction Given a protein sequence a 1 a 2 …a N, secondary structure prediction.
Fall 2004 Perceptron CS478 - Machine Learning.
Hidden Markov Models (HMM)
network of simple neuron-like computing elements
Artificial Neural Networks Thomas Nordahl Petersen & Morten Nielsen
Gene Structure Prediction Using Neural Networks and Hidden Markov Models June 18, 권동섭 신수용 조동연.
Neural Networks for Protein Structure Prediction Dr. B Bhunia.
Presentation transcript:

Protein Prediction with Neural Networks! Chris Alvino CS152 Fall ’06 Prof. Keller

Introduction Proteins, made from amino acids Proteins, made from amino acids Polar forces interact for craaazzzy combinatoric explosion! Polar forces interact for craaazzzy combinatoric explosion! Just how crazzzzyyy? Just how crazzzzyyy?

Real Crazy Using crude workload estimates for a petaflop/second capacity machine leads to an estimate of THREE YEARS to simulate 100 MICROSECONDS of protein folding. Using crude workload estimates for a petaflop/second capacity machine leads to an estimate of THREE YEARS to simulate 100 MICROSECONDS of protein folding.

Why Neural Nets? Not so crazy Not so crazy Relatively accurate results Relatively accurate results 70-80% accurate70-80% accurate Patterns learned can lead to useful biological data Patterns learned can lead to useful biological data Used to quickly check existing databases Used to quickly check existing databases

Early Methods: Black Box Approach Protein Folding Analysis by an Artifical Neural Network Approach Protein Folding Analysis by an Artifical Neural Network Approach Authors: R. Sacile and C. Ruggiero Authors: R. Sacile and C. Ruggiero Published 1993 Published 1993

Early Methods: Black Box Approach Standard Back Prop Algorithm Standard Back Prop Algorithm

Early Methods: Black Box Approach 3 Layers 3 Layers Input = Window size = 13 amino acidsInput = Window size = 13 amino acids Hidden Layer = 20 neuronsHidden Layer = 20 neurons Output Layer: 3 possible (alpha, beta, coil)Output Layer: 3 possible (alpha, beta, coil)

Early Methods: Black Box Approach 7 training sets 7 training sets Each consists of around 1500 residuals (amino acids)Each consists of around 1500 residuals (amino acids) Training took 3-4 hours Training took 3-4 hours

Results

Artificial Neural Networks and Hidden Markov Models for Predicting the Protein Structures: The Secondary Structure Prediction in Caspases Thimmappa S. Anekonda (2002)

Current State of the Art Neural Networks and Hidden Markov Models Neural Networks and Hidden Markov Models

Hidden Markov what? Hidden Markov models (HMMs), originally developed for other applications such as speech recognition, are generative, probabilistic models of sequential information. An observed sequence is modeled as being the stochastic result of an underlying unobserved random walk through the hidden states of the model. The parameters of an HMM are the transition probabilities between the hidden states and the symbol emission probabilities from each hidden state.

State transitions in a hidden Markov model (example) x — hidden states y — observable outputs a — transition probabilities b — output probabilities State transitions in a hidden Markov model (example) x — hidden states y — observable outputs a — transition probabilities b — output probabilities

Caspases, the friendly Ghost Caspases are a family of intracellular cysteine endopeptidases. They play a key role in inflammation and mammalian apoptosis or programmed cell death.

Clash of the Titans PHDSec PHDSec Utilizes evolutionary informationUtilizes evolutionary information PSIPRED PSIPRED Uses iterated PSI-BLAST profiles as input instead of multiple sequeence alignments like PHDSecUses iterated PSI-BLAST profiles as input instead of multiple sequeence alignments like PHDSec SAM-T02 SAM-T02 Uses ANN and HMMUses ANN and HMM PROF King PROF King Uses seven GOR-based predictions and ANNUses seven GOR-based predictions and ANN