Automated Alphabet Reduction Method with Evolutionary Algorithms for Protein Structure Prediction Jaume Bacardit, Michael Stout, Jonathan D. Hirst, Kumara.

Slides:



Advertisements
Similar presentations
Secondary structure prediction from amino acid sequence.
Advertisements

Protein – Protein Interactions Lisa Chargualaf Simon Kanaan Keefe Roedersheimer Others: Dr. Izaguirre, Dr. Chen, Dr. Wuchty, ChengBang Huang.
1 An Adaptive GA for Multi Objective Flexible Manufacturing Systems A. Younes, H. Ghenniwa, S. Areibi uoguelph.ca.
© 2005, it - instituto de telecomunicações. Todos os direitos reservados. Automatic Analog Integrated Circuits Layout Generator 9 th Annual “HUMIES” Awards.
© 2005, it - instituto de telecomunicações. Todos os direitos reservados. GENOM-POF: Multi-Objective Evolutionary Synthesis of Analog ICs with Corners.
1 Evolvable Malware Sadia Noreen, Sahafq Murtaza, M. Zubair Shafiq, Muddassar Farooq National University of Computer and Emerging Sciences (FAST-NUCES)
High Throughput Computing and Protein Structure Stephen E. Hamby.
Protein Structure Prediction With Evolutionary Algorithms Natalio Krasnogor, U of the West of England William Hart, Sandia National Laboratories Jim Smith,
Structural bioinformatics
JYC: CSM17 BioinformaticsCSM17 Week 10: Summary, Conclusions, The Future.....? Bioinformatics is –the study of living systems –with respect to representation,
EvoNet Flying Circus Introduction to Evolutionary Computation Brought to you by (insert your name) The EvoNet Training Committee The EvoNet Flying Circus.
JM - 1 Systems biology of cell-signaling systems: It's all about protein-protein interactions Jarek Meller Departments of Environmental.
Using Bioinformatics to Make the Bio- Math Connection The Confessions of a Biology Teacher.
. Protein Structure Prediction [Based on Structural Bioinformatics, section VII]
Project list 1.Peptide MHC binding predictions using position specific scoring matrices including pseudo counts and sequences weighting clustering (Hobohm)
Bioinformatics Unit 1: Data Bases and Alignments Lecture 3: “Homology” Searches and Sequence Alignments (cont.) The Mechanics of Alignments.
Muhammad Shahzad 1, Saira Zahid 1, Syed Ali Khayam 1,2, Muddassar Farooq 1 1 Next Generation Intelligent Networks Research Center National University of.
Srihari-CSE730-Spring 2003 CSE 730 Information Retrieval of Biomedical Text and Data Inroduction.
Section 2: Science as a Process
Protein Tertiary Structure Prediction
Forces and Prediction of Protein Structure Ming-Jing Hwang ( 黃明經 ) Institute of Biomedical Sciences Academia Sinica
SUPERVISED NEURAL NETWORKS FOR PROTEIN SEQUENCE ANALYSIS Lecture 11 Dr Lee Nung Kion Faculty of Cognitive Sciences and Human Development UNIMAS,
BINF6201/8201 Principle components analysis (PCA) -- Visualization of amino acids using their physico-chemical properties
G ENETIC A LGORITHMS FOR F AST M ATRIX M ULTIPLICATION András Joó Anikó Ekárt Juan Neirotti United Kingdom 14/07/2011 GECCO 2011 H UMIES AWARDS 2.
Functional Genomic Hypothesis Generation and Experimentation by a Robot Scientist King et al, Nature : Presented by Monica C. Sleumer February.
Multiobjective Genetic Algorithms for Multiscaling Excited-State Direct Dynamics in Photochemistry Kumara Sastry 1, D.D. Johnson 2, A. L. Thompson 3, D.
Friday 17 rd December 2004Stuart Young Capstone Project Presentation Predicting Deleterious Mutations Young SP, Radivojac P, Mooney SD.
Fast Search Protein Structure Prediction Algorithm for Almost Perfect Matches1 By Jayakumar Rudhrasenan S Primary Supervisor: Prof. Heiko Schroder.
1 Science as a Process Chapter 1 Section 2. 2 Objectives  Explain how science is different from other forms of human endeavor.  Identify the steps that.
Genetic Algorithms Michael J. Watts
Protein Folding Programs By Asım OKUR CSE 549 November 14, 2002.
1 Formal Verification of Candidate Solutions for Evolutionary Circuit Design (Entry 04) Zdeněk Vašíček and Lukáš Sekanina Faculty of Information Technology.
Title Line Subtitle Line Date / Student Example photos.
TMpro: Transmembrane Helix Prediction using Amino Acid Properties and Latent Semantic Analysis Madhavi Ganapathiraju, N. Balakrishnan, Raj Reddy and Judith.
Copyright © 2015 by Educational Testing Service. 1 Feature Selection for Automated Speech Scoring Anastassia Loukina, Klaus Zechner, Lei Chen, Michael.
Josh Bongard † & Hod Lipson Computational Synthesis Laboratory Cornell University † Current Address: Department of Computer Science.
Genetic algorithms and solid-state NMR pulse sequences Matthias Bechmann *, John Clark $, Angelika Sebald & * Department of Organic Chemistry, Johannes.
Study of Protein Prediction Related Problems Ph.D. candidate Le-Yi WEI 1.
Introduction to Earth Science Section 2 Section 2: Science as a Process Preview Key Ideas Behavior of Natural Systems Scientific Methods Scientific Measurements.
Artificial Intelligence Research Laboratory Bioinformatics and Computational Biology Program Computational Intelligence, Learning, and Discovery Program.
National Research Council Of the National Academies
9 th Annual "Humies" Awards 2012 — Philadelphia, Pennsylvania Uday Kamath, Amarda Shehu,Kenneth A De Jong Department of Computer Science George Mason University.
The NRC Framework for K-12 Science Education and the Next Generation Science Standards Tom Keller, Senior Program Officer Board on Science Education National.
Feature Extraction Artificial Intelligence Research Laboratory Bioinformatics and Computational Biology Program Computational Intelligence, Learning, and.
BMC Bioinformatics 2005, 6(Suppl 4):S3 Protein Structure Prediction not a trivial matter Strict relation between protein function and structure Gap between.
Bioinformatics Dipl. Ing. (FH) Patrick Grossmann
Final Report (30% final score) Bin Liu, PhD, Associate Professor.
V diagonal lines give equivalent residues ILS TRIVHVNSILPSTN V I L S T R I V I L P E F S T Sequence A Sequence B Dot Plots, Path Matrices, Score Matrices.
V diagonal lines give equivalent residues ILS TRIVHVNSILPSTN V I L S T R I V I L P E F S T Sequence A Sequence B Dot Plots, Path Matrices, Score Matrices.
Chapter 1 continued.  Observation- something noted with one of the five senses.
Organic Evolution and Problem Solving Je-Gun Joung.
Scientific Method (Inquiry). What is the scientific method….. ? The scientific method is a ______ for answering questions. process.
We propose an accurate potential which combines useful features HP, HH and PP interactions among the amino acids Sequence based accessibility obtained.
BIOINFORMATION A new taxonomy-based protein fold recognition approach based on autocross-covariance transformation - - 王红刚 14S
Jaume Bacardit, Michael Stout, Jonathan D
Evolvable Malware Sadia Noreen, Sahafq Murtaza, M. Zubair Shafiq, Muddassar Farooq National University of Computer and Emerging Sciences (FAST-NUCES) Next.
On Routine Evolution of Complex Cellular Automata
9th Annual "Humies" Awards 2012 — Philadelphia, Pennsylvania
Automated Reverse Engineering of Nonlinear Dynamical Systems
Challenges in Creating an Automated Protein Structure Metaserver
Section 2: Science as a Process
Zdeněk Vašíček and Lukáš Sekanina
SMA5422: Special Topics in Biotechnology
A Consensus-Based Clustering Method
Prediction of RNA Binding Protein Using Machine Learning Technique
Extra Tree Classifier-WS3 Bagging Classifier-WS3

Jigar.B.Katariya (08291A0531) E.Mahesh (08291A0542)
Protein structure prediction.
謝孫源 (Sun-Yuan Hsieh) 成功大學 電機資訊學院 資訊工程系
Presentation transcript:

Automated Alphabet Reduction Method with Evolutionary Algorithms for Protein Structure Prediction Jaume Bacardit, Michael Stout, Jonathan D. Hirst, Kumara Sastry, Xavier Llorà and Natalio Krasnogor University of Nottingham and University of Illinois at Urbana-Champaign

What is a protein?

Protein Structure Prediction (PSP)  The goal is to predict the (complex) 3D structure (and some sub- features) of a protein from its amino acid sequence (a 1D object) Primary Sequence3D Structure

Alphabet reduction process and validation Dataset Card=20 ECGA Mutual Information Size = N (<20) Dataset Card=N (<20) BioHEL Test set Accuracy Ensemble of rule sets Domain (CN, RSA, …)

This entry is human competitive because:  G: The result solves a problem of indisputable difficulty in its field (Difficult)  D: The result is publishable in its own right as a new scientific result - independent of the fact that the result was mechanically created (Publishable)  E: The result is equal to or better than the most recent human-created solution to a long-standing problem for which there has been a succession of increasingly better human-created solutions (≥Human)  B: The result is equal to or better than a result that was accepted as a new scientific result at the time when it was published in a peer-reviewed scientific journal (Innovative)

G:Difficulty  PSP is, after many decades of research, still one of the main unsolved problems in Science  In the 2006 CASP experiment, one of the best methods used > 3 cpu yrs to predict a single protein  Amino acid sequence is a string drawn from a 20-letter alphabet  Some AAs are similar & could be grouped, reducing the dimensionality of the domain  We can find a new alphabet with much lower cardinality than the AA representation without loosing critical information in the process  We can tailor alphabet reduction automatically to a variety of PSP-related domains

Why is this entry human- competitive?  The initial version of our alphabet reduction process has been accepted in GECCO 2007, in the biological applications track  One of the most famous alphabet reductions is the HP model that reduces AA types to only two: Hydrophobic & Polar (e.g. [Broome & Hecht, 2000])  Other experts use a broader set of physico- chemical properties to propose reduced alphabets (examples in later slides)  We have improved upon both of the above D:Publish. E:≥Human

B:Innovative  Comparison of our results against other reduced alphabets existing in the literature and human-designed ones, applied to two PSP-related datasets, Coordination Number (CN) and Solvent Accessibility (SA)  Our method produces the best reduced alphabets AlphabetLetters CN acc. SA acc. Diff.Ref. AA ± ± Our method 573.3± ±0.40.7/0.4 This work WW5673.1± ±0.40.9/1.1 [Wang & Wang, 99] SR5673.1± ±0.40.9/1.1 [Solis & Rackovsky, 00] MU4572.6± ±0.41.4/1.3 [Murphy et al., 00] MM5673.1± ±0.30.9/1.4 [Melo & Marti-Renom, 06] HD1772.9± ±0.41.1/1.4 This work HD2973.0± ±0.41.0/1.4 HD ± ±0.40.8/0.8 Alphabets from the literature Expert designed alphabets

Why is this entry better than the other entries?  PSP is a very difficult and very relevant domain  It has been named as Grand Challenge by the USA government [1]  Impact of having better protein structure models are countless  Genetic therapy  Synthesis of drugs for incurable diseases  Improved crops  Environmental remediation  Our contribution is a small but concrete step towards achieving this goal [1] Mathematical Committee on Physical, Engineering Engineering Sciences, Federal Coordinating Council for Science, and Technology. Grand challenges 1993: High performance computing and communications, 1992.

Better than other entries: New understanding of the folding process  Simpler rules obtained by BioHEL  AA alphabet:  AA alphabet: If AA −4  {F, G, I, L, V,X, Y }, AA −3  {F, G, Q,W}, AA −2  {C,N, P}, AA −1  {A, I, Q, V, Y }, AA  {K}, AA 1  {F, I, L,M,N, P, T, V }, AA 2  {N, P, Q, S}, AA 3  {C, I, L,R,W}, AA 4  {A,C, I, L,R, S} then AA is exposed  Reduced alphabet:  Reduced alphabet: If AA −4  {1, 3}, AA −3  {1, 3}, AA  {3}, AA 1  {1, 3}, AA 2  {1}, AA 3  {0} then AA is exposed   0 = ACFHILMVWY, 1 = DEKNPQRST (EK for AA), 3 = X  Unexpected explanations: Alphabet reduction clustered AA types that experts did not expect. Analyzing the data verified that groups were sound

Better than other entries: run- time reduction & conclusions  Alphabet reduction is also beneficial in the short term  We have extrapolated the reduced alphabet to Position- Specific Scoring Matrices (PSSM)  PSSM is the state-of-the-art representation for PSP with orders of magnitude more information than the AA alphabet  Learning time of BioHEL using PSSM has been reduced from 2 weeks to 3 days with only 0.5% accuracy drop  We consider that our entry is the best because it addresses successfully and in many ways a very relevant, important, high profile and timely problem