Fast Search Protein Structure Prediction Algorithm for Almost Perfect Matches1 By Jayakumar Rudhrasenan S3047315 Primary Supervisor: Prof. Heiko Schroder.

Slides:

Advertisements

Similar presentations

Fa07CSE 182 CSE182-L4: Database filtering. Fa07CSE 182 Summary (through lecture 3) A2 is online We considered the basics of sequence alignment –Opt score.

Advertisements

1 Introduction to Sequence Analysis Utah State University – Spring 2012 STAT 5570: Statistical Bioinformatics Notes 6.1.

Bioinformatics “Other techniques raise more questions than they answer. Bioinformatics is what answers the questions those techniques generate.” SheAvery

C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E Alignments 1 Sequence Analysis.

Structural bioinformatics

Sequence Similarity Searching Class 4 March 2010.

Strict Regularities in Structure-Sequence Relationship

Protein structure determination. Tertiary protein structure: protein folding Three main approaches: [1] experimental determination (X-ray crystallography,

Computational Molecular Biology (Spring’03) Chitta Baral Professor of Computer Science & Engg.

Bioinformatics and Phylogenetic Analysis

The Protein Data Bank (PDB)

CISC667, F05, Lec20, Liao1 CISC 467/667 Intro to Bioinformatics (Fall 2005) Protein Structure Prediction Protein Secondary Structure.

Protein Modules An Introduction to Bioinformatics.

BioInformatics - Protein Structure Prediction Rajalingam Aravinthan Gad Abraham Summer Studentship(2003/2004) Under the supervision of Professor Heiko.

Protein structure Friday, 10 February 2006 Introduction to Bioinformatics Brigham Young University DA McClellan

Similar Sequence Similar Function Charles Yan Spring 2006.

Protein structure determination & prediction. Tertiary protein structure: protein folding Three main approaches: [1] experimental determination (X-ray.

Bioinformatics Unit 1: Data Bases and Alignments Lecture 3: “Homology” Searches and Sequence Alignments (cont.) The Mechanics of Alignments.

Protein Structures.

Bioinformatics Ayesha M. Khan Spring 2013.

© Wiley Publishing All Rights Reserved. Searching Sequence Databases.

Protein Tertiary Structure Prediction

SUPERVISED NEURAL NETWORKS FOR PROTEIN SEQUENCE ANALYSIS Lecture 11 Dr Lee Nung Kion Faculty of Cognitive Sciences and Human Development UNIMAS,

Structural alignment Protein structure Every protein is defined by a unique sequence (primary structure) that folds into a unique.

Development of Bioinformatics and its application on Biotechnology

Genomics and Personalized Care in Health Systems Lecture 9 RNA and Protein Structure Leming Zhou, PhD School of Health and Rehabilitation Sciences Department.

Functional Associations of Protein in Entire Genomes Sequences Bioinformatics Center of Shanghai Institutes for Biological Sciences Bingding.

Lecture 10 – protein structure prediction. A protein sequence.

Master’s Degrees in Bioinformatics in Switzerland: Past, present and near future Patricia M. Palagi Swiss Institute of Bioinformatics.

BLAST: A Case Study Lecture 25. BLAST: Introduction The Basic Local Alignment Search Tool, BLAST, is a fast approach to finding similar strings of characters.

Multiple Alignment and Phylogenetic Trees Csc 487/687 Computing for Bioinformatics.

1 P9 Extra Discussion Slides. Sequence-Structure-Function Relationships Proteins of similar sequences fold into similar structures and perform similar.

Neural Networks for Protein Structure Prediction Brown, JMB 1999 CS 466 Saurabh Sinha.

BIOINFORMATICS IN BIOCHEMISTRY Bioinformatics– a field at the interface of molecular biology, computer science, and mathematics Bioinformatics focuses.

Sequence Analysis CSC 487/687 Introduction to computing for Bioinformatics.

Comp. Genomics Recitation 3 The statistics of database searching.

Function preserves sequences Christophe Roos - MediCel ltd Similarity is a tool in understanding the information in a sequence.

Protein Classification II CISC889: Bioinformatics Gang Situ 04/11/2002 Parts of this lecture borrowed from lecture given by Dr. Altman.

Protein Structure & Modeling Biology 224 Instructor: Tom Peavy Nov 18 & 23, 2009

A Tutorial of Sequence Matching in Oracle Haifeng Ji* and Gang Qian** * Oklahoma City Community College ** University of Central Oklahoma.

BLAST: Basic Local Alignment Search Tool Altschul et al. J. Mol Bio CS 466 Saurabh Sinha.

1 Protein Structure Prediction (Lecture for CS397-CXZ Algorithms in Bioinformatics) April 23, 2004 ChengXiang Zhai Department of Computer Science University.

Protein Tertiary Structure. Protein Data Bank (PDB) Contains all known 3D structural data of large biological molecules, mostly proteins and nucleic acids:

Introduction to Bioinformatics Dr. Rybarczyk, PhD University of North Carolina-Chapel Hill

Protein Classification Using Averaged Perceptron SVM

Protein Modeling Protein Structure Prediction. 3D Protein Structure ALA CαCα LEU CαCαCαCαCαCαCαCα PRO VALVAL ARG …… ??? backbone sidechain.

Introduction to Protein Structure Prediction BMI/CS 576 Colin Dewey Fall 2008.

Point Specific Alignment Methods PSI – BLAST & PHI – BLAST.

Pairwise sequence alignment Lecture 02. Overview  Sequence comparison lies at the heart of bioinformatics analysis.  It is the first step towards structural.

Doug Raiford Phage class: introduction to sequence databases.

Protein Structure and Bioinformatics. Chapter 2 What is protein structure? What are proteins made of? What forces determines protein structure? What is.

BMC Bioinformatics 2005, 6(Suppl 4):S3 Protein Structure Prediction not a trivial matter Strict relation between protein function and structure Gap between.

Bioinformatics Dipl. Ing. (FH) Patrick Grossmann

Structural classification of Proteins SCOP Classification: consists of a database Family Evolutionarily related with a significant sequence identity Superfamily.

Lecture 10 CS566 Fall Structural Bioinformatics Motivation Concepts Structure Solving Structure Comparison Structure Prediction Modeling Structural.

Protein Tertiary Structure Prediction Structural Bioinformatics.

Mismatch String Kernals for SVM Protein Classification Christina Leslie, Eleazar Eskin, Jason Weston, William Stafford Noble Presented by Pradeep Anand.

1 4. Nucleic acids and proteins in one and more dimensions - second part.

Bioinformatics Overview

PROTEIN MODELLING Presented by Sadhana S.

Biological Databases By: Komal Arora.

Predict Protein Sequence by Fuzzy-Association Rules

Intro to Alignment Algorithms: Global and Local

Protein Structures.

Molecular Modeling By Rashmi Shrivastava Lecturer

Homology Modeling.

Protein structure prediction.

Sequence alignment, E-value & Extreme value distribution

Sequence Analysis Alan Christoffels

Introduction to bioinformatics Lecture 5 Pair-wise sequence alignment

Presentation transcript:

Fast Search Protein Structure Prediction Algorithm for Almost Perfect Matches1 By Jayakumar Rudhrasenan S Primary Supervisor: Prof. Heiko Schroder Secondary Supervisor: Dr. Margaret Hamilton

Fast Search Protein Structure Prediction Algorithm for Almost Perfect Matches2 Introduction Bio-Informatics What is Bio-Informatics? Bio-Informatics is the science of developing computer databases and algorithms to facilitate biological research especially in the area of genomic. Genomic is the study of genes and its functions.

Fast Search Protein Structure Prediction Algorithm for Almost Perfect Matches3 Background - Protein Structure e How can we find the Structure of a protein ? X-ray Crystallography NMR Spectroscopy Phi Psi Amino acid a k r n d c a r a Protein Structure

Fast Search Protein Structure Prediction Algorithm for Almost Perfect Matches4 Where does Computer Science come into it? Limitations of traditional lab-work Expensive Cost involved in finding the structure through these method is expensive Time Consuming Takes 6 to 12 months to predict the structure of a single protein. REASON:  Some proteins don’t crystallise  Some don’t give good diffraction patterns  All proteins are fragile and difficult to handle.

Fast Search Protein Structure Prediction Algorithm for Almost Perfect Matches5 Methods Available There are many ways by which this problem is being tackled. These methods are basically classified into two groups: ab initio Homology modelling What is Homology modelling ?

Fast Search Protein Structure Prediction Algorithm for Almost Perfect Matches6 What is homology modelling? Homology modeling works on the principle that although each protein adopts a unique structure, there are only ~2,000 common folds between the various super families identified thus far. If two protein sequences are aligned and their percentage similarity is above the ‘twilight zone’, or 20% we can conclude that the sequences are homologous, or share a common ancestry, below this zone it is not possible to say whether the identical amino acid residues are in fact evolutionarily linked or have arisen by chance.

Fast Search Protein Structure Prediction Algorithm for Almost Perfect Matches7 What is Protein Structure Prediction? In its most general form - It is the prediction of the relative position of each amino acid in the protein structure with the knowledge of the structural details of other known proteins.

Fast Search Protein Structure Prediction Algorithm for Almost Perfect Matches8 Why predict protein structure? The sequence structure gap – known sequences, known structures Structural knowledge brings understanding of function and mechanism of action Can help in prediction of function Predicted structures can be used in structure based drug design It can help us understand the effects of mutations on structure or function It is a very interesting scientific problem –still unsolved in its most general form after more than 20 years of effort

Fast Search Protein Structure Prediction Algorithm for Almost Perfect Matches9 Protein Structure Prediction Algorithm nfsbcar..... arndcqeghilkmnfssd eghilnfsearlkspqga nhe Window size =3. Can be implemented with window size of 5,7,9. With window size of 9, we look for almost perfect matches as we wont get a perfect match with the database we have. window Protein Database Protein sequence for which the structure is unknown

Fast Search Protein Structure Prediction Algorithm for Almost Perfect Matches10 Algorithm – continued.. Number of Occurrences Phi graph Psi graph

Fast Search Protein Structure Prediction Algorithm for Almost Perfect Matches11 Limitations of this algorithm  Time Consuming Time taken to predict the structure of a protein Time taken to predict the structure 20,000 protein 2 hr PC time 2 x 20,000 = 40,000 hrs PC time

Fast Search Protein Structure Prediction Algorithm for Almost Perfect Matches12 Why does it take time? Each sub sequence of the unknown protein is compared with all the sub sequences of the proteins in the database. With a window size of 9, the number of sub strings in the database will be around 2 million. So, there will be 2 million comparisons for each sub sequence in the unknown protein. “Unknown protein” here means the proteins whose sequence is knows but the structure is not known.

Fast Search Protein Structure Prediction Algorithm for Almost Perfect Matches13 Fast Search Protein Structure Prediction Algorithm for Almost Perfect Matches Arrange the sub sequences with a hamming distance of one between each sub sequences. What is hamming distance? The number of disagreeing bits between twobinary vectors. Used as measure of dissimilarity. Eg These two binary numbers differ by one bit. Hamming distance of one here means that the each sub sequence differ from the one next to that by just one amino acid.

Fast Search Protein Structure Prediction Algorithm for Almost Perfect Matches14 Continued… Maintain a table which stores the hope index value for a mismatch. For example Row number Sub SequenceJump to row number