Some principles and examples related to evaluation of sequence similarities with help of length equivalent measures (ELEMS) Jaroslav Kubrycht and Karel.

Slides:



Advertisements
Similar presentations
From Geometry to Architecture and Construction. Consider a vertical section of polyhedrons that correspond to columns having different shift states. Each.
Advertisements

Blast to Psi-Blast Blast makes use of Scoring Matrix derived from large number of proteins. What if you want to find homologs based upon a specific gene.
PhyCMAP: Predicting protein contact map using evolutionary and physical constraints by integer programming Zhiyong Wang and Jinbo Xu Toyota Technological.
Bounds on Code Length Theorem: Let l ∗ 1, l ∗ 2,..., l ∗ m be optimal codeword lengths for a source distribution p and a D-ary alphabet, and let L ∗ be.
Gapped Blast and PSI BLAST Basic Local Alignment Search Tool ~Sean Boyle Basic Local Alignment Search Tool ~Sean Boyle.
Random walk Presented by Changqing Li Mathematics Probability Statistics.
Presented By Cindy Xiaotong Lin
Measuring the degree of similarity: PAM and blosum Matrix
Lecture 8 Alignment of pairs of sequence Local and global alignment
Bioinformatics Finding signals and motifs in DNA and proteins Expectation Maximization Algorithm MEME The Gibbs sampler Lecture 10.
Heuristic alignment algorithms and cost matrices
Expect value Expect value (E-value) Expected number of hits, of equivalent or better score, found by random chance in a database of the size.
Alignment methods and database searching April 14, 2005 Quiz#1 today Learning objectives- Finish Dotter Program analysis. Understand how to use the program.
1 1. BLAST (Basic Local Alignment Search Tool) Heuristic Only parts of protein are frequently subject to mutations. For example, active sites (that one.
Scoring Matrices June 22, 2006 Learning objectives- Understand how scoring matrices are constructed. Workshop-Use different BLOSUM matrices in the Dotter.
Sequence similarity.
Alignment methods June 26, 2007 Learning objectives- Understand how Global alignment program works. Understand how Local alignment program works.
Similar Sequence Similar Function Charles Yan Spring 2006.
1-month Practical Course Genome Analysis Lecture 3: Residue exchange matrices Centre for Integrative Bioinformatics VU (IBIVU) Vrije Universiteit Amsterdam.
BLOSUM Information Resources Algorithms in Computational Biology Spring 2006 Created by Itai Sharon.
Computational Biology, Part 2 Sequence Comparison with Dot Matrices Robert F. Murphy Copyright  1996, All rights reserved.
Alignment IV BLOSUM Matrices. 2 BLOSUM matrices Blocks Substitution Matrix. Scores for each position are obtained frequencies of substitutions in blocks.
Alignment III PAM Matrices. 2 PAM250 scoring matrix.
Blast heuristics Morten Nielsen Department of Systems Biology, DTU.
Sequence comparison: Score matrices Genome 559: Introduction to Statistical and Computational Genomics Prof. James H. Thomas
Scaffold Download free viewer:
©2003/04 Alessandro Bogliolo Background Information theory Probability theory Algorithms.
Alignment Statistics and Substitution Matrices BMI/CS 576 Colin Dewey Fall 2010.
Gapped BLAST and PSI-BLAST : a new generation of protein database search programs Team2 邱冠儒 黃尹柔 田耕豪 蕭逸嫻 謝朝茂 莊閔傑 2014/05/12 1.
An Introduction to Bioinformatics
Computational Biology, Part 3 Sequence Alignment Robert F. Murphy Copyright  1996, All rights reserved.
BINF6201/8201 Hidden Markov Models for Sequence Analysis
PROBABILITY AND STATISTICS FOR ENGINEERING Hossein Sameti Department of Computer Engineering Sharif University of Technology Two Functions of Two Random.
1 7. Two Random Variables In many experiments, the observations are expressible not as a single quantity, but as a family of quantities. For example to.
KNR 445 Statistics Hyp-tests Slide 1 Introduction to Hypothesis Testing The z-test 1.
Sequence analysis: Macromolecular motif recognition Sylvia Nagl.
Amino Acid Scoring Matrices Jason Davis. Overview Protein synthesis/evolution Protein synthesis/evolution Computational sequence alignment Computational.
Pairwise Sequence Alignment. The most important class of bioinformatics tools – pairwise alignment of DNA and protein seqs. alignment 1alignment 2 Seq.
Sequence Analysis CSC 487/687 Introduction to computing for Bioinformatics.
Comp. Genomics Recitation 3 The statistics of database searching.
Construction of Substitution Matrices
Sequence Alignment Csc 487/687 Computing for bioinformatics.
Function preserves sequences Christophe Roos - MediCel ltd Similarity is a tool in understanding the information in a sequence.
MUSCLE An Attractive MSA Application. Overview Some background on the MUSCLE software. The innovations and improvements of MUSCLE. The MUSCLE algorithm.
11/23/2015Slide 1 Using a combination of tables and plots from SPSS plus spreadsheets from Excel, we will show the linkage between correlation and linear.
PROBABILITY AND STATISTICS FOR ENGINEERING Hossein Sameti Department of Computer Engineering Sharif University of Technology Mean, Variance, Moments and.
1 8. One Function of Two Random Variables Given two random variables X and Y and a function g(x,y), we form a new random variable Z as Given the joint.
Alignment & Secondary Structure You have learned about: Data & databases Tools Amino Acids Protein Structure Today we will discuss: Aligning sequences.
©CMBI 2005 Database Searching BLAST Database Searching Sequence Alignment Scoring Matrices Significance of an alignment BLAST, algorithm BLAST, parameters.
Sequence Alignment.
1 6. Mean, Variance, Moments and Characteristic Functions For a r.v X, its p.d.f represents complete information about it, and for any Borel set B on the.
Construction of Substitution matrices
Sequence comparisons April 9, 2002 Review homework Learning objectives-Review amino acids. Understand difference between identity, similarity and homology.
Step 3: Tools Database Searching
The statistics of pairwise alignment BMI/CS 576 Colin Dewey Fall 2015.
One Function of Two Random Variables
©CMBI 2005 Database Searching BLAST Database Searching Sequence Alignment Scoring Matrices Significance of an alignment BLAST, algorithm BLAST, parameters.
Multiple Sequence Alignment Vasileios Hatzivassiloglou University of Texas at Dallas.
Review: Stages in Research Process Formulate Problem Determine Research Design Determine Data Collection Method Design Data Collection Forms Design Sample.
Sequence Alignment. Assignment Read Lesk, Problem: Given two sequences R and S of length n, how many alignments of R and S are possible? If you.
Techniques for Protein Sequence Alignment and Database Searching G P S Raghava Scientist & Head Bioinformatics Centre, Institute of Microbial Technology,
Substitution Matrices and Alignment Statistics BMI/CS 776 Mark Craven February 2002.
9/6/07BCB 444/544 F07 ISU Dobbs - Lab 3 - BLAST1 BCB 444/544 Lab 3 BLAST Scoring Matrices & Alignment Statistics Sept6.
Advanced Wireless Networks
CH 8. Image Compression 8.1 Fundamental 8.2 Image compression models
7. Two Random Variables In many experiments, the observations are expressible not as a single quantity, but as a family of quantities. For example to record.
Protein structure prediction.
Volume 3, Issue 6, Pages (November 1998)
Alignment IV BLOSUM Matrices
9. Two Functions of Two Random Variables
Presentation transcript:

Some principles and examples related to evaluation of sequence similarities with help of length equivalent measures (ELEMS) Jaroslav Kubrycht and Karel Sigler Prague, 30 November, 2006

Examples and kinds of column identities derived by ELEMS

LIATR ISARV LWIRCC LWSITV ISAIRC LSATR LIWIC LISRC IWATV LWSICR Minimum aa numbers limiting ELEMS(RDA) derived levels: CCBE aa, high occurrence aa, template motif aa, questionable aa cysteine exhibits the same numbers for both template motif and questionable aa ? see our pdf file

Examples of amino acid similarities and their contradictory dissimilarities in sequence block columns

Questionable amino acids A and V convertible via single triplet mutation present in the same column (cooperating pairs) achieve mixed high occurrence level. AG A VA AA G VG G AA V On the other, hand collocating template amino acids A and G without mutation relationship form contradictory pairs, which in fact diminish the level of overall extent of aa similarities in their block.

Length equivalents as products of probabilistic compression

The probability of amino acids present in left column can be represented by a complete column similarity of non-integer height, i.e. by the vertical length equivalent of column (LE A ). A A A A A A A A ELEMS(RDA) in given case determines high occurrence level of aa similarity, which LE A =

In addition to LE A, we define also mean compressed height of whole sequence blocks, i.e. LE TM. Both given height- related (vertical) length equivalents are restricted by the same number limits in ELEMS distinguishing different kinds of similarities. restricted aadescription of lower limitinterval questionable (gray zone) random aa/chain template motiffuzzy-related point among random aa/chain and double sequence similarity cohesivethree compressed aa/chains represent minimum sticking stage of cohesion 3.0-(SL+2) CBCEfor details see our pdf file> SL+2

Similar compression principle is also used to process gapped sequence block. Thus we result a compressed block with co- lumns containing only identical/similar aa and exhibiting non- integer height done by LE TM. However, the first floor of given oblong block belongs to a random chain (in light orange) of the template motif. Only upper area determines HLE value. This means that: HLE = (LE TM – 1) x n. HLE random chain

Mild modification in case of double sequence similarity Double sequence similarity uses only a single value of LE A (LE A = 2) following from the presence of only two chains in corresponding sequence block. Since this similarity has no alternative chain, corresponding alignment is accompanied by increased frequency of losses of column similarities in comparison with multiple sequence alignments. This and LE A values higher than necessary induced us to avoid restrictions of mean length equivalent (LE TM ) value in double sequence similarity, still keeping HLE evaluation. In spite of it, some agreement between BLAST and ELEMS is demonstrated in WP3.2.2.

Alternatively, we can represent HLE as a single chain of non-integer HLE length. This raises the question of minimal length of the chain exhibiting mean aa probability (or score) identical with template motif related to HLE. Corresponding minimum value of non-integer length (SL, i.e. specific limit) can be determined using several statistical procedures. specific limit (SL) HLE chain of sufficient length i.e. HLE > SL

RBS as unifying value

The ratio of HLE to SL is independent of any probability differences. Moreover, this ratio provides a simply and illustrative insight into the difference from minimum significant value. Consequently, we suppose that such value may represent an interesting density- related parameter, which may complement the bit score evaluation. The given ratio was named relative block similarity (RBS). RBS is thus determined by the formula: RBS = HLE/SLE

Thank you for your visit of our web page. If you have any questions, our s are: You are invited.