Sources Page & Holmes Vladimir Likic presentation: 20show.pdf

Slides:



Advertisements
Similar presentations
Pairwise Sequence Alignment Sushmita Roy BMI/CS 576 Sushmita Roy Sep 10 th, 2013 BMI/CS 576.
Advertisements

Blast outputoutput. How to measure the similarity between two sequences Q: which one is a better match to the query ? Query: M A T W L Seq_A: M A T P.
Alignment methods Introduction to global and local sequence alignment methods Global : Needleman-Wunch Local : Smith-Waterman Database Search BLAST FASTA.
Gapped Blast and PSI BLAST Basic Local Alignment Search Tool ~Sean Boyle Basic Local Alignment Search Tool ~Sean Boyle.
Measuring the degree of similarity: PAM and blosum Matrix
DNA sequences alignment measurement
Lecture 8 Alignment of pairs of sequence Local and global alignment
C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E Alignments 1 Sequence Analysis.
Sequence Similarity Searching Class 4 March 2010.
Heuristic alignment algorithms and cost matrices
Sequence similarity (II). Schedule Mar 23midterm assignedalignment Mar 30midterm dueprot struct/drugs April 6teams assignedprot struct/drugs April 13RNA.
Introduction to Bioinformatics Algorithms Sequence Alignment.
Introduction to bioinformatics
Sequence Analysis Tools
Sequence similarity.
Alignment methods June 26, 2007 Learning objectives- Understand how Global alignment program works. Understand how Local alignment program works.
Pairwise Alignment Global & local alignment Anders Gorm Pedersen Molecular Evolution Group Center for Biological Sequence Analysis.
Similar Sequence Similar Function Charles Yan Spring 2006.
Sequence Alignment III CIS 667 February 10, 2004.
Introduction to Bioinformatics Algorithms Sequence Alignment.
1-month Practical Course Genome Analysis Lecture 3: Residue exchange matrices Centre for Integrative Bioinformatics VU (IBIVU) Vrije Universiteit Amsterdam.
Bioinformatics Unit 1: Data Bases and Alignments Lecture 3: “Homology” Searches and Sequence Alignments (cont.) The Mechanics of Alignments.
Sequence Alignments Revisited
Sequence similarity. Motivation Same gene, or similar gene Suffix of A similar to prefix of B? Suffix of A similar to prefix of B..Z? Longest similar.
Pairwise alignment Computational Genomics and Proteomics.
1 Lesson 3 Aligning sequences and searching databases.
Alignment methods II April 24, 2007 Learning objectives- 1) Understand how Global alignment program works using the longest common subsequence method.
Sequence comparison: Score matrices Genome 559: Introduction to Statistical and Computational Genomics Prof. James H. Thomas
Alignment Statistics and Substitution Matrices BMI/CS 576 Colin Dewey Fall 2010.
Pairwise Alignment How do we tell whether two sequences are similar? BIO520 BioinformaticsJim Lund Assigned reading: Ch , Ch 5.1, get what you can.
Inferring function by homology The fact that functionally important aspects of sequences are conserved across evolutionary time allows us to find, by homology.
BIOMETRICS Module Code: CA641 Week 11- Pairwise Sequence Alignment.
Pairwise alignments Introduction Introduction Why do alignments? Why do alignments? Definitions Definitions Scoring alignments Scoring alignments Alignment.
Pairwise & Multiple sequence alignments
CISC667, S07, Lec5, Liao CISC 667 Intro to Bioinformatics (Spring 2007) Pairwise sequence alignment Needleman-Wunsch (global alignment)
Evolution and Scoring Rules Example Score = 5 x (# matches) + (-4) x (# mismatches) + + (-7) x (total length of all gaps) Example Score = 5 x (# matches)
Gapped BLAST and PSI- BLAST: a new generation of protein database search programs By Stephen F. Altschul, Thomas L. Madden, Alejandro A. Schäffer, Jinghui.
Sequence Alignment Goal: line up two or more sequences An alignment of two amino acid sequences: …. Seq1: HKIYHLQSKVPTFVRMLAPEGALNIHEKAWNAYPYCRTVITN-EYMKEDFLIKIETWHKP.
Amino Acid Scoring Matrices Jason Davis. Overview Protein synthesis/evolution Protein synthesis/evolution Computational sequence alignment Computational.
Alignment methods April 26, 2011 Return Quiz 1 today Return homework #4 today. Next homework due Tues, May 3 Learning objectives- Understand the Smith-Waterman.
Pairwise Sequence Alignment. The most important class of bioinformatics tools – pairwise alignment of DNA and protein seqs. alignment 1alignment 2 Seq.
Pairwise alignment of DNA/protein sequences I519 Introduction to Bioinformatics, Fall 2012.
Scoring Matrices April 23, 2009 Learning objectives- 1) Last word on Global Alignment 2) Understand how the Smith-Waterman algorithm can be applied to.
Construction of Substitution Matrices
Sequence Alignment Csc 487/687 Computing for bioinformatics.
Function preserves sequences Christophe Roos - MediCel ltd Similarity is a tool in understanding the information in a sequence.
Chapter 3 Computational Molecular Biology Michael Smith
BLAST: Basic Local Alignment Search Tool Altschul et al. J. Mol Bio CS 466 Saurabh Sinha.
Last lecture summary. Flavors of sequence alignment pair-wise alignment × multiple sequence alignment.
A Table-Driven, Full-Sensitivity Similarity Search Algorithm Gene Myers and Richard Durbin Presented by Wang, Jia-Nan and Huang, Yu- Feng.
Applied Bioinformatics Week 3. Theory I Similarity Dot plot.
Biocomputation: Comparative Genomics Tanya Talkar Lolly Kruse Colleen O’Rourke.
Sequence Alignments with Indels Evolution produces insertions and deletions (indels) – In addition to substitutions Good example: MHHNALQRRTVWVNAY MHHALQRRTVWVNAY-
Pairwise Sequence Alignment Part 2. Outline Summary Local and Global alignments FASTA and BLAST algorithms Evaluating significance of alignments Alignment.
Pairwise sequence alignment Lecture 02. Overview  Sequence comparison lies at the heart of bioinformatics analysis.  It is the first step towards structural.
Sequence Alignment.
Construction of Substitution matrices
Bioinformatics Computing 1 CMP 807 – Day 2 Kevin Galens.
Sequence Alignment Abhishek Niroula Department of Experimental Medical Science Lund University
Step 3: Tools Database Searching
The statistics of pairwise alignment BMI/CS 576 Colin Dewey Fall 2015.
Heuristic Alignment Algorithms Hongchao Li Jan
Techniques for Protein Sequence Alignment and Database Searching G P S Raghava Scientist & Head Bioinformatics Centre, Institute of Microbial Technology,
Last lecture summary. Sequence alignment What is sequence alignment Three flavors of sequence alignment Point mutations, indels.
Substitution Matrices and Alignment Statistics BMI/CS 776 Mark Craven February 2002.
9/6/07BCB 444/544 F07 ISU Dobbs - Lab 3 - BLAST1 BCB 444/544 Lab 3 BLAST Scoring Matrices & Alignment Statistics Sept6.
Database Scanning/Searching FASTA/BLAST/PSIBLAST G P S Raghava.
Pairwise Sequence Alignment and Database Searching
Sequence similarity, BLAST alignments & multiple sequence alignments
Biology 162 Computational Genetics Todd Vision Fall Aug 2004
Presentation transcript:

Sources Page & Holmes Vladimir Likic presentation: 20show.pdf 20show.pdf Wikipedia Lecture at :

Homoplasy – structural or DNA resemblance due to parallelism or convergent evolution rather than to common ancestry

Which are homoplasious?

Problem: which base positions share common descent? agtggtcttgctacattgctagctaaatcgatcatgatcgatgattcagg tagctaaatcgatcatgatcgatgattcaggcgatgtcatgactgatcag tacattgctagctaaatcgatcatgatcgatgattcaggcgatgtcatga gatcatgatcgatgattcaggcgatgtcatgactgatcagggatgatgat Alignment – residue to residue correspondence between 2 or more sequences such that the order of residues in each sequence is preserved. agtggtcttgctacattgctagctaaatcgatcatgatcgatgattcagg tagctaaatcgatcatgatcgatgattcaggcgatgtcatgactgatcag tacattgctagctaaatcgatcatgatcgatgattcaggcgatgtcatga gatcatgatcgatgattcaggcgatgtcatgactgatcagggatgatgat agtggtcttgctacattgctagctaaatcgatcatgatcgatgattcagg tagctaaatcgatcatgatcgatgattcaggcgatgtcatgactgatcag tacattgctagctaaa----tcatgatcgatgattcaggcgatgtcatga gatcatgatcgatgattcaggcgat------actgatcagggatgatgat Indels make alignment trickier

Assembly – (from ensembl) - When the genome of a species is to be sequenced, the chromosomes from many cells are broken at random positions into small fragments, which are sequenced, and reassembled into long sequences (contigs). Contigs may be assembled into longer sequences called scaffolds and sometimes, if the depth of sequencing is high enough, there may be enough information to assemble most of the scaffolds into chromosomes. The resulting collection of sequences after assembly is called a genome assembly. Alignment problems (examples) 1) different sequences of the same allele from the same locus within the same individual 2) sequences of different alleles from the same locus within the same individual 3) same locus from different individuals

Alignment Methods Dot plot – qualitative Sequence alignment – quantitative; constructing the best alignment using a scoring scheme Types of Alignment Global – best alignment over the entire length Local – best alignment in small region; used when comparing sequences of different lengths Multiple – beyond pairwise cagcacttggattctgg & cagcgtgg Local cagca-cttggattctgg ---cagcgtgg Global (best depending on gap penalties) cagcacttggattctgg cagc----g—t----gg

Gaps residue to nothing match that can be inserted in either sequence are not part of the DNA sequence, only a construct for alignment Gap to gap match is meaningless and not allowed

Dot plots – heuristic; make matrix, place dots; find diagonals

Alignment with scoring schemes score to select the best possible alignment given scoring scheme Scoring scheme A set of rules that assigns a score to a particular alignment between two sequences Goal is to maximize score Score is sum of residue substitution scores and gap penalties

atggcgt = 4 atg-agt +1 for match -1 for mismatch No gap penalty atggcgt = 2 a-tgagt Substitution matrix: c t a g c t a g

Substitution matrix: c t a g c T a g What if we want to penalize transitions less than transversions?

Protein substitution matrices More complex than DNA scoring matrices. Proteins are composed of twenty amino acids, and physical-chemical properties of individual amino acids vary considerably. can be based on any property of amino acids: size, polarity, charge, hydrophobicity. Evolutionary substitution matrices – empirically derived by assessment of frequencies of changes at particular levels of divergence

Evolutionary substitution matrices PAM ("point accepted mutation") family PAM250, PAM120, etc. BLOSUM ("Blocks substitution matrix") family BLOSUM62, BLOSUM50, etc. The BLOSUM matrices were developed more recently and considered better.

Blosum62 Blosum80 is used for less divergent sequences Blosum45 is used for more divergent sequences Etc.

Because gaps often result in radical protein changes (frame shifts, premature stop), the penalty for a gap is usually several times greater than the penalty for a mutation. Once created, gaps of more than one residue might be less expensive than a completely new gap - in other words gap opening penalties and gap extension penalties are often defined separately Gaps

W i =g+h*i (for i>= 1, where i = gap length ) g: gap opening penalty h: gap extension penalty The ratio between gand h determines the relative weight for opening versus extension –Small g, Large h: gap length more important –Large g, Small h: gap length less important Affine gap penalty function W(i)

ATGTAGTGTATAGTACATGCA ATGTAG TACATGCA ATGTAGTGTATAGTACATGCA ATGTA--G--TA---CATGCA W i =g+h*i G = -3 H = -1 Substitution matrix: c t a g c T a g – 3 – 1(7) = – 3 (3) – 1(7) =10

How do we find the best alignment? Brute-force approach: Generate the list all possible alignments between two sequences, score them, select the alignment with the best score The number of possible global alignments between two sequences of length N is For two sequences of 250 residues this is ~10 149

Needleman-Wunsch and Smith-Waterman are both algorithms that find the best alignment through breaking the problem down into sub problems using dynamic programming …however, it is only the best based on the scoring matrix and the gap opening and extension penalities These methods are computationally expensive

BLAST – Basic Local Alignment Search Tool -Tries to find the highest scoring ungapped local alignment between a query and a database -Uses a word length (w) and scans for matches with a higher threshold (T) when aligned with words in the query -The local alignment is then extended in both directions until the score falls below the best score reached so far. -Many types of blast can be found at