Pairwise Sequence Alignment

Slides:



Advertisements
Similar presentations
Sequence Alignments.
Advertisements

Sequence Alignments with Indels Evolution produces insertions and deletions (indels) – In addition to substitutions Good example: MHHNALQRRTVWVNAY MHHALQRRTVWVNAY-
COT 6930 HPC and Bioinformatics Bioinformatics Resources and Databases Xingquan Zhu Dept. of Computer Science and Engineering.
Lecture 8 Alignment of pairs of sequence Local and global alignment
Sequence Alignments and Database Searches Introduction to Bioinformatics.
Introduction to Bioinformatics Burkhard Morgenstern Institute of Microbiology and Genetics Department of Bioinformatics Goldschmidtstr. 1 Göttingen, March.
Sequence Similarity Searching Class 4 March 2010.
1 Pairwise Sequence Alignment. 2 Biological motivation Main algorithms for pairwise sequences alignment ATTGCGTCGATCGCAC-GCACGCT ATTGCAGTG-TCGAGCGTCAGGCT.
Summer Bioinformatics Workshop 2008 Sequence Alignments Chi-Cheng Lin, Ph.D. Associate Professor Department of Computer Science Winona State University.
C T C G T A GTCTGTCT Find the Best Alignment For These Two Sequences Score: Match = 1 Mismatch = 0 Gap = -1.
|| || ||||| ||| || || ||||||||||||||||||| MVHLTPEEKTAVNALWGKVNVDAVGGEALGRLLVVYPWTQRFFE… ATGGTGAACCTGACCTCTGACGAGAAGACTGCCGTCCTTGCCCTGTGGAACAAGGTGGACG TGGAAGACTGTGGTGGTGAGGCCCTGGGCAGGTTTGTATGGAGGTTACAAGGCTGCTTAAG.
Pairwise Sequence Alignment Part 2. Outline Global alignments-continuation Local versus Global BLAST algorithms Evaluating significance of alignments.
Sequence Alignments Chi-Cheng Lin, Ph.D. Associate Professor Department of Computer Science Winona State University – Rochester Center
Sequence Comparison Intragenic - self to self. -find internal repeating units. Intergenic -compare two different sequences. Dotplot - visual alignment.
Sequence similarity.
Introduction to Bioinformatics - Tutorial no. 2 Global Alignment Local Alignment.
Pairwise Alignment Global & local alignment Anders Gorm Pedersen Molecular Evolution Group Center for Biological Sequence Analysis.
Computational Biology, Part 2 Sequence Comparison with Dot Matrices Robert F. Murphy Copyright  1996, All rights reserved.
Dynamic Programming. Pairwise Alignment Needleman - Wunsch Global Alignment Smith - Waterman Local Alignment.
Sequence comparison: Local alignment
Introduction to Bioinformatics / Lecturer: Prof. Yael Mandel-Gutfreund Teaching Assistance: Shai Ben-Elazar Idit kosti Course web site :
Sequencing a genome and Basic Sequence Alignment
Sequence Analysis Alignments dot-plots scoring scheme Substitution matrices Search algorithms (BLAST)
Pairwise alignments Introduction Introduction Why do alignments? Why do alignments? Definitions Definitions Scoring alignments Scoring alignments Alignment.
Bioinformatics in Biosophy
Pairwise & Multiple sequence alignments
Computational Biology, Part 3 Sequence Alignment Robert F. Murphy Copyright  1996, All rights reserved.
Evolution and Scoring Rules Example Score = 5 x (# matches) + (-4) x (# mismatches) + + (-7) x (total length of all gaps) Example Score = 5 x (# matches)
Pairwise Sequence Alignment. The most important class of bioinformatics tools – pairwise alignment of DNA and protein seqs. alignment 1alignment 2 Seq.
Pairwise Sequence Alignment BMI/CS 776 Mark Craven January 2002.
Sequence Analysis CSC 487/687 Introduction to computing for Bioinformatics.
Sequencing a genome and Basic Sequence Alignment
Introduction to Bioinformatics Lecturer: Prof. Yael Mandel-Gutfreund Teaching Assistance: Rachelly Normand Edward Vitkin Course web site :
Construction of Substitution Matrices
Function preserves sequences Christophe Roos - MediCel ltd Similarity is a tool in understanding the information in a sequence.
Arun Goja MITCON BIOPHARMA
Basic terms:  Similarity - measurable quantity. Similarity- applied to proteins using concept of conservative substitutions Similarity- applied to proteins.
Intro to Alignment Algorithms: Global and Local Intro to Alignment Algorithms: Global and Local Algorithmic Functions of Computational Biology Professor.
Applied Bioinformatics Week 3. Theory I Similarity Dot plot.
Sequence Alignments with Indels Evolution produces insertions and deletions (indels) – In addition to substitutions Good example: MHHNALQRRTVWVNAY MHHALQRRTVWVNAY-
Pairwise Sequence Alignment Part 2. Outline Summary Local and Global alignments FASTA and BLAST algorithms Evaluating significance of alignments Alignment.
Pairwise sequence alignment Lecture 02. Overview  Sequence comparison lies at the heart of bioinformatics analysis.  It is the first step towards structural.
Sequence Alignment.
Construction of Substitution matrices
V diagonal lines give equivalent residues ILS TRIVHVNSILPSTN V I L S T R I V I L P E F S T Sequence A Sequence B Dot Plots, Path Matrices, Score Matrices.
V diagonal lines give equivalent residues ILS TRIVHVNSILPSTN V I L S T R I V I L P E F S T Sequence A Sequence B Dot Plots, Path Matrices, Score Matrices.
Introduction to Bioinformatics Lecturer: Prof. Yael Mandel-Gutfreund Teaching Assistance: Rachelly Normand Olga Karinski Course web site :
 During replication (in DNA), an error may be made that causes changes in the mRNA and proteins made from that part of the DNA  These errors or changes.
Day 7 Carlow Bioinformatics Aligning sequences. What is an alignment? CENTRAL concept in bioinformatics Easy if straight-forward, similar seqs –THISTHESAME.
. Sequence Alignment Author:- Aya Osama Supervision:- Dr.Noha khalifa.
Introduction to Bioinformatics
Introduction to sequence alignment Mike Hallett (David Walsh)
INTRODUCTION TO BIOINFORMATICS
The ideal approach is simultaneous alignment and tree estimation.
Sequence comparison: Local alignment
Introduction to Bioinformatics /234525
MUTATIONS.
Pairwise sequence Alignment.
Intro to Alignment Algorithms: Global and Local
Pairwise Sequence Alignment
Introduction to Bioinformatics
BCB 444/544 Lecture 7 #7_Sept5 Global vs Local Alignment
Find the Best Alignment For These Two Sequences
Pairwise Alignment Global & local alignment
Sequence Alignment Algorithms Morten Nielsen BioSys, DTU
MUTATIONS.
Dynamic Programming Finds the Best Score and the Corresponding Alignment O Alignment: Start in lower right corner and work backwards:
MUTATIONS.
Basic Local Alignment Search Tool (BLAST)
Presentation transcript:

Pairwise Sequence Alignment

WHAT?

WHAT? Given any two sequences (DNA or protein) Seq 1: Seq 2: CATATTGCAGTGGTCCCGCGTCAGGCT Seq 2: TAAATTGCGTGGTCGCACTGCACGCT we are interested to know to what extent they are similar? CATATTGCAGTGGTCCCGCGTCAGGCT TAAATTGCGT-GGTCGCACTGCACGCT

WHY?

Discover function Study evolution Find crucial features within a sequence Identify cause of diseases

Discover function Sequences that are similar probably have the same function

Study evolution If two sequences from different organisms are similar , they may have a common ancestor

Find crucial features Conservation of the IGFALS (Insulin-like growth factor) Between human and mouse. Regions in the sequences that are strongly conserved between different sequences can indicate their functional importance CATATTGCAGTGGTCCCGCGTCAGGCT TAAATTGCGT-GGTCGCACTGCACGCT

Identify cause of disease Comparison of sequences between individuals can detect changes that are related to diseases

Sickle Cell Anemia Due to 1 swapping an A for a T, causing inserted amino acid to be valine instead of glutamine in hemoglobin Image source: http://www.cc.nih.gov/ccc/ccnews/nov99/

Healthy Individual >gi|28302128|ref|NM_000518.4| Homo sapiens hemoglobin, beta (HBB), mRNA ACATTTGCTTCTGACACAACTGTGTTCACTAGCAACCTCAAACAGACACCATGGTGCATCTGACTCCTGA GGAGAAGTCTGCCGTTACTGCCCTGTGGGGCAAGGTGAACGTGGATGAAGTTGGTGGTGAGGCCCTGGGC AGGCTGCTGGTGGTCTACCCTTGGACCCAGAGGTTCTTTGAGTCCTTTGGGGATCTGTCCACTCCTGATG CTGTTATGGGCAACCCTAAGGTGAAGGCTCATGGCAAGAAAGTGCTCGGTGCCTTTAGTGATGGCCTGGC TCACCTGGACAACCTCAAGGGCACCTTTGCCACACTGAGTGAGCTGCACTGTGACAAGCTGCACGTGGAT CCTGAGAACTTCAGGCTCCTGGGCAACGTGCTGGTCTGTGTGCTGGCCCATCACTTTGGCAAAGAATTCA CCCCACCAGTGCAGGCTGCCTATCAGAAAGTGGTGGCTGGTGTGGCTAATGCCCTGGCCCACAAGTATCA CTAAGCTCGCTTTCTTGCTGTCCAATTTCTATTAAAGGTTCCTTTGTTCCCTAAGTCCAACTACTAAACT GGGGGATATTATGAAGGGCCTTGAGCATCTGGATTCTGCCTAATAAAAAACATTTATTTTCATTGC >gi|4504349|ref|NP_000509.1| beta globin [Homo sapiens] MVHLTPEEKSAVTALWGKVNVDEVGGEALGRLLVVYPWTQRFFESFGDLSTPDAVMGNPKVKAHGKKVLG AFSDGLAHLDNLKGTFATLSELHCDKLHVDPENFRLLGNVLVCVLAHHFGKEFTPPVQAAYQKVVAGVAN ALAHKYH

Diseased Individual >gi|28302128|ref|NM_000518.4| Homo sapiens hemoglobin, beta (HBB), mRNA ACATTTGCTTCTGACACAACTGTGTTCACTAGCAACCTCAAACAGACACCATGGTGCATCTGACTCCTGA GGTGAAGTCTGCCGTTACTGCCCTGTGGGGCAAGGTGAACGTGGATGAAGTTGGTGGTGAGGCCCTGGGC AGGCTGCTGGTGGTCTACCCTTGGACCCAGAGGTTCTTTGAGTCCTTTGGGGATCTGTCCACTCCTGATG CTGTTATGGGCAACCCTAAGGTGAAGGCTCATGGCAAGAAAGTGCTCGGTGCCTTTAGTGATGGCCTGGC TCACCTGGACAACCTCAAGGGCACCTTTGCCACACTGAGTGAGCTGCACTGTGACAAGCTGCACGTGGAT CCTGAGAACTTCAGGCTCCTGGGCAACGTGCTGGTCTGTGTGCTGGCCCATCACTTTGGCAAAGAATTCA CCCCACCAGTGCAGGCTGCCTATCAGAAAGTGGTGGCTGGTGTGGCTAATGCCCTGGCCCACAAGTATCA CTAAGCTCGCTTTCTTGCTGTCCAATTTCTATTAAAGGTTCCTTTGTTCCCTAAGTCCAACTACTAAACT GGGGGATATTATGAAGGGCCTTGAGCATCTGGATTCTGCCTAATAAAAAACATTTATTTTCATTGC >gi|4504349|ref|NP_000509.1| beta globin [Homo sapiens] MVHLTPVEKSAVTALWGKVNVDEVGGEALGRLLVVYPWTQRFFESFGDLSTPDAVMGNPKVKAHGKKVLG AFSDGLAHLDNLKGTFATLSELHCDKLHVDPENFRLLGNVLVCVLAHHFGKEFTPPVQAAYQKVVAGVAN ALAHKYH

How do sequences change?

Sequence Modifications Three types of changes Substitution (point mutation) Insertion Deletion Indel (replication slippage) TCCGT TCAGT TCAGT TCGAGT TCGT

In order to align two sequences we need a quantitive model to evaluate similarity between sequences. How do we quantitate sequence similarity ?

Substitutions Only not including indels Sequences compared base-by-base Count the number of matches and mismatches For example :Matches score +2, Mismatches score -1 TTCGTCGTAGTCGGCTCGACCTG GTACGTCTAGCGAGCGTGATCCT 9 matches +18 14 mismatches -14 Total score +4 A weak match

TT-CGTCGTAGTCG-GC-TCGACC-TG GTACGTC-TAG-CGAGCGT-GATCCT- Including Indels Create an ‘alignment’ Count matches within alignment Indels are scored as mismatches -1 TT-CGTCGTAGTCG-GC-TCGACC-TG GTACGTC-TAG-CGAGCGT-GATCCT- 17 matches +34 2 mismatches - 2 8 indels - 8 Total score +24 A strong match

Choosing an Alignment Many different alignments are possible Should consider all possible Take the best score found There may be more than one best alignment TT-CGTCGTAGTCG-GC-TCGACC-TG GTACGTC-TAG-CGAGCGT-GATCCT- +24 -TTCGT-CGTAGTC-GGCTCG-ACCTG GTAC-GTCTA-GCGAGCGT-GATCC-T

Why is it hard ? Alignment requires an algorithm that performs a number of comparisons roughly proportional to the square of the average sequence length n2.

Dynamic Programming A method for reducing a complex problem to a set of identical sub-problems The best solution to one sub-problem is independent from the best solution to the other sub-problem

Dynamic Programming A method for reducing a complex problem to a set of identical sub-problems The best solution to one sub-problem is independent from the best solution to the other sub-problem

What does it mean? If a path from X→Z passes through Y, the best path from X→Y is independent of the best path from Y→Z

Sequence Global Alignment Needleman-Wunsch Sequences: A = ACGCTG, B = CATGT A C G C T G 1 2 3 4 5 6 C 1 A 2 T 3 G 4 T Z 5

Example -1 2 2 3 4 5 -2 ? Sequences: A = ACGCTG, B = CATGT Score of best alignment between AC and CATG 2 …between ACG and CATG 2 3 4 5 -2 …between AC and CATGT Calculate score between ACG and CATGT ? Match:+2, Other:-1

Example 2 3 4 5 Align the next Insertion in the letter in the sequences Insertion in the first sequence (del) 3 5 2 3 4 5 - 5 Insertion in the Second sequence 3 -

Example 2 3 4 5 -1 2 1 -2 Sequences: A = ACGCTG, B = CATGT -1 from before plus -1 for mismatch of G against T  -2 2 from before plus -1 for mismatch of – against T  1 2 3 4 5 -1 2 -2 from before plus -1 for mismatch of G against –  -3 1 Cell gets highest score of -2,1,-3  1 -2 Sequences: A = ACGCTG, B = CATGT

Sequences: A = ACGCTG, B = CATGT Example 2 3 4 5 1 -1 2 -2 Sequences: A = ACGCTG, B = CATGT

A 1 C 2 G 3 4 T 5 6 0  C 1  A 2  T 3  G 4  T 5 

A 1 C 2 G 3 4 T 5 6 0  -1 C 1  A 2  T 3  G 4  T 5  A -

ACGCTG ------ A 1 C 2 G 3 4 T 5 6 0 -1 -2 -3 -4 -5 -6 C 1 A 2 T 3 G 4 A 1 C 2 G 3 4 T 5 6 0  -1 -2 -3 -4 -5 -6 C 1  A 2  T 3  G 4  T 5  ACGCTG ------

----- CATGT A 1 C 2 G 3 4 T 5 6 0 -1 -2 -3 -4 -5 -6 C 1 A 2 T 3 G 4 A 1 C 2 G 3 4 T 5 6 0  -1 -2 -3 -4 -5 -6 C 1  A 2  T 3  G 4  T 5  ----- CATGT

A 1 C 2 G 3 4 T 5 6 0  -1 -2 -3 -4 -5 -6 C 1  A 2  T 3  G 4  T 5  A C

A 1 C 2 G 3 4 T 5 6 0  -1 -2 -3 -4 -5 -6 C 1  A 2  T 3  G 4  T 5  AC -C

A 1 C 2 G 3 4 T 5 6 0  -1 -2 -3 -4 -5 -6 C 1  A 2  T 3  G 4  T 5  ACG -C-

ACGC ---C ACGC -C-- A 1 C 2 G 3 4 T 5 6 0 -1 -2 -3 -4 -5 -6 C 1 A 2 A 1 C 2 G 3 4 T 5 6 0  -1 -2 -3 -4 -5 -6 C 1  A 2  T 3  G 4  T 5  ACGC ---C ACGC -C--

A 1 C 2 G 3 4 T 5 6 0  -1 -2 -3 -4 -5 -6 C 1  A 2  T 3  G 4  T 5  ACG -CA

A 1 C 2 G 3 4 T 5 6 0  -1 -2 -3 -4 -5 -6 C 1  A 2  T 3  G 4  T 5 

A 1 C 2 G 3 4 T 5 6 0  -1 -2 -3 -4 -5 -6 C 1  A 2  T 3  G 4  T 5 

A 1 C 2 G 3 4 T 5 6 0  -1 C 1  A 2  T 3  G 4  T 5 

A 1 C 2 G 3 4 T 5 6 0  -1 C 1  A 2  T 3  G 4  T 5  ACGCTG- -C-ATGT

A 1 C 2 G 3 4 T 5 6 0  -1 C 1  A 2  T 3  G 4  T 5  ACGCTG- -CA-TGT

A 1 C 2 G 3 4 T 5 6 0  -1 C 1  A 2  T 3  G 4  T 5  -ACGCTG CATG-T-

Needleman-Wunsch Alignment Summary Needleman-Wunsch Alignment Global alignment between sequences Compare entire sequence against another Create scoring table Sequence A across top, B down left Cell at column i and row j contains the score of best alignment between the first i elements of A and the first j elements of B Global alignment score is bottom right cell

Global vs. Local alignment DorothyHodkin DorothyCrowfootHodkin DOROTHY HODGKIN Global alignment: DOROTHY--------HODGKIN DOROTHYCROWFOOTHODGKIN Local alignment:

Local Alignment Smith-Waterman Best score for aligning part of sequences Often beats global alignment score Global Alignment ATTGCAGTG-TCGAGCGTCAGGCT ATTGCGTCGATCGCAC-GCACGCT Local Alignment CATATTGCAGTGGTCCCGCGTCAGGCT TAAATTGCGT-GGTCGCACTGCACGCT

Global vs. Local alignment Alignment of two Genomic sequences >Human DNA CATGCGACTGACcgacgtcgatcgatacgactagctagcATCGATCATA >Mouse DNA CATGCGTCTGACgctttttgctagcgatatcggactATCGATATA

Global vs. Local alignment Alignment of two Genomic sequences Global Alignment Human:CATGCGACTGACcgacgtcgatcgatacgactagctagcATCGATCATA Mouse:CATGCGTCTGACgct---ttttgctagcgatatcggactATCGAT-ATA ****** ***** * *** * ****** *** Human:CATGCGACTGAC Mouse:CATGCGTCTGAC Human:ATCGATCATA Mouse:ATCGAT-ATA Local Alignment

Global vs. Local alignment Alignment of two Genomic DNA and mRNA >Human DNA CATGCGACTGACcgacgtcgatcgatacgactagctagcATCGATCATA >Human mRNA CATGCGACTGACATCGATCATA

Global vs. Local alignment Alignment of two Genomic DNA and mRNA Global Alignment DNA: CATGCGACTGACcgacgtcgatcgatacgactagctagcATCGATCATA mRNA:CATGCGACTGAC---------------------------ATCGATCATA ************ ********** DNA: CATGCGACTGAC mRNA:CATGCGACTGAC DNA: ATCGATCATA mRNA:ATCGATCATA Local Alignment