INTRODUCTION TO BIOINFORMATICS

Slides:



Advertisements
Similar presentations
1 Introduction to Sequence Analysis Utah State University – Spring 2012 STAT 5570: Statistical Bioinformatics Notes 6.1.
Advertisements

Alignment methods Introduction to global and local sequence alignment methods Global : Needleman-Wunch Local : Smith-Waterman Database Search BLAST FASTA.
Bioinformatics Tutorial I BLAST and Sequence Alignment.
Sequence Alignment.
Definitions Optimal alignment - one that exhibits the most correspondences. It is the alignment with the highest score. May or may not be biologically.
C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E Alignments 1 Sequence Analysis.
Introduction to Bioinformatics Burkhard Morgenstern Institute of Microbiology and Genetics Department of Bioinformatics Goldschmidtstr. 1 Göttingen, March.
Sequence Analysis Tools
Multiple sequence alignments and motif discovery Tutorial 5.
Sequence Alignment III CIS 667 February 10, 2004.
Bioinformatics Unit 1: Data Bases and Alignments Lecture 3: “Homology” Searches and Sequence Alignments (cont.) The Mechanics of Alignments.
Bioinformatics Workshop, Fall 2003 Algorithms in Bioinformatics Lawrence D’Antonio Ramapo College of New Jersey.
Dynamic Programming. Pairwise Alignment Needleman - Wunsch Global Alignment Smith - Waterman Local Alignment.
Incorporating Bioinformatics in an Algorithms Course Lawrence D’Antonio Ramapo College of New Jersey.
Introduction to Bioinformatics From Pairwise to Multiple Alignment.
Alignment methods II April 24, 2007 Learning objectives- 1) Understand how Global alignment program works using the longest common subsequence method.
Sequence comparison: Local alignment
Chapter 5 Multiple Sequence Alignment.
Multiple Sequence Alignment CSC391/691 Bioinformatics Spring 2004 Fetrow/Burg/Miller (Slides by J. Burg)
Sequence Alignment and Phylogenetic Prediction using Map Reduce Programming Model in Hadoop DFS Presented by C. Geetha Jini (07MW03) D. Komagal Meenakshi.
Multiple sequence alignment
Bioiformatics I Fall Dynamic programming algorithm: pairwise comparisons.
Pair-wise Sequence Alignment What happened to the sequences of similar genes? random mutation deletion, insertion Seq. 1: 515 EVIRMQDNNPFSFQSDVYSYG EVI.
Xuhua Xia Sequence Alignment Xuhua Xia
Pairwise alignments Introduction Introduction Why do alignments? Why do alignments? Definitions Definitions Scoring alignments Scoring alignments Alignment.
Pairwise & Multiple sequence alignments
An Introduction to Bioinformatics
Protein Sequence Alignment and Database Searching.
Content of the previous class Introduction The evolutionary basis of sequence alignment The Modular Nature of proteins.
Pairwise Sequence Alignment. The most important class of bioinformatics tools – pairwise alignment of DNA and protein seqs. alignment 1alignment 2 Seq.
Pairwise Sequence Alignment BMI/CS 776 Mark Craven January 2002.
Multiple sequence alignments Introduction to Bioinformatics Jacques van Helden Aix-Marseille Université (AMU), France Lab.
Bioinformatics Multiple Alignment. Overview Introduction Multiple Alignments Global multiple alignment –Introduction –Scoring –Algorithms.
Function preserves sequences Christophe Roos - MediCel ltd Similarity is a tool in understanding the information in a sequence.
Multiple Sequence Alignment. How to score a MSA? Very commonly: Sum of Pairs = SP Compute the pairwise score of all pairs of sequences and sum them. Gap.
Sequence Alignment Xuhua Xia
HMMs for alignments & Sequence pattern discovery I519 Introduction to Bioinformatics.
BLAST: Basic Local Alignment Search Tool Altschul et al. J. Mol Bio CS 466 Saurabh Sinha.
COT 6930 HPC and Bioinformatics Multiple Sequence Alignment Xingquan Zhu Dept. of Computer Science and Engineering.
Applied Bioinformatics Week 3. Theory I Similarity Dot plot.
Biocomputation: Comparative Genomics Tanya Talkar Lolly Kruse Colleen O’Rourke.
Techniques for Protein Sequence Alignment and Database Searching (part2) G P S Raghava Scientist & Head Bioinformatics Centre, Institute of Microbial Technology,
Pairwise Sequence Alignment Part 2. Outline Summary Local and Global alignments FASTA and BLAST algorithms Evaluating significance of alignments Alignment.
Heuristic Methods for Sequence Database Searching BMI/CS 576 Colin Dewey Fall 2015.
Pairwise sequence alignment Lecture 02. Overview  Sequence comparison lies at the heart of bioinformatics analysis.  It is the first step towards structural.
Sequence Alignment.
Burkhard Morgenstern Institut für Mikrobiologie und Genetik Molekulare Evolution und Rekonstruktion von phylogenetischen Bäumen WS 2006/2007.
Heuristic Methods for Sequence Database Searching BMI/CS 576 Colin Dewey Fall 2010.
V diagonal lines give equivalent residues ILS TRIVHVNSILPSTN V I L S T R I V I L P E F S T Sequence A Sequence B Dot Plots, Path Matrices, Score Matrices.
V diagonal lines give equivalent residues ILS TRIVHVNSILPSTN V I L S T R I V I L P E F S T Sequence A Sequence B Dot Plots, Path Matrices, Score Matrices.
More on HMMs and Multiple Sequence Alignment BMI/CS 776 Mark Craven March 2002.
Techniques for Protein Sequence Alignment and Database Searching G P S Raghava Scientist & Head Bioinformatics Centre, Institute of Microbial Technology,
4.2 - Algorithms Sébastien Lemieux Elitra Canada Ltd.
Database Scanning/Searching FASTA/BLAST/PSIBLAST G P S Raghava.
Sequence similarity, BLAST alignments & multiple sequence alignments
The ideal approach is simultaneous alignment and tree estimation.
Welcome to Introduction to Bioinformatics
Sequence comparison: Local alignment
Biology 162 Computational Genetics Todd Vision Fall Aug 2004
Sequence comparison: Significance of similarity scores
Using Dynamic Programming To Align Sequences
Sequence Based Analysis Tutorial
Pairwise sequence Alignment.
Intro to Alignment Algorithms: Global and Local
Sequence Based Analysis Tutorial
Pairwise Sequence Alignment
BCB 444/544 Lecture 7 #7_Sept5 Global vs Local Alignment
Sequence Alignment Algorithms Morten Nielsen BioSys, DTU
Sequence alignment BI420 – Introduction to Bioinformatics
Basic Local Alignment Search Tool (BLAST)
Presentation transcript:

INTRODUCTION TO BIOINFORMATICS David H. Ardell, Asst. Prof. Linnaeus Centre for Bioinformatics Biomedikum Centrum Uppsala Universitet

Lecture Outline: Intro. to alignments, theory and practice Part I: Theory Definitions and kinds of alignments: evolutionary , Structure and functional Scoring matrices and gap penalties Intro. to dynamic programming (DP) DP for global pairwise alignment (Needleman-Wuncsh) and local pairwise alignment (Smith-Waterman) Heuristics for sequence-database alignment (BLAST) and for multiple alignment (progressive alignment, Clustal) Sequence profiles HMMs Part II: Practice Common mistakes, common tasks Software and formats Optimizing alignments Applications of profiles: sequence logos, PSI-BLAST Applications of HMMs: classifying with Pfam Problems: Aligning the homologs they found with PSI-BLAST Optimizing an alignment (by hand, with multiclustal) Codon alignments Editing alignments POA? Pfam/HMMer? Infernal/Rfam? Weblogo Common mistakes/assumptions Forcing Methionines to line up Forcing intron/exon boundaries to line up

We can’t tell insertions from deletions if we don’t know the ancestor GCCACTTTCGCGATCA GCCACTTTCGCGATCA GCCACTTTCGCGATCG GCCACTTTCGCGATTA GCCACTTTCGTGATCG GCCACGTTCGTGATCG GACAGTTTCGCGATTA Deletion GCCTTCGCGATCG Insertion GGCAGTTTTGCGATGGTA GCCTTCGCGATCG GGCAGTTTCGCGATGGTT indels GGCAGTTTCGCGATGGTT GCCTTCGCGATCG GCC---TTCGCGAT--CG | | ||||||| GGCAGTCTCGCGATGGTT

An alignment is a hypothesis of commonality among amino acids in different proteins An Evolutionary Alignment is a hypothesis about common ancestry of specific amino acid residues in a set of sequences. Residues lined up in a column are meant to be homologous. Also called a “sequence alignment.” A Structural Alignment is a hypothesis about common structure or fold of specific amino acid residues. Residues lined up in a column are have analogous structure. A Functional Alignment is a hypothesis about common function of specific amino acid residues in a set of sequences. Residues lined up in a column have analogous function.

Structural Alignment Protein structures Superimposed by Distance-minimization Establish a structural alignment

Two examples of functional alignments: translation start-sites and codon alignments:

Two examples of functional alignments: translation start-sites and codon alignments:

Another example of a functional alignment: intron-exon boundaries

Evolutionary alignment algorithms weigh substitutions against indels trying to maximize a score Matches/Mismatches are scored with amino acid score matrices like we learned about yesterday. Indels are scored with so-called gap-penalties. For pairwise sequence alignments, efficient algorithms are guaranteed to give optimal answers, weighing match scores against gap-penalties, in reasonable time. These rely on dynamic programming. For multiple alignments and for database searching, the algorithms that guarantee optimal answers are too slow, and so heuristics (“tricks”) are used that are not guaranteed optimal.

Dynamic Programming To demonstrate the two main dynamic programming algorithms we will talk about the alignment of two sequences PAWHEAE AND HEAGAWGHEE. Dynamic programming is recursive which means that to solve alignments of sequences you break them up into parts and align the parts. For these examples we will use linear gap penalties where the penalty of an indel is proportional to its size. This is the simplest assumption.

Score matrix for the example: Blossum 50 Durbin et al. 1998

A match score table indexed by the two sequences. Durbin et al. 1998

Dynamic Programming: Needleman-Wunsch Optimal Global Pairwise Alignment P

Dynamic Programming: Needleman-Wunsch Optimal Global Pairwise Alignment (gap pen. = –8) -8 P

Dynamic Programming: Needleman-Wunsch Optimal Global Pairwise Alignment (gap pen. = –8) -8 P

Dynamic Programming: Needleman-Wunsch Optimal Global Pairwise Alignment (gap pen. (d) = –8) -8 P Fi-1,j-1 Fi-1,j -s(Ai,Bj) -d Fi,j-1 Fi,j -d

Dynamic Programming: Needleman-Wunsch Optimal Global Pairwise Alignment (gap pen. (d) = –8) -8 P Fi-1,j-1 Fi-1,j -s(Ai,Bj) -d Fi,j-1 Fi,j -d

Dynamic Programming: Needleman-Wunsch Optimal Global Pairwise Alignment (gap pen. (d) = –8) -8 P -2 Fi-1,j-1 Fi-1,j -s(Ai,Bj) -d Fi,j-1 Fi,j -d

Dynamic Programming: Needleman-Wunsch Optimal Global Pairwise Alignment (gap pen. (d) = –8) -8 P -2 Fi-1,j-1 Fi-1,j -s(Ai,Bj) -d Fi,j-1 Fi,j -d

Dynamic Programming: Needleman-Wunsch Optimal Global Pairwise Alignment (gap pen. (d) = –8) -8 -16 -24 -32 -40 -48 -56 -64 -72 -80 P -2 Fi-1,j-1 Fi-1,j -s(Ai,Bj) -d Fi,j-1 Fi,j -d

Dynamic Programming: Needleman-Wunsch Optimal Global Pairwise Alignment (gap pen. (d) = –8) -8 -16 -24 -32 -40 -48 -56 -64 -72 -80 P -2 Fi-1,j-1 Fi-1,j -s(Ai,Bj) -d Fi,j-1 Fi,j -d

Dynamic Programming: Needleman-Wunsch Optimal Global Pairwise Alignment (gap pen. (d) = –8) -8 -16 -24 -32 -40 -48 -56 -64 -72 -80 P -2 -10 Fi-1,j-1 Fi-1,j -s(Ai,Bj) -d Fi,j-1 Fi,j -d

Dynamic Programming: Needleman-Wunsch Optimal Global Pairwise Alignment (gap pen. (d) = –8) -8 -16 -24 -32 -40 -48 -56 -64 -72 -80 P -2 -10 Fi-1,j-1 Fi-1,j -s(Ai,Bj) -d Fi,j-1 Fi,j -d

Dynamic Programming: Needleman-Wunsch Optimal Global Pairwise Alignment (gap pen. (d) = –8) -8 -16 -24 -32 -40 -48 -56 -64 -72 -80 P -2 -9 -10 Fi-1,j-1 Fi-1,j -s(Ai,Bj) -d Fi,j-1 Fi,j -d

Dynamic Programming: Needleman-Wunsch Optimal Global Pairwise Alignment (gap pen. (d) = –8) -8 -16 -24 -32 -40 -48 -56 -64 -72 -80 P -2 -9 -10

Dynamic Programming: Needleman-Wunsch Optimal Global Pairwise Alignment (gap pen. (d) = –8) -8 -16 -24 -32 -40 -48 -56 -64 -72 -80 P -2 -9 -10 -3

Dynamic Programming: Needleman-Wunsch Optimal Global Pairwise Alignment (gap pen. (d) = –8) -8 -16 -24 -32 -40 -48 -56 -64 -72 -80 P -2 -9 -10 -3

Dynamic Programming: Needleman-Wunsch Optimal Global Pairwise Alignment (gap pen. (d) = –8)

Needleman-Wunsch is for aligning entire sequences (globally)

Smith-Waterman is a variant that gives you the highest scoring local alignment (subsegment)

Smith-Waterman uses the exact same principle except the minimum score in any cell is zero

DNA Local Alignment Example (match = 1, gap = –3, mismatch = –5)

DNA Local Alignment Example (is wrong DNA Local Alignment Example (is wrong!) (match = 1, gap = –3, mismatch = –5)

Querying GenBank is like doing a local alignment (with repeats) against one very long sequence… Your query Would be way too slow….. Why?

BLAST and FASTA: Widely used heuristic (not guaranteed optimal) Database Query Algorithms

BLAST and FASTA: Widely used heuristic (not guaranteed optimal) Database Query Algorithms

Multiple alignment is also too expensive to do with dynamic programming.

So we rely on progressive multiple alignment methods (CLUSTAL) also not guaranteed optimal

Q: Getting back to structural or functional alignments, what can you do with them? A: You can make consensus sequences… A T C G

But better than consensus sequences, why throw out all the minority states? Use a “Profile” instead.

Keep all the information in a “profile Keep all the information in a “profile.” EX: Sequence logos are like consensus sequences but show more of the profile.

Sequence logos

Profiles applied in BLAST: PSI-BLAST For more sensitive searching of distance protein homologs, NCBI has PSI-BLAST. BLAST matches are aggregated into alignments and then a profile. The profile is then run on the database instead of a single sequence. New matches are added to the profile and the process continues until no more matches are found.

Profiles applied in Clustal You don’t need to realign everything when you want to add sequences to an existing alignment! Run clustal in “profile mode.” Put in your alignment and your unaligned sequences separately, and clustalw will add them. The progressive algorithm in Clustal is based on profile-sequence alignment.