Presentation is loading. Please wait.

Presentation is loading. Please wait.

INTRODUCTION TO BIOINFORMATICS

Similar presentations


Presentation on theme: "INTRODUCTION TO BIOINFORMATICS"— Presentation transcript:

1 INTRODUCTION TO BIOINFORMATICS
David H. Ardell, Asst. Prof. Linnaeus Centre for Bioinformatics Biomedikum Centrum Uppsala Universitet

2 Lecture Outline: Intro. to alignments, theory and practice
Part I: Theory Definitions and kinds of alignments: evolutionary , Structure and functional Scoring matrices and gap penalties Intro. to dynamic programming (DP) DP for global pairwise alignment (Needleman-Wuncsh) and local pairwise alignment (Smith-Waterman) Heuristics for sequence-database alignment (BLAST) and for multiple alignment (progressive alignment, Clustal) Sequence profiles HMMs Part II: Practice Common mistakes, common tasks Software and formats Optimizing alignments Applications of profiles: sequence logos, PSI-BLAST Applications of HMMs: classifying with Pfam Problems: Aligning the homologs they found with PSI-BLAST Optimizing an alignment (by hand, with multiclustal) Codon alignments Editing alignments POA? Pfam/HMMer? Infernal/Rfam? Weblogo Common mistakes/assumptions Forcing Methionines to line up Forcing intron/exon boundaries to line up

3 We can’t tell insertions from deletions if we don’t know the ancestor
GCCACTTTCGCGATCA GCCACTTTCGCGATCA GCCACTTTCGCGATCG GCCACTTTCGCGATTA GCCACTTTCGTGATCG GCCACGTTCGTGATCG GACAGTTTCGCGATTA Deletion GCCTTCGCGATCG Insertion GGCAGTTTTGCGATGGTA GCCTTCGCGATCG GGCAGTTTCGCGATGGTT indels GGCAGTTTCGCGATGGTT GCCTTCGCGATCG GCC---TTCGCGAT--CG | | ||||||| GGCAGTCTCGCGATGGTT

4 An alignment is a hypothesis of commonality among amino acids in different proteins
An Evolutionary Alignment is a hypothesis about common ancestry of specific amino acid residues in a set of sequences. Residues lined up in a column are meant to be homologous. Also called a “sequence alignment.” A Structural Alignment is a hypothesis about common structure or fold of specific amino acid residues. Residues lined up in a column are have analogous structure. A Functional Alignment is a hypothesis about common function of specific amino acid residues in a set of sequences. Residues lined up in a column have analogous function.

5 Structural Alignment Protein structures Superimposed by
Distance-minimization Establish a structural alignment

6 Two examples of functional alignments: translation start-sites and codon alignments:

7 Two examples of functional alignments: translation start-sites and codon alignments:

8 Another example of a functional alignment: intron-exon boundaries

9 Evolutionary alignment algorithms weigh substitutions against indels trying to maximize a score
Matches/Mismatches are scored with amino acid score matrices like we learned about yesterday. Indels are scored with so-called gap-penalties. For pairwise sequence alignments, efficient algorithms are guaranteed to give optimal answers, weighing match scores against gap-penalties, in reasonable time. These rely on dynamic programming. For multiple alignments and for database searching, the algorithms that guarantee optimal answers are too slow, and so heuristics (“tricks”) are used that are not guaranteed optimal.

10 Dynamic Programming To demonstrate the two main dynamic programming algorithms we will talk about the alignment of two sequences PAWHEAE AND HEAGAWGHEE. Dynamic programming is recursive which means that to solve alignments of sequences you break them up into parts and align the parts. For these examples we will use linear gap penalties where the penalty of an indel is proportional to its size. This is the simplest assumption.

11 Score matrix for the example: Blossum 50
Durbin et al. 1998

12 A match score table indexed by the two sequences.
Durbin et al. 1998

13 Dynamic Programming: Needleman-Wunsch Optimal Global Pairwise Alignment
P

14 Dynamic Programming: Needleman-Wunsch Optimal Global Pairwise Alignment (gap pen. = –8)
-8 P

15 Dynamic Programming: Needleman-Wunsch Optimal Global Pairwise Alignment (gap pen. = –8)
-8 P

16 Dynamic Programming: Needleman-Wunsch Optimal Global Pairwise Alignment (gap pen. (d) = –8)
-8 P Fi-1,j-1 Fi-1,j -s(Ai,Bj) -d Fi,j-1 Fi,j -d

17 Dynamic Programming: Needleman-Wunsch Optimal Global Pairwise Alignment (gap pen. (d) = –8)
-8 P Fi-1,j-1 Fi-1,j -s(Ai,Bj) -d Fi,j-1 Fi,j -d

18 Dynamic Programming: Needleman-Wunsch Optimal Global Pairwise Alignment (gap pen. (d) = –8)
-8 P -2 Fi-1,j-1 Fi-1,j -s(Ai,Bj) -d Fi,j-1 Fi,j -d

19 Dynamic Programming: Needleman-Wunsch Optimal Global Pairwise Alignment (gap pen. (d) = –8)
-8 P -2 Fi-1,j-1 Fi-1,j -s(Ai,Bj) -d Fi,j-1 Fi,j -d

20 Dynamic Programming: Needleman-Wunsch Optimal Global Pairwise Alignment (gap pen. (d) = –8)
-8 -16 -24 -32 -40 -48 -56 -64 -72 -80 P -2 Fi-1,j-1 Fi-1,j -s(Ai,Bj) -d Fi,j-1 Fi,j -d

21 Dynamic Programming: Needleman-Wunsch Optimal Global Pairwise Alignment (gap pen. (d) = –8)
-8 -16 -24 -32 -40 -48 -56 -64 -72 -80 P -2 Fi-1,j-1 Fi-1,j -s(Ai,Bj) -d Fi,j-1 Fi,j -d

22 Dynamic Programming: Needleman-Wunsch Optimal Global Pairwise Alignment (gap pen. (d) = –8)
-8 -16 -24 -32 -40 -48 -56 -64 -72 -80 P -2 -10 Fi-1,j-1 Fi-1,j -s(Ai,Bj) -d Fi,j-1 Fi,j -d

23 Dynamic Programming: Needleman-Wunsch Optimal Global Pairwise Alignment (gap pen. (d) = –8)
-8 -16 -24 -32 -40 -48 -56 -64 -72 -80 P -2 -10 Fi-1,j-1 Fi-1,j -s(Ai,Bj) -d Fi,j-1 Fi,j -d

24 Dynamic Programming: Needleman-Wunsch Optimal Global Pairwise Alignment (gap pen. (d) = –8)
-8 -16 -24 -32 -40 -48 -56 -64 -72 -80 P -2 -9 -10 Fi-1,j-1 Fi-1,j -s(Ai,Bj) -d Fi,j-1 Fi,j -d

25 Dynamic Programming: Needleman-Wunsch Optimal Global Pairwise Alignment (gap pen. (d) = –8)
-8 -16 -24 -32 -40 -48 -56 -64 -72 -80 P -2 -9 -10

26 Dynamic Programming: Needleman-Wunsch Optimal Global Pairwise Alignment (gap pen. (d) = –8)
-8 -16 -24 -32 -40 -48 -56 -64 -72 -80 P -2 -9 -10 -3

27 Dynamic Programming: Needleman-Wunsch Optimal Global Pairwise Alignment (gap pen. (d) = –8)
-8 -16 -24 -32 -40 -48 -56 -64 -72 -80 P -2 -9 -10 -3

28 Dynamic Programming: Needleman-Wunsch Optimal Global Pairwise Alignment (gap pen. (d) = –8)

29 Needleman-Wunsch is for aligning entire sequences (globally)

30 Smith-Waterman is a variant that gives you the highest scoring local alignment (subsegment)

31 Smith-Waterman uses the exact same principle except the minimum score in any cell is zero

32 DNA Local Alignment Example (match = 1, gap = –3, mismatch = –5)

33 DNA Local Alignment Example (is wrong
DNA Local Alignment Example (is wrong!) (match = 1, gap = –3, mismatch = –5)

34 Querying GenBank is like doing a local alignment (with repeats) against one very long sequence…
Your query Would be way too slow….. Why?

35 BLAST and FASTA: Widely used heuristic (not guaranteed optimal) Database Query Algorithms

36 BLAST and FASTA: Widely used heuristic (not guaranteed optimal) Database Query Algorithms

37 Multiple alignment is also too expensive to do with dynamic programming.

38 So we rely on progressive multiple alignment methods (CLUSTAL) also not guaranteed optimal

39 Q: Getting back to structural or functional alignments, what can you do with them?
A: You can make consensus sequences… A T C G

40 But better than consensus sequences, why throw out all the minority states? Use a “Profile” instead.

41 Keep all the information in a “profile
Keep all the information in a “profile.” EX: Sequence logos are like consensus sequences but show more of the profile.

42

43 Sequence logos

44 Profiles applied in BLAST: PSI-BLAST
For more sensitive searching of distance protein homologs, NCBI has PSI-BLAST. BLAST matches are aggregated into alignments and then a profile. The profile is then run on the database instead of a single sequence. New matches are added to the profile and the process continues until no more matches are found.

45 Profiles applied in Clustal
You don’t need to realign everything when you want to add sequences to an existing alignment! Run clustal in “profile mode.” Put in your alignment and your unaligned sequences separately, and clustalw will add them. The progressive algorithm in Clustal is based on profile-sequence alignment.


Download ppt "INTRODUCTION TO BIOINFORMATICS"

Similar presentations


Ads by Google