Presentation is loading. Please wait.

Presentation is loading. Please wait.

Doug Raiford Lesson 5.  Dynamic programming methods  Needleman-Wunsch (global alignment)  Smith-Waterman (local alignment)  BLAST Fixed: best Linear:

Similar presentations


Presentation on theme: "Doug Raiford Lesson 5.  Dynamic programming methods  Needleman-Wunsch (global alignment)  Smith-Waterman (local alignment)  BLAST Fixed: best Linear:"— Presentation transcript:

1 Doug Raiford Lesson 5

2  Dynamic programming methods  Needleman-Wunsch (global alignment)  Smith-Waterman (local alignment)  BLAST Fixed: best Linear: next best Polynomial (n 2 ): not bad Exponential (3 n ): very bad

3  BLAST fast (linear)  But not as sensitive Speed Sensitivity

4  Similarity matrix  Especially with amino acids  Some amino acids have similar chemical characteristics  Similarity to all 8,000 3- mers calculated  Usually ~50 are above a threshold  All of these ~50 are considered hits when searching Matrices PAM (Point Accepted Mutation) Built from observed substitution rates in closely related proteins BLOSOM (BLOck SUbstitution Matrix) Built from observed substitution rates in evolutionarily divergent proteins

5  PSI-BLAST (Position Specific Iterative)  Align using default similarity matrix  At each query location build a Position Specific Scoring Matrix (PSSM) based upon observed search and alignment results  Repeat with new matrix until results no longer change Build sensitivity by specifying allowed similarity at each position Slower, but still faster than local alignment PSI-BLAST

6  Central to bioinformatics  Need for  Phylogeny  Protein function  Protein structure ▪ Structure  function  Drug discovery

7  Some parts of proteins are very important to maintain function  Must be similar from species to species  Can we spot these regions through alignment? atgccgca-actgccgcaggagatcaggactttcatgaatatcatcatgcgtggga-ttcag acctcgatacgtgccgcaggagatcaggactttcacct--tggatcatgcgaccgtacctac

8  Often conserved regions are near active sights  Ligand binding sights (docking)  Protein-to-protein interface  Important regions for tertiary structure Ligand: small molecule, target of protein, e.g. O 2 is the ligand for hemoglobin Substrate: a molecule upon which an enzyme acts Ligand: small molecule, target of protein, e.g. O 2 is the ligand for hemoglobin Substrate: a molecule upon which an enzyme acts

9  What if we look at more proteins  Increase our confidence?  But how to go about performing multiple sequence alignment? atgccgca-actgccgcaggagatcaggactttcatgaatatcatcatgcgtggga-ttcag acctccatacgtgccccaggagatctggactttcacc---tggatcatgcgaccgtacctac t-atgg-t-cgtgccgcaggagatcaggactttca-gt--g-aatcatctgg-cgc--c-aa t--tcgt-ac-tgccccaggagatctggactttcaaa---ca-atcatgcgcc-g-tc-tat aattccgtacgtgccgcaggagatcaggactttcag-t--a-tatcatctgtc-ggc--tag

10  Hyper-dimensional dynamic programming  Becomes exponential with respect to number of sequences  O(n L ) with L = number of sequences

11  Determine all pair-wise distances  Fast: number of l-mer matches  Slower: full global alignments  Start with closest pair and aligns  Then aligns the next closest to those two  And so on.. ClustalW: cluster-alignment

12  Profile: matrix of real values, representing the probability of amino acids at each position in a corresponding multiple sequence alignment  A modification of the Smith/Waterman algorithm  Degree to which an aa is preferred is the degree of match between the profile and the sequence Consensus1 M.ERS.HLPEG.PFAAALSGARFAAQSSGN.ASVL..DWNVLP.E 38 | : : : || : ::::: : |: | ::|: : | : OPSD_XENLA 1 MNG.GTE..EGPN.NFYVP.PMS...SN.NKTGVVRSP.P..PFD 33 Consensus1 M.ERS.HLPEG.PFAAALSGARFAAQSSGN.ASVL..DWNVLP.E 38 | : : : || : ::::: : |: | ::|: : | : OPSD_XENLA 1 MNG.GTE..EGPN.NFYVP.PMS...SN.NKTGVVRSP.P..PFD 33

13  Mistakes early in a progressive approach propagated throughout process  Once aligned not revisited  Iterative methods devised to revisit  Newest version of ClustalW (version 2) includes iteration Other MSA apps T-Coffee PSalign DIALIGN MUSCLE Other MSA apps T-Coffee PSalign DIALIGN MUSCLE

14  Height of letter represents how prevalent that letter is at that position

15

16 Database Searches16  Scores are affected by sequence lengths  If want scores that can be compared across different query lengths need to normalize  Term “bit” comes from fact that probabilities are stored as log 2 values (binary, bit)  Done so can add across length of sequence instead of multiply


Download ppt "Doug Raiford Lesson 5.  Dynamic programming methods  Needleman-Wunsch (global alignment)  Smith-Waterman (local alignment)  BLAST Fixed: best Linear:"

Similar presentations


Ads by Google