CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU Sequence motifs, information content, logos, and HMM’s Morten Nielsen, CBS,

Slides:



Advertisements
Similar presentations
Sequence motifs, information content, logos, and HMM’s
Advertisements

CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU Sequence information, logos and Hidden Markov Models Morten Nielsen, CBS, BioCentrum,
CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU T cell Epitope predictions using bioinformatics (Neural Networks and hidden.
Ulf Schmitz, Statistical methods for aiding alignment1 Bioinformatics Statistical methods for pattern searching Ulf Schmitz
Gapped Blast and PSI BLAST Basic Local Alignment Search Tool ~Sean Boyle Basic Local Alignment Search Tool ~Sean Boyle.
Ab initio gene prediction Genome 559, Winter 2011.
Hidden Markov Models What are the good for? Morten Nielsen CBS.
Profile Hidden Markov Models Bioinformatics Fall-2004 Dr Webb Miller and Dr Claude Depamphilis Dhiraj Joshi Department of Computer Science and Engineering.
1 Profile Hidden Markov Models For Protein Structure Prediction Colin Cherry
Profiles for Sequences
Bioinformatics Finding signals and motifs in DNA and proteins Expectation Maximization Algorithm MEME The Gibbs sampler Lecture 10.
SNU BioIntelligence Lab. ( 1 Ch 5. Profile HMMs for sequence families Biological sequence analysis: Probabilistic models of proteins.
Sequence alignment & Substitution matrices By Thomas Nordahl & Morten Nielsen.
Protein Fold recognition Morten Nielsen, CBS, BioCentrum, DTU.
Biological sequence analysis and information processing by artificial neural networks Morten Nielsen CBS.
Heuristic alignment algorithms and cost matrices
Profile-profile alignment using hidden Markov models Wing Wong.
Sequence motifs, information content, logos, and HMM’s Morten Nielsen, CBS, BioCentrum, DTU.
Protein Fold recognition Morten Nielsen, Thomas Nordahl CBS, BioCentrum, DTU.
Position-Specific Substitution Matrices. PSSM A regular substitution matrix uses the same scores for any given pair of amino acids regardless of where.
Lecture 9 Hidden Markov Models BioE 480 Sept 21, 2004.
Protein Fold recognition
Introduction to bioinformatics
CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU Sequence motifs, information content, logos, and HMM’s Morten Nielsen, CBS,
Similar Sequence Similar Function Charles Yan Spring 2006.
Sequence motifs, information content, logos, and HMM’s Morten Nielsen, CBS, BioSys, DTU.
Computational Biology, Part 2 Representing and Finding Sequence Features using Consensus Sequences Robert F. Murphy Copyright  All rights reserved.
1-month Practical Course Genome Analysis Lecture 3: Residue exchange matrices Centre for Integrative Bioinformatics VU (IBIVU) Vrije Universiteit Amsterdam.
Sequence motifs, information content, logos, and HMM’s Morten Nielsen, CBS, BioCentrum, DTU.
Alignment IV BLOSUM Matrices. 2 BLOSUM matrices Blocks Substitution Matrix. Scores for each position are obtained frequencies of substitutions in blocks.
CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU Protein Fold recognition Morten Nielsen, CBS, BioCentrum, DTU.
Blast heuristics Morten Nielsen Department of Systems Biology, DTU.
Sequence comparison: Score matrices Genome 559: Introduction to Statistical and Computational Genomics Prof. James H. Thomas
Information theoretic interpretation of PAM matrices Sorin Istrail and Derek Aguiar.
Alignment Statistics and Substitution Matrices BMI/CS 576 Colin Dewey Fall 2010.
What is bioinformatics?. What are bioinformaticians up to, actually? Manage molecular biological data –Store in databases, organise, formalise, describe...
CSCE555 Bioinformatics Lecture 6 Hidden Markov Models Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu Course page:
Protein Sequence Alignment and Database Searching.
Hidden Markov Models for Sequence Analysis 4
Scoring Matrices Scoring matrices, PSSMs, and HMMs BIO520 BioinformaticsJim Lund Reading: Ch 6.1.
Sequence analysis: Macromolecular motif recognition Sylvia Nagl.
Sequence Alignment Goal: line up two or more sequences An alignment of two amino acid sequences: …. Seq1: HKIYHLQSKVPTFVRMLAPEGALNIHEKAWNAYPYCRTVITN-EYMKEDFLIKIETWHKP.
Motif finding with Gibbs sampling CS 466 Saurabh Sinha.
Sequence encoding, Cross Validation Morten Nielsen BioSys, DTU
Comp. Genomics Recitation 3 The statistics of database searching.
Construction of Substitution Matrices
Sequence Alignment Csc 487/687 Computing for bioinformatics.
HMMs for alignments & Sequence pattern discovery I519 Introduction to Bioinformatics.
What is a Project Purpose –Use a method introduced in the course to describe some biological problem How –Construct a data set describing the problem –Define.
The Blosum scoring matrices Morten Nielsen BioSys, DTU.
Bioinformatics Ayesha M. Khan 9 th April, What’s in a secondary database?  It should be noted that within multiple alignments can be found conserved.
PROTEIN PATTERN DATABASES. PROTEIN SEQUENCES SUPERFAMILY FAMILY DOMAIN MOTIF SITE RESIDUE.
Dealing with Sequence redundancy Morten Nielsen Department of Systems Biology, DTU.
Sequence Based Analysis Tutorial March 26, 2004 NIH Proteomics Workshop Lai-Su L. Yeh, Ph.D. Protein Science Team Lead Protein Information Resource at.
Sequence Alignment.
Construction of Substitution matrices
Blosum matrices What are they? Morten Nielsen BioSys, DTU
Step 3: Tools Database Searching
The statistics of pairwise alignment BMI/CS 576 Colin Dewey Fall 2015.
Hidden Markov model BioE 480 Sept 16, In general, we have Bayes theorem: P(X|Y) = P(Y|X)P(X)/P(Y) Event X: the die is loaded, Event Y: 3 sixes.
Psi-Blast Morten Nielsen, Department of systems biology, DTU.
Computational Biology, Part 3 Representing and Finding Sequence Features using Frequency Matrices Robert F. Murphy Copyright  All rights reserved.
V diagonal lines give equivalent residues ILS TRIVHVNSILPSTN V I L S T R I V I L P E F S T Sequence A Sequence B Dot Plots, Path Matrices, Score Matrices.
V diagonal lines give equivalent residues ILS TRIVHVNSILPSTN V I L S T R I V I L P E F S T Sequence A Sequence B Dot Plots, Path Matrices, Score Matrices.
Prediction of T cell epitopes using artificial neural networks Morten Nielsen, CBS, BioCentrum, DTU.
Blast heuristics, Psi-Blast, and Sequence profiles Morten Nielsen Department of systems biology, DTU.
Pairwise Sequence Alignment and Database Searching
Free for Academic Use. Jianlin Cheng.
Sequence motifs, information content, logos, and HMM’s
Alignment IV BLOSUM Matrices
Presentation transcript:

CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU Sequence motifs, information content, logos, and HMM’s Morten Nielsen, CBS, BioCentrum, DTU

CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU Outline Multiple alignment and sequence motifs Weight matrix construction and consensus sequence –Sequence weighting –Low (pseudo) counts Information content –Sequence logos –Mutual information Example from the real world HMM’s and profile HMM’s –TMHMM (trans-membrane protein) –Gene finding Links to HMM packages

CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU Multiple alignment and sequence motifs Core Consensus sequence Weight matrices Problems –Sequence weights –Low counts MLEFVVEADLPGIKA MLEFVVEFALPGIKA MLEFVVEFDLPGIAA YLQDSDPDSFQD GSDTITLPCRMKQFINMWQE RNQEERLLADLMQNYDPNLR YDPNLRPAERDSDVVNVSLK NVSLKLTLTNLISLNEREEA EREEALTTNVWIEMQWCDYR WCDYRLRWDPRDYEGLWVLR--- --LWVLRVPSTMVWRPDIVLEN IVLENNVDGVFEVALYCNVL YCNVLVSPDGCIYWLPPAIF PPAIFRSACSISVTYFPFDW---- ********* FVVEFDLPG Consensus

CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU Sequences weighting 1 - Clustering (slow, but accurate) MLEFVVEADLPGIKA MLEFVVEFALPGIKA MLEFVVEFDLPGIAA YLQDSDPDSFQD GSDTITLPCRMKQFINMWQE RNQEERLLADLMQNYDPNLR YDPNLRPAERDSDVVNVSLK NVSLKLTLTNLISLNEREEA EREEALTTNVWIEMQWCDYR WCDYRLRWDPRDYEGLWVLR--- --LWVLRVPSTMVWRPDIVLEN IVLENNVDGVFEVALYCNVL YCNVLVSPDGCIYWLPPAIF PPAIFRSACSISVTYFPFDW---- ********* } Homologous sequences Weight = 1/n (1/3) Consensus sequence YRQELDPLV Previous FVVEFDLPG

CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU Sequences weighting 2 - Henikoff & Henikoff (fast) w FVVEADLPG 0.37 FVVEFALPG 0.43 FVVEFDLPG 0.32 YLQDSDPDS 0.59 MKQFINMWQ 0.90 LMQNYDPNL 0.68 PAERDSDVV 0.75 LKLTLTNLI 0.85 VWIEMQWCD 0.84 YRLRWDPRD 0.51 WRPDIVLEN 0.71 VLENNVDGV 0.59 YCNVLVSPD 0.71 FRSACSISV 0.75 w aa ’ = 1/rs r: Number of different aa in a column s: Number occurrences Normalize  w aa = 1 for each column Sequence weight is sum of w aa in sequence F: r=7 (FYMLPVW), s=4 w’=1/28, w = Y: s=3, w`=1/21, w = M,P,W: s=1, w’=1/7, w = L,V: s=2, w’=1/14, w = 0.109

CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU Low count correction MLEFVVEADLPGIKA MLEFVVEFALPGIKA MLEFVVEFDLPGIAA YLQDSDPDSFQD GSDTITLPCRMKQFINMWQE RNQEERLLADLMQNYDPNLR YDPNLRPAERDSDVVNVSLK NVSLKLTLTNLISLNEREEA--- --EREEALTTNVWIEMQWCDYR WCDYRLRWDPRDYEGLWVLR--- LWVLRVPSTMVWRPDIVLEN IVLENNVDGVFEVALYCNVL YCNVLVSPDGCIYWLPPAIF PPAIFRSACSISVTYFPFDW---- ********* Limited number of data Poor sampling of sequence space I is not found at position P1. Does this mean that I can never be found at P1? No! Use Blosum matrix to estimate pseudo frequency of I P1

CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU Low count correction using Blosum matrices # I L V L V Blosum62 substitution frequencies Every time for instance L/V is observed, I is also likely to occur Estimate low (pseudo) count correction using this approach As more data are included the pseudo count correction becomes less important

CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU Information content Information and entropy –Conserved amino acid regions contain high degree of information (high order == low entropy) –Variable amino acid regions contain low degree of information (low order == high entropy) Shannon information D = log 2 (N) +  p i log 2 p i (for proteins N=20, DNA N=4) Conserved residue p A =1, p i<>A =0, D = log 2 (N) ( = 4.3 for proteins) Variable region p A =0.05, p C =0.05,.., D = 0

CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU Sequence logo Height of a column equal to D Relative height of a letter is p A Highly useful tool to visualize sequence motifs High information positions MHC class I

CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU More on logos Information content D =  p i log 2 (p i /q i ) Shannon, q i = 1/N = 0.05 D =  p i log 2 (p i ) -  p i log 2 (1/N) = log 2 N +  p i log 2 (p i ) Kullback-Leibler, q i = background frequency –V/L/A more frequent than for instance C/H/W

CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU Mutual information P(G 1 ) = 2/9 = 0.22,.. P(V 6 ) = 4/9 = 0.44,.. P(G 1,V 6 ) = 2/9 = 0.22, P(G 1 )*P(V 6 ) = 8/81 = 0.10 log(0.22/0.10) > 0 ALWGFFPVA ILKEPVHGV ILGFVFTLT LLFGYPVYV GLSPTVWLS YMNGTMSQV GILGFVFTL WLSLLVPFV FLPSDFFPS P1P6

CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU Mutual information 313 binding peptides313 random peptides

CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU Learning higher order correlation Neural networks can learn higher order correlations! –What does this mean? 0 0 => => => => 0 No linear function can learn this pattern

CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU End of first part Take a deep breath Smile to you neighbor

CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU Weight matrices Estimate amino acid frequencies from alignment including sequence weighting and pseudo counts Construct a weight matrix as W ij = log(p ij /q j ) Here i is a position in the motif, and j an amino acid. q j is the prior frequency for amino acid j. W is a L x 20 matrix, L is motif length Score sequences to weight matrix by looking up and adding L values from matrix

CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU Weight matrix 2 What are log-odds scores? –Does an monthly income of 2000 $ mean that you are rich? –Depends on where you live In Denmark no In Argentina yes –You must always compare your measured value to a background For proteins the background is either the flat distribution 0.05 or the distribution in Swiss-prot In nature not all amino acids are found equally often –P A = 0.070, P W = –Finding 6% A is hence not significant, but 6% W highly significant

CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU Scoring sequences to a weight matrix A R N D C Q E G H I L K M F P S T W Y V ILYQVPFSV ALPYWNFAT MTAQWWLDA Which peptide bindes the best? Which peptide second?

CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU Example from real life 10 peptides from MHCpep database Bind to the MHC complex Relevant for immune system recognition Estimate sequence motif and weight matrix Evaluate motif “correctness” on 528 peptides ALAKAAAAM ALAKAAAAN ALAKAAAAR ALAKAAAAT ALAKAAAAV GMNERPILT GILGFVFTM TLNAWVKVV KLNEPVLLL AVVPFIVSV

CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU Prediction accuracy Pearson correlation 0.45

CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU Example (cont.) Raw sequence counting –No sequence weighting –No pseudo count –Prediction accuracy 0.45 Sequence weighting –No pseudo count –Prediction accuracy 0.5 ALAKAAAAM ALAKAAAAN ALAKAAAAR ALAKAAAAT ALAKAAAAV GMNERPILT GILGFVFTM TLNAWVKVV KLNEPVLLL AVVPFIVSV

CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU Example (cont.) Sequence weighting and pseudo count –Prediction accuracy 0.60 Motif found on all data (485) –Prediction accuracy 0.79 ALAKAAAAM ALAKAAAAN ALAKAAAAR ALAKAAAAT ALAKAAAAV GMNERPILT GILGFVFTM TLNAWVKVV KLNEPVLLL AVVPFIVSV

CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU Hidden Markov Models Weight matrices do not deal with insertions and deletions In alignments, this is done in an ad-hoc manner by optimization of the two gap penalties for first gap and gap extension HMM is a natural frame work where insertions/deletions are dealt with explicitly

CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU HMM (a simple example) ACA---ATG TCAACTATC ACAC--AGC AGA---ATC ACCG--ATC Example from A. Krogh Core region defines the number of states in the HMM (red) Insertion and deletion statistics are derived from the non-core part of the alignment (black) Core of alignment

CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU.8.2 ACGTACGT ACGTACGT ACGTACGT ACGTACGT ACGTACGT ACGTACGT ACGTACGT HMM construction ACA---ATG TCAACTATC ACAC--AGC AGA---ATC ACCG--ATC 5 matches. A, 2xC, T, G 5 transitions in gap region C out, G out A-C, C-T, T out Out transition 3/5 Stay transition 2/5 ACA---ATG 0.8x1x0.8x1x0.8x0.4x1x1x0.8x1x0.2 = 3.3x10 -2

CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU Align sequence to HMM ACA---ATG 0.8x1x0.8x1x0.8x0.4x1x0.8x1x0.2 = 3.3x10 -2 TCAACTATC 0.2x1x0.8x1x0.8x0.6x0.2x0.4x0.4x0.4x0.2x0.6x1x1x0.8x1x0.8 = x10 -2 ACAC--AGC = 1.2x10 -2 AGA---ATC = 3.3x10 -2 ACCG--ATC = 0.59x10 -2 Consensus: ACAC--ATC = 4.7x10 -2, ACA---ATC = 13.1x10 -2 Exceptional: TGCT--AGG = x10 -2

CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU Align sequence to HMM - Null model Score depends strongly on length Null model is a random model. For length L the score is 0.25 L Log-odds score for sequence S Log( P(S)/0.25 L ) Positive score means more likely than Null model ACA---ATG = 4.9 TCAACTATC = 3.0 ACAC--AGC = 5.3 AGA---ATC = 4.9 ACCG--ATC = 4.6 Consensus: ACAC--ATC = 6.7 ACA---ATC = 6.3 Exceptional: TGCT--AGG = Note!

CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU HMM’s and weight matrices In the case of un-gapped alignments HMM’s become simple weight matrices It still might be useful to apply a HMM tool package to estimate a weight matrix –Sequence weighting –Pseudo counts

CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU Profile HMM’s Alignments based on conventional scoring matrices (BLOSUM62) scores all positions in a sequence in an equal manner Some positions are highly conserved, some are highly variable (more than what is described in the BLOSUM matrix) Profile HMM’s are ideal suited to describe such position specific variations

CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU Example Sequence profiles Alignment of protein sequences 1PLC._ and 1GYC.A E-value > 1000 Profile alignment –Align 1PLC._ against Swiss-prot –Make position specific weight matrix from alignment –Use this matrix to align 1PLC._ against 1GYC.A E-value < Rmsd=3.3

CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU Example continued Score = 97.1 bits (241), Expect = 9e-22 Identities = 13/107 (12%), Positives = 27/107 (25%), Gaps = 17/107 (15%) Query: 3 ADDGSLAFVPSEFSISPGEKI------VFKNNAGFPHNIVFDEDSIPSGVDASKIS 56 F + G++ N+ + +G + + Sbjct: VFPSPLITGKKGDRFQLNVVDTLTNHTMLKSTSIHWHGFFQAGTNWADGP 79 Query: 57 MSEEDLLNAKGETFEVAL---SNKGEYSFYCSP--HQGAGMVGKVTV 98 A G +F G + ++ G+ G V Sbjct: 80 AFVNQCPIASGHSFLYDFHVPDQAGTFWYHSHLSTQYCDGLRGPFVV 126 Rmsd=3.3 Å Model red Template blue

CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU ADDGSLAFVPSEF--SISPGEKIVFKNNAGFPHNIVFDEDSIPSGVDASKISMSEEDLLN TVNGAI--PGPLIAERLKEGQNVRVTNTLDEDTSIHWHGLLVPFGMDGVPGVSFPG---I -TSMAPAFGVQEFYRTVKQGDEVTVTIT-----NIDQIED-VSHGFVVVNHGVSME---I IE--KMKYLTPEVFYTIKAGETVYWVNGEVMPHNVAFKKGIV--GEDAFRGEMMTKD--- -TSVAPSFSQPSF-LTVKEGDEVTVIVTNLDE------IDDLTHGFTMGNHGVAME---V ASAETMVFEPDFLVLEIGPGDRVRFVPTHK-SHNAATIDGMVPEGVEGFKSRINDE---- TKAVVLTFNTSVEICLVMQGTSIV----AAESHPLHLHGFNFPSNFNLVDPMERNTAGVP TVNGQ--FPGPRLAGVAREGDQVLVKVVNHVAENITIHWHGVQLGTGWADGPAYVTQCPI Profile HMM’s Insertion Deletion Conserved

CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU Profile HMM’s All M/D pairs must be visited once

CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU TMHMM (trans-membrane HMM) (Sonnhammer, von Heijne, and Krogh) Model TM length distribution. Power of HMM. Difficult in alignment.

CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU Combination of HMM’s - Gene finding x ccc xxxxxxxxATGccc cccTAAxxxxxxxx Inter-genic region Region around start codon Coding region Region around stop codon Start codon Stop codon

CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU HMM packages HMMER ( –S.R. Eddy, WashU St. Louis. Freely available. SAM ( –R. Hughey, K. Karplus, A. Krogh, D. Haussler and others, UC Santa Cruz. Freely available to academia, nominal license fee for commercial users. META-MEME ( –William Noble Grundy, UC San Diego. Freely available. Combines features of PSSM search and profile HMM search. NET-ID, HMMpro ( –Freely available to academia, nominal license fee for commercial users. –Allows HMM architecture construction.

CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU trainanhmm Copyright (C) 1998 by Anders Krogh Header {alphabet ACGT;} begin {trans s1 s2;} S1 {trans s2 End;} S2 {trans s1 End;} End {letter NULL;} B S1 S2 End A 0.25 C 0.25 G 0.25 T 0.25 A 0.25 C 0.25 G 0.25 T 0.25