Presentation is loading. Please wait.

Presentation is loading. Please wait.

Hidden Markov Models Probabilistic model of a Multiple sequence alignment. No indel penalties are needed Experimentally derived information can be incorporated.

Similar presentations


Presentation on theme: "Hidden Markov Models Probabilistic model of a Multiple sequence alignment. No indel penalties are needed Experimentally derived information can be incorporated."— Presentation transcript:

1 Hidden Markov Models Probabilistic model of a Multiple sequence alignment. No indel penalties are needed Experimentally derived information can be incorporated Parameters are adjusted to represent observed variation. Requires at least 20 sequences

2 The Evolution of a Sequence Over long periods of time a sequence will acquire random mutations. –These mutations may result in a new amino acid at a given position, the deletion of an amino acid, or the introduction of a new one. –Over VERY long periods of time two sequences may diverge so much that their relationship can not see seen through the direct comparison of their sequences.

3 Hidden Markov Models Pair-wise methods rely on direct comparisons between two sequences. In order to over come the differences in the sequences, a third sequence is introduced, which serves as an intermediate. A high hit between the first and third sequences as well as a high hit between the second and third sequence, implies a relationship between the first and second sequences. Transitive relationship

4 Introducing the HMM The intermediate sequence is kind of like a missing link. The intermediate sequence does not have to be a real sequence. The intermediate sequence becomes the HMM.

5 Introducing the HMM The HMM is a mix of all the sequences that went into its making. The score of a sequence against the HMM shows how well the HMM serves as an intermediate of the sequence. –How likely it is to be related to all the other sequences, which the HMM represents.

6 BM1M2M3M4E Match State with no Indels MSGL MTNL Arrow indicates transition probability. In this case 1 for each step

7 BM1M2M3M4E Match State with no Indels MSGL MTNL Also have probability of Residue at each positon M=1 S=0.5 T=0.5

8 BM1M2M3M4E MSGL MTNL M=1 S=0.5 T=0.5 Typically want to incorporate small probability for all other amino acids.

9 BM1M2M3M4E I1I2I3I4 MS.GL MT.NL MSANI Permit insertion states Transition probabilities may not be 1 I0

10 BM1M2M3M4E I1I2I3I4 MS..GL MT..NL MSA.NI MTARNL Permit insertion states I0

11 MS..GL-- MT..NLAG MSA.NIAG MTARNLAG DELETE PERMITS INCORPORATION OF LAST TWO SITES OF SEQ1 D1D2D3D4D5D6 I1I2I3I4I5I6I0 BM1 M2M3 M4E M5M6 M STSTA GNGN ILIL A M7 I7 D7 G

12 The bottom line of states are the main states (M) These model the columns of the alignment The second row of diamond shaped states are called the insert states (I) These are used to model the highly variable regions in the alignment. The top row or circles are delete states (D) These are silent or null states because they do not match any residues, they simply allow the skipping over of main states. BM1M2M3M4E I1I2I3I4 D1D2D3D4 M5M6 D5D6 I5I6I0

13 Dirichlet Mixtures Additional information to expand potential amino acids in individual sites. Observed frequency of amino acids seen in certain chemical environments –aromatic –acidic –basic –neutral –polar

14 STRUCTURES  helix  sheet coils turns Structures are used to build domains.-Legos of evolution

15

16 Rotation around the peptide bond

17

18 Ramachandran plot for Glycine Areas not permitted for other amino acids Phi angles Psi Angles

19 Introduction to Protein Structure, Branden and Tooze Garland Publishing Co.1991 p.13

20 From: http://bioweb.ncsa.uiuc.edu/~bioph254/Class-slides/Lect12/figure13.html

21

22 From: http://bioweb.ncsa.uiuc.edu/~bioph254/Class-slides/Lect12/figure14.html Longitudinal and Transverse image of alpha helix

23 Turn connecting two helices Introduction to Protein Structure, Branden and Tooze Garland Publishing Co.1991 p. 17

24 Hemoglobin - ribbon representation

25 Proline Because of its structure, proline is typically excluded from  helices except in the first three positions at the amino end.

26  Structure  strand - single run of amino acids in  conformation  sheet- multiple  strands which are hydrogen bonded to yield a sheet like structure.  bulge - disruption of normal hydrogen bonding in a  sheet by amino acid(s) that will not fit into the sheet -for example: proline

27 Introduction to Protein Structure, Branden and Tooze Garland Publishing Co.1991 p.17.  sheets- Parallel

28  sheet - longitudinal and transverse view. Side chains stick “out” http://bioweb.ncsa.uiuc.edu/~bioph254/Class-slides/Lect12/figure22.html

29 Superoxide dismutase -  sheet

30

31 Six classes of structure Class  - bundled a helices connected by loops. Class  - sandwich or barrel comprised entirely of  sheets typically anti-parallel. Class  /  mainly parallel  sheets with intervening a helices. Class  +  - segregated a helices and anti-parallel  sheets Multi-domain Membrane proteins

32 CD8 -all 

33 Thioredoxin  / 

34 Endonuclease Class  + 

35 Rhodopsin 7TM proten

36 Common Hairpin Loop between two  Strands Introduction to Protein Structure, Branden and Tooze Garland Publishing Co.1991 p. 17

37 Turn - short, regular loop. –Difference in frequency of amino acids at positions 1-4 of the turn. Coils (not coiled coil) –Random turns or irregular structure.

38 Disulfide bridges Crosslink of two cysteine residues. Distance between sulfur = 3 Angstroms.

39 Coiled coil -two a helices bundled side by side From: http://catt.poly.edu/~jps/coilcoil.html

40 a,d are internal, remaining amino acids are solvent exposed From: http://catt.poly.edu/~jps/coilcoil.html

41 Coiled Coil Two or more adjacent  helices

42 Prediction of potential Coiled coil domain in Groucho

43 MMFPQSRHSGSSHLPQQLKFTTSDSCDRIKDEFQLLQAQYHSL KLECDKLASEKSEMQRHYVMYYEMSYGLNIEMHKQAEIVKR LNGICAQVLPYLSQEHQQQVLGAIERAKQVTAPELNSIIRQQL QAHQLSQLQALALPLTPLPVGLQPPSLPAVSAGTGLLSLSALG SQTHLSKEDKNGHDGDTHQEDDGEKSD Potential Residues involved in Coiled Coil

44 Triple helix coiled coil - built from  helices

45 Backbone of triple coiled coil

46 E. coli Nucleotide exchange factor

47 Domains Single domain proteins - Epidermal growth factor Serine Proteases - Trypsin Multi domain proteins -Factor IX -one Ca2 + binding, two EGF/ one protease domain. Permit building of novel functions by swapping of domains

48 CaEGF CT Factor IX Domain Structure Ca - Calcium binding domain EGF - Epidermal growth factor domain CT - Chymotrypsin domain

49 Chou - Fasman Prediction of Secondary Structure Based upon analysis of known structures (1974). Frequency of occurrence of each amino acid in: –  helix –  strand –turn

50 Chou - Fasman Prediction List is then analyzed for stretches of amino acids that have a common tendency to form a given secondary structure. Extend until a region of high probability for either a turn or region with a low probability of both  or  is encountered. Window is typically <10

51 GOR prediction Similar to Chou - Fassman –More recent (1988) tabulation of amino acid preferences. –Uses a larger window -17

52

53

54 More Recent Prediction Programs Make use of library of 3d structures to predict structure. Most use a Neural Net approach for prediction. Examples –Nnpredict –PredictProtein

55 Neural Net Programs “trained” on structures. Window -within the window each position is predicted based upon knowledge. Rules also applied (alpha helix 4 AA long)   coil Input Hidden Output window

56 PredictProtein Uses an alignment approach. Submitted sequence is compared to database and alignment is generated Profile is generated for further database searching. Alignment is then used for prediction of secondary structure. Confidence predicted - based upon number of residues of given type at a given position in the alignment

57 Kyte and Doolittle Hydropathy Average of hydropathy index for each residue. Examle of Hydropathy index: F +2.8 R -4.5

58 Transmembrane Domain Characteristics make them easier to predict: –  helix structure –Hydrophobic amino acids –19 or more amino acids long –charged residue will typically have an opposing charge for neutralization. Difficulty in predicted ends of transmembrane domains.

59 Caveat Local secondary structure can be influenced by tertiary structure. Identical string of residues can be an  helix in one protein but a  strand in another protein.

60 3D structural prediction

61 >gi|14769656|ref|XP_010270.4| coagulation factor IX [Homo sapiens] MQRVNMIMAESPGLITICLLGYLLSAECTVFLDHENANKILNRPKRYNSGKLEEFVQGNLERECMEEKCSFEEAREVFEN TERTTEFWKQYVDGDQCESNPCLNGGSCKDDINSYECWCPFGFEGKNCELDVTCNIKNGRCEQFCKNSADNKVVCSCTEG YRLAENQKSCEPAVPFPCGRVSVSQTSKLTRAETVFPDVDYVNSTEAETILDNITQSTQSFNDFTRVVGGEDAKPGQFPW QVVLNGKVDAFCGGSIVNEKWIVTAAHCVETGVKITVVAGEHNIEETEHTEQKRNVIRIIPHHNYNAAINKYNHDIALLE LDEPLVLNSYVTPICIADKEYTNIFLKFGSGYVSGWGRVFHKGRSALVLQYLRVPLVDRATCLRSTKFTIYNNMFCAGFH EGGRDSCQGDSGGPHVTEVEGTSFLTGIISWGEECAMKGKYGIYTKVSRYVNWIKEKTKLT Pfam Protein Information Resource KFHU

62 Tertiary Structure Still challenging Focus upon core structure for prediction Hydrophobic interactions that stabilize structure.

63 Approach Determine “fit”of a query sequence to library of known structures. –Threading- examine compatibility of amino acid side groups with known structures –Two approaches: Environmental template Contact potential

64 Environmental Template Each amino acid in known core evaluated for: –secondary structure –area of side chain buried –types of nearby AA side chains

65 Arginine - basic Aa Isoleucine Different propensity to be in a hydrophobic environment. Might accommodate charge by opposite charge

66 Environmental Query sequence is submitted to previously analyzed database of structures –How well does your sequence fit these protein cores?

67 Contact Potential Number and closeness between each AA pair determined. Query sequence examined to determine if potential AA interactions match those of known cores.

68 Structural Profile Structural position specific scoring matrix Identify which amino acid fit into a specific position in the core of each known structure –each position is assigned to one of the 18 classes of structural environment –scores reflect suitability of AA for that position –log odds matrix Use profile to examine query sequence

69 Z score Many return an E value or a Z score Z score the number of standard deviations from the mean score for all sequences. The higher the Z score, the more significant the model -typical good score >5.


Download ppt "Hidden Markov Models Probabilistic model of a Multiple sequence alignment. No indel penalties are needed Experimentally derived information can be incorporated."

Similar presentations


Ads by Google