Presentation is loading. Please wait.

Presentation is loading. Please wait.

Alignment & Secondary Structure You have learned about: Data & databases Tools Amino Acids Protein Structure Today we will discuss: Aligning sequences.

Similar presentations


Presentation on theme: "Alignment & Secondary Structure You have learned about: Data & databases Tools Amino Acids Protein Structure Today we will discuss: Aligning sequences."— Presentation transcript:

1 Alignment & Secondary Structure You have learned about: Data & databases Tools Amino Acids Protein Structure Today we will discuss: Aligning sequences After this: You know how to perform structural alignments You are ready to apply this knowledge in your bioinformatics research project!

2 ©CMBI 2011 Why align sequences? The problem: There a lots of sequences with unknown structure and/or function There are a few sequences with known structure and/or function Alignment can help: If one of them has known structure/function, then alignment gives us insight in structural and/or functional aspects of the aligned sequence(s) Transfer of information!

3 ©CMBI 2011 Sequence Alignment (1) A sequence alignment is a representation of a whole series of evolutionary events, which left traces in the sequences. The purpose of a sequence alignment is to line up all residues in the sequence that were derived from the same residue position in the ancestral gene or protein.

4 ©CMBI 2009 Sequence Alignment (2) gap = insertion or deletion (indel) A B B A

5 ©CMBI 2011 Structural alignment To carry over structural information, we need a structural alignment. The implicit meaning of placing amino acid residues below each other in the same column of a protein (multiple) sequence alignment is that they are at the equivalent position in the 3D structures of the corresponding proteins!!

6 ©CMBI 2009 Examples 1) the 3 active site residues H, D, S, of the serine protease we saw earlier 2) Cysteine bridges (disulfide bridges): STCTKGALKLPVCRK TSCTEG--RLPGCKR

7 ©CMBI 2009 Transfer of information Such information can be: Phosphorylation sites Glycosylation sites Stabilizing mutations Membrane anchors Ion binding sites Ligand binding residues Cellular localization Typically what one finds in the feature (FT) records of Swissprot!

8 ©CMBI 2011 Significance of alignment One can only transfer information if the similarity is significantly high between the two sequences. The “threshold curve” for transferring structural information from one known protein structure to another protein sequence: If the sequences are > 80 aa long, then >25% sequence identity is enough to reliably transfer structural information. Structure is much more conserved than sequence!

9 ©CMBI 2009 Significance of alignment (2)

10 ©CMBI 2009 Aligning sequences by hand Examples: which is the better alignment (left or right)? 1) CPISRTWASIFRCW CPISRTWASIFRCW CPISRT---LFRCW CPISRTL---FRCW 2)CPISRTRASEFRCW CPISRTRASEFRCW CPISRTK---FRCW CPISRT---KFRCW

11 ©CMBI 2011 Aligning sequences by hand (2) Procedure of aligning depends on information available: 1)In most cases you will start with a alignment program (e.g. CLUSTAL) 2)Then use your knowledge of the amino acids to improve the alignment, for instance by correcting the position of gaps. 3)Also use explicitly the secondary structure preference of the amino acids, especially for N-termini of helices and beta-turns. 4)Use 3D information if one or more of the structures in the alignment are known.

12 ©CMBI 2009 Helix

13 ©CMBI 2009 -4 -3 -2 -1 1 2 3 4 5 total - - - - H H H H H ASP 98 110 121 260 98 197 167 49 86 1186 Dataset of good helices from PDB files Count all Asp residues in & before helices Identify preferential positions for Asp residues Positional preferences in helices (1) Position 1 in helix

14 ©CMBI 2009 Aligning 2 sequences when sequence identity is low S G V S P D Q L A A L K L I L E L A L K G T S L E T A L L M Q I A Q K L I A G Helix 1: Helix 2:

15 ©CMBI 2009 Fill this table for all 20 amino acids Use this information when aligning helices who have low percentage of sequence identity -4 -3 -2 -1 1 2 3 4 5 total - - - - H H H H H ALA 143 148 99 58 189 205 187 241 268 1538 CYS 24 31 29 22 14 17 18 33 17 205 ASP 98 110 121 260 98 197 167 49 86 1186 GLU 91 100 71 71 152 287 269 70 147 1258 (…) TRP 29 25 29 14 30 26 28 30 29 240 TYR 66 65 75 33 58 44 56 72 48 517 Positional preferences in helices (2) Position 1 in helix

16 Protein threading The word threading implies that one drags the sequence (ACDEFG...) step by step through each location on the template ©CMBI 2009

17 Aligning 2 helices when sequence identity is low S G V S P D Q L A A L K L I L E L A L K -1-4-4-1-4-1 3-2 1 1-2 2 -3-2 -3 2 5 1 2 2 1 5 4 -2 3 4 3 3 4 1 5 4 4 5 5 5 G T S L E T A L L M Q I A Q K L I A G -4-1-1-2 2-1 1-2 -3 3 1 3 3 2 1 4 3 4 5 4 5 5

18 ©CMBI 2009 Aligning 2 helices when sequence identity is low S G V S P D Q L A A L K L I L E L A L K -1-4-4-1-4-1 3-2 1 1-2 2 -3-2 -3 2 5 1 2 2 1 5 4 -2 3 4 3 3 4 1 5 4 4 5 5 5 G T S L E T A L L M Q I A Q K L I A G -4-1-1-2 2-1 1-2 -3 3 1 3 3 2 1 4 3 4 5 4 5 5 Final alignment: S G V S P D Q L A A L K L I L E L A L K - G T S L E T A L L M Q I A Q K L I A G

19 ©CMBI 2009 Use of 3D structure info (1) If you know that in structure 1 the Ala is pointing outside and the Ser is pointing inside: Where does the Arg in structure 2 go? (and what will CLUSTAL choose?) A B

20 Use of 3D-structure info (2) Sequence A: FDICRLPGSAEAV Sequence B1: FNVCRMP---EAI Sequence B2: FNVCR---MPEAI S G P L A E R C IV C R M P E V C R M P E  Correct alignment F-D- -A-V

21 ©CMBI 2011 What you have learned today A good sequence alignment is necessary to carrying over information between proteins. Putting amino acids below each other in a sequence alignment implies that you predict that they are on equivalent positions in both proteins. Alignments can be optimized by using secondary structure preferences (especially for helix positioning and prediction of beta-turns) 3D structure info If the aligned sequences are > 80 aa long, then >25% sequence identity is enough to reliably transfer structural information.

22 ©CMBI 2011 Alignment videos Swift.cmbi.ru.nl/teach/B1M => Seminars => Link to Aligning video page

23 ©CMBI 2009 You are ready to… Applying these lessons to the practical exercises Performing your own bioinformatics research project! Take home lesson: Please remember to always use all structural information available to you to optimize a sequence alignment. This can be real 3D data, but can also be “just” your own knowledge about the properties and preferences of the amino acids.


Download ppt "Alignment & Secondary Structure You have learned about: Data & databases Tools Amino Acids Protein Structure Today we will discuss: Aligning sequences."

Similar presentations


Ads by Google