Aligning Sequences You have learned about: Data & databases Tools

Aligning Sequences You have learned about: Data & databases Tools
Amino Acids Protein Structure Today we will discuss: Aligning sequences After this, you are ready to carry out a bioinformatics research project! ©CMBI 2009

TRANSFER OF INFORMATION!
Why align sequences? The problem: There a lots of sequences with unknown structure and/or function There are a few sequences with known structure and/or function Alignment can help: If sequences align well, they are likely to be similar If they are similar, then they very likely share structural and/or functional aspects If one of them has known structure/function, then alignment gives us insight in structural and/or functional aspects of the aligned sequence(s) TRANSFER OF INFORMATION! ©CMBI 2009

Sequence Alignment (1) A sequence alignment is a representation of a whole series of evolutionary events, which left traces in the sequences. Things that are more likely to happen during evolution should be most prominently observed in your alignment. The purpose of a sequence alignment is to line up all residues in the sequence that were derived from the same residue position in the ancestral gene or protein. ©CMBI 2009

gap = insertion or deletion
Sequence Alignment (2) A B gap = insertion or deletion A B ©CMBI 2009

Structural alignment To carry over information from a well studied protein sequence and its structure to a newly discovered protein sequence, we need a sequence alignment that represents the protein structures today, a structural alignment. The implicit meaning of placing amino acid residues below each other in the same column of a protein (multiple) sequence alignment is that they are at the equivalent position in the 3D structures of the corresponding proteins!! ©CMBI 2009

Examples 1) the 3 active site residues H, D, S, of the serine protease we saw earlier 2) Cysteine bridges (disulfide bridges): STCTKGALKLPVCRK TSCTEG--RLPGCKR ©CMBI 2009

Transfer of information
Such information can be: Phosphorylation sites Glycosylation sites Stabilizing mutations Membrane anchors Ion binding sites Ligand binding residues Cellular localization Typically what one finds in the feature (FT) records of Swissprot! ©CMBI 2009

Significance of alignment
One can only transfer information if the similarity is significantly high between the two sequences. Schneider (group of Sander) determined the “threshold curve” for transferring structural information from one known protein structure to another protein sequence: If the sequences are > 80 aa long, then >25% sequence identity is enough to reliably transfer structural information. If the sequences are smaller in length, a higher percentage of identity is needed. Structure is much more conserved than sequence! ©CMBI 2009

Significance of alignment (2)
©CMBI 2009

Aligning sequences by hand
Most information that enters the alignment procedure comes from the physico-chemical properties of the amino acids. Examples: which is the better alignment (left or right)? 1) CPISRTWASIFRCW CPISRTWASIFRCW CPISRT---LFRCW CPISRTL---FRCW 2) CPISRTRASEFRCW CPISRTRASEFRCW CPISRTK---FRCW CPISRT---KFRCW ©CMBI 2009

Aligning sequences by hand (2)
Procedure of aligning depends on information available: Use “only” identity of amino acid and its physico-chemical properties. This is more or less what alignment programs do. Also use explicitly the secondary structure preference of the amino acids. Example: aligning 2 helices when sequence identity is low Use 3D information if one or more of the structures in the alignment are known. In most cases you will start with a alignment program (e.g. CLUSTAL) and then use your knowledge of the amino acids to improve the alignment, for instance by correcting the position of gaps. ©CMBI 2009

Helix ©CMBI 2009

Positional preferences in helices (1)
total H H H H H ASP Dataset of good helices from PDB files Count all Asp residues in & before helices Identify preferential positions for Asp residues Position 1 in helix ©CMBI 2009

Positional preferences in helices (2)
Fill this table for all 20 amino acids Use this information when aligning helices who have low percentage of sequence identity total H H H H H ALA CYS ASP GLU (…) TRP TYR Position 1 in helix ©CMBI 2009

Aligning 2 helices when sequence identity is low
Helix 1: S G V S P D Q L A A L K L I L E L A L K Helix 2: G T S L E T A L L M Q I A Q K L I A G ©CMBI 2009

Aligning 2 helices when sequence identity is low (2)
S G V S P D Q L A A L K L I L E L A L K 5 5 G T S L E T A L L M Q I A Q K L I A G 5 Final alignment: S G V S P D Q L A A L K L I L E L A L K - G T S L E T A L L M Q I A Q K L I A G ©CMBI 2009

Use of 3D structure info (1)
2 If you know that in structure 1 the Ala is pointing outside and the Ser is pointing inside: Where does the Arg in structure 2 go? (and what will CLUSTAL choose?) ©CMBI 2009

Use of 3D structure info (2)
A ILE CYS ARG LEU PRO GLY SER ALA GLU ALA VAL B1 VAL CYS ARG THR PRO GLU ALA ILE B2 VAL CYS ARG THR PRO GLU ALA ILE ©CMBI 2009

An even more real example
A ILE CYS ARG LEU PRO GLY SER ALA GLU ALA VAL B1 VAL CYS ARG THR PRO GLU ALA ILE B2 VAL CYS ARG THR PRO GLU ALA ILE G- - PP- S-T LT- A-P RRR EEE CCC IVV AAA V I I ©CMBI 2009

We have seen that alignments ….
Are crucial for being able to transfer information Can be optimized by using secondary structure preferences (e.g. helix positioning) Can be optimized by using 3D structure info ©CMBI 2009

Multiple sequence alignments
If we have more than two sequences aligned, the alignment is called a multiple sequence alignment (MSA) MSA’s can: confirm or improve pair-wise sequence alignments reveal structural information (e.g. cys-bridges) validate PROSITE search results ©CMBI 2009

MSA and cysteine bridges
Multiple sequence alignments can reveal structural information: ASCTRGCIKLPTCKKMGRCTGY STCTKGALKLPVCRKMGKSSAY ATSTHGCMKLPCSRRFGKCSSY TSCTEGCLRLPGCKRFGRCTSY TTCTKGLLKLPGCKRFGKSSAY ASSTKGCMKLPVSRRFGRCTAY ©CMBI 2009

MSA to validate PROSITE results (1)
PROSITE glycosylation pattern: N-{P}-[ST]-{P} where N is the glycosylation site. PROSITE Syntax: A-[BC]-X-D(2,5)-{EFG}-H Means: A B or C Anything 2-5 D’s Not E,F or G H ©CMBI 2009

MSA to validate PROSITE results (2)
The chance of finding N-{P}-[ST]-{P} is rather high. So how can you be sure? Look at the multiple sequence alignment: ASLRNASTVVTIGDTITGNLTLASYHW GSIKNGSSVITLPGTMEGNLSTTTYHY ATLRNASTVMEINGTITGDLTLASFHW ©CMBI 2009

What you have learned today (and will need for your own project)
A good sequence alignment is necessary to carrying over information between proteins. Putting amino acids below each other in a sequence alignment implies that you predict that they are on equivalent positions in both proteins. If the aligned sequences are > 80 aa long, then >25% sequence identity is enough to reliably transfer structural information. You need to use all structural information available to you to optimize the sequence alignment. This can be real 3D data, but can also be “just” your own knowledge about the properties and preferences of the amino acids. ©CMBI 2009

Aligning Sequences You have learned about: Data & databases Tools

Similar presentations

Presentation on theme: "Aligning Sequences You have learned about: Data & databases Tools"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Aligning Sequences You have learned about: Data & databases Tools

Similar presentations

Presentation on theme: "Aligning Sequences You have learned about: Data & databases Tools"— Presentation transcript:

Similar presentations

About project

Feedback