Presentation is loading. Please wait.

Presentation is loading. Please wait.

(PSI-)BLAST & MSA via Max-Planck. Where? (to find homologues) Structural templates- search against the PDB Sequence homologues- search against SwissProt.

Similar presentations


Presentation on theme: "(PSI-)BLAST & MSA via Max-Planck. Where? (to find homologues) Structural templates- search against the PDB Sequence homologues- search against SwissProt."— Presentation transcript:

1 (PSI-)BLAST & MSA via Max-Planck

2 Where? (to find homologues) Structural templates- search against the PDB Sequence homologues- search against SwissProt or Uniprot (recommended!) How many? As many as possible, as long as the MSA looks good (next week…) General Issues

3 How long? (length of homologues) Fragments- short homologues (less than 50,60% the query’s length) = bad alignment Ensure your sequences exhibit the wanted domain(s) N/C terminal tend to vary in length between homologues How close? (distance from query sequence) All too close- no information Too many too far- bad alignment Ensure that you have a balanced collection! General Issues

4 From who? (which species the sequence belongs to) Don’t care, all homologues are welcome Orthologues/paralogues may be helpful Sequences from distant/close species provide different types of information Which method? (BLAST/PSI-BLAST) Depends on the protein, available homologues, the goal in mind… General Issues

5 Rules For Choosing Sequences Very similar sequences have little information Very different sequences cause trouble…<30% identical with more than half of the other sequences in the set Choose sequences as distantly related as possible Sequence between 30-80% identical with more than half of the sequences in the set The more sequences the better General Issues

6 Overall work steps 1.Run the search- 1.Select database 2.E-value threshold 3.BLAST or PSI-BLAST- how many rounds? 2.Take out sequences- HSP (slider region) or full sequences 3.Align sequences- choose alignment program 4.View alignment with BioEdit tor another program 5.Calculate trees, conservation scores (ConSurf) etc…

7 (PSI-)BLAST via Max-Planck http://toolkit.tuebingen.mpg.de/sections/search Databases- swissprot, tremble, NR, env, pdb or any combination for proteins, but only NT for DNA. All BLAST programs Main advantage- you can easily extract and filter the HSPs, on top of full sequences

8 The Query Protein Name: Dihydrodipicolinate reductase Enzyme reaction: Molecular process: Lysine biosynthesis (early stages) Organism: E. coli Sequence length: 273 aa

9 Query: DAPB_ECOLI >DAPB_ECOLI MHDANIRVAIAGAGGRMGRQLIQAALALEGVQLGAALEREGSSLLGSDAGELAGAGKTGVTVQSSLDAV KDDFDVFIDFTRPEGTLNHLAFCRQHGKGMVIGTTGFDEAGKQAIRDAAADIAIVFAANFSVGVNVMLKLL EKAAKVMGDYTDIEIIEAHHRHKVDAPSGTALAMGEAIAHALDKDLKDCAVYSREGHTGERVPGTIGFATV RAGDIVGEHTAMFADIGERLEITHKASSRMTFANGAVRSALWLSGKESGLFDMRDVLDLNNL The Query Protein

10 (PSI-)BLAST via Max-Planck http://toolkit.tuebingen.mpg.de/psi_blast/ Choose database or databases (selecting a few using CTRL) Upload sequence or MSA

11 (PSI-)BLAST via Max-Planc

12

13

14

15 (PSI-)BLAST via Max-Planck E-value threshold can be assessed using the distribution

16 Forward results to MSA http://toolkit.tuebingen.mpg.de/sections/alignment

17 Forward results to MSA All marked hits or filter by e-value HSP (sider region) or full sequences

18 Forward results to MSA

19 Align via Max-Planck Alignment results: Save the alignment

20 Alignmen viewing & editing BioEdit http://www.mbio.ncsu.edu/BioEdit/BioEdit.html Easy-to-use sequence alignment editor View and manipulate alignments up to 20,000 sequences. F our modes of manual alignment: select and slide, dynamic grab and drag, gap insert and delete by mouse click, and on-screen typing which behaves like a text editor. Reads and writes Genbank, Fasta, Phylip 3.2, Phylip 4, and NBRF/PIR formats. Also reads GCG and Clustal formats

21 Easiest Using Bioedit http://www.mbio.ncsu.edu/BioEdit/bioedit.html Alignment viewing & editing

22 Easiest Using Bioedit http://www.mbio.ncsu.edu/BioEdit/bioedit.html Find a specific sequence: “Edit-> search -> in titles” Erase\add sequences: “Edit-> cut\paste\delete sequence” “Sequence Identity matrix” under “Alignment”- useful for a rough evaluation of distances within the alignment. After taking out sequences, “Minimize Alignment” under “Alignment” takes out unessential gaps. Can save an image using: “File -> Graphic View” & then “Edit -> Copy page as BITMAP” Alignment viewing & editing

23 A little of ConSurf Compute Conservation Scores Give an MSA or will compute one for you (given a FASTA sequence, BLAST & MSA) Main advantage: filters short HSPs, removes redundant sequences Shows conservation scores on sequence or on a protein structure (if available)

24 ConSurf http://consurf.tau.ac.il/

25 ConSurf

26 http://consurf.tau.ac.il/results/1321532763/output.php

27 ConSurf http://consurf.tau.ac.il/results/1321532763/output.php

28 ConSurf MSA colored by conservation PSI-BLAST result MSA Phylogenetic tree Sequences used Sequence conservation

29 ConSurf

30 Jmol- Easy web-based viewer

31 WebLogo http://weblogo.berkeley.edu/logo.cgi

32 WebLogo http://weblogo.berkeley.edu/logo.cgi

33 Each sequence is a different story  adjust parameters: BLAST- E-value, substitution matrix, gap penalties, database, minimum length, redundancy level, fragment overlap… PSI-BLAST- BLAST parameters + PSSM inclusion threshold (or chose manually), number of rounds… Try using HSP or full sequences, different MSA programs… No “Miracle solution” 


Download ppt "(PSI-)BLAST & MSA via Max-Planck. Where? (to find homologues) Structural templates- search against the PDB Sequence homologues- search against SwissProt."

Similar presentations


Ads by Google