Presentation is loading. Please wait.

Presentation is loading. Please wait.

Wellcome Trust graduate course. - Computational Methods series. --- Sequence-based bioinformatics. Dr. Hyunji Kim Department of Biochemistry, University.

Similar presentations


Presentation on theme: "Wellcome Trust graduate course. - Computational Methods series. --- Sequence-based bioinformatics. Dr. Hyunji Kim Department of Biochemistry, University."— Presentation transcript:

1 Wellcome Trust graduate course. - Computational Methods series. --- Sequence-based bioinformatics. Dr. Hyunji Kim Department of Biochemistry, University of Oxford, South Parks Road, Oxford, OX1 3QU, UK Email:hyunji.kim@bioch.ox.ac.uk

2 1) BLAST/WUBLAST A search engine to find sequences of your interest. BLAST can sophisticate its search, by varying substitution matrices/filtering options on a specified database. http://www.ncbi.nlm.nih.gov/BLAST/, http://www.ebi.ac.uk/blast2/,http://www.ncbi.nlm.nih.gov/BLAST/http://www.ebi.ac.uk/blast2/ 2) ClustalW/T-Coffee/Muscle Helps us make sense of a bunch of unaligned sequences, via generating multiple or pairwise sequence alignments. Uses a progressive-alignment method. http://www.ebi.ac.uk/clustalw/ 3) HMMer/PSI-BLAST Builds a profile Hidden Markov Model from a set of sequences aligned. Aligns sequences using a pHMM, searches from a sequence database, and can assign functions to a given sequence. http://hmmer.wustl.edu/ 4) Phylip/TreeDyn Calculates a distance matrix from a set of sequences. Derives phylogenetic trees, by taking such matrix as input, based upon theories of minimum evolution, parsimony and more. http://evolution.genetics.washington.edu/phylip.html Basic Tools

3 5) Databases Nucleotide databases; EMBL, Genbank &DDBJ Protein databases; fully annotated, e.g. Swiss-Prot v52.3, as of 17 th of Apr., 2007. (264,492 entries) a computer-annotated, e.g. TrEMBL v35.3 Genomics databases; Ensembl & Eukaryota, Bacteria and Archaea genomes 20+14;(v44), 51, 445, 40, as of 20 th of Apr., 2007. http://www.ebi.ac.uk/uniprot/index.htmlhttp://www.ebi.ac.uk/uniprot/index.html, http://www.ensembl.org/,http://www.ensembl.org/ http://www.ebi.ac.uk/genomes/index.html 6) Major Bioinformatics Centres, around the globe. http://www.ebi.ac.uk/http://www.ebi.ac.uk/, http://www.ncbi.nlm.nih.gov/, http://www.ddbj.nig.ac.jp/,http://www.ncbi.nlm.nih.gov/http://www.ddbj.nig.ac.jp/ http://us.expasy.org/http://us.expasy.org/, http://www.sanger.ac.uk/, http://geneontology.org/http://www.sanger.ac.uk/http://geneontology.org/

4 Searching for sequences by homology - BLAST

5 x y i j

6

7

8 Reference: Gish, W. (1996-2006) http://blast.wustl.edu Query= KcsA (160 letters) >Filtered+0 MPPMXXXXXXXXXXXXXGRHGSALHWRXXXXXXXXXXXXXXXGSYLAVLAERGAPGAQLI TYPRALWWSVETATTVGYGDLYPVTLWGRLVAVVVMVAGITSFGLVTAALATWFVGREQE RRGHFVRHSEKXXXXXXXXXXXXLHERFDRLERMLDDNRR Database: swissprot 223,100 sequences; 81,965,973 total letters. Searching....10....20....30....40....50....60....70....80....90....100% done Smallest Sum High Probability Sequences producing High-scoring Segment Pairs: Score P(N) N SW:KCSA_STRCO P0A333 Voltage-gated potassium channel. 615 3.0e-60 1 SW:KCSA_STRLI P0A334 Voltage-gated potassium channel. 615 3.0e-60 1 >SW:KCSA_STRCO P0A333 Voltage-gated potassium channel. Length = 160 Score = 615 (221.5 bits), Expect = 3.0e-60, P = 3.0e-60, Group = 1 Identities = 120/160 (75%), Positives = 120/160 (75%) Query: 1 MPPMXXXXXXXXXXXXXGRHGSALHWRXXXXXXXXXXXXXXXGSYLAVLAERGAPGAQLI 60 MPPM GRHGSALHWR GSYLAVLAERGAPGAQLI Sbjct: 1 MPPMLSGLLARLVKLLLGRHGSALHWRAAGAATVLLVIVLLAGSYLAVLAERGAPGAQLI 60 Query: 61 TYPRALWWSVETATTVGYGDLYPVTLWGRLVAVVVMVAGITSFGLVTAALATWFVGREQE 120 TYPRALWWSVETATTVGYGDLYPVTLWGRLVAVVVMVAGITSFGLVTAALATWFVGREQE Sbjct: 61 TYPRALWWSVETATTVGYGDLYPVTLWGRLVAVVVMVAGITSFGLVTAALATWFVGREQE 120

9 Multiple sequence alignment – ClustalW

10 ***************************************************** CLUSTAL W (1.83) Multiple Sequence Alignments ***************************************************** 1. Sequence Input From Disc 2. Multiple Alignments 3. Profile / Structure Alignments 4. Phylogenetic trees S. Execute a system command H. HELP X. EXIT (leave program) Your choice: 2 ****** MULTIPLE ALIGNMENT MENU ****** 1. Do complete multiple alignment now (Slow/Accurate) 2. Produce guide tree file only 3. Do alignment using old guide tree file 4. Toggle Slow/Fast pairwise alignments = SLOW 5. Pairwise alignment parameters 6. Multiple alignment parameters 7. Reset gaps before alignment? = OFF 8. Toggle screen display = ON 9. Output format options S. Execute a system command H. HELP or press [RETURN] to go back to main menu Your choice:

11

12 CLUSTAL W (1.82) multiple sequence alignment KVAP_AERPE FDALW-WAVVTATTVGYGDVVP-ATPIGKVIGIAVMLTGISALTLLIGTVSNMF------ 79 MVP_METJA FDAFY-FTTISITTVGYGDITP-KTDAGKLI---IIFS---VLFFISGLITS-------- 70 O28600 FDSLY-MTVITITTTGYGEVKP-MGPGGRVISMLLMFVGVGTF----------------- 64 Q8TXQ4 LTCLY-FTAATITTVGYGDVVP-TTEAGRLLSVIVMFSGIGVASYAL------------- 73 Q6L2S2 FTSLW-WTMQTITTVGYGDTPV-YGFYGRINGMLIMVFGIGTIGYVTASLAT-------- 79 Q979Z2 FTAIW-FTMETVTTVGYGDVVP-VSNLGRVVAMLIMVSGIGLLGTLTATISAYLF----Q 80 O26605 EDSLW-YVLQTITTVGYGDIVP-VTSLGRFTGMVIMFSAIASTSLITASATSTLLERGEQ 114 Q9HIA8 GNAFY-YTGEVITTLGFGDILP-VTMDAKIFTISLAFLGVAIFFSSITALILPSVERRLG 94 Q97CK5 GTALY-YTGETVTTLGFGDILP-VDLESRLFTISLAFLGVAIFFSAMTALITPTIERRVG 84 GrayOthers Hydroxyl, AmineGreenSTYHCNGQ BasicMagentaRHK AcidicBlueDE Small (small+ hydrophobic (incl.aromatic -Y)) RedAVFPMILW

13 Profile alignment & Pattern recognition: HMMer More sensitive homology-search: PSI-BLAST & HMMer

14 DNA sequence Amino acid sequence

15

16 PSI-BLAST

17 Phylogeny: Phylip & Treedyn

18 Saitou N and Nei M, The neighbour-joining method: a new method for reconstructing phylogenetic trees. Mol Biol Evol, 4(4):406-425, 1987

19

20

21 TreeDyn

22 Protein secondary structure prediction: two consensus methods

23 http://sbcb.bioch.ox.ac.uk/TM_noj/TM_noj.html

24 640 650 660 670 680 690 700 | | | | | | | MFAKGYGKNNEPLRGYILTFLIALGFILIAELNVIAPIISNFFLASYALINFSVFHASLAKSPGWRPAFK ALOM2 ***************** DAS **************************************** HMMTOP2 ****************** ************************* MEMSAT1.5 ************************* PHD ************************* SPLIT4 **************** *************************** TMAP ***************************** TMFINDER **************************************** TMHMM2 *********************** ****************** TMPRED ************************* TOPPRED2 ********************* ********************* Consensus ------------???hhhhHHHHHHHHHHHHHHHHHhHHhhhhhhhhh???????????----------- Dr. Jonathan Cuthbertson developed Transmembrane Prediction Server. Example Output

25 http://pongo.biocomp.unibo.it/pongo Pongo

26 Example Output by Pongo

27 Background for practical sessions

28 Ion channels ; Potassium channels ; Voltage-gated potassium channels Ion channels are a diverse class of transmembrane proteins that are responsible for the diffusion of ions across the cell membranes. There are several major families of ion channels, for instance K +, Na +, Ca 2+ and Cl - channels as well as ligand gated ion channels (LGICs). Many human neurological and muscular disorders have been traced to defects in voltage-gated and ligand-gated ion channels. Fig 2. A. Long et al., Science, Vol. 309, p897, 2005 TM T1  Introduction to your input sequence

29 K + channels, blastp Homologues are visualised in BLIXEM. Your expected blastp-output

30 Kv BK SK Erg Kir CNG AKT Kv1.x Shab Kv2.x Shal Kv4.x Kv5.6.8.9. Shaw Kv3.x Kir2.x Kir6.2 Kir3.x Kir4.x Kir1.1 Kir6.1 Kir2.3 Fig 4. Shealy et al., Biophysical Journal, Vol 84, p2929, 2003 Alignment you are about to build, not necessarily as big.

31 hmmsearch - search a sequence database with a profile HMM - - - - - - - - - - - - - - - - - - HMM file: Kv.hmm [Kv_homologues] Sequence database: infile_comb - - - - - - - - - - - - - - - - Query HMM: Kv_homologues HMM has been calibrated; E-values are empirical estimates] Scores for complete sequences (score includes all domains): Sequence Description Score E-value N -------- ----------- ----- ------- --- CIKS_DROME 241.2 3.2e-71 1 Q9VX00_DROME 234.3 3.9e-69 1 CIKB_DROME 159.3 1.5e-46 1 O62350_Celegans 156.7 8.8e-46 1 Q9VLC6_DROME 156.6 9.6e-46 1 CIKW_DROME 156.5 1e-45 1 Q8SYL2_DROME 156.5 1e-45 1 Q22012_Celegans 155.3 2.4e-45 1 Filtered_5DROME 140.5 6.6e-41 1 Filtered_6DROME 140.5 6.6e-41 1 Q9XXD1_Celegans 125.0 3.1e-36 1 Example of pHMM-related output

32 Kir Kv BK SK AKT CNG/HErg KcsA MthK Kv1.2 KvAP Raw tree-files produced by PHYLIP

33 Phylogenetic trees modified in TreeDyn


Download ppt "Wellcome Trust graduate course. - Computational Methods series. --- Sequence-based bioinformatics. Dr. Hyunji Kim Department of Biochemistry, University."

Similar presentations


Ads by Google