Presentation is loading. Please wait.

Presentation is loading. Please wait.

File formats and conversions. Important formats How Fasta Raw/Peptide Tab.

Similar presentations


Presentation on theme: "File formats and conversions. Important formats How Fasta Raw/Peptide Tab."— Presentation transcript:

1 File formats and conversions

2 Important formats How Fasta Raw/Peptide Tab

3 How 1. One or more entries 1. First line 1. Length of sequence (6 digits right aligned) 2. Name of sequence 2. Next lines 1. Sequence, usually 80 characters pr line 3. Last lines 1. Assignments of the positions in the sequence

4 How file 553 ATP0_BOVIN_1E79.C MLSVRVAAAVARALPRRAGLVSKNALGSSFIAARNLHASNSRLQKTGTAEVSSILEERILGADTSVDLEETGRVLSIGDG IARVHGLRNVQAEEMVEFSSGLKGMSLNLEPDNVGVVVFGNDKLIKEGDIVKRTGAIVDVPVGEELLGRVVDALGNAIDG KGPIGSKARRRVGLKAPGIIPRISVREPMQTGIKAVDSLVPIGRGQRELIIGDRQTGKTSIAIDTIINQKRFNDGTDEKK KLYCIYVAIGQKRSTVAQLVKRLTDADAMKYTIVVSATASDAAPLQYLAPYSGCSMGEYFRDNGKHALIIYDDLSKQAVA YRQMSLLLRRPPGREAYPGDVFYLHSRLLERAAKMNDAFGGGSLTALPVIETQAGDVSAYIPTNVISITDGQIFLETELF YKGIRPAINVGLSVSRVGSAAQTRAMKQVAGTMKLELAQYREVAAFAQFGSDLDAATQQLLSRGVRLTELLKQGQYSPMA IEEQVAVIYAGVRGYLDKLEPSKITKFENAFLSHVISQHQALLSKIRTDGKISEESDAKLKEIVTNFLAGFEA -------------------------------------------------------------...SS.TTTEEEEEEEETT EEEEEE.TT.BTTEEEEETTS.EEEEEEE.SS.EEEEESS.GGG..TT.EEEEEEEESEEE.SGGGTT.EE.TTS.B.SS S.....S.EEETT.....STTB....SB...S.HHHHHHS..BTT.B.EEEESTTSSHHHHHHHHHHHTHHHHSSS.GGG..EEEEEEES..HHHHHHHHHHHHHHT.GGGEEEEEE.TTS.HHHHHHHHHHHHHHHHHHHHTT.EEEEEEETHHHHHHH HHHHHHHTT....GGGS.TTHHHHHHHHHTT..BB.GGGTS.EEEEEEEEE.STT.TTSHHHHHHHTTSSEEEEE.HHHH HHT.SS.B.TTT.EESSGGGGS.HHHHHHHTTHHHHHHHHHHHHHHHTT.....HHHHHHHHHHHHHHHHT...SS.... HHHHHHHHHHHHTSTTTTS.GGGHHHHHHHHHHHHHHH.HHHHHHHHHHTS..HHHHHHHHHHHHHHHHHHH.

5 Fasta 1. One or more entries 1. First line 1. The character “>” 2. The name 3. Optional descriptions not read by all readers 2. Rest of lines 1. The sequence usually 50-80 characteres per line

6 Raw/peptide Short sequences One peptide per line

7 Tab format 1. One or more entries 1. One entry per line 2. Tab delimited fields 1. Name 2. Sequence 3. Assignments/features

8 Converters Saco_convert –From/To How Fasta Tab Makefsa –Raw peptides to fasta peptides

9 Databases at CBS

10 Databases - ready for BLAST SwissProt PDB GenBank nr –Non redundant set of proteins from the above plus TREMBL, PIR and others sptr_nrdb –Non redundant set of proteins from SwissProt and TREMBL

11 BLAST routines - single search blastp –aadb aaquery blastn –ntdb ntquery blastx –aadb ntquery tblastn –ntdb aaquery tblastx –ntdb ntquery

12 Blastpgp - iterative blast Repetetive searches with AA query through an AA database Results in hits plus an optional position specific scoring matrix

13 The actual search Query is single file in FASTA format Costum databases need to be initially formatted from sets in FASTA format –Use setdb program for protein sequence databases (i.e., blastp and blastx) –Use pressdb program for nucleotide sequence databases (i.e., blastn and tblastn) –Use formatdb for blastpgp (psiblast)

14 Exercises

15 Conversion exersise Convert the file A1.rsee.test to fasta format Convert the file ss_sub300.how to fasta format

16 Blast Take the first entry in ss_sub300.how and blastp it against ss_sub300.how and PDB Make a position specific scoring matrix for the entry using psiblast and nr and save the profile as binary and readable matrices Use the binary matrix to search against PDB and ss_sub300.how


Download ppt "File formats and conversions. Important formats How Fasta Raw/Peptide Tab."

Similar presentations


Ads by Google