Presentation is loading. Please wait.

Presentation is loading. Please wait.

©CMBI 2008 Databases Data must be in a certain format for software to recognize Every database can have its own format but some data elements are essential.

Similar presentations


Presentation on theme: "©CMBI 2008 Databases Data must be in a certain format for software to recognize Every database can have its own format but some data elements are essential."— Presentation transcript:

1 ©CMBI 2008 Databases Data must be in a certain format for software to recognize Every database can have its own format but some data elements are essential for every database: 1. Unique identifier, or accession code 2. Name of depositor 3. Literature references 4. Deposition date 5. The real data

2 ©CMBI 2008 Quality of Data SwissProt Data is only entered by annotation experts EMBL, PDB “Everybody” can submit data No human intervention when submitted; some automatic checks

3 ©CMBI 2008 SwissProt database Database of protein sequences 399749 entries (Oct 2008) Ca. 200 Annotation experts worldwide Keyword-organised flatfile Obligatory deposit of in SwissProt before publication Presently, databases are being merged into UniProt.

4 ©CMBI 2008 Important records in SwissProt (1) ID HBA_HUMAN Reviewed; 142 AA. AC P69905; P01922; Q3MIF5; Q96KF1; Q9NYR7; DT 21-JUL-1986, integrated into UniProtKB/Swiss-Prot. DT 23-JAN-2007, sequence version 2. DT 23-SEP-2008, entry version 63. DE RecName: Full=Hemoglobin subunit alpha; DE AltName: Full=Hemoglobin alpha chain; DE AltName: Full=Alpha-globin;

5 ©CMBI 2008 Important records in SwissProt (2) Cross references section: Hyperlinks to all entries in other databases which are relevant for the protein sequence HBA_HUMAN

6 ©CMBI 2008 Important records in SwissProt (3) Features section: post-translational modifications, signal peptides, binding sites, enzyme active sites, domains, disulfide bridges, local secondary structure, sequence conflicts between references etc. etc.

7 ©CMBI 2008 And finally, the amino acid sequence!

8 ©CMBI 2008 Protein Data Bank (PDB) Databank for macromolecular structure data (3-dimensional coordinates). Started ca. 30 years ago (on punched cards!) Obligatory deposit of coordinates in the PDB before publication ~ 50000 entries (April 2008) ( ~2500 “unique” structures) PDB file is a keyword-organised flat-file (80 column) 1) human readable 2) every line starts with a keyword (3-6 letters) 3) platform independent

9 ©CMBI 2008 PDB important records (1) PDB nomenclature Filename= accession number= PDB Code Filename is 4 positions (often 1 digit & 3 letters, e.g. 1CRN) HEADER describes molecule & gives deposition date HEADER PLANT SEED PROTEIN 30-APR-81 1CRN CMPND name of molecule COMPND CRAMBIN SOURCE organism SOURCE ABYSSINIAN CABBAGE (CRAMBE ABYSSINICA) SEED

10 ©CMBI 2008 PDB important records (2) SEQRES Sequence of protein; be aware: Not always all 3d-coordinates are present for all the amino acids in SEQRES!! SEQRES 1 46 THR THR CYS CYS PRO SER ILE VAL ALA ARG SER ASN PHE 1CRN 51 SEQRES 2 46 ASN VAL CYS ARG LEU PRO GLY THR PRO GLU ALA ILE CYS 1CRN 52 SEQRES 3 46 ALA THR TYR THR GLY CYS ILE ILE ILE PRO GLY ALA THR 1CRN 53 SEQRES 4 46 CYS PRO GLY ASP TYR ALA ASN 1CRN 54 SSBOND disulfide bridges SSBOND 1 CYS 3 CYS 40 SSBOND 2 CYS 4 CYS 32

11 ©CMBI 2008 PDB important records (3) and at the end of the PDB file the “real” data: ATOM one line for each atom with its unique name and its x,y,z coordinates ATOM 1 N THR 1 17.047 14.099 3.625 1.00 13.79 1CRN 70 ATOM 2 CA THR 1 16.967 12.784 4.338 1.00 10.80 1CRN 71 ATOM 3 C THR 1 15.685 12.755 5.133 1.00 9.19 1CRN 72 ATOM 4 O THR 1 15.268 13.825 5.594 1.00 9.85 1CRN 73 ATOM 5 CB THR 1 18.170 12.703 5.337 1.00 13.02 1CRN 74 ATOM 6 OG1 THR 1 19.334 12.829 4.463 1.00 15.06 1CRN 75 ATOM 7 CG2 THR 1 18.150 11.546 6.304 1.00 14.23 1CRN 76 ATOM 8 N THR 2 15.115 11.555 5.265 1.00 7.81 1CRN 77 ATOM 9 CA THR 2 13.856 11.469 6.066 1.00 8.31 1CRN 78 ATOM 10 C THR 2 14.164 10.785 7.379 1.00 5.80 1CRN 79 ATOM 11 O THR 2 14.993 9.862 7.443 1.00 6.94 1CRN 80

12 ©CMBI 2008 MRS home page

13 ©CMBI 2008 MRS Search Steps Select database(s) of choice Formulate your query Hit “Search” The result is a “query set” or “hitlist” Analyze the results

14 ©CMBI 2008 Simply type your keywords in the keyword field and choose SEARCH. If you know the fields of the database you are searching in you can specify your query further But think about your query first!! MRS Search options


Download ppt "©CMBI 2008 Databases Data must be in a certain format for software to recognize Every database can have its own format but some data elements are essential."

Similar presentations


Ads by Google