Presentation is loading. Please wait.

Presentation is loading. Please wait.

Molecular biology databases Based on Chapter 2 of Post-genome Informatics by Minoru Kanehisa, Oxford University Press, 2000 2.1 History 2.2 Information.

Similar presentations


Presentation on theme: "Molecular biology databases Based on Chapter 2 of Post-genome Informatics by Minoru Kanehisa, Oxford University Press, 2000 2.1 History 2.2 Information."— Presentation transcript:

1 Molecular biology databases Based on Chapter 2 of Post-genome Informatics by Minoru Kanehisa, Oxford University Press, 2000 2.1 History 2.2 Information Technology 2.3 New generation databases

2 Evolution of molecular biology databases

3 The addresses for the major databases

4 New generation of molecular biology databases

5

6

7 Example of sequence database entry for Genbank LOCUSDRODPPC4001 bpINV15-MAR-1990 DEFINITIOND.melanogaster decapentaplegic gene complex (DPP-C), complete cds. ACCESSIONM30116 KEYWORDS. SOURCED.melanogaster, cDNA to mRNA. ORGANISMDrosophila melanogaster Eurkaryote; mitochondrial eukaryotes; Metazoa; Arthropoda; Tracheata; Insecta; Pterygota; Diptera; Brachycera; Muscomorpha; Ephydroidea; Drosophilidae; Drosophilia. REFERENCE1 (bases 1 to 4001) AUTHORSPadgett, R.W., St Johnston, R.D. and Gelbart, W.M. TITLEA transcript from a Drosophila pattern gene predicts a protein homologous to the transforming growth factor-beta family JOURNALNature 325, 81-84 (1987) MEDLINE87090408 COMMENTThe initiation codon could be at either 1188-1190 or 1587-1589 FEATURESLocation/Qualifiers source1..4001 /organism=“Drosophila melanogaster” /db_xref=“taxon:7227” mRNA<1..3918 /gene=“dpp” /note=“decapentaplegic protein mRNA” /db_xref=“FlyBase:FBgn0000490” gene1..4001 /note=“decapentaplegic” /gene=“dpp” /allele=“” /db_xref=“FlyBase:FBgn0000490” CDS1188..2954 /gene=“dpp” /note=“decapentaplegic protein (1188 could be 1587)” /codon_start=1 /db_xref=“FlyBase:FBgn0000490” /db_xref=“PID:g157292” /translation=“MRAWLLLLAVLATFQTIVRVASTEDISQRFIAAIAPVAAHIPLA SASGSGSGRSGSRSVGASTSTALAKAFNPFSEPASFSDSDKSHRSKTNKKPSKSDANR …………………… LGYDAYYCHGKCPFPLADHFNSTNAVVQTLVNNMNPGKVPKACCVPTQLDSVAMLYL NDQSTBVVLKNYQEMTBBGCGCR” BASE COUNT1170 a1078 c956 g797 t ORIGIN 1 gtcgttcaac agcgctgatc gagtttaaat ctataccgaa atgagcggcg gaaagtgagc 61 cacttggcgt gaacccaaag ctttcgagga aaattctcgg acccccatat acaaatatcg 121 gaaaaagtat cgaacagttt cgcgacgcga agcgttaaga tcgcccaaag atctccgtgc 181 ggaaacaaag aaattgaggc actattaaga gattgttgtt gtgcgcgagt gtgtgtcttc 241 agctgggtgt gtggaatgtc aactgacggg ttgtaaaggg aaaccctgaa atccgaacgg 301 ccagccaaag caaataaagc tgtgaatacg aattaagtac aacaaacagt tactgaaaca 361 gatacagatt cggattcgaa tagagaaaca gatactggag atgcccccag aaacaattca 421 attgcaaata tagtgcgttg cgcgagtgcc agtggaaaaa tatgtggatt acctgcgaac 481 cgtccgccca aggagccgcc gggtgacagg tgtatccccc aggataccaa cccgagccca 541 gaccgagatc cacatccaga tcccgaccgc agggtgccag tgtgtcatgt gccgcggcat 601 accgaccgca gccacatcta ccgaccaggt gcgcctcgaa tgcggcaaca caattttcaa …………………………. 3841 aactgtataa acaaaacgta tgccctataa atatatgaat aactatctac atcgttatgc 3901 gttctaagct aagctcgaat aaatccgtac acgttaatta atctagaatc gtaagaccta 3961 acgcgtaagc tcagcatgtt ggataaatta atagaaacga g //

8 Example of sequence database entry for SWISS-PROT IDDECA_DROMESTANDARD;PRT;588AA. ACP07713; DT01-APR-1988 (REL. 07, CREATED) DT01-APR-1988 (REL. 07, LAST SEQUENCE UPDATE) DT01-FEB-1995 (REL. 31, LAST ANNOTATION UPDATE) DEDECAPENTAPLEGIC PROTEIN PRECURSOR (DPP-C PROTEIN). GNDPP. OSDROSOPHILA MELANOGASTER (FRUIT FLY). OCEUKARYOTA; METAZOA; ARTHROPODA; INSECTA; DIPTERA. RN[1] RPSEQUENCE FROM N.A. RM87090408 RAPADGETT R.W., ST JOHNSTON R.D., GELBART W.M.; RLNATURE 325:81-84 (1987) RN[2] RPCHARACTERIZATION, AND SEQUENCE OF 457-476. RM90258853 RAPANGANIBAN G.E.F., RASHKA K.E., NEITZEL M.D., HOFFMANN F.M.; RLMOL. CELL. BIOL. 10:2669-2677(1990). CC-!- FUNCTION: DPP IS REQUIRED FOR THE PROPER DEVELOPMENT OF THE CC EMBRYONIC DOORSAL HYPODERM, FOR VIABILITY OF LARVAE AND FOR CELL CC VIABILITY OF THE EPITHELIAL CELLS IN THE IMAGINAL DISKS. CC-!- SUBUNIT: HOMODIMER, DISULFIDE-LINKED. CC-!- SIMILARITY: TO OTHER GROWTH FACTORS OF THE TGF-BETA FAMILY. DREMBL; M30116; DMDPPC. DRPIR; A26158; A26158. DRHSSP; P08112; 1TFG. DRFLYBASE; FBGN0000490; DPP. DRPROSITE; PS00250; TGF_BETA. KWGROWTH FACTOR; DIFFERENTIATION; SIGNAL. FTSIGNAL1?POTENTIAL. FTPROPEP?456 FTCHAIN457588DECAPENTAPLEGIC PROTEIN. FTDISULFID487553BY SIMILARITY. FTDISULFID516585BY SIMILARITY. FTDISULFID520587BY SIMILARITY. FTDISULFID552552INTERCHAIN (BY SIMILARITY). FTCARBOHYD120120POTENTIAL. FTCARBOHYD342342POTENTIAL. FTCARBOHYD377377POTENTIAL. FTCARBOHYD529529POTENTIAL. SQSEQUENCE 588 AA; 65850MW; 1768420 CN; MRAWLLLLAV LATFQTIVRV ASTEDISQRF IAAIAPVAAH IPLASASGSG SGRSGSRSVG ASTSTAGAKA FNRFSEPASF SDSDKSHRSK TNKKPSKSDA NRQFNEVHKP RTDQLENSKN KSKQLVNKPN HNKMAVKEQR SHHKKSHHHR SHQPKQASAS TESHQSSSIE SIFVEEPTLV LDREVASINV PANAKAIIAE QGPSTYSKEA LIKDKLKPDP STYLVEIKSL LSLFNMKRPP KIDRSKIIIP EPMKKLYAEI MGHELDSVNI PKPGLLTKSA NTVRSFTHKD SKIDDRFPHH HRFRLHFDVK SIPADEKLKA AELQLTRDAL SQQVVASRSS ANRTRYQBLV YDITRVGVRG QREPSYLLLD TKTBRLNSTD TVSLDVQPAV DRWLASPQRN YGLLVEVRTV RSLKPAPHHH VRLRRSADEA HERWQHKQPL LFTYTDDGRH DARSIRDVSG GEGGGKGGRN KRHARRPTRR KNHDDTCRRH SLYVDFSDVG WDDWIVAPLG YDAYYCHGKC PFPLADHRNS TNHAVVQTLV NNMNPGKBPK ACCBPTQLDS VAMLYLNDQS TVVLKNYQEM TVVGCGCR

9 Functional classification of E. coli genes according to Monica Riley

10 Relational database. A table (relation) is a set and the three basic table operations shown here are extensions of the standard set operations. Paper 1 Paper 2 Paper 3 Paper 4.. MUID Journal Volume Pages Year SELECT PROJECT MUID Author Author 1-1 Author 1-2 Author 2-1 Author 2-2 Author 2-3 Author 3-1.. JOIN MUID Journal Volume Pages Year Author

11

12

13 A history of database technology development Object-oriented Programming (Kay, 1972) Relational database (Codd, 1970) Logic programming (Kowalski, 1972) Object-oriented Database (1986) Deductive database\ (1977) Deductive, object- Oriented database (1989)

14

15 Multimedia in GenomeNet

16

17 Pancreatic trypsin inhibitor PDB: 4PTI ribbon model and variant with cylinder for alpha helix (figures from PDB)

18

19 The periodic table of chemical elements where the shaded elements are those normally found in biology.

20 Biologically important classes of organic compounds derived from the six basic elements

21 The 20 common amino acids

22 BLO(ck)SU(bstitution)M(atrix) (Henikoff & Henikoff 1992) Derived from a set (2000) of aligned and ungapped regions from protein families; emphasizing more on chemical similarities (versus how easy it is to mutate from one residue to another). BLOSUMx is derived from the set of segments of x% identity. BLOSUM62 Matrix, log-odds representation

23 Substitution/Scoring Matrices Pam matrices ( Dayhoff et al. 1978 ) --- phylogeny-based. PAM250 matrix, log-odds representation PAM1: expected number of mutation = 1%

24

25 A hidden Markov model for sequence analysis d1d1 d2d2 d3d3 d4d4 I0I0 I2I2 I3I3 I4I4 I1I1 m0m0 m1m1 m2m2 m3m3 m4m4 m5m5 Start End m= match state (output), I = insert state (output), d= delete state (no output)

26

27 Globin fold  protein myoglobin PDB: 1MBN

28  sandwich  protein immunoglobulin PDB: 7FAB

29 TIM barrel  /  protein Triose phosphate IsoMerase PDB: 1TIM

30 A fold in  +  protein ribonuclease A PDB: 7RSA

31

32 434 Cro protein complex (phage) PDB: 3CRO

33 Zinc finger DNA recognition (Drosophila) PDB: 2DRP..YRCKVCSRVY THISNFCRHY VTSH...

34 Leucine zipper (yeast) PDB: 1YSA..RA RKLQRMKQLE DKVEE LLSKN YHLENEVARL...

35 The orthologue group table for F1-F0 ATP synthase (upper) and V-type ATP synthase (lower).

36

37 Reactions and interactions Biochemical pathways Genome diversity Note notion of Enzyme Commission (EC) number.

38 The tree of life showing the relationship of archaea, bacteria, and eukaryotes, as well as the relationship of fungi, plants and animals. Animals Fungi Plants Protists Eukaryotes Archae Bacteria

39


Download ppt "Molecular biology databases Based on Chapter 2 of Post-genome Informatics by Minoru Kanehisa, Oxford University Press, 2000 2.1 History 2.2 Information."

Similar presentations


Ads by Google