Presentation is loading. Please wait.

Presentation is loading. Please wait.

Jan Pačes Ústav molekulární genetiky Jiří Vondrášek Ústav organické chemie a biochemie

Similar presentations


Presentation on theme: "Jan Pačes Ústav molekulární genetiky Jiří Vondrášek Ústav organické chemie a biochemie"— Presentation transcript:

1 Jan Pačes Ústav molekulární genetiky Jiří Vondrášek Ústav organické chemie a biochemie Bioinformatika pro PřfUK 2001

2 Databáze: obsah principy SQL formáty biologických sekvencí IUB kódy DNA databáze proteinové a genomové databáze strukturní databáze

3 organizace databází Relační databáze c_ididentifikátor, číslo titletext journalkrátký text yeardatum …… a_ididentifikátor c_ididentifikátor namekrátký text k_ididentifikátor c_ididentifikátor keywordkrátký text

4 SQL: Structured Query Language c_ididentifikátor, číslo titletext journalkrátký text yeardatum …… CREATE TABLE article ( c_idINTEGER, titleTEXT, journalVARCHAR(30), yearDATE );

5 SQL: Structured Query Language CREATE TABLE author ( a_idINTEGER, c_idINTEGER, nameVARCHAR(30) ); a_ididentifikátor c_ididentifikátor namekrátký text

6 SQL: Structured Query Language INSERT INTO article SET c_id='1', title='Something absolutely fantastic', journal='Bioinformatics', year='2002'; INSERT INTO author SET a_id='1', c_id='1', name='Paces, Jan'; INSERT INTO author SET a_id='2', c_id='1', name='Vondrasek, Jiri';

7 SQL: Structured Query Language SELECT article.title,article.journal,author.name FROM article,journal WHERE article.c_id = author.c_id AND article.year > '2000' AND author.name LIKE 'Paces%';

8 kódnukleotidykomplement AAT CCG GGC TTA (UU)A MACK RAGY WATS SCGW YCTR KGTM VACGB HACTD DAGTH BCGTV NACGTN -mezera- kódtřípísmenný kódaminokyselina AAlaalanin CCyscystein DAspasparagová kyselina GGluglutamová kyselina HHishistidin IIleisoleucin KLyslysin LLeuleucin MMetmethionin NAsnasparagin PProprolin QGlnglutamin RArgarginin SSerserin TThrthreonin VValvalin WTrptryptofan YTyrtyrosin BAsxasparagová kys. nebo asparagin ZGlxglutamová kys. nebo glutamin XXxxjakákoliv aminokyselina *---stop nukleotidyaminokyseliny IUB kódy

9 binárnís chromatogramy pro programy minimální anotované textové SCF ALF ABI interní formáty databází text fasta EMBL GenBank ASN XML formáty sekvencí

10 SCF (standart chromatogram file) formáty sekvencí - SCF

11 EMBL (formát databáze EMBL) ID AF standard; RNA; ROD; 1379 BP. XX AC AF031150; XX SV AF XX DT 27-FEB-1998 (Rel. 54, Created) DT 27-FEB-1998 (Rel. 54, Last updated, Version 1) XX DE Mus musculus paired-box transcription factor (Pax4) mRNA, complete cds. XX KW. XX OS Mus musculus (house mouse) OC Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; OC Eutheria; Rodentia; Sciurognathi; Muridae; Murinae; Mus. XX RN [1] RP RA Inoue H., Nomiyama J., Nakai K., Matsutani A., Tanizawa Y., Oka Y.; RT Isolation of full-length cDNA of mouse PAX4 gene and identification of its RT human homologue; RL Biochem. Biophys. Res. Commun. 243: (1998). XX RN [2] RP RA Inoue H., Nomiyama J., Nakai K., Tanizawa Y., Oka Y.; RT ; RL Submitted (23-OCT-1997) to the EMBL/GenBank/DDBJ databases. RL Third Dept. of Int. Med., Yamaguchi University, 1144 Kogushi, Ube, RL Yamaguchi 755, Japan XX FH Key Location/Qualifiers … formáty sekvencí - EMBL

12 … FH Key Location/Qualifiers FH FT source FT /db_xref=taxon:10090 FT /organism=Mus musculus FT /cell_line=MIN6 FT CDS FT /codon_start=1 FT /gene=Pax4 FT /product=paired-box transcription factor FT /protein_id=AAC FT /translation=MQQDGLSSVNQLGGLFVNGRPLPLDTRQQIVQLAIRGMRPCDISR FT SLKVSNGCVSKILGRYYRTGVLEPKCIGGSKPRLATPAVVARIAQLKDEYPALFAWEIQ FT HQLCTEGLCTQDKAPSVSSINRVLRALQEDQSLHWTQLRSPAVLAPVLPSPHSNCGAPR FT GPHPGTSHRNRTIFSPGQAEALEKEFQRGQYPDSVARGKLAAATSLPEDTVRVWFSNRR FT AKWRRQEKLKWEAQLPGASQDLTVPKNSPGIISAQQSPGSVPSAALPVLEPLSPSFCQL FT CCGTAPGRCSSDTSSQAYLQPYWDCQSLLPVASSSYVEFAWPCLTTHPVHHLIGGPGQV FT PSTHCSNWP XX SQ Sequence 1379 BP; 327 A; 402 C; 347 G; 303 T; 0 other; aaaaaaaaaa aaaaagcggc cgctgaattc tagcagaagg ctgccctctg ctcctgagtg 60 aaggctctgt gaagctctgg accccctggc aggactgaag cagctggagg ctgttacaag 120 accagaccac cagcaaaccc tggagcctgc acaggaccct gagacctctt cctggaattc 180 ccaccttttt tcctccatcc agaaccagtc ccaaagagaa acttccagaa ggagctctcc 240 gttttcagtt tgccagttgg cttcctgtcc ttctgtgagg agtaccagtg tgaagcatgc 300 agcaggacgg actcagcagt gtgaatcagc tagggggact ctttgtgaat ggccggcccc 360 … gctgtgggac agcaccaggc agatgttcca gtgacacctc atcccaggcc tatctccaac 1200 cctactggga ctgccaatcc ctccttcctg tggcttcctc ctcatatgtg gaatttgcct 1260 ggccctgcct caccacccat cctgtgcatc atctgattgg aggcccagga caagtgccat 1320 caacccattg ctcaaactgg ccataagagg cctctatttg acagtaataa aaacctttt 1379 // EMBL (formát databáze EMBL) formáty sekvencí - EMBL

13 Genbank LOCUS AF bp mRNA ROD 23-OCT-1999 DEFINITION Mus musculus transcription factor PAX4 (Pax4) mRNA, complete cds. ACCESSION AF VERSION AF GI: KEYWORDS. SOURCE house mouse. ORGANISM Mus musculus Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; Eutheria; Rodentia; Sciurognathi; Muridae; Murinae; Mus. REFERENCE 1 (bases 1 to 1360) AUTHORS Kalousova,A., Benes,V., Paces,J., Paces,V. and Kozmik,Z. TITLE DNA binding and transactivating properties of the paired and homeobox protein Pax4 JOURNAL Biochem. Biophys. Res. Commun. 259 (3), (1999) MEDLINE PUBMED REFERENCE 2 (bases 1 to 1360) AUTHORS Kalousova,A., Paces,J. and Kozmik,Z. TITLE Direct Submission JOURNAL Submitted (23-APR-1999) Dept. of Transcription Regulation, Institute of Molecular Genetics, Videnska 1083, Prague , Czech Republic FEATURES Location/Qualifiers source /organism="Mus musculus" /db_xref="taxon:10090" gene /gene="Pax4" CDS /gene="Pax4" /note="DNA binding protein; paired box protein; homeobox protein" /codon_start=1 /product="transcription factor PAX4" /protein_id="AAF " … formáty sekvencí - GenBank

14 CDS /gene="Pax4" /note="DNA binding protein; paired box protein; homeobox protein" /codon_start=1 /product="transcription factor PAX4" /protein_id="AAF " /db_xref="GI: " /translation="MQQDGLSSVNQLGGLFVNGRPLPLDTRQQIVQLAIRGMRPCDIS RSLKVSNGCVSKILGRYYRTGVLEPKCIGGSKPRLATPAVVARIAQLKDEYPALFAWE IQHQLCTEGLCTQDKAPSVSSINRVLRALQEDQSLHWTQLRSPAVLAPVLPSPHSNCG APRGPHPGTSHRNRTIFSPGQAEALEKEFQRGQYPDSVARGKLAAATSLPEDTVRVWF SNRRAKWRRQEKLKWEAQLPGASQDLTVPKNSPGIISAQQSPGSVPSAALPVLEPLSP SFCQLCCGTAPGRCSSDTSSQAYLQPYWDCQSLLPVASSSYVEFAWPCLTTHPVHHLI GGPGQVPSTHCSNWP" BASE COUNT 359 a 381 c 328 g 292 t ORIGIN 1 tggcaggact gaagcagctg gaggctgtta caagaccaga ccaccagcaa accctggagc 61 ctgcacagga ccctgagacc tcttcctgga attcccacct tttttcctcc atccagaacc 121 agtcccaaag agaaacttcc agaaggagct ctccgttttc agtttgccag ttggcttcct 181 gtccttctgt gaggagtacc agtgtgaagc atgcagcagg acggactcag cagtgtgaat … 1081 tccagtgaca cctcatccca ggcctatctc caaccctact gggactgcca atccctcctt 1141 cctgtggctt cctcctcata tgtggaattt gcctggccct gcctcaccac ccatcctgtg 1201 catcatctga ttggaggccc aggacaagtg ccatcaaccc attgctcaaa ctggccataa 1261 gaggcctcta tttgacagta ataaaaacct tttcttagat gttaaaaaaa aaaaaaaaaa 1321 aaaaaaaaaa aaaaaaaaaa aaaaaaaaaa aaaaaaaaaa // Genbank formáty sekvencí - GenBank

15 fasta >gi| |gb|AF |AF Mus musculus transcription factor PAX4 (Pax4) mRNA, complete cds TGGCAGGACTGAAGCAGCTGGAGGCTGTTACAAGACCAGACCACCAGCAAACCCTGGAGCCTGCACAGGA CCCTGAGACCTCTTCCTGGAATTCCCACCTTTTTTCCTCCATCCAGAACCAGTCCCAAAGAGAAACTTCC AGAAGGAGCTCTCCGTTTTCAGTTTGCCAGTTGGCTTCCTGTCCTTCTGTGAGGAGTACCAGTGTGAAGC ATGCAGCAGGACGGACTCAGCAGTGTGAATCAGCTAGGGGGACTCTTTGTGAATGGCCGGCCCCTTCCTC TGGACACCAGGCAGCAGATTGTGCAGCTAGCAATAAGAGGGATGCGACCCTGTGACATTTCACGGAGCCT TAAGGTATCTAATGGCTGTGTGAGCAAGATCCTAGGACGCTACTACCGCACAGGTGTCTTGGAACCCAAG TGTATTGGGGGAAGCAAACCACGTCTGGCCACACCTGCTGTGGTGGCTCGAATTGCCCAGCTAAAGGATG AGTACCCTGCTCTTTTTGCCTGGGAGATCCAACACCAGCTTTGCACTGAAGGGCTTTGTACCCAGGACAA GGCTCCCAGTGTGTCCTCTATCAATCGAGTACTTCGGGCACTTCAGGAAGACCAGAGCTTGCACTGGACT CAACTCAGATCACCAGCTGTGTTGGCTCCAGTTCTTCCCAGTCCCCACAGTAACTGTGGGGCTCCCCGAG GCCCCCACCCAGGAACCAGCCACAGGAATCGGACTATCTTCTCCCCGGGACAAGCCGAGGCACTGGAGAA AGAGTTTCAGCGTGGGCAGTATCCAGATTCAGTGGCCCGTGGGAAGCTGGCTGCTGCCACCTCTCTGCCT GAAGACACGGTGAGGGTTTGGTTTTCTAACAGAAGAGCCAAATGGCGCAGGCAAGAGAAGCTGAAATGGG AAGCACAGCTGCCAGGTGCTTCCCAGGACCTGACAGTACCAAAAAATTCTCCAGGGATCATCTCTGCACA GCAGTCCCCCGGCAGTGTACCCTCAGCTGCCTTGCCTGTGCTGGAACCATTGAGTCCTTCCTTCTGTCAG CTATGCTGTGGGACAGCACCAGGCAGATGTTCCAGTGACACCTCATCCCAGGCCTATCTCCAACCCTACT GGGACTGCCAATCCCTCCTTCCTGTGGCTTCCTCCTCATATGTGGAATTTGCCTGGCCCTGCCTCACCAC CCATCCTGTGCATCATCTGATTGGAGGCCCAGGACAAGTGCCATCAACCCATTGCTCAAACTGGCCATAA GAGGCCTCTATTTGACAGTAATAAAAACCTTTTCTTAGATGTTAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAA formáty sekvencí - FastA

16 ASN Seq-entry ::= set { class nuc-prot, descr { title "Mus musculus transcription factor PAX4 (Pax4) mRNA, complete cds.", source { org { taxname "Mus musculus", common "house mouse", db { { db "taxon", tag id } }, orgname { name binomial { genus "Mus", species "musculus" }, lineage "Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; Eutheria; Rodentia; Sciurognathi; Muridae; Murinae; Mus", gcode 1, mgcode 2, div "ROD" } } }, pub { pub { sub { authors { names std formáty sekvencí - ASN

17 Bioinformatic Links

18 GenBank

19 Swiss-Prot

20 Entrez Literature (PubMed) Nucleotide (GenBank) Protein (PIR) Genome Structure (PDB) PopSet Taxonomy OMIM

21 Entrez

22

23

24 SRS

25

26

27

28

29

30

31 SRS - list

32

33

34 PDB

35

36

37 HEADER GENE REGULATION/DNA 22-APR-99 6PAX TITLE CRYSTAL STRUCTURE OF THE HUMAN PAX-6 PAIRED DOMAIN-DNA TITLE 2 COMPLEX REVEALS A GENERAL MODEL FOR PAX PROTEIN-DNA TITLE 3 INTERACTIONS COMPND MOL_ID: 1; COMPND 2 MOLECULE: HOMEOBOX PROTEIN PAX-6; COMPND 3 CHAIN: A; COMPND 4 ENGINEERED: YES; COMPND 5 BIOLOGICAL_UNIT: MONOMER; COMPND 6 MOL_ID: 2; COMPND 7 MOLECULE: 26 NUCLEOTIDE DNA; COMPND 8 CHAIN: B; COMPND 9 ENGINEERED: YES; COMPND 10 BIOLOGICAL_UNIT: MONOMER; COMPND 11 MOL_ID: 3; COMPND 12 MOLECULE: 26 NUCLEOTIDE DNA; COMPND 13 CHAIN: C; COMPND 14 ENGINEERED: YES; COMPND 15 BIOLOGICAL_UNIT: MONOMER SOURCE MOL_ID: 1; SOURCE 2 ORGANISM_SCIENTIFIC: HOMO SAPIENS; SOURCE 3 ORGANISM_COMMON: HUMAN; SOURCE 4 GENE: PAX6; SOURCE 5 EXPRESSION_SYSTEM: ESCHERICHIA COLI; SOURCE 6 EXPRESSION_SYSTEM_STRAIN: BL21(DE3); SOURCE 7 MOL_ID: 2; SOURCE 8 SYNTHETIC: YES; SOURCE 9 MOL_ID: 3; SOURCE 10 SYNTHETIC: YES KEYWDS PAX, PAIRED DOMAIN, TRANSCRIPTION, PROTEIN-DNA INTERACTIONS, KEYWDS 2 GENE REGULATION/DNA EXPDTA X-RAY DIFFRACTION AUTHOR H.E.XU,M.A.ROULD,W.XU,J.A.EPSTEIN,R.L.MAAS,C.O.PABO REVDAT 1 13-JUL-99 6PAX 0 JRNL AUTH H.E.XU,M.A.ROULD,W.XU,J.A.EPSTEIN,R.L.MAAS,C.O.PABO JRNL TITL CRYSTAL STRUCTURE OF THE HUMAN PAX-6 PAIRED JRNL TITL 2 DOMAIN-DNA COMPLEX REVEALS SPECIFIC ROLES FOR THE JRNL TITL 3 LINKER REGION AND THE CARBOXY-TERMINAL SUBDOMAIN JRNL TITL 4 IN DNA BINDING

38 PDB SEQRES 1 A 133 SER HIS SER GLY VAL ASN GLN LEU GLY GLY VAL PHE VAL SEQRES 2 A 133 ASN GLY ARG PRO LEU PRO ASP SER THR ARG GLN ARG ILE SEQRES 3 A 133 VAL GLU LEU ALA HIS SER GLY ALA ARG PRO CYS ASP ILE SEQRES 4 A 133 SER ARG ILE LEU GLN VAL SER ASN GLY CYS VAL SER LYS SEQRES 5 A 133 ILE LEU GLY ARG TYR TYR ALA THR GLY SER ILE ARG PRO SEQRES 6 A 133 ARG ALA ILE GLY GLY SER LYS PRO ARG VAL ALA THR PRO SEQRES 7 A 133 GLU VAL VAL SER LYS ILE ALA GLN TYR LYS GLN GLU CYS SEQRES 8 A 133 PRO SER ILE PHE ALA TRP GLU ILE ARG ASP ARG LEU LEU SEQRES 9 A 133 SER GLU GLY VAL CYS THR ASN ASP ASN ILE PRO SER VAL SEQRES 10 A 133 SER SER ILE ASN ARG VAL LEU ARG ASN LEU ALA SER GLU SEQRES 11 A 133 LYS GLN GLN SEQRES 1 B 26 A A G C A T T T T C A C G SEQRES 2 B 26 C A T G A G T G C A C A G SEQRES 1 C 26 T T C T G T G C A C T C A SEQRES 2 C 26 T G C G T G A A A A T G C FORMUL 4 HOH *84(H2 O1) HELIX 1 1 ASP A 20 HIS A HELIX 2 2 PRO A 36 LEU A HELIX 3 3 ASN A 47 THR A HELIX 4 4 PRO A 78 GLU A HELIX 5 5 ALA A 96 SER A HELIX 6 6 VAL A 117 GLU A SHEET 1 A 2 SER A 3 VAL A 5 0 SHEET 2 A 2 VAL A 11 VAL A N PHE A 12 O GLY A 4 CRYST P ORIGX ORIGX ORIGX SCALE SCALE SCALE ATOM 1 N SER A N ATOM 2 CA SER A C ATOM 3 C SER A C ATOM 4 O SER A O ATOM 5 CB SER A C ATOM 6 OG SER A O ATOM 7 H SER A H ATOM 8 HG SER A H ATOM 9 N HIS A N

39 SCOP

40 PDBsum

41

42

43 CATH

44

45 FSSP - Fold classification

46 Structural genomics

47 Bioinformatické WWW rozcestníky EBI:http://www.ebi.ac.uk/Tools Expasy:http://www.expasy.ch Pasteur:http://bioweb.pasteur.fr Lyon:http://pbil.univ-lyon1.fr NCBI:http://ncbi.nlm.nih.gov

48 EBI

49 ExPASy

50 PBIL

51 Pasteur

52 Bioinformatic Links


Download ppt "Jan Pačes Ústav molekulární genetiky Jiří Vondrášek Ústav organické chemie a biochemie"

Similar presentations


Ads by Google