Presentation on theme: "SWISS-PROT The SWISS-PROT database consists of sequence entries. It contains high-quality annotation, is non-redundant and cross- referenced to many other."— Presentation transcript:
SWISS-PROT The SWISS-PROT database consists of sequence entries. It contains high-quality annotation, is non-redundant and cross- referenced to many other databases. Release 39.0 of SWISS-PROT contains 86,593 sequence entries. SWISS-PROT is accompanied by TrEMBL, a computer- annotated supplement to SWISS-PROT. TrEMBL contains the translations of all coding sequences (CDS) present in the EMBL Nucleotide Sequence Database, which are not yet integrated into SWISS-PROT.
TrEMBL TrEMBL release 17 (June 2001) was created from the EMBL Nucleotide Sequence Database release 66 and updates up to and contains 540,195 sequence entries, comprising 155,771,315 amino acids. TrEMBL is split into two main sections; SP-TrEMBL and REM- TrEMBL. SP-TrEMBL (SWISS-PROT TrEMBL) contains the entries which should eventually be incorporated into SWISS- PROT and can be considered as a preliminary section of SWISS-PROT as all SP-TrEMBL entries have been assigned SWISS-PROT accession numbers. REM-TrEMBL (REMaining TrEMBL) contains the entries that we do not want to include in SWISS-PROT. REM-TrEMBL entries have no accession numbers.
Protein Information Resource (PIR) The Protein Information Resource (PIR), in collaboration with MIPS and JIPID, produces the PIR- International Protein Sequence Database (PIR-PSD) -- a comprehensive, non-redundant, expertly annotated, fully classified and extensively cross-referenced protein sequence database in the public domain. The PIR-PSD, iProClass and other PIR auxiliary databases provide an integration of sequences, functional, and structural information to support genomics and proteomics research.
Nyhet: UniProt UniProt (Universal Protein Resource) is the world's most comprehensive catalogue of information on proteins. It is a central repository of protein sequence and function created by joining the information contained in Swiss-Prot, TrEMBL, and PIR. UniProt is comprised of three components, each optimised for different uses. The UniProt Knowledgebase (UniProt) is the central access point for extensive curated protein information, including function, classification, and cross-reference. The UniProt Non-redundant Reference (UniRef) databases combine closely related sequences into a single record to speed searches. The UniProt Archive (UniParc) is a comprehensive repository, reflecting the history of all protein sequences. The sequences and information in UniProt is accessible via text search, BLAST similarity search, and FTP. text searchBLAST similarity searchFTP
Entrez at NCBI Entrez is a retrieval system for searching several linked databases. It provides access to: PubMed: The biomedical literature (PubMed) Nucleotide sequence database (Genbank) Protein sequence database Structure: three-dimensional macromolecular structures Genome: complete genome assemblies PopSet: population study data sets OMIM: Online Mendelian Inheritance in Man Taxonomy: organisms in GenBank Books: online books ProbeSet: Gene Expression Omnibus (GEO) 3D Domains: domains from Entrez Structure PubMed Nucleotide Protein Structure Genome PopSet OMIM Taxonomy Books ProbeSet 3D Domains Go to NCBI
Database links in Entrez
SRS at EBI SRS is a powerful data integration platform, providing rapid, easy and user friendly access to the large volumes of diverse and heterogeneous Life Science data stored in more than 400 internal and public domain databases. SRS enables the querying of diverse biological and Life Science data through only one interface, SRS facilitates the rapid development of applications and algorithms, as well as bioinformatics portals for the Inter- or Intranet, making the data efficiently available to entire organizations. Today, SRS is answering the most demanding requirements of modern Life Science companies and will truly add value to their research programs.
SRS enables: Fast access to diverse life science data - genetic, protein, cellular, molecular, and clinical - for researchers and bioinformaticians Integration of public and proprietary data through one interface Unique ability to perform cross-database queries Rapid string search of large volumes of data Scalability to the customer's specific requirements EBI, the European Bioinformatics Institute (EMBL Outstation, Hinxton, UK)
Forskjellige sekvensformater Her er en sekvens i GCG-format EXTRACTPEPTIDE of frames: C from: caupol.map (Linear) MAP of: caupol.raw check: 2457 from: 1 to: 3957 Frame C from: 1 to: 1318 caupol.pep Length: 941 August 27, :35 Type: P Check: MAYPLLVLVD GHALAYRAFF ALRESGLRSS RGEPTYAVFG FAQILLTALA 51 EYRPDYAAVA FDVGRTFRDD LYAEYKAGRA ETPEEFYPQF ERIKQLVQAL 101 NIPIYTAEGY EADDVIGTLA RQATERGVDT IILTGDSDVL QLVNDHVRVA 151 LANPYGGKTS VTLYDLEQVR KRYDGLEPDQ LADLRGLKGD TSDNIPGVRG Her er en annen i FASTA-format >ECPOLA V00317 E. coli gene polA coding for DNA polymerase I. 9/93 CACCGGGCAACGGCGGCAGAAGTGTTTGGTTTGCCACTGGAAACCGTCACCAGCGAGCAA CGCCGTAGCGCGAAAGCGATCAACTTTGGTCTGATTTATGGCATGAGTGCTTTCGGTCTG GCGCGGCAATTGAACATTCCACGTAAAGAAGCGCAGAAGTACATGGACCTTTACTTCGAA CGCTACCCTGGCGTGCTGGAGTATATGGAACGCACCCGTGCTCAGGCGAAAGAGCAGGGC TACGTTGAAACGCTGGACGGACGCCGTCTGTATCTGCCGGATATCAAATCCAGCAATGGT GCTCGTCGTGCAGCGGCTGAACGTGCAGCCATTAACGCGCCAATGCAGGGAACCGCCGCC GA Mens dette er et eksempel på en ren tekstfil CGCCGTAGCGCGAAAGCGATCAACTTTGGTCTGATTTATGGCATGAGTGCTTTCGGTCTG GCGCGGCAATTGAACATTCCACGTAAAGAAGCGCAGAAGTACATGGACCTTTACTTCGAA CGCTACCCTGGCGTGCTGGAGTATATGGAACGCACCCGTGCTCAGGCGAAAGAGCAGGGC TACGTTGAAACGCTGGACGGACGCCGTCTGTATCTGCCGGATATCAAATCCAGCAATGGT GCTCGTCGTGCAGCGGCTGAACGTGCAGCCATTAACGCGCCAATGCAGGGAACCGCCGCC GA
Hvordan oversette fra et format til et annet? ReadSeq ReadSeq kan oversette fra og til 21 forskjellige sekvensformater