Presentation is loading. Please wait.

Presentation is loading. Please wait.

Introduction to Databases

Similar presentations


Presentation on theme: "Introduction to Databases"— Presentation transcript:

1 Introduction to Databases

2 INTRODUCTION

3 DATA Data is raw, unorganized facts that need to be processed. Example:- Each student's test score is one piece of data. INFORMATION When data is processed, organized, structured or presented in a given context so as to make it useful, it is called information. Example:- score of a class or of the average entire school is information that can be derived from the given data.

4 Database A database is a collection of data in an organized manner, which is accessible in various ways. Biological Databases serve a critical purpose in the collection and organization of data related to biological systems. They provide a computational support and a user-friendly interface to a researcher for a meaningful analysis of biological data.

5 A database is a computerized archive used to store and organize data in such a way that information can be retrieved easily via a variety of search criteria. Databases are composed of computer hardware and software for data management. The chief objective of the development of a database is to organize data in a set of structured records to enable easy retrieval of information. Each record, also called an entry, should contain a number of fields that hold the actual data items, for example, fields for names, phone numbers, addresses, dates.

6 WHAT ARE THE BIOLOGICAL DATABASES ???

7

8 Different classifications of databases
Type of data nucleotide sequences protein sequences proteins sequence patterns or motifs macromolecular 3D structure gene expression data metabolic pathways

9

10 Different classifications of databases….
Primary or derived databases Primary databases: experimental results directly into database Secondary databases: results of analysis of primary databases Aggregate of many databases Links to other data items Combination of data Consolidation of data

11 Different classifications of databases….
Availability Publicly available, no restrictions Available, but with copyright Accessible, but not downloadable Academic, but not freely available Proprietary, commercial; possibly free for academics

12 TYPES OF DATABASES Primary Databases Secondary Databases

13 PRIMARY DATABASES  Contains bio-molecular data in its original form.  Experimental results are submitted directly into the database by researchers, and the data are essentially archival in nature.  Once given a database accession number, the data in primary databases are never changed.  Examples :- GenBank, EMBL and DDBJ for DNA/RNA sequences, SWISS-PROT and PIR for protein sequences and PDB for molecular structures.

14 GenBank /genbank/ Database from NCBI, includes sequences from publicly available resources.

15 NCBI and Entrez One of the largest and most comprehensive databases belonging to the NIH – national institute of health (USA) Entrez is the search engine of NCBI Search for : genes, proteins, genomes, structures, diseases, publications and more.

16 Genbank An annotated collection of all publicly available nucleotide and proteins Set up in 1979 at the LANL (Los Alamos). Maintained since 1992 NCBI (Bethesda).

17 GenBank file format

18 GenBank file format

19

20 EMBL  European Molecular Biological Laboratory
/  European Molecular Biological Laboratory  Nucleic acid database from EBI (European Bioinformatics Institute)  Produced in collaboration with DDBJ and GenBank  Search engine – SRS (Sequence Retrieval System)

21 DDBJ  DNA Databank of Japan
 DNA Databank of Japan  Started in 1986 in collaboration with GenBank  Produced and maintained at NIG (National Institute of Genetics)

22 SWISS PROT Annotated sequence database established in 1986
Annotated sequence database established in 1986 Consists of sequence entries of different lie formats Similar format to EMBL …...

23 PIR Protein Information Resource
/ Protein Information Resource A division of National Biomedical Research Foundation (NBRF) in U.S. One can search for entries or do sequence similarity search at PIR site.

24 TrEMBL  Translated European Molecular Biology Laboratory
 Translated European Molecular Biology Laboratory  Computer annotated supplement of SWISS PROT.  Contains all the translations of EMBL nucleotide sequence entries not yet integrated in SWISS PROT.

25 Protein DataBank (PDB)
Important in solving real problems in molecular biology Protein Databank PDB Established in 1972 at Brookhaven National Laboratory (BNL) Sole international repository of macromolecular structure data Moved to Research Collaboratory for Structural Bioinformatics

26 PDB: example HEADER LYASE(OXO-ACID) 01-OCT-91 12CA 12CA 2
COMPND CARBONIC ANHYDRASE /II (CARBONATE DEHYDRATASE) (/HCA II) 12CA 3 SOURCE HUMAN (HOMO SAPIENS) RECOMBINANT PROTEIN CA 5 AUTHOR S.K.NAIR,D.W.CHRISTIANSON CA 6 REVDAT OCT-92 12CA CA 7 JRNL AUTH S.K.NAIR,T.L.CALDERONE,D.W.CHRISTIANSON,C.A.FIERKE 12CA 8 JRNL TITL ALTERING THE MOUTH OF A HYDROPHOBIC POCKET CA 9 JRNL TITL 2 STRUCTURE AND KINETICS OF HUMAN CARBONIC ANHYDRASE 12CA 10 JRNL TITL 3 /II$ MUTANTS AT RESIDUE VAL CA 11 JRNL REF J.BIOL.CHEM V CA 12 JRNL REFN ASTM JBCHA3 US ISSN CA 13 REMARK CA 14EMARK AUTHORS HENDRICKSON,KONNERT CA 20 REMARK R VALUE CA 21 REMARK RMSD BOND DISTANCES ANGSTROMS CA 22 REMARK RMSD BOND ANGLES DEGREES CA 23 REMARK CA 24 REMARK 4 N-TERMINAL RESIDUES SER 2, HIS 3, HIS 4 AND C-TERMINAL CA 25 REMARK 4 RESIDUE LYS 260 WERE NOT LOCATED IN THE DENSITY MAPS AND, 12CA 26 REMARK 4 THEREFORE, NO COORDINATES ARE INCLUDED FOR THESE RESIDUES. 12CA 27 ………

27 COMPOSITE DATABASES  Collection of various primary database sequences
 Renders sequence searching highly efficient as it searches multiple resources  Examples :- NRDB (Non Redundant Database), OWL, MIPSX, SWISS PROT + TrEMBL

28

29 SECONDARY DATABASES Contains data derived from the results of analysing primary data Manually created or automatically generated Contains more relevant and useful information structured to specific requirements Example :- PROSITE, PRINTS, BLOCKS, Pfam

30 PROSITE Families of proteins Can search using regular expressions
Similar to unix commands Families exhibit these patterns So we can search over families

31 BLOCKS Motifs/blocks are created by automatically detecting the most conserved regions of each protein family.

32 PRIMARY VS SECONDARY DATABASES


Download ppt "Introduction to Databases"

Similar presentations


Ads by Google