Presentation on theme: "Creating NCBI The late Senator Claude Pepper recognized the importance of computerized information processing methods for the conduct of biomedical research."— Presentation transcript:
Creating NCBI The late Senator Claude Pepper recognized the importance of computerized information processing methods for the conduct of biomedical research and sponsored legislation that established the National Center for Biotechnology Information (NCBI) on November 4, 1988, as a division of the National Library of Medicine (NLM) at the National Institutes of Health (NIH).
What does NCBI do? Established in 1988 as a national resource for molecular biology information, NCBI creates public databases, conducts research in computational biology, develops software tools for analyzing genome data, and disseminates biomedical information - all for the better understanding of molecular processes affecting human health and disease.
OMIM, Online Mendelian Inheritance in Man. This database is a catalog of human genes and genetic disorders authored and edited by Dr. Victor A. McKusick and his colleagues at Johns Hopkins and elsewhere, and developed for the World Wide Web by NCBI, the National Center for Biotechnology Information. The database contains textual information and references. It also contains copious links to MEDLINE and sequence records in the Entrez system, and links to additional related resources at NCBI and elsewhere.
Entrez is a search and retrieval system that integrates information from databases at NCBI.
GenBank is the NIH genetic sequence database. GenBank (at NCBI), together with the DNA DataBank of Japan (DDBJ) and the European Molecular Biology Laboratory (EMBL) comprise the International Nucleotide Sequence Database Collaboration. These three organizations exchange data on a daily basis. GenBank grows at an exponential rate, with the number of nucleotide bases doubling approximately every 14 months. Currently, GenBank contains more than 28 billion bases from over 250,000 species. International Nucleotide Sequence Database Collaboration
PubMed PubMed, a service of the National Library of Medicine, provides access to over 12 million MEDLINE citations back to the mid-1960's and additional life science journals. PubMed includes links to many sites providing full text articles and other related resources.
What is BLAST? BLAST ® (Basic Local Alignment Search Tool) is a set of similarity search programs designed to explore all of the available sequence databases regardless of whether the query is protein or DNA. The BLAST programs have been designed for speed, with a minimal sacrifice of sensitivity to distant sequence relationships. The scores assigned in a BLAST search have a well-defined statistical interpretation, making real matches easier to distinguish from random background hits. BLAST uses a heuristic algorithm which seeks local as opposed to global alignments and is therefore able to detect relationships among sequences which share only isolated regions of similarity (Altschul et al., 1990).(Altschul et al., 1990)
Comparative Analysis Darwin’s comparison of morphological features of the Galapagos finches led him to postulate the theory of natural selection. When you compare the sequences of genes and proteins, you are performing the same type of analysis, just at another level.
The most common type of comparative method is sequence alignment. Comparison of one sequence to the entire database of known sequences is an important discovery technique for molecular biologists.
Explain: “One goal of sequence alignment is to enable the researcher to determine whether two sequences display sufficient similarity such that an inference of homology is justified.” Similarity= an observable quantity often expressed as % identity. Homology= ? (hint- there are no degrees of homology).
BLAST tutorial: Introduction
Questions that might be answered from a BLAST search 1. How long is the sequence that you used to search the database? 2. What is the most likely identity of this sequence? What data supports this conclusion? 3. What organism is the source of the sequence? What is the common name for this organism?
Questions that might be answered from a BLAST search What phylum contains this organism? What is the accession number for this sequence? Is this sequence expressed? How do you know? If your sequence is expressed, where (tissue) and when is it expressed? Is anything known about factors that cause your sequence to be expressed?
What is the difference between RefSeq and GenBank? The GenBank archival sequence database includes publicly available DNA sequences submitted from individual laboratories and large-scale sequencing projects. GenBank accession numbers are assigned to these submitted sequences. Submitted sequence data is exchanged between NCBIs GenBank, EMBL Data Library (EMBL) and the DNA Data Bank of Japan (DDBJ) to achieve comprehensive worldwide coverage. As an archival database, GenBank can be very redundant for some loci. GenBank sequence records are owned by the original submitter and can not be altered by a third party.
What is the difference between RefSeq and GenBank? RefSeq sequences are derived from GenBank and provide non-redundant curated data representing our current knowledge of known genes. Some records include additional sequence information that was never submitted to an archival database but is available in the literature. Some sequence records are provided through collaboration; the underlying primary sequence data is available in GenBank, but may not be available in any one GenBank record. RefSeq sequences are not submitted primary sequences. RefSeq records are owned by NCBI and therefore can be updated as needed to maintain current annotation or to incorporate additional sequence information.
Unigene UniGene is an experimental system for automatically partitioning GenBank sequences into a non-redundant set of gene-oriented clusters. Each UniGene cluster contains sequences that represent a unique gene, as well as related information such as the tissue types in which the gene has been expressed and map location.