Presentation is loading. Please wait.

Presentation is loading. Please wait.

Keeping Current: Genetics Resources. This workshop will provide an overview of NCBI resources for finding-- Background information & journal articles.

Similar presentations


Presentation on theme: "Keeping Current: Genetics Resources. This workshop will provide an overview of NCBI resources for finding-- Background information & journal articles."— Presentation transcript:

1 Keeping Current: Genetics Resources

2 This workshop will provide an overview of NCBI resources for finding-- Background information & journal articles Nucleotide sequences Protein sequences Three-dimensional structures Complete genomes and maps

3 Information Pyramid Start at the bottom of the pyramid for background information

4 Finding Background Information Genetics Home Reference Consumer information about genetic conditions Includes condition summaries, gene summaries, chromosome summaries, and glossary http://ghr.nlm.nih.gov

5 Finding Background Information OMIM--Online Mendelian Inheritance in Man database Catalog of human genes and genetic disorders Links to MEDLINE and sequence information http://www.ncbi.nlm.nih.g ov/sites/entrez?db=omim http://www.ncbi.nlm.nih.g ov/sites/entrez?db=omim

6 Finding Background Information NCBI Bookshelf Collection of biomedical books Searchable by topic http://www.ncbi.nlm.nih.go v/sites/entrez?db=books http://www.ncbi.nlm.nih.go v/sites/entrez?db=books

7 Finding Journal Articles PubMed/MEDLINE Coverage from the early 1950’s-present, with over 17 million articles. Remote access to licensed fulltext articles available via Get It @ VCU Helpful limits and links to other databases including OMIM, NCBI Books, and Entrez databases Import citations into RefWorks Tutorials available at http://www.nlm.nih.gov/bsd/dist ed/pubmed.html http://www.nlm.nih.gov/bsd/dist ed/pubmed.html

8 Using Entrez to Find Molecular Biology Information The Data Many different types of data domains (databases) Sources are generally comprehensive databases Domain can include both primary (archival) and curated data Search Features Text term searches Related records Links across databases

9 Searching all databases at once

10 Searching Individual Entrez Databases Four Ways to Search Basic Advanced Method 1 Do a separate search for each term or phrase and combine searches using History Advanced Method 2 Stack your query one step at a time (iterative searching) using Preview/Index Complex Boolean Query

11 Understanding Search Results Mixed types of data In Entrez Nucleotides, for example, search results can include: Archival data from GenBank/EMBL/DDBJ Complete & characterized sequences Sequence fragments Long genomic regions that are finished or in progress Sequences of different molecule types (e.g., DNA, mRNA) Curated sequence records (RefSeq) Sequence records drawn from PDB structure records

12 Entrez Nucleotide A collection of sequences from several sources GenBank RefSeq PDB Can limit by field (such as organism), molecule, gene location, source, and date

13 GenBank Archival database of nucleotide sequences from >160,000 organisms Records annotated with coding region (CDS) features also include amino acid translations Each record represents the work of a single lab Redundant; can have many sequence records for a single gene

14 RefSeq Database of reference sequences Curated Non-redundant A representative GenBank record is used as the source for a RefSeq record Value-added information is added by experts Each record offers an encapsulation of the current understanding of a gene or protein, similar to a review article Variety of accession number prefixes (NM, NP, etc.) and status codes (provisional, reviewed, etc.). Includes genomic DNA, mRNA, and protein sequences

15 Entrez Gene Genes that have been annotated on the complete genomes in Entrez Genomes (i.e., genes annotated on RefSeq NC_* records).RefSeqNC_* records Over 5086 organisms are represented in Entrez Gene Includes organisms with genomes that were completely sequenced or are in progress. Each record represents a single gene from a given organism Types and quantity of information present in an individual record depend upon what is available for a particular organism or gene A unique identifier or GeneID assigned by NCBI A preferred symbol Any one or more of: sequence information map information official nomenclature from an authority list alternate gene symbols summary of gene/protein function published references that provide additional information on function expression homology data

16 Representative rather than comprehensive information for a gene Reference list provides a selective rather than exhaustive list of articles Sequence records listed in an Entrez Gene record are representative rather than exhaustive Additional information can be retrieved by following links to the various Entrez databases, then displaying additional, "related records" within a database.

17 Entrez Genomes Over 3000 completely sequenced organisms in GenBank, including archaea, bacteria, eukaryotes, viruses, viroids, and plasmids. Includes organisms whose genomes are completed, as well as data from some genome sequencing projects that are currently in progress. Provides graphical overviews of complete genomes/chromosomes, and the ability to explore regions of interest in progressively greater detail

18 Sample Eukaryotic Genome Homo sapiens Homo sapiens

19 BLAST Compares sequence similarities Use when you have a SNP sequence in FASTA format Run nucleotide and protein queries

20 Entering a BLAST Query Enter the sequence in FASTA Format FASTA definition line ("def line") that begins with a >, followed by single line description Up to 80 nucleotide bases or amino acids per line

21 What does BLAST tell you? Gives the putative identity and function of the sequence Helps to direct experimental design to prove the function Find similar sequences in model organisms (e.g., yeast, C. elegans, mouse), which can be used to further study the gene Compare complete genomes against each other to identify similarities and differences among organisms

22 BLAST Programs: Which One to Use? Depends on-- what type of query sequence you have (nucleotide or protein) what type of database you will search against (nucleotide or protein)

23 Jurassic Park DNA Sequence >DinoDNA "Dinosaur DNA" from Crichton JURASSIC PARK p. 103 nt 1-1200 GCGTTGCTGGCGTTTTTCCATAGGCTCCGCCCCCCTGACGAGCATCACAAAAATCGACGC GGTGGCGAAACCCGACAGGACTATAAAGATACCAGGCGTTTCCCCCTGGAAGCTCCCTCG TGTTCCGACCCTGCCGCTTACCGGATACCTGTCCGCCTTTCTCCCTTCGGGAAGCGTGGC TGCTCACGCTGTACCTATCTCAGTTCGGTGTAGGTCGTTCGCTCCAAGCTGGGCTGTGTG CCGTTCAGCCCGACCGCTGCGCCTTATCCGGTAACTATCGTCTTGAGTCCAACCCGGTAA AGTAGGACAGGTGCCGGCAGCGCTCTGGGTCATTTTCGGCGAGGACCGCTTTCGCTGGAG ATCGGCCTGTCGCTTGCGGTATTCGGAATCTTGCACGCCCTCGCTCAAGCCTTCGTCACT CCAAACGTTTCGGCGAGAAGCAGGCCATTATCGCCGGCATGGCGGCCGACGCGCTGGGCT GGCGTTCGCGACGCGAGGCTGGATGGCCTTCCCCATTATGATTCTTCTCGCTTCCGGCGG CCCGCGTTGCAGGCCATGCTGTCCAGGCAGGTAGATGACGACCATCAGGGACAGCTTCAA CGGCTCTTACCAGCCTAACTTCGATCACTGGACCGCTGATCGTCACGGCGATTTATGCCG CACATGGACGCGTTGCTGGCGTTTTTCCATAGGCTCCGCCCCCCTGACGAGCATCACAAA CAAGTCAGAGGTGGCGAAACCCGACAGGACTATAAAGATACCAGGCGTTTCCCCCTGGAA GCGCTCTCCTGTTCCGACCCTGCCGCTTACCGGATACCTGTCCGCCTTTCTCCCTTCGGG CTTTCTCAATGCTCACGCTGTAGGTATCTCAGTTCGGTGTAGGTCGTTCGCTCCAAGCTG ACGAACCCCCCGTTCAGCCCGACCGCTGCGCCTTATCCGGTAACTATCGTCTTGAGTCCA ACACGACTTAACGGGTTGGCATGGATTGTAGGCGCCGCCCTATACCTTGTCTGCCTCCCC GCGGTGCATGGAGCCGGGCCACCTCGACCTGAATGGAAGCCGGCGGCACCTCGCTAACGG CCAAGAATTGGAGCCAATCAATTCTTGCGGAGAACTGTGAATGCGCAAACCAACCCTTGG CCATCGCGTCCGCCATCTCCAGCAGCCGCACGCGGCGCATCTCGGGCAGCGTTGGGTCCT

24

25 Search Results: Understanding the Output Reminders about your specific query RID Query sequence reminder (contains the information from your FASTA def line) What database you searched against Graphical summary Shows where the hits aligned to your query Colors indicate score range Mouse over a colored bar to see info about that hit Text summary (GI numbers and Def lines) GI links to complete record in Entrez Score links to pairwise alignment between your query sequence and the hit Pairwise alignments BLAST statistics for your search

26

27 Protein-Protein BLAST (blastp) Usually better to use than nucleotide- nucleotide BLAST Since the genetic code is degenerate, blastn can often give less specific results than blastp Translated BLAST (blastx) One way to do a protein BLAST search if you have a nucleotide query sequence BLAST program does the translating for you, in all 6 reading frames

28 Structures Molecular Modeling Database (MMDB) Drawn from PDB Includes only experimentally resolved structures (i.e., by NMR and X-ray crystallography) Includes value added information, such as: Explicit chemical graph showing the bonding between atoms Atomic substructure classification Direct 3-D structure comparisons using the VAST (Vector Alignment Search Tool) algorithm to identify structural neighbors Links to associated records in other Entrez databases

29 Fewer Structures than Protein Sequences Most protein sequences in the Entrez Protein database do not yet have a resolved structure The chances of finding a resolved structure for a given protein sequence are low. Therefore, if the traditional methods for searching an Entrez database do not find a structure for your protein of interest, you might need to use some creative "back doors" for retrieving structures that might have some relationship to your protein of interest.

30 "Front Doors" and "Back Doors" for Retrieving Structures If a protein sequence already HAS a resolved structure: Front Door #1: Search the Entrez Structure database directly Front Door #2: Search the Entrez Protein database and then "Display" the associated Structure records. If a protein sequence DOES NOT HAVE a resolved structure: Back Door #1: Find similar protein sequences and see if any of them have a resolved 3-D structure. Back Door #2: Identify the conserved domain(s) in the protein of interest and view the 3-D structure of the domain(s).

31 Cn3D ("See in 3-D") Viewing Software Before we can view a 3-D structure from the structure summary page, we need to install a structure viewing software program. A variety of public domain programs exist, such as: Cn3D ("See in 3-D") Rasmol

32 Structure Record Summary

33 PubChem Small organic molecules Structures and biological activities Search by descriptive terms, chemical properties, and structural similarity Links to other NCBI databases

34 PubChem Databases PubChem Substance Substance information includes chemical structures, synonyms, registration IDs, descriptions, related urls Database cross-reference links to PubMed, protein 3D structures PubChem Compound There is one PubChem Compound record for each unique substance, and for each unique substance component. There can be multiple PubChem Substance records associated Includes all standardized structures, mixture components, and precalculated structure neighboring links. Compound information includes structure, compound property information (molecular weight, formula, xLogP, count of the rotatable bonds, H bond donor, H bond acceptor, etc.), and structure description (SMILES, IUPAC name, INCHI). PubChem BioAssay The assay database consists of deposited bioactivity data and descriptions of bioactivity assays used for screening of the chemical substances contained in PubChem Substance, including descriptions of the conditions and the readouts (bioactivity levels) specific to the screening procedure. The assay database includes DTP/NCI's 710 million lines of in vitro and in vivo data covering from cancer, HIV, to many other fields.

35 Education Resources Introduction to Molecular Biology Resources http://www.ncbi.nlm.nih.gov/Class/MLACourse/ind ex.html http://www.ncbi.nlm.nih.gov/Class/MLACourse/ind ex.html NCBI Education Resources http://www.ncbi.nlm.nih.gov/Education/


Download ppt "Keeping Current: Genetics Resources. This workshop will provide an overview of NCBI resources for finding-- Background information & journal articles."

Similar presentations


Ads by Google