Presentation on theme: "Beyond PubMed and BLAST: Exploring NCBI tools and databases Kate Bronstad David Flynn Alumni Medical Library."— Presentation transcript:
Beyond PubMed and BLAST: Exploring NCBI tools and databases Kate Bronstad David Flynn Alumni Medical Library
Location − 12 th Floor Instructional Bldg − www.medlib.bu.eduwww.medlib.bu.edu Services − Electronic resources: full text access through PubMed, Google Scholar, Web of Science −Reference: drop in or by reservation − Instruction: request class sessions or creation of web tutorial - Learning resource center: lab space, hands-on instruction
NCBI National Center for Biotechnology Information Built on Entrez System Original database was Nucleotide PubMed built upon this original structure. PubMed, GENE, other molecular databases interconnected Gene discovery, related data options in PubMed MyNCBI works with multiple databases
GENE Gives sequence, expression, information about protein structure and function. Doesn't list all known and predicted genes Focuses on completely sequenced genomes or ones where research communities are actively contributing genetic information. Information from RefSeq and collaborating model organism databases. Mix of curated and automatically updated information. Pulls in, links out to resources outside of NCBI. 4.6 Million records for 5,588 taxa
GENE Record Summary official full name, gene type, lineage, summary, AKA Genomic regions, transcripts – structure, exon-intron boundaries. − Gene table for fuller display. Bibliography: GeneRIF. − Summary of gene functions with specific references to related articles about function of gene/proteins in PubMed. Put together by people at NCBI. − Not comprehensive, but will give you the most relevant papers regarding function. − Authors can contact the NCBI to submit their citations
RefSeq Reference Sequences − Nucleotide sequences and protein translation − Curated by NCBI or NCBI-approved programs. Difference between GenBank and RefSeq − GenBank has raw data and duplicated records − Metadata in GenBank can be incomplete − RefSeq annotated, curated and non-redundant. − NCBI takes best sequences from GenBank and curates for RefSeq records
RefSeq Record Numbers mRNAs and Proteins NM_123456 Curated mRNA NP_123456 Curated Protein NR_123456 Curated non-coding RNA XM_123456 Predicted mRNA XP_123456 Predicted Protein XR_123456 Predicted non-coding RNA Gene Records NG_123456 Reference Genomic Sequence Chromosome NC_123455 Microbial replicons, organelle genomes, human chromosomes AC-123455 Alternate assemblies Assemblies NT_123456 Contig NW_123456 WGS Supercontig
OMIM Online Mendelian Inheritance in Man Previously in print, 10 volumes, updated every 2 years. Contains all the known genes in humans. Gives referenced explanations of cloning, allelic variations, inheritance, mapping, molecular genetics Links to clinical and testing information OMIA (Online Mendelian Inheritance in Animals) a separate database for information in animals.
Databases for Evidence GEO Profiles: Microarray Data Repository public repository - Archives and freely distributes microarray, next-generation sequencing, and other high- throughput functional genomic data. - Submitted by researchers. Offers data storage, web-based interfaces and applications to query and download content Evidence Viewer: Graphical display of evidence supporting a gene model
Genome Sequence and map data from the whole genomes of over 1000 organisms -Represent organisms that are completely sequenced and those that are in progress. Graphical overviews of complete genomes/chromosomes Specialized genome BLAST search to see alignments in context of genome Good for microbial genomes.
Homologene May want to use instead of BLAST if looking for a model organism with same function or if looking at an evolutionary comparison. Allows downloads of genomic information. - Can capture regulatory region by including bases up or down stream. Multiple and pairwise alignment Protein Alignment scores - Substitution rates, synonymous vs. non, conservative vs. radical Polymorphisms in GeneView dbSNP link
Structure and Models Structure, MMDB (Molecular Modeling Database) -Access from Protein link, Related Structure CN3D for application to view at different angles, highlight sequence in structure. VAST (Vector Alignment Search Tool) searches by geometric criteria
BLink BLAST Link - Pre-run BLAST results - NCBI runs weekly searches for every new protein sequence. Can use instead of running BLAST search - More information than in default BLAST: taxonomy report, view multiple alignments, search data against different
Links to Outside Databases MGI Ensembl KEGG: Kyoto encyclopedia of genes and genomes - Integrated databases - Pathway, disease, drug - Good for quick pathway and protein graphics UCSC Genome Browser -Visualize tracks to compare information like gene predictions, ESTs, conserved regions. - BLAT Blast-like alignment tool – quicker but not as sensitive as BLAST.
Gene Information from GO Gene expression information from Gene Ontology (GO) - Lists what has been assigned to the gene in: Molecular Function Biological Processes Cellular Component Level of evidence and references linked when available. Links into AMIGO browser for more ontology or evidence information Can search GENE for GO information by placing suffix at end of search Ex: “vasodilation [GO]”
BU Resources Biostatistics - Dr.Mayetri Gupta: created statistical software for discovering transcription factor binding sites (motifs) and regulatory modules, gene regulatory networks, and phylogenetic inference. - Dr. Paola Sebastiani: created software for network modeling called Bayesware Discoverer, also CAGED, BAGED for analysis of gene expression data.
Library Support Contact the library with any suggestions, recommendations that we can list or promote for BU community Software and datasets can be archived in BU’s Digital Common If there are resources we don’t have, we may be able to procure them for you. Hands-on BLAST workshop offered.