Download presentation
Presentation is loading. Please wait.
Published byKelly Anthony Modified over 9 years ago
1
GENOME-CENTRIC DATABASES Daniel Svozil
2
NCBI Gene Search for DUT gene in human
3
Obtaining gene sequence Genomic regions section of the full report – click on FASTA If you want to adjust the range to capture, modify the values in the Change region shown tool on the FASTA display and click on Update View.
4
Obtaining gene sequence Genomic regions section – click on Graphics Place your cursor over this bar Click these arrows again, region can be adjusted in FASTA view
5
Obtaining gene sequence Genomic context section – MapViewer Click on Download/View Sequence/Evidence in the upper right of Map Viewer display, or click on dl in the label for the gene.
6
On the plus/minus strands and numbering 5’ 3’ 5’ plus strand minus strand 1 2 3 4 5 6 7 1 2 3 4 5 6 7 gene on plus starts at 2 and ends at 5 gene on minus starts at 5 and ends at 2
7
Obtaining gene sequence How many transcript variants exist for human TP53 gene? Search for TP53[gene] AND human[orgn] In GenBank View find mRNAs in FEATURES seven variants
8
Obtaining gene sequence For a limited number of genes in the human genome, gene-specific genomic RefSeqs, termed RefSeqGene, have been created. These have a RefSeq accession beginning with NG_ and can be retrieved from the nucleotide database using the query keyword refseqgene. What is the accession number of RefSeqGene of TP53 gene?
9
GeneRIF Gene Reference into Function A GeneRIF is a concise phrase describing a function or functions of a gene, with the PubMed citation supporting that assertion. The majority of GeneRIFs have been provided by a collaboration between the NLM's Index Section and NCBI. There is no constraint on the number of independent submissions of GeneRIFs per PubMed id, although those from non-NLM sources are reviewed by RefSeq staff.
10
Phenotypes This section reports the effect of the gene on phenotype, especially disease. For human genes, the first row links to the Phenotype- Genotype Integrator, (PheGenI), a web portal providing a tabular display of genome-wide association study results relating the gene and/or its expression to a phenotype. Named phenotypes are provided in subsequent rows. Each phenotype row may be expanded, providing links to more information if available.
11
Interactions There are two major subcategories of information reported as Interactions: HIV-1 interactions and general interactions (TP53 has both). The HIV-1, Human Protein Interaction Database focuses on the human proteins that have been shown to interact with proteins from HIV-1. product of the gene that is part of the interaction the other interactant source of these data description of the interaction
12
General gene information Several subcategories of information including Pathways: A description of pathways that include this gene with links to more information about that pathway. Homology: A partial listing, with links, of orthologs in other species. GeneOntology (GO): The specific GO terms are listed by source of the information, category, term, evidence information, and links to supporting publications.
13
Gene Ontology (GO) I Unify the representation of gene and gene product attributes across all species. Project aims: Maintain and develop controlled vocabulary of gene and gene product attributes Annotate genes and gene products Provide tools for easy access to all aspects of the data provided by the project
14
Gene Ontology (GO) II The ontology covers three domains: molecular function, the elemental activities of a gene product at the molecular level, such as binding or catalysis biological process, operations or sets of molecular events with a defined beginning and end, pertinent to the functioning of integrated living units: cells, tissues, organs, and organisms. cellular component, the parts of a cell or its extracellular environment http://www.geneontology.org/ AmiGO browser - http://amigo.geneontology.org/cgi- bin/amigo/go.cgihttp://amigo.geneontology.org/cgi- bin/amigo/go.cgi
15
NCBI Reference Sequences (RefSeqs) This section describes the gene-specific NCBI reference sequences (RefSeqs) that have been established for this gene.
16
Exercise retrieve all records for human genes that are associated with OMIM and have been annotated on the genome Advanced search + Limits – Homo Sapiens Full list of Entrez filters: http://eutils.ncbi.nlm.nih.gov/entrez/query/static/entrezlinks.html http://eutils.ncbi.nlm.nih.gov/entrez/query/static/entrezlinks.html
17
Selected Entrez filters http://www.ncbi.nlm.nih.gov/books/NBK3841/table/EntrezGene.T.filter_sets_partial_complet/?report=objectonly
18
Genome-centric databases Nucleotide sequences are routinely determined at the whole genome or chromosome scale – at least for microorganisms We now have information not only about individual gene sequences, but also e.g. about their relative positions or strand orientation. To take advantage of this more global information, researchers have had to design state-of-the-art genome- centric sequence-information management systems that can connect specialized sequence collections with browsing tools.
19
The NCBI Map Viewer http://www.ncbi.nlm.nih.gov/mapview/ The term “map” refers to a position of a particular type of object in a particular coordinate system. This means that there is not one sequence map but a set of maps in various sequence coordinates. Map Viewer is now used to present genetic, cytogenetic, sequence-based, … maps for many genomes. The details about genome assembly and annotation can be found here: http://www.ncbi.nlm.nih.gov/books/NBK21086/ http://www.ncbi.nlm.nih.gov/books/NBK21086/ Map Viewer integrates map and sequence data from a variety of sources.
20
The NCBI Map Viewer Map Viewer is a powerful tool because it provides a mechanism to compare maps in different coordinate systems a robust query interface diverse options for configuring the display multiple functions to report and download maps and annotated information tools to manipulate nucleotide sequence such as ModelMaker (for constructing mRNAs from putative exon sequences) connections to comprehensive data files for transfer by FTP detailed descriptions of the objects displayed on the maps
21
Non-sequence-based maps not based directly on sequence include published maps in the following coordinate systems genetic linkage radiation hybrid cytogenetic ordinal (i.e. in the order of clones) The primary sources of each map are described in the online help documentation of each genome-specific Map Viewer.
22
Sequence-based maps The sequence-based maps can be supplied by external sources and/or supplied from features computed within NCBI. For example, when the annotated sequence for a complete genome is submitted to the GenBank, a copy of the data may also be accessioned as Reference Sequences (RefSeqs). The gene, transcript, and other feature annotations of the submitted complete genome are processed for display in the Map Viewer. NCBI staff may then calculate and display the position of other types of features, such as marker position or points of variation, as separate maps.
23
Types of Map Viewer annotation provided by NCBI source: http://www.ncbi.nlm.nih.gov/books/NBK21089/table/A1565/?report=objectonly
24
NCBI data resources used in NCBI- generated annotation source: http://www.ncbi.nlm.nih.gov/books/NBK21089/table/A1566/?report=objectonly
25
Relationships In addition to supporting the display of multiple maps in the same coordinate system (e.g., multiple sequence- based maps), Map Viewer also displays maps in different coordinate systems by calculating the correspondances among them (e.g., sequence to genetic). This is accomplished by: identifying features that have been placed on maps in different coordinate systems (mainly STSs) using general conversion factors
26
Map Viewer http://www.ncbi.nlm.nih.gov/mapview/ genome can be searched from this page
27
Position based access Display a particular section of a genome by using a range of positions as a query Select a particular chromosome first Enter a value into the Region Shown This could be a numerical range (base pairs are the default if no units are entered), the names of clones, genes, markers, SNPs, or any combination Use the Maps & Options control
28
Maps sv – Sequence Viewer, review the sequence dl – download the sequence of interest ev – Evidence Viewer, mRNA alignments in a region hm – homology maps mm – Model Maker, create cDNA in a real time individual maps
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.