Presentation is loading. Please wait.

Presentation is loading. Please wait.

gene-CENTRIC database

Similar presentations


Presentation on theme: "gene-CENTRIC database"— Presentation transcript:

1 gene-CENTRIC database
Daniel Svozil

2 NCBI Nucleotide Exercise
How many nucleotide sequences are there from the bacterium Chlamydia trachomatis in the NCBI Sequence Database? How many mRNA sequences for collagen genes from nematode worms are there in the NCBI Database? How many nucleotide sequences from nematode worms are there in the RefSeq Database? How many structures containing nucleotide sequences from nematode worms are known? How many nucleotide sequences were submitted to NCBI by Matthew Berriman? “Chlamydia trachomatis”[ORGN] - Found nucleotide sequences. Nucleotide (40417) GSS (148) (Ask students what is GSS?) ( ) collagen AND nematode[Organism], Limits mRNA (or on the right there is directly the number) … 1041 ( ) nematode[Organism], Limits RefSeq (or on the right there is directly the number) … ( ) OR Nematode[ORGN] AND srcdb_refseq[PROP] Nematode[ORGN], Limits: Source database PDB Nematode[ORGN] AND srcdb_pdb[PROP] 5 sequences ( ) s “Berriman M”[AU] … Found nucleotide sequences. Nucleotide (265333) EST (121075) GSS (88649) Note that unfortunately the NCBI website does not allow us to search for “Berriman Matthew”[AU] so we cannot be sure that all of these sequences were submitted by Matthew Berriman. Note also that the search above will find sequences that were either submitted to the NCBI database by M. Berriman, or described in a paper on which M. Berriman was an author. Therefore, not all of the sequences found were necessarily submitted by M. Berriman.

3 Gene-centric databases
Sequence databases are great tools when you want to come up with a bibliography for a particular sequence. However, they do not provide easy access to sequence data when your query deals with broader issues related to a gene or function. The second-generation nucleotide-sequence databases have adopted a more gene-centric perspective. all the sequence information relevant to a given gene is made accessible at once NCBI Gene Gene described in Gene Help: Gene FAQ:

4 NCBI Gene Search for DUT gene in human
How will you get from the sequence record U90223 to the gene record this sequence belongs to? The central functions of Gene are to establish unique identifiers (GeneID) for genes that can be tracked and, in so doing, support accurate connections with the defining sequences, nomenclature and other descriptors. GeneID – integer, species specific (GeneID assigned to dystrophin in human is different from that in any other species) Find human and mouse genes having reviewed RefSeq records. DUT[gene] and human[organism] in NCBI Gene, or use Advanced search in NCBI gene search U90223 in NCBI Nucleotide, Display Summary, right column Related information, click Gene Click on Limits, check Mus Musculus and Homo Sapiens, Limit by RefSeq Status: Reviewed

5 NCBI Gene Gene does not claim to be comprehensive; rather, it serves as a guide to additional information in other databases. For example, a gene can be represented by multiple sequences, but not all are reported explicitly from Gene. Instead, connections are supplied from Gene to Entrez Nucleotide, Entrez Protein, and Blink (BLAST Link), where more sequences with significant similarity can be retrieved. In addition to the multiple links to NCBI databases, LinkOuts submitted to Gene from external databases support ready navigation to more gene-specific information.

6 NCBI Gene Go to the DUT gene in human record.
Right column – TOC of the record Additional links in TOC … contain LinkOut What is NCBI LinkOut? Right column – Links … contain connections to other database Go to the Protein database Link Right column – Find related data – Database: Protein It runs BLAST LinkOut - LinkOut is a service that allows you to link directly from PubMed and other NCBI databases to a wide range of information and services beyond the NCBI systems. LinkOut aims to facilitate access to relevant online resources in order to extend, clarify, and supplement information found in NCBI databases.

7 NCBI Gene genomic context – umisteni na chromosomu, OMIM (MIM) – Online Mendelian Inheritance in Man. OMIM is a directory of human genes and genetic disorders, with links to literature references, sequence records, maps, and related databases.

8 NCBI Gene Location of the gene on the chromosome in non-sequence coordinates. If the gene has been included in a genomic annotation, the section also diagrams neighboring genes and indicates their orientations. The gene being shown on the diagram is in maroon.

9 three transcript variants
NCBI Gene three transcript variants This portion is provided when a gene has been annotated on a genomic RefSeq, in other words, when the position of the intron/exon/coding region information is available in some genomic coordinate system. How annotated structures are rendered is described in

10 Genomic context You can use this section to:
view the intron/exon/coding region organization on a genomic RefSeq identify the RefSeqs that correspond to any RNA or protein product and see an overview of the exons they represent alter the zoom level of the display move upstream and downstream in sequence being displayed navigate to a full display of the genomic context via the link Go to nucleotide Graphics navigate to the genomic sequence of the gene in FASTA format navigate to the genomic sequence of the gene in GenBank format. Change the display of the genomic sequence on which the gene is annotated. The default display is the chromosome of the reference assembly; for some taxa there are alternate assemblies. For human, the RefSeqGene can also be selected. move upstream/down … just drag in the window RefSeqGene defines genomic sequences to be used as reference standards for well-characterized genes

11 Obtaining gene sequence
Genomic regions section of the full report – click on FASTA If you want to adjust the range to capture, modify the values in the Change region shown tool on the FASTA display and click on Update View. - from

12 Obtaining gene sequence
Genomic regions section – click on Graphics Click these arrows Place your cursor over this bar - from again, region can be adjusted in FASTA view

13 Obtaining gene sequence
Genomic context section – MapViewer Click on Download/View Sequence/Evidence in the upper right of Map Viewer display, or click on dl in the label for the gene.

14 Obtaining gene sequence
How many transcript variants exist for human TP53 gene? Search for TP53[gene] AND human[orgn] In GenBank View find mRNAs in FETURES seven variants - from GenBank record of TP53 gene the sequence can also be obtained, change view to FASTA

15 Obtaining gene sequence
For a limited number of genes in the human genome, gene-specific genomic RefSeqs, termed RefSeqGene, have been created. These have a RefSeq accession beginning with NG_ and can be retrieved from the nucleotide database using the query keyword refseqgene. What is the accession number of RefSeqGene of TP53 gene?

16 GeneRIF Gene Reference into Function
A GeneRIF is a concise phrase describing a function or functions of a gene, with the PubMed citation supporting that assertion. The majority of GeneRIFs have been provided by a collaboration between the NLM's Index Section and NCBI. There is no constraint on the number of independent submissions of GeneRIFs per PubMed id, although those from non-NLM sources are reviewed by RefSeq staff. What is GeneRIF? from

17 Phenotypes This section reports the effect of the gene on phenotype, especially disease. For human genes, the first row links to the Phenotype-Genotype Integrator, (PheGenI), a web portal providing a tabular display of genome-wide association study results relating the gene and/or its expression to a phenotype. Named phenotypes are provided in subsequent rows. Each phenotype row may be expanded, providing links to more information as available. PheGenI pronounce FEE-GEE-NEE

18 Interactions There are two major subcategories of information reported as Interactions: HIV-1 interactions and general interactions (TP53 has both). The HIV-1, Human Protein Interaction Database focuses on the human proteins that have been shown to interact with proteins from HIV-1. the other interactant product of the gene that is part of the interaction source of these data description of the interaction

19 General gene information
Several subcategories of information including Pathways: A description of pathways that include this gene with links to more information about that pathway. Homology: A partial listing, with links, of orthologs in other species. GeneOntology (GO): The specific GO terms are listed by source of the information, category, term, evidence information, and links to supporting publications.

20 Gene Ontology (GO) Unify the representation of gene and gene product attributes across all species. Project aims: Maintain and develop controlled vocabulary of gene and gene product attributes Annotate genes and gene products Provide tools for easy access to all aspects of the data provided by the project The ontology covers three domains: molecular function, the elemental activities of a gene product at the molecular level, such as binding or catalysis biological process, operations or sets of molecular events with a defined beginning and end, pertinent to the functioning of integrated living units: cells, tissues, organs, and organisms. cellular component, the parts of a cell or its extracellular environment AmiGO browser -

21 NCBI Reference Sequences (RefSeqs)
This section describes the gene-specific NCBI reference sequences (RefSeqs) that have been established for this gene.

22 Exercise retrieve all records for human genes that are associated with OMIM and have been annotated on the genome Advanced search + Limits – Homo Sapiens Full list of Entrez filters:

23 Selected Entrez filters

24

25

26 Genome-centric databases
Nucleotide sequences are routinely determined at the whole genome or chromosome scale – at least for microorganisms We now have information not only about individual gene sequences, but also e.g. about their relative positions or strand orientation. To take advantage of this more global information, researchers have had to design state-of-the-art genome-centric sequence-information management systems that can connect specialized sequence collections with browsing tools. MapViewer described in MapViewer exercises from UCSC Genome Browser video tutorials:


Download ppt "gene-CENTRIC database"

Similar presentations


Ads by Google