Presentation is loading. Please wait.

Presentation is loading. Please wait.

NCBI FieldGuide NCBI Molecular Biology Resources January 12, 2007 A Field Guide Part 1.

Similar presentations


Presentation on theme: "NCBI FieldGuide NCBI Molecular Biology Resources January 12, 2007 A Field Guide Part 1."— Presentation transcript:

1 NCBI FieldGuide NCBI Molecular Biology Resources January 12, 2007 A Field Guide Part 1

2 NCBI FieldGuide The NCBI Entrez System NCBI Sequence Databases –Primary data: GenBank –Derivative data: RefSeq, Gene Protein Structure and Function Sequence polymorphisms and phenotypes ** Intermission ** NCBI Genomic Resources BLAST NCBI Resources

3 NCBI FieldGuide The National Center for Biotechnology Information Created in 1988 as a part of the National Library of Medicine at NIH –national resource for molecular biology information (biological information direct from organisms) –gather data both nationally and internationally –develop new information technologies to aid in the understanding of fundamental molecular and genetic processes that control health and disease Bethesda,MD

4 NCBI FieldGuide Data sources: traditional literature and data obtained from the direct study of organisms The information landscape in biological and medical research has grown far beyond literature to include a wide variety of databases generated by research fields such as molecular biology and genomics. Figure 1 from Geer RC., Broad issues to consider for library involvement in bioinformatics. J Med Libr Assoc. 2006 Jul; 94(3):286–98. E-152.–5. PMID: 16888662Geer RC. NCBI: –accepts submissions of bibliographic records (example) and primary research data (example nucleotide sequence for colon cancer gene, MLH1)example –organizes the information into databases, maintains them, makes them available to the world –develops software to retrieve and analyze the data –conducts basic research to make new biological discoveries using the databases and software tools

5 NCBI FieldGuide What does NCBI do? NCBI accepts submissions of primary data NCBI develops tools to analyze these data NCBI uses these tools to create derivative databases based on the primary data NCBI provides free search, link, and retrieval of these data, primarily through the Entrez system

6 NCBI FieldGuide BLAST VAST Entrez Text Sequence Protein Structure Small Mol. Structure PubChem www.ncbi.nlm.nih.gov Web Access query

7 NCBI FieldGuide The NCBI ftp site 30,000 files per day 620 Gigabytes per day

8 NCBI FieldGuide NCBI Toolbox: In-house source code useful for incorporating NCBI-like functionality into their programs. Three main parts: Data Model, Data Encoding and Programming Libraries. Examples: BLAST, Cn3D, Sequin, Data format conversion scripts http://www.ncbi.nlm.nih.gov/IEB/ToolBox/index.cgi Help for Programmers http://www.ncbi.nih.gov/entrez/query/static/eutils_help.html E-Utilities: Guidelines for Entrez “URL calls” used to access data. Designed for use in scripts. Examples: ESearch, EPost, ESummary, EFetch and ELink Caution: Overuse may result in blocked IPs!

9 NCBI FieldGuide Global Entrez Search Page All[Filter]

10 NCBI FieldGuide What is Entrez? A system of 31 linked databases A text search engine A tool for finding biologically linked data A retrieval engine A virtual workspace for manipulating large datasets

11 NCBI FieldGuide Entrez Databases Each record is assigned a UID –unique integer identifier for internal tracking –GI number for Nucleotide Each record is given a Document Summary –a summary of the record’s content (DocSum) Each record is assigned links to biologically related UIDs Each record is indexed by data fields –[author], [title], [organism], and many others

12 NCBI FieldGuide Linking in Entrez Follow links to related data in the same database or in others! Links Hard Links: Curated links based on biology nucleotide  taxonomy (based on organism identifier) protein  domain relatives (based on domain assignment) domains  pubmed (based on supporting literature) pcsubstance  structures/mmdb (based on source information ) Soft Links: Pre-computed analyses nucleotide  related sequences (BLAST neighbors) protein  conserved domains (CDD/RPS-BLAST search) pccompound  pccompound (structure-based neighboring)

13 NCBI FieldGuide Genomes Taxonomy Entrez: Database Integration PubMed abstracts Nucleotide sequences Protein sequences 3-D Structure Word weight VAST BLAST Phylogeny Hard Link Neighbors Related Sequences Neighbors Related Seqs. BLink, Domains Neighbors Related Structures

14 NCBI FieldGuide Links: Database Integration at NCBI Gene Nucleotide Protein Structure CDD SNP Taxonomy PubMed Homolo- gene mRNAs; genome All CDS products Protein Function SNPs; indels Source organism Literature Gene locus BLASTn CDS product 3D DNA 3D RNA SNPs; indels Source organism Literature Gene locus cDNA transcript BLASTp 3D proteins FunctionSNPs; indels Source organism Literature DNA sequence Protein sequence VAST Protein Function SNP BLASTp Source organism Literature Gene lociProteins with CD 3D templates CDART Broadest taxon Literature Gene locus DNA sequence Protein sequence 3D template Source organism Literature Genes for taxon Seqs for taxon Structs for taxon CD spans Taxon SNPs for taxon Common Tree Gene loci in article Sequence in article Structure in article CDs in article SNPs in article Related articles Nucleotide Protein Structure CDD SNP Taxonomy PubMed

15 NCBI FieldGuide Types of Databases Primary Databases –Original submissions by experimentalists –Content controlled by the submitter Examples: GenBank, dbSNP, GEO, PubChem Substance and PubChem Bioassays Derivative Databases –Built from primary data –Content controlled by third party (NCBI) Examples: Refseq, RefSNP, GEO Datasets, PubChem Compound

16 NCBI FieldGuide An Entrez Database - Nucleotide GenBank: Primary Data (98.2%) –original submissions by experimentalists –submitters retain editorial control of records –archival in nature RefSeq: Derivative Data (1.8%) –curated by NCBI staff –NCBI retains editorial control of records –record content is updated continually

17 NCBI FieldGuide Literature Databases

18 NCBI FieldGuide NM_000249: PubMed Books

19 NCBI FieldGuide Books Link

20 NCBI FieldGuide

21 A part of the NCBI Bookshelf Part 1. The Databases Part 3. Querying and Linking the Data Part 2. Data Flow and Processing Part 4. User Support

22 NCBI FieldGuide

23

24

25 PubMed Central PubMed Central is a digital archive of life sciences journal literature. Integrated into the Entrez retrieval system, PMC provides free and unrestricted access to the full text of over 160 life sciences journals, with more to come.

26 NCBI FieldGuide NCBI Journal Database Detailed journal information

27 NCBI FieldGuide OMIM - A catalogue of genes involved with human disease processes - Detailed clinical and reference information - Curated and maintained by Johns Hopkins - Links to PubMed and sequence databases

28 NCBI FieldGuide Primary vs. Derivative Databases ACGTGC CGTGA ATTGACTA ACGTGC TTGACA TATAGCCG GenBank Sequencing Centers GA ATT C C GA ATT C C UniGene RefSeq: Gene and Genomes Pipelines RefSeq: Annotation Pipeline Labs Curators Algorithms TATAGCCG AGCTCCGATA CCGATGACAA Updated ONLY by submitters EST UniSTS STS GSS HTG Updated continually by NCBI PRIRODPLNMAMBCT INVVRTPHGVRL

29 NCBI FieldGuide What is GenBank? NCBI’s Primary Sequence Database Nucleotide only sequence database Archival in nature Each record is assigned a stable accession number GenBank Data –Direct submissions (traditional records ) –Batch submissions (EST, GSS, STS) –ftp accounts (genome data) Three collaborating databases –GenBank –DNA Database of Japan (DDBJ) –European Molecular Biology Laboratory (EMBL) Database

30 NCBI FieldGuide GenBank DDBJ EMBL EMBL Entrez SRS getentry NIG CIB NCBI NIH Submissions Updates Submissions Updates Submissions Updates The International Sequence Database Collaboration Sequin BankIt ftp EBI

31 NCBI FieldGuide full release every two months incremental and cumulative updates daily available only through internet ftp://ftp.ncbi.nih.gov/genbank/ (non-WGS) Release 156October 2006 62765195 Records 66925938907 Nucleotides >150,000Species 245 Gigabytes 1032 files GenBank Releases

32 NCBI FieldGuide The Growth of GenBank Non-WGS: 59.8 billion bases WGS: 63.2 billion bases Release 152

33 NCBI FieldGuide GenBank Divisions PRI Primate ROD Rodent PLN Plant and Fungal BCT Bacterial/Archeal VRT Other Vertebrate INV Invertebrate VRL Viral MAM Mammalian PHG Phage SYN Synthetic UNA Unannotated Direct Submissions (Sequin/Bankit) Accurate (~1 error per 10,000 bp) Well characterized Organized by taxonomy EST Expressed Sequence Tag GSS Genome Survey Sequence HTG High Throughput Genomic PAT Patent sequences STS Sequence Tagged Site HTC High Throughput cDNA CON Constructed entries From sequencing projects Batch submissions (ftp/email) Inaccurate Poorly Characterized Organized by sequence type Traditional Bulk

34 NCBI FieldGuide Entrez Nucleotide Subsets CoreNucleotide 29225247 EST 39288168 GSS 15655087 TOTAL 84168502

35 NCBI FieldGuide A Traditional GenBank Record LOCUS AY182241 1931 bp mRNA linear PLN 04-MAY-2004 DEFINITION Malus x domestica (E,E)-alpha-farnesene synthase (AFS1) mRNA, complete cds. ACCESSION AY182241 VERSION AY182241.2 GI:32265057 KEYWORDS. SOURCE Malus x domestica (cultivated apple) ORGANISM Malus x domestica Eukaryota; Viridiplantae; Streptophyta; Embryophyta; Tracheophyta; Spermatophyta; Magnoliophyta; eudicotyledons; core eudicots; rosids; eurosids I; Rosales; Rosaceae; Maloideae; Malus. REFERENCE 1 (bases 1 to 1931) AUTHORS Pechous,S.W. and Whitaker,B.D. TITLE Cloning and functional expression of an (E,E)-alpha-farnesene synthase cDNA from peel tissue of apple fruit JOURNAL Planta 219, 84-94 (2004) REFERENCE 2 (bases 1 to 1931) AUTHORS Pechous,S.W. and Whitaker,B.D. TITLE Direct Submission JOURNAL Submitted (18-NOV-2002) PSI-Produce Quality and Safety Lab, USDA-ARS, 10300 Baltimore Ave. Bldg. 002, Rm. 205, Beltsville, MD 20705, USA REFERENCE 3 (bases 1 to 1931) AUTHORS Pechous,S.W. and Whitaker,B.D. TITLE Direct Submission JOURNAL Submitted (25-JUN-2003) PSI-Produce Quality and Safety Lab, USDA-ARS, 10300 Baltimore Ave. Bldg. 002, Rm. 205, Beltsville, MD 20705, USA REMARK Sequence update by submitter COMMENT On Jun 26, 2003 this sequence version replaced gi:27804758. FEATURES Location/Qualifiers source 1..1931 /organism="Malus x domestica" /mol_type="mRNA" /cultivar="'Law Rome'" /db_xref="taxon:3750" /tissue_type="peel" gene 1..1931 /gene="AFS1" CDS 54..1784 /gene="AFS1" /note="terpene synthase" /codon_start=1 /product="(E,E)-alpha-farnesene synthase" /protein_id="AAO22848.2" /db_xref="GI:32265058" /translation="MEFRVHLQADNEQKIFQNQMKPEPEASYLINQRRSANYKPNIWK NDFLDQSLISKYDGDEYRKLSEKLIEEVKIYISAETMDLVAKLELIDSVRKLGLANLF EKEIKEALDSIAAIESDNLGTRDDLYGTALHFKILRQHGYKVSQDIFGRFMDEKGTLE NHHFAHLKGMLELFEASNLGFEGEDILDEAKASLTLALRDSGHICYPDSNLSRDVVHS LELPSHRRVQWFDVKWQINAYEKDICRVNATLLELAKLNFNVVQAQLQKNLREASRWW ANLGIADNLKFARDRLVECFACAVGVAFEPEHSSFRICLTKVINLVLIIDDVYDIYGS EEELKHFTNAVDRWDSRETEQLPECMKMCFQVLYNTTCEIAREIEEENGWNQVLPQLT KVWADFCKALLVEAEWYNKSHIPTLEEYLRNGCISSSVSVLLVHSFFSITHEGTKEMA DFLHKNEDLLYNISLIVRLNNDLGTSAAEQERGDSPSSIVCYMREVNASEETARKNIK GMIDNAWKKVNGKCFTTNQVPFLSSFMNNATNMARVAHSLYKDGDGFGDQEKGPRTHI LSLLFQPLVN" ORIGIN 1 ttcttgtatc ccaaacatct cgagcttctt gtacaccaaa ttaggtattc actatggaat 61 tcagagttca cttgcaagct gataatgagc agaaaatttt tcaaaaccag atgaaacccg 121 aacctgaagc ctcttacttg attaatcaaa gacggtctgc aaattacaag ccaaatattt 181 ggaagaacga tttcctagat caatctctta tcagcaaata cgatggagat gagtatcgga 241 agctgtctga gaagttaata gaagaagtta agatttatat atctgctgaa acaatggatt 1801 aataaatagc agcaaaagtt tgcggttcag ttcgtcatgg ataaattaat ctttacagtt 1861 tgtaacgttg ttgccaaaga ttatgaataa aaagttgtag tttgtcgttt aaaaaaaaaa 1921 aaaaaaaaaa a // Header Feature Table Sequence The Flatfile Format

36 NCBI FieldGuide An Example Record – M17755 FieldIndexed Terms [primary accession]M17755 [title]Homo sapiens thyroid peroxidase (TPO) mRNA… [organism]Homo sapiens [sequence length]3060 [modification date]1999/04/26 [properties]biomol mrna gbdiv pri srcdb genbank Indexing for Nucleotide UID 4680720

37 NCBI FieldGuide M17755: Feature Table CDS position in bp TPO [gene name] thyroiditis [text word] thyroid peroxidase [protein name] protein accession

38 NCBI FieldGuide Sequence: 99.99% Accurate The sequence itself is not indexed… Use BLAST for that!

39 NCBI FieldGuide Entrez Protein GenPept (DDBJ, EMBL, GenBank)6259705 RefSeq 2997502 Swiss Prot 236666 PDB 86934 PIR 30413 PRF 12079 Third Party Annotation 4969 Total9628271

40 NCBI FieldGuide Protein Sources and Links PIR RefSeq SWISS-PROT GenPept  NM_000547  M17755 no mRNA!

41 NCBI FieldGuide Sequence Revisions Version and GI change only if the sequence changes The accession number always retrieves the most recent version First seen at NCBI, not first seen at GenBank!

42 NCBI FieldGuide Update without a Sequence Change June 15, 1989! GenBank came to NCBI in 1992!

43 NCBI FieldGuide Update with a Sequence Change

44 NCBI FieldGuide GenBank File Formats ASN.1 – The Raw Data XML FASTA flat file

45 NCBI FieldGuide /************************************************************************ * * asn2ff.c * convert an ASN.1 entry to flat file format, using the FFPrintArray. * **************************************************************************/ #include #include "asn2ff.h" #include "asn2ffp.h" #include "ffprint.h" #include #ifdef ENABLE_ID1 #include #endif FILE *fpl; Args myargs[] = { {"Filename for asn.1 input","stdin",NULL,NULL,TRUE,'a',ARG_FILE_IN,0.0,0,NULL}, {"Input is a Seq-entry","F", NULL,NULL,TRUE,'e',ARG_BOOLEAN,0.0,0,NULL}, {"Input asnfile in binary mode","F",NULL,NULL,TRUE,'b',ARG_BOOLEAN,0.0,0,NULL}, {"Output Filename","stdout", NULL,NULL,TRUE,'o',ARG_FILE_OUT,0.0,0,NULL}, {"Show Sequence?","T", NULL,NULL,TRUE,'h',ARG_BOOLEAN,0.0,0,NULL}, Toolbox Sources ftp> open ftp.ncbi.nih.gov. ftp> cd toolbox ftp> cd ncbi_tools ftp://ftp.ncbi.nlm.gov/toolbox/ncbi_tools NCBI Toolbox

46 NCBI FieldGuide Text Queries in Entrez term1[limit] OP term2[limit] OP … limit = Entrez indexing field (organism, author, …) OP = Boolean operator = AND, OR, NOT where term1 term2 Complex queries: ((A[limit1] OR B[limit2]) AND C[limit3]) NOT D[limit4] 1:200[MW] Ranges: Wildcards: cancer[title] vs. cancer*[title]

47 NCBI FieldGuide Entrez Tabs Limits Provides a simple form for applying commonly used Entrez limits Preview/Index Allows access to the full indexing of each Entrez database and aids in constructing complex queries History Provides access to previous searches in the current Entrez database ClipboardA temporary storage area for selected records DetailsDisplays the detailed parsing of the current Entrez query, and lists errors and terms without matches

48 NCBI FieldGuide Programming Entrez: E-Utilities ESearch EPost ESummary Entrez query UID list or History Document summaries http://www.ncbi.nih.gov/entrez/query/static/eutils_help.html History UID list or History UID list EFetch Formatted data UID list or History ELink UID list or History

49 NCBI FieldGuide Finding Primary Sequences Search Entrez CoreNucleotide –94.8% GenBank (primary data) –5.2% RefSeq (curated data) M17755 [primary accession]TPO [gene name] thyroid peroxidase [title]thyroiditis [text word] Homo sapiens [organism]thyroid peroxidase [protein name] 3060 [sequence length]1999/04/26 [modification date] biomol mrna [properties]gbdiv pri [properties] srcdb genbank [properties] Possible queries we’ve seen so far…

50 NCBI FieldGuide A Starting Query Find nucleotide records for human thyroid peroxidase (("Homo sapiens“[Organism] OR human[All Fields]) AND thyroid peroxidase[All Fields]) human thyroid peroxidase human[organism] AND thyroid peroxidase ("Homo sapiens“[Organism] AND thyroid peroxidase[All Fields]) 276 records 262 records Field Limit! 14 records aren’t human sequences!!

51 NCBI FieldGuide Limit by Title and Database #1: thyroid peroxidase AND human[orgn] 262 #2: thyroid peroxidase[title] AND human[orgn] 55 #3: #2 AND srcdb refseq[properties] 5 #4: #2 AND srcdb ddbj/embl/genbank[properties] 50 Entrez Nucleotide GenBank srcdb ddbj/embl/genbank[properties] RefSeq srcdb refseq[properties] primary data

52 NCBI FieldGuide Limit by Biomolecule Type Genomic DNA biomol genomic[prop] cDNA biomol mrna[prop] #1: thyroid peroxidase AND human[orgn] 262 #2: thyroid peroxidase[title] AND human[orgn] 55 #3: #2 AND srcdb refseq[properties] 5 #4: #2 AND srcdb ddbj/embl/genbank[properties] 50 #5: #4 AND biomol genomic[prop] 26 #6: #4 AND biomol mrna[prop] 24 mRNA / cDNA genomic DNA

53 NCBI FieldGuide Limit by Protein Name thyroid peroxidase[protein name] AND human[orgn] AND gbdiv pri[prop] AND biomol mrna[prop] 24 records [title]  5 records [protein name]

54 NCBI FieldGuide Entrez Document Summaries Click the accession to view the record Links menu Links to other Entrez databases computed for M17755

55 NCBI FieldGuide Viewing M17755

56 NCBI FieldGuide GenBank Sequences for Human TPO Which one is the best sequence???

57 NCBI FieldGuide Non-redundant Explicitly linked nucleotide and protein sequences Updated to reflect current sequence data and biology Validated by hand Format consistency Distinct accession series Stewardship by NCBI staff and collaborators ftp://ftp.ncbi.nih.gov/refseq/release RefSeq: NCBI’s Derivative Sequence Database RefSeq Benefits

58 NCBI FieldGuide RefSeq: NCBI’s Derivative Sequence Database Curated transcripts and proteins –NM_123456  NP_123456 –NR_123456 (non-coding RNA) Model transcripts and proteins –XM_123456  XP_123456 –XR_123456 (non-coding RNA) Assembled Genomic Regions (contigs) –NT_123456 (BAC clones) –NW_123456 (WGS) Other Genomic Sequence –NG_123456 (complex regions, pseudogenes) –NZ_ABCD12345678 (WGS)  ZP_123456 Chromosome records in Entrez Genome –NC_123456 (chromosome; microbial or organelle genome) Nucleotide Protein

59 NCBI FieldGuide NM/NP Records in Entrez COMMENT REVIEWED REFSEQ: This record has been curated by NCBI staff. The reference sequence was derived from M17755.2 and AW874082.1. On Feb 25, 2003 this sequence version replaced gi:21361188. NM_000547: variant 1 COMMENT REVIEWED REFSEQ: This record has been curated by NCBI staff. The reference sequence was derived from J02970.1, AW874082.1 and M17755.2. NM_175719: variant 2 EST that completes 3’ end Nucleotide Protein

60 NCBI FieldGuide Genomic DNA (NC, NT, NW) Model mRNA (XM) (XR) Curated mRNA (NM) (NR) Model protein (XP) Annotating the Gene Curated Protein (NP) Scanning.... = ?= ! Genbank Sequences RefSeq

61 NCBI FieldGuide The Perils of the XM XM records are models based only on genomic sequence, and are subject to revision or removal with each new build of that genome. Query= gi|20850420|ref|XM_124429.1| Mus musculus expressed sequence AA553001 (AA553001), mRNA gi|19527087|ref|NM_133873.1| Mus musculus DNA segment, Chr 4, Wayne State University 114, expressed (D4Wsu114e), mRNA Length=1898 Score = 3701.55 bits (1867), Expect = 0 Identities = 1870/1871 (99%), Gaps = 0/1871 (0%) Strand=Plus/Plus BLAST the XM against the RefSeq database to look for a replacement:

62 NCBI FieldGuide Entrez Gene and RefSeq Entrez Gene is the central depository for information about a gene available at NCBI, and often provides links to sites beyond NCBI Entrez Gene includes records for organisms that have NCBI Reference Sequences (RefSeqs) Entrez Gene records contain RefSeq mRNAs, proteins, and genomic DNA (if known) for a gene locus, plus links to other Entrez databases NCBI RefSeqs are based on primary sequence data in GenBank GenBankRefSeq Gene Nucleotide

63 NCBI FieldGuide Entrez Gene: RefSeq Annotations

64 NCBI FieldGuide NM/NP Records in Entrez Gene

65 NCBI FieldGuide Entrez Gene RefSeq Graphics NMNP

66 NCBI FieldGuide Getting the Annotation Details Genomic sequence ACCESSION NC_000002 REGION: 1396242..1525502

67 NCBI FieldGuide Genome Annotation in Entrez Nucleotide GenBank Components (clones, WGS) NT/NW Contigs NC Assembly Components Genome Components NM/XM Master mRNA

68 NCBI FieldGuide Genome Annotation Links curated mRNA genomic contig on chromosome 2 transcribing NM_000547 human chromosome 2 the 18 contigs of the chromosome 2 assembly

69 NCBI FieldGuide Searching Entrez Gene RefSeq status and variants: Reviewed RefSeqs with transcript variants srcdb refseq reviewed[prop] AND has transcript variants[prop] Gene symbol: human thyroid peroxidase (TPO) tpo [sym] AND human [organism] Disease and Gene Ontology: Membrane proteins linked to cancer integral to plasma membrane[gene ontology] AND cancer [dis] Chromosome and Links: genes on human chromosome 2 with OMIM links 2 [chromosome] AND gene omim [filter] AND human [organism] Protein name: topoisomerase genes from Archaea topoisomerase[gene/protein name] AND archaea [organism]

70 NCBI FieldGuide Examples of sequences appropriate for TPA are: Annotation of features on gene and/or mRNA sequences Assembled “full length” genes and/or mRNAs NCBI now accepts the submission of new annotations of existing GenBank sequences. Submissions must be published in a peer-reviewed journal. Facilitates the annotation of sequences by experts. What should not be submitted to TPA? Synthetic constructs (such as cloning vectors) that use well-characterized, publicly available genes, promoters, or terminators Updates or changes to existing sequence data Sequence annotations without experimental evidence Third Party Annotation (TPA) Database

71 NCBI FieldGuide Linking Protein Sequence, Structure, and Function sequence  function (pfam, smart) Conserved Domains (CDD) sequence  structure + function (cd) VAST Structure (MMDB) sequence  structure structure  structure Protein sequence  sequence

72 NCBI FieldGuide Entrez Structure Derived from experimentally determined PDB records Add value to PDB records by: –Adding explicit chemical bonding information –Validating and indexing the sequences –Annotating 3D domains and secondary structure –Adding links to CDD, Taxonomy, Pubmed –Converting PDB data to ASN.1 Structure neighbors determined by Vector Alignment Search Tool (VAST) MM MMDB: Molecular Modeling Data Base Structure

73 NCBI FieldGuide Structure Summary Page Conserved Domains VAST Neighbors for chain C (domain 0) Cn3D VAST Neighbors for domain 2

74 NCBI FieldGuide Related Structures

75 NCBI FieldGuide VAST: Structure Neighbors Vector Alignment Search Tool For each 3D domain, locate SSEs (secondary structure elements), and represent them as individual vectors. 1 2 3 4 5 6 Human IL-4 VAST uses 3D Domains only! Whole polypeptides are assigned 3D domain 0 (zero).

76 NCBI FieldGuide VAST Neighbors 1D2V 1Q4G 3D domains!   Cn3D

77 NCBI FieldGuide Submitting a PDB File to VAST Redesigned interface! This is the best way to convert PDB into MMDB format! New!

78 NCBI FieldGuide Structure + Function VAST finds proteins that have similar 3D folds CD-Search finds proteins that have similar sequences and similar functions Curated CDs = VAST + CD-Search Proteins that have similar 3D folds, similar sequences and similar functions

79 NCBI FieldGuide Protein Links: Domains Click on a colored bar to align your sequence to the CD

80 NCBI FieldGuide CDD Record – heme peroxidases aligned query red = high conservation blue = low conservation

81 NCBI FieldGuide Curated CD Record - EGF Annotated features Launch Cn3D phylogenetic tree of aligned sequences Launch CDTree New

82 NCBI FieldGuide Curated CD Record - EGF Annotated features Launch Cn3D phylogenetic tree of aligned sequences Launch CDTree New Cn3D

83 NCBI FieldGuide Entrez PubChem PC Substance PC Compound PC BioAssay Primary database of chemical samples Derived database of known chemicals from PC Substance records Primary database of bioactivity screens of samples in PC Substance

84 NCBI FieldGuide Links from Structure N-acetylglucosamine heme mannose fucose

85 NCBI FieldGuide Sequence Polymorphisms SNPOMIM Primary database of submitted SNPs Curated database of reference SNPs Contains more than just SNPs: True SNPs MNP (multiple nucleotide) Insertions Deletions Microsatellites Mixed No variation (constant) Clinical literature database Curated at Johns Hopkins Univ Links human genes and genetic disorders to human disease Lists allelic variants that have clinical consequences Variations in SNP are not necessarily in OMIM, and vice versa! General PolymorphismsHuman Phenotypes

86 NCBI FieldGuide Linking to SNP Links to SNP are also available from Nucleotide and Protein Entrez Gene - TPO

87 NCBI FieldGuide Entrez SNP primary data: ss# SNP UID: rs#

88 NCBI FieldGuide Find Non-synonymous SNPs #7 AND coding nonsynon[Function Class] Function Class

89 NCBI FieldGuide Non-synonymous TPO SNPs Link to Map Viewer View all SNPs in locus Link to related 3D structures

90 NCBI FieldGuide GeneView in dbSNP

91 NCBI FieldGuide Links to OMIM Entrez Gene - TPO

92 NCBI FieldGuide OMIM Record

93 NCBI FieldGuide Explore a Disease SNP 799

94 NCBI FieldGuide Curated CD Record Launch Cn3D phylogenetic tree of aligned sequences Launch CDTree Cn3D E799


Download ppt "NCBI FieldGuide NCBI Molecular Biology Resources January 12, 2007 A Field Guide Part 1."

Similar presentations


Ads by Google