Presentation is loading. Please wait.

Presentation is loading. Please wait.

Lecture 2.21 Retrieving Information: Using Entrez.

Similar presentations

Presentation on theme: "Lecture 2.21 Retrieving Information: Using Entrez."— Presentation transcript:

1 Lecture 2.21 Retrieving Information: Using Entrez

2 Lecture 2.22 Retrieving information: how it works: Servers have the records you want You need to understand the data they have, and how it is organized There are often many ways to get to an answer. Route to get there is not always obvious, but you need to think of alternatives and traps. Use some query language – each system has its own. Retrieve data in a specified format. Save it in a way that will be useful to you.

3 Lecture 2.23 What you may be looking for: Did a BLAST search – and you need more info about some of the proteins they found similarities to. Heard on about a disease gene that was recently discovered, and you want to know more about it. Want to build a dataset for local blast searches. A colleague wants you to do an alignment of all sequences from a given protein family.

4 Lecture 2.24 What you are looking for: PubMed paper from author X Sequence from gene X in organism Y All information about organelle W in model organism Y All information about disease X in human Orthologs of that disease genes in other model organisms

5 Lecture 2.25 Central Dogma: NCBI version RNA protein DNA Write a paper about it

6 Lecture 2.26 Entrez: Pathway to Discovery Amino acid sequence similarity Coding region features Nucleotide sequence similarity Term frequency statistics Literature citations in sequence databases MEDLINE abstracts Nucleotide sequences Protein sequences 1993

7 Lecture 2.27 Related Articles Type in your last name and find a paper form one of your teammates

8 Lecture 2.28 Hard link DNA to protein L12345

9 Lecture 2.29 From Fig 1 of Entrez search and retrieval system Jim Ostell Chapter 14, the NCBI Handbook. 2003

10 Lecture 2.210

11 Lecture 2.211

12 Lecture 2.212

13 Lecture 2.213 Ctrl-F

14 Lecture 2.214

15 Lecture 2.215 Getting started in Entrez

16 Lecture 2.216 “ouellette bf” [au] AND yeast

17 Lecture 2.217

18 Lecture 2.218

19 Lecture 2.219

20 Lecture 2.220 MeSH: Medical Subject Heading

21 Lecture 2.221 A query Word : too many hits –More words (the Boolean ‘AND’ is the default) –Limit query to specified field –Limit query in time –Do Boolean on queries #1 AND #2 #3 NOT #5 #7 OR #8

22 Lecture 2.222 hieter p [au]

23 Lecture 2.223 Limit in Time: 1993-01-01 1993-12-31

24 Lecture 2.224

25 Lecture 2.225 No abstract With abstract Full Text on-line Full Text in PubMed Central

26 Lecture 2.226 boguski m [au] 99 boguski ms [au] 80

27 Lecture 2.227 #24 NOT #23 19

28 Lecture 2.228

29 Lecture 2.229 Other types of links in Entrez Next slides to explore other kind of things linked into Entrez records.

30 Lecture 2.230 “hieter p” [au] cdc16p

31 Lecture 2.231

32 Lecture 2.232

33 Lecture 2.233

34 Lecture 2.234

35 Lecture 2.235

36 Lecture 2.236

37 Lecture 2.237

38 Lecture 2.238

39 Lecture 2.239 “Books”

40 Lecture 2.240 (2)

41 Lecture 2.241

42 Lecture 2.242

43 Lecture 2.243

44 Lecture 2.244

45 Lecture 2.245

46 Lecture 2.246 Link to Genome View of Chromosome I

47 Lecture 2.247

48 Lecture 2.248

49 Lecture 2.249 RefSeq RefSeq represents the NCBI curated “reference sequences” for all ‘worked’ genome. Historically, these used to be referred to as “GenBank-Gold”. RefSeq are either genomic, mRNA or protein sequences. Not all sequences are in RefSeq All RefSeq sequences are assembled/taken from things in GenBank.

50 Lecture 2.250 Some of the features of the RefSeq: non-redundancy explicitly linked nucleotide and protein sequences updates to reflect current knowledge of sequence data and biology data validation and format consistency distinct accession series ongoing curation by NCBI staff and collaborators, with review status indicated on each record

51 Lecture 2.251 Accession number space GenBank: –1+5 (L12345, U00001) –2+6 (AF000001, AC000003) –4+2+6 (WGS) All have accession.version Protein: –1+5 (SwissProt/UniProt) –3+5 (GenPept) All have accession.version RefSeq: –N*_12345

52 Lecture 2.252 RefSeq Accession Number Space NC_123456GenomicComplete genomic molecules including genomes, chromosomes, organelles, plasmids. NG_123456GenomicIncomplete genomic region; supplied to support the NCBI Genome Annotation pipeline. NM_123456mRNA NR_123456RNANon-coding transcripts including structural RNAs, transcribed pseudogenes, and others NP_123456Protein NP_12345678ProteinPlanned expansion of accession series

53 Lecture 2.253 Automated Assemblies NT_123456GenomicIntermediate genomic assemblies of BAC sequence data NW_123456GenomicIntermediate genomic assemblies of Whole Genome Shotgun sequence data

54 Lecture 2.254 Model RefSeq records XM_123456mRNAmodel mRNA provided by the Genome Annotation process; sequence corresponds to the genomic contig. XR_123456RNAmodel non-coding transcripts provided by the Genome Annotation process; sequence corresponds to the genomic contig. XP_123456Proteinmodel proteins provided by the Genome Annotation process; sequence corresponds to the genomic contig.

55 Lecture 2.255 WGS special case NZ_ABCD123 45678 GenomicA collection of whole genome shotgun sequence data for a project. Accessions are not tracked between releases. The first four characters following the underscore (e.g. 'ABCD') identifies a genome project. ZP_12345678ProteinProteins annotated on NZ_ accessions (often via computational methods).

56 Lecture 2.256 Download all the data Entrez and RefSeq

57 Lecture 2.257

58 Lecture 2.258

59 Lecture 2.259

60 Lecture 2.260 Locus Link

61 Lecture 2.261 Things to watch out for:

62 Lecture 2.262

Download ppt "Lecture 2.21 Retrieving Information: Using Entrez."

Similar presentations

Ads by Google