Presentation is loading. Please wait.

Presentation is loading. Please wait.

14 May 2013Ganesha Associates1 Competências Básicas de Investigação Científica e de Publicação Lecture 3: Searching the Literature.

Similar presentations

Presentation on theme: "14 May 2013Ganesha Associates1 Competências Básicas de Investigação Científica e de Publicação Lecture 3: Searching the Literature."— Presentation transcript:

1 14 May 2013Ganesha Associates1 Competências Básicas de Investigação Científica e de Publicação Lecture 3: Searching the Literature

2 Types of scientific output Abstracts Primary journal articles – peer-reviewed interpretations of original research Reviews Book chapters, monographs Conference proceedings Lectures, seminars Sequences, data sets Patents, other forms of intellectual property Blogs, tweets… 14 May 2013Ganesha Associates2

3 Usage of output differs 5 July 2012Copyright: Ganesha Associates 2012 3

4 Some sources of scientific content Google PubMed/Medline (NLM) Scopus (Elsevier) Web of Science (Thomson Reuters) Google Scholar PubMed Central, PubMed Central Europe SciELO, Biblioteca Virtual em Saude Science Direct, Ovid, SpringerLink, Wiley Online Library, BiomedCentral, Public Library of Science, SWETSwise… CAPES Portal de Periódicos 14 May 2013Ganesha Associates4

5 Each source is different Free – Google, Google Scholar, Pubmed Central Subscription – Scopus, ScienceDirect Abstracts and citations only – PubMed, Web of Science Full text, single publisher – SpringerLink Full text, many publishers – Pubmed Central, SwetsWise Online Content

6 Classify sources of content Abstract only Full text Free access Subscription

7 14 May 2013Ganesha Associates7 You can get access if… The journal is subscribed to by CAPES You have a personal subscription The journal is of the ‘Open Access’ type – Note: some journals only make their content ‘Open Access’ after 6 or longer months. Some journals contain a mixture of OA and non-OA articles. See for more info. Journals in the ‘red’ categories are available anywhere. Most journals subscribed to by CAPES will be available from more than one source. CAPES journals are only available from computers within the University network unless you have remote access privileges.

8 14 May 2013Ganesha Associates8 So which sources should I use ? No single source contains all of the articles relevant to your research Google has the broadest coverage, but not all of the documents you find will be peer- reviewed articles Scopus, WoS and PubMed give you the best balance between quality and quantity, and, in theory, should link to all the content subscribed to by CAPES, plus OA content.

9 24 August 2012Ganesha Associates9 So usually you will visit several sources to find the information you are looking for ? GoogleScopus Web of Science PubMedScielo HighWire Science Direct Springer Link National Literature CAPES Portal OA: BMC Or PLoS Other Databases, e.g. NCBI

10 Components of a bibliographic database Content such as abstracts and full-text articles [or a pointer to where these may be found] Metadata [data about data] Index Search engine Ranking/relevance algorithm Plus many additional features 14 May 2013Ganesha Associates10

11 14 May 2013Ganesha Associates11 Content (Basic PDF)

12 14 May 2013Ganesha Associates12 Content (HTML)

13 14 May 2013Ganesha Associates13 Content (Page source)

14 14 May 2013Ganesha Associates14 Content (metadata)

15 14 May 2013Ganesha Associates15 Sources of article metadata Journal name, publisher, ISSN Date of publication, volume and page numbers Document object identifier [DOI] Article title Authors names Address, affiliation, contact details Article section identifiers Sources of funding Semantic tagging, e.g. protein name

16 The basis of search: Indexing The purpose of an index is to optimize speed and performance in finding relevant documents for a search query. Without an index, the search engine would have to scan every document in the corpus, which would require considerable time and computing power. Metadata helps the indexing algorithm to select different classes of terminology from which to make an index, so a search can be carried out on just the authors names, for example 24 August 2012Ganesha Associates16

17 Example: Inverted index Suppose we have three short documents: – D[0] = "it is what it is" – D[1] = "what is it" – D[2] = "it is a banana" The inverted index looks like: – "a": {2} – "banana": {2} – "is": {0, 1, 2} – "it": {0, 1, 2} – "what": {0, 1} So a search for “what” would produce a list of two documents, 0 and 1.

18 Search: stop words and stemming Not all search terms get used, and some others will get modified. Stop words such as “a”, “the”, “this” occur frequently. They are dropped at indexing time and thus ignored at search time. Often a stemming algorithm reduces words to their basic stem, e.g. "fishing", "fished", "fish", and "fisher" to the root word, "fish". Both strategies make search quicker, but at a cost. 14 May 2013Ganesha Associates18

19 Search: how the result list is ranked Date of publication Relevance – Frequency with which search terms occur in the document – Proximity of search terms Google’s PageRank algorithm uses "link popularity”- a document is ranked higher if there are more links to it 14 May 2013Ganesha Associates19


21 The question behind the query Search engines think in terms of words, but users think in terms of sentences! – How do you spell Bousfield? – What do we know about BRCA1? – Given these symptoms, what is the most likely diagnosis? – What are the side effects of aspirin? – Has this chemical structure been synthesized before? “Cancer causes X” vs. “Y causes cancer”

22 24 August 2012Ganesha Associates22 What real queries look like - Google pharmacogenomics and disorders bacteria growth casein media effect waal pseudomonas TRPM2 PCR mouse Chitinases in carnivorous plants glycerophosphoinositol 4-phosphate Dai N, Gubler C, Hengstler P, Meyenberger C, Bauerfeind P. Improved capsule endoscopy after bowel preparation. Gastrointest Endosc 2005;61(1) 28-31.

23 24 August 2012Ganesha Associates23 What real queries look like - PubMed ATR1 HAL2 Fuzzy[ALL] AND Hanage[AU] AND 2005[DP] arndt and rhabdomyosarcoma "Vorster HH"[Author] (rotavirus infections[majr] OR rotavirus[majr]) AND english[la] AND humans[mh] NOT (editorial[pt] OR letter[pt])

24 24 August 2012Ganesha Associates24 Query changes people actually make Query series 1 – latrunculin – latrunculin fm3a cell arrest – latrunculin fm3a arrest – latrunculin fm3a – latrunculin FM3A Query series 2 – cytokinin signalling in arabidopsis – "cytokinin signalling in arabidopsis" – cytokinin delta – spindly arabidopsis Results – Remember to look beyond the first page. Compare the results of Query 1 in PubMed and Google (add the term PubMed)

25 Improving search accuracy Wild card characters – "a * saved is a * earned" Operators – jaguar speed -car – Pandas – “ribosome” Synonyms – MeSH terms Boolean terms – AND, OR, NOT Faceted search – GO terms

26 Anatomy of a query - Pubmed invasive fungal infections in young children invasive[All Fields] AND ("mycoses"[MeSH Terms] OR "mycoses"[All Fields] OR ("fungal"[All Fields] AND "infections"[All Fields]) OR "fungal infections"[All Fields]) AND ("Young Child"[Journal] OR ("young"[All Fields] AND "children"[All Fields]) OR "young children"[All Fields]) 14 May 2013Ganesha Associates26


28 So… Using the same search terms will produce different results in different databases because: – Content different – Preparation of search terms will be different, e.g. only Pubmed uses MeSH terms – Indexing process, implementation of stemming, removal of stop words will be different – Ranking algorithms will be different

29 24 August 2012Ganesha Associates29 Search terms - summary Make sure you understand the search term syntax used by your preferred site, i.e. AND, +, “ ”, etc Search engines ‘see’ only certain words, not sentences Do not use ‘stop’ words, i.e. a, the, of, before unless they are part of “a text string search” Try to think of different ways to search for the same subject Look beyond the first page of search results

30 Other useful features Related articles – PubMed, Scopus Cited by… – WoS, Scopus Saved searches – PubMed, Scopus Reference management – EndNote, Zotero, Mendeley

31 PubMed Related Articles Algorithm The similarity between documents is measured by the words they have in common A list of 310 common, but uninformative, stop words are eliminated from processing at this stage. Next, a limited amount of stemming of words is done. Words from the abstract of a document are classified as text words. Words from titles are also classified as text words, but words from titles are added in a second time to give them a small advantage in the local weighting scheme. MeSH terms are placed in a third category, and a MeSH term with a subheading qualifier is entered twice, once without the qualifier and once with it. These three categories of words (or phrases in the case of MeSH) comprise the representation of a document. No other fields, such as Author or Journal, enter into the calculations. 24 August 2012Ganesha Associates31

32 Quick tour
























56 Break

57 Other types of database Some databases contain mainly text, but others contain image, sequence or structural data The technologies required to search and retrieve these different data types are very different. There is a growing amount of information in publicly available databases. For example, in 2013 the Nucleic Acids Research journal online Molecular Biology Database Collection listed 1512.Molecular Biology Database Collection The National Center for Biotechnology Information (NCBI) and the European Bioinformatics Institute(EBI) host some of the most important databases used for biomedical research.NCBIEBI 24 August 2012Ganesha Associates57

58 Linking different data types is a challenge 24 August 2012Ganesha Associates58 Gene Expression Warehouse ProteinDisease SNP Enzyme Pathway Known Gene Sequence Cluster Affy Fragment Sequence LocusLink MGD ExPASy SwissProt PDB OMIM NCBI dbSNP ExPASy Enzyme KEGG SPAD UniGene Genbank NMR Metabolite

59 Databases available at NCBI 24 August 2012Ganesha Associates59

60 Asking meaningful questions is a challenge 24 August 2012Ganesha Associates60

61 24 August 2012Ganesha Associates61



64 24 August 2012Ganesha Associates64 Example of BLAST search results

65 PC Compound Record 24 August 2012Ganesha Associates65

66 UCSC Genome Browser The Genome Browser zooms and scrolls over chromosomes, showing the work of annotators worldwide.Genome Browser The Gene Sorter shows expression, homology and other information on groups of genes that can be related in many ways.Gene Sorter Blat quickly maps your sequence to the genome. Blat TheTable Browser provides convenient access to the underlying database.Table Browser VisiGene lets you browse through a large collection of in situ mouse and frog images to examine expression patterns. VisiGene Genome Graphs allows you to upload and display genome-wide data sets. Genome Graphs 24 August 2012Ganesha Associates66

67 24 August 2012Ganesha Associates67

68 For example: adding meta data with a gene ontology There is no universal standard terminology in biology and related domains, and the use of different terms may be specific to a species, research area or even a particular research group. This makes communication and sharing of data difficult. The Gene Ontology project provides an ontology of defined terms representing gene product properties. Useful links: http://www.amigo.org The ontology covers three domains: – cellular component, the parts of a cell or its extracellular environment – molecular function, the elemental activities of a gene product at the molecular level, such as binding or catalysis – biological process, operations or sets of molecular events with a defined beginning and end, pertinent to the functioning of integrated living units: cells, tissues, organs, and organisms. 14 May 2013Ganesha Associates68

69 24 August 2012Ganesha Associates69 attacked time control Puparial adhesion Molting cycle hemocyanin Defense response Immune response Response to stimulus Toll regulated genes JAK-STAT regulated genes Immune response Toll regulated genes Amino acid catabolism Lipid metobolism Peptidase activity Protein catabloism Immune response Bregje Wertheim at the Centre for Evolutionary Genomics, Department of Biology, UCL and Eugene Schuster Group, EBI. MicroArray data analysis with GO


71 Learning points 13/08/2013Ganesha Associates Google is a good place to start Learn to use several information resources Modify your search terms during the course of a search session Understand how the results are ranked and don’t just look on the first page

Download ppt "14 May 2013Ganesha Associates1 Competências Básicas de Investigação Científica e de Publicação Lecture 3: Searching the Literature."

Similar presentations

Ads by Google