Lindsay & Gordon’s Discovery Support Systems Model

Slides:

Advertisements

Similar presentations

eClassifier: Tool for Taxonomies

Advertisements

Author linkage Vetle I. Torvik. PubMed/MEDLINE is topic-driven Articles in MEDLINE are assigned medical subject headings (MeSH) PubMed converts a free.

Chapter 5: Introduction to Information Retrieval

Improved TF-IDF Ranker

Knowledge Enabled Information and Services Science Schema-Driven Relationship Extraction from Unstructured Text Cartic Ramakrishnan Kno.e.sis Center, Wright.

Prof. Carolina Ruiz Computer Science Department Bioinformatics and Computational Biology Program WPI WELCOME TO BCB4003/CS4803 BCB503/CS583 BIOLOGICAL.

Ranking models in IR Key idea: We wish to return in order the documents most likely to be useful to the searcher To do this, we want to know which documents.

Automating Discovery from Biomedical Texts Marti Hearst & Barbara Rosario UC Berkeley Agyinc Visit August 16, 2000.

Text Mining Tools: Instruments for Scientific Discovery Marti Hearst UC Berkeley SIMS Advanced Technologies Seminar June 15, 2000.

An Overview of Text Mining Rebecca Hwa 4/25/2002 References M. Hearst, “Untangling Text Data Mining,” in the Proceedings of the 37 th Annual Meeting of.

Computer comunication B Information retrieval. Information retrieval: introduction 1 This topic addresses the question on how it is possible to find relevant.

Social Pharmacy and Pharmacoepidemiology Lister Hill National Center for Biomedical Communications Text-based Discovery in Biomedicine The Architecture.

Retrieval Models II Vector Space, Probabilistic.  Allan, Ballesteros, Croft, and/or Turtle Properties of Inner Product The inner product is unbounded.

CSE 730 Information Retrieval of Biomedical Data The use of medical lexicon in biomedical IR.

Chapter 5: Information Retrieval and Web Search

B IOMEDICAL T EXT M INING AND ITS A PPLICATION IN C ANCER R ESEARCH Henry Ikediego

AQUAINT Kickoff Meeting – December 2001 Integrating Robust Semantics, Event Detection, Information Fusion, and Summarization for Multimedia Question Answering.

An Integrated Approach to Extracting Ontological Structures from Folksonomies Huairen Lin, Joseph Davis, Ying Zhou ESWC 2009 Hyewon Lim October 9 th, 2009.

©2008 Srikanth Kallurkar, Quantum Leap Innovations, Inc. All rights reserved. Apollo – Automated Content Management System Srikanth Kallurkar Quantum Leap.

THOMSON SCIENTIFIC Web of Science Using the specialized search and analyze features Jackie Stapleton, librarian Fall 2006.

Knowledge Discovery in the Digital Library Access tools for mining science ICSTI Public Workshop Presented by: Bernard Dumouchel, Director-General February.

 CiteGraph: A Citation Network System for MEDLINE Articles and Analysis Qing Zhang 1,2, Hong Yu 1,3 1 University of Massachusetts Medical School, Worcester,

Chapter 6: Information Retrieval and Web Search

Search Engines. Search Strategies Define the search topic(s) and break it down into its component parts What terms, words or phrases do you use to describe.

Introduction to Digital Libraries hussein suleman uct cs honours 2003.

Information Retrieval Model Aj. Khuanlux MitsophonsiriCS.426 INFORMATION RETRIEVAL.

Text Mining Tools: Instruments for Scientific Discovery Marti Hearst UC Berkeley SIMS IMA Text Mining Workshop April 17, 2000.

Why Biomedical Literature? Potentially 80% of the known biomedical knowledge is contained in PubMed Analysis Interpretation.

Distribution of information in biomedical abstracts and full- text publications M. J. Schuemie et al. Dept. of Medical Informatics, Erasmus University.

Data mining, interactive semantic structuring, and collaboration: A diversity-aware method for sense-making in search Mathias Verbeke, Bettina Berendt,

Automatic Assignment of Biomedical Categories: Toward a Generic Approach Patrick Ruch University Hospitals of Geneva, Medical Informatics Service, Geneva.

Aiding Biomedical Researchers with Tools to Assist Discovery Neil R. Smalheiser May 18, 2006.

Automatically Identifying Candidate Treatments from Existing Medical Literature Catherine Blake Information & Computer Science University.

A System for Automatic Personalized Tracking of Scientific Literature on the Web Tzachi Perlstein Yael Nir.

DISCUSSION Using a Literature-based NMF Model for Discovering Gene Functional Relationships Using a Literature-based NMF Model for Discovering Gene Functional.

Toward Entity Retrieval over Structured and Text Data Mayssam Sayyadian, Azadeh Shakery, AnHai Doan, ChengXiang Zhai Department of Computer Science University.

Understanding GWAS SNPs Xiaole Shirley Liu Stat 115/215.

Innovative Novartis Knowledge Center

Major Issues n Information is mostly online n Information is increasing available in full-text (full-content) n There is an explosion in the amount of.

Multi-Source Information Extraction Valentin Tablan University of Sheffield.

TDM in the Life Sciences Application to Drug Repositioning *

Information Retrieval in Practice

Ricardo EIto Brun Strasbourg, 5 Nov 2015

Best pTree organization? level-1 gives te, tf (term level)

Neighborhood - based Tag Prediction

Clustering of Web pages

Text Mining CSC 600: Data Mining Class 20.

RaJoLink: Creative Knowledge Discovery by Literature Outlier Detection

Kenneth Baclawski et. al. PSB /11/7 Sa-Im Shin

Search Engine Architecture

Introduction to Corpus Linguistics: Exploring Collocation

Terminology problems in literature mining and NLP

John MacMullen SILS Bioinformatics Journal Club Fall 2002

Title of your science project

Martin Rajman, Martin Vesely

Statistical NLP: Lecture 9

Hypotheses Hypothesis Testing

Blake & Pratt’s ‘Collaborative Information Synthesis’

Introduction to Search Engines

Networked Information Resources

Citation-based Extraction of Core Contents from Biomedical Articles

Introduction to Hypothesis Testing

Chapter 5: Information Retrieval and Web Search

Correlated-Groups and Single-Subject Designs

Text Mining CSC 576: Data Mining.

Boolean and Vector Space Retrieval Models

The Winograd Schema Challenge Hector J. Levesque AAAI, 2011

Introduction to Search Engines

Statistical NLP : Lecture 9 Word Sense Disambiguation

Introduction to Systematic Reviews

Presentation transcript:

Lindsay & Gordon’s Discovery Support Systems Model John MacMullen SILS Bioinformatics Journal Club Fall 2002

SILS Bioinformatics Journal Club Background Specialization in science leads to fragmented, ‘complementary but disjoint’ or ‘non-interactive’ literatures Attempt to find ‘undiscovered public knowledge’ in the biomedical literature Tools are needed to integrate biomedical knowledge; ‘discovery support systems’ are one class Replication and extension of Swanson & Smalheiser’s model (‘Arrowsmith’) SILS Bioinformatics Journal Club

Complementary but Disjoint Literatures Fish Oil C Raynaud’s B1 – Blood Viscosity B2 – Platelet Aggregation B3 – Vascular Reactivity ‘non-interactive’ literatures Adapted from Swanson & Smalheiser, 1997 by Weeber, et al., 2001 SILS Bioinformatics Journal Club

Swanson & Smalheiser’s method Systematic trial-and-error method Procedure 1: Citation acquisition Search MEDLINE for topical cites (‘C’ list) Apply stopword list and extract unique terms (‘B’ list) Search MEDLINE for ‘B’ term cites; prune list Perform MEDLINE searches for each ‘B’ term Classify results into likely categories Derive the intersection of each ‘B’ set with the restriction set, and the union of intersection sets (‘U’) Search the resulting terms of ‘U’ set in MEDLINE ‘U’ list becomes potential ‘A’ terms, with each ‘A’ term attached to the ‘B’ term that generated it Rank ‘A’ term results against ‘B’ co-occurrence Procedure 2: Relationship Mining Search for pre-existing A→C &/or A→B→C relationships Search for novel A→C relationships Output: Display of ‘A’ & ‘C’ cites by their common ‘B’ terms Goal: a plausible testable hypothesis Human relevance judgments in each step influence future steps http://arrowsmith.psych.uic.edu SILS Bioinformatics Journal Club

Lindsay & Gordon’s method Limited to lexical statistics, no syntactic or semantic evaluation Uses full MEDLINE record instead of title only Identify a source literature (e.g., a topic) Find all single words, bi-grams & tri-grams in source corpus; exclude stop words; normalize singular/plural Calculate 4 statistics for each token (tf, df, rf, tf*idf) Rank tokens by frequency of occurrence Identify 1 or more intermediate literatures based on stats in step (3), starting with highest ranks from (4) Run process from steps 1-4 for each intermediate literature DL complete MEDLINE records for all docs “mentioning” (i.e., word match or index term) source topic General stop word list + common medical terminology Lindsay & Gordon, 1999, pp. 576-577 SILS Bioinformatics Journal Club

Term / Phrase Statistics tf = token frequency in sub-corpus df = doc frequency (# of docs in sub-corpus w/this term) rf = relative frequency (# of appearances in sub-corpus vs whole corpus) idf(t) = log(N / f(t)) where N = # of docs in whole corpus over number of docs with t in them tf*idf = token frequency * inverse doc frequency SILS Bioinformatics Journal Club

SILS Bioinformatics Journal Club Experiments Reproduction of Swanson & Smalheiser’s magnesium / migraine connection Used their method to try to find the 11 intermediate literatures S&S found +1 new 1,081 MEDLINE records from 1986-1988 on migraine SILS Bioinformatics Journal Club

SILS Bioinformatics Journal Club Outcome Is ranking tokens by frequency effective? Arbitrary ranking and population size cutoffs Domain knowledge used (med student) Lots of manual intervention. Is this reproducible? Is it really all that different from S&S? SILS Bioinformatics Journal Club

SILS Bioinformatics Journal Club Hypotheses Intermediate literatures are best identified by absolute lexical frequencies Candidate discoveries are best generated by relative lexical frequencies Results showed the opposite of H2 (582-583) Search for the absence of connections between source and target literatures (Swanson’s disjointness) SILS Bioinformatics Journal Club

SILS Bioinformatics Journal Club Questions Is this process more automatable? Is 1 intermediate step enough? What happens to system complexity when there are i intermediate literatures? S I1 Ii T SILS Bioinformatics Journal Club