Lexical Tools Briefing The Lexical Systems Group NLMNLM. LHNCBC. CGSBLHNCBCCGSB June, 2006.

Slides:



Advertisements
Similar presentations
LIS618 lecture 6 Thomas Krichel structure DIALOG –basic vs additional index –initial database file selection (files) Lexis/Nexis.
Advertisements

Effective Searching Strategies and Techniques
Search Techniques Boolean Logic and Keyword Searching.
Properties of Text CS336 Lecture 3:. 2 Generating Document Representations Want to automatically generate with little human intervention Use significant.
PubMed and its search options Jan Emmerich, Sonja Jacobi, Kerstin Müller (5th Semester Library Management)
Intelligent Information Retrieval CS 336 –Lecture 3: Text Operations Xiaoyan Li Spring 2006.
U. S. National Library of Medicine NLM Indexing Initiative Tools for NLP: MetaMap and the Medical Text Indexer Natural Language Processing: State of the.
Codifying Semantic Information in Medical Questions Using Lexical Sources Paul E. Pancoast Arthur B. Smith Chi-Ren Shyu.
The Role of the UMLS in Vocabulary Control CENDI Conference “Controlled Vocabulary and the Internet” Stuart J. Nelson, MD.
Text Operations: Preprocessing. Introduction Document preprocessing –to improve the precision of documents retrieved –lexical analysis, stopwords elimination,
Engineering Village ™ ® Basic Searching On Compendex ®
U. S. National Library of Medicine Welcome to the first MMTx User’s Group Meeting AMIA 2003 November 11, 2003.
April 22, Text Mining: Finding Nuggets in Mountains of Textual Data Jochen Doerre, Peter Gerstl, Roland Seiffert IBM Germany, August 1999 Presenter:
June 12, 2015 ©2005 Ovid Technologies Jörn Hope Ovid.
Brian A. Carlsen Apelon, Inc. Tools For Classification Integration Networked Knowledge Organization Systems/Services Workshop June 28, 2001.
QuASI: Question Answering using Statistics, Semantics, and Inference Marti Hearst, Jerry Feldman, Chris Manning, Srini Narayanan Univ. of California-Berkeley.
UCB BioText TREC 2003 Participation Participants: Marti Hearst Gaurav Bhalotia, Presley Nakov, Ariel Schwartz Track: Genomics, tasks 1 and 2.
CSE 730 Information Retrieval of Biomedical Data The use of medical lexicon in biomedical IR.
© 2013 The McGraw-Hill Companies, Inc. All rights reserved. Introduction to CPT Chapter Six.
HIKM’2006AMTEx Automatic Document Indexing in Large Medical Collections Angelos Hliaoutakis, Kalliopi Zervanou, Euripides G.M. Petrakis Technical University.
Unified Medical Language System® (UMLS®) NLM Presentation Theater MLA 2007 National Library of Medicine National Institutes of Health U.S. Dept. of Health.
Word Sense Disambiguation for Automatic Taxonomy Construction from Text-Based Web Corpora 12th International Conference on Web Information System Engineering.
Taxonomies: Hidden but Critical Tools Marjorie M.K. Hlava President Access Innovations, Inc.
Introduction to Citing Worth Weller. Why Cite? There are four reasons for citations: 1.your teacher told you that you had to have them 2.they show that.
What’s New in VRS? GUGM May 15, 2008 Presenter: Kelly P. Robinson GIL Service Georgia State University
Text Categorization By Susanne M. Humphrey Lexical Systems Group National Library of Medicine
Using Eclipse. What is Eclipse? The Eclipse Platform is an open source IDE (Integrated Development Environment), created by IBM for developing Java programs.
Unified Medical Language System® (UMLS®) NLM Presentation Theater MLA 2005 May 16 & 17, 2005 Rachel Kleinsorge.
Session II: Scientific Publishing and Semantic Web W3C Semantic Web for Life Sciences Workshop October 27, 2004 Moderator: Alan R. Aronson.
Effective Searching Techniques Getting the most from Electronic Information Resources Ibrar Muahammad Chief Librarian Tahir Jan Assistant Librarian University.
Olivier Bodenreider Lister Hill National Center for Biomedical Communications Bethesda, Maryland - USA Experiences in visualizing and navigating biomedical.
Betsy L. Humphreys Betsy L. Humphreys Associate Director for Library Operations NLM, NIH, HHS NLM, NIH, HHS National Library.
Let VRS Work for You! ELUNA Conference 2008 Presenter: Kelly P. Robinson GIL Service Georgia State University
1 st June 2006 St. George’s University of LondonSlide 1 Using UMLS to map from a Library to a Clinical Classification: Improving the Functionality of a.
Concept-based Image Retrieval: The ARRS GoldMiner ® Image Search Engine Charles E. Kahn Jr., MD, MS Medical College of Wisconsin Milwaukee, Wisconsin,
Jennie Ning Zheng Linda Melchor Ferhat Omur. Contents Introduction WordNet Application – WordNet Data Structure - WordNet FrameNet Application – FrameNet.
Survey of Medical Informatics CS 493 – Fall 2004 September 27, 2004.

Medline on OvidSP. Medline Facts Extensive MeSH thesaurus structure with many synonyms used in mapping and multidatabase searching with Embase Thesaurus.
Searching CAB Abstracts, Medline & Zoological Record Cab Abstracts –Agriculture, Animal and crop husbandry –Animal and plant breeding –Veterinary medicine.
Index Building Overview Database tables Building flow (logical) Sequential Drawbacks Parallel processing Recovery Helpful rules.
Indexing Jyothi Jandhyala. Disclaimer! Indexing cannot be reduced to a set of steps that can be followed! It is not a mechanical process. Indexing books.
Indexing UMLS concepts with Apache Lucene Julien Thibault University of Utah Department of Biomedical Informatics.
Relevance Detection Approach to Gene Annotation Aid to automatic annotation of databases Annotation flow –Extraction of molecular function of a gene from.
UMLS Unified Medical Language System. What is UMLS? A Unified knowledge representation system Project of NLM Large scale Distributed First launched in.
Sharing Ontologies in the Biomedical Domain Alexa T. McCray National Library of Medicine National Institutes of Health Department of Health & Human Services.
C OMPUTING E SSENTIALS Timothy J. O’Leary Linda I. O’Leary Presentations by: Fred Bounds.
UoS Libraries 2011 EndNote X5 - basic graduate session.
Basics of Information Retrieval and Query Formulation Bekele Negeri Duresa Nuclear Information Specialist.
Lexical Tools Briefing The Lexical Systems Group NLMNLM. LHNCBC. CGSBLHNCBCCGSB May, 2006.
By: Chris Lu Guy Divita Allen Browne Date: Remove Parenthesis Plural Forms of (s), (es), and (ies)
Automatically Identifying Candidate Treatments from Existing Medical Literature Catherine Blake Information & Computer Science University.
Java Programming: Advanced Topics1 Introduction to Advanced Java Programming Chapter 1.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Automatic Document Indexing in Large Medical Collections.
Chapter – 8 Software Tools.
South Dakota Library Network SFX Management Basics A – Z List & Citation Linker South Dakota Library Network 1200 University, Unit 9672 Spearfish, SD
Selecting Relevant Documents Assume: –we already have a corpus of documents defined. –goal is to return a subset of those documents. –Individual documents.
Lecture 1: Introduction to Entrez October 16-19, 2007 NCBI PowerScripting.
DIALOGBRIEFING Training Advanced Searching on DataStar Web.
GUIDE. P UB M ED
UNIFIED MEDICAL LANGUAGE SYSTEMS (UMLS)
Designing Cross-Language Information Retrieval System using various Techniques of Query Expansion and Indexing for Improved Performance  Hello everyone,
LECTURE 3: DATABASE SEARCHING PRINCIPLES
Introduction to Advanced Java Programming
PRG 421 MART Higher Education / prg421mart.com
CAB Abstracts, Medline & Zoological Record
IL Step 3: Using Bibliographic Databases
How to Search in PubMed and ESGO Journal
Using Eclipse.
Information Retrieval and Web Design
Presentation transcript:

Lexical Tools Briefing The Lexical Systems Group NLMNLM. LHNCBC. CGSBLHNCBCCGSB June, 2006

Introduction Lexical Tools Lvg Norm Text Categorization Questions Table of Contents

Introduction

Introduction - LB

Introduction - Lexicon

Introduction - LC

Introduction - LA

Introduction - Numbers

Introduction - SCRT

Introduction – Lexical Tools

Introduction - GSpell

Introduction – Text Tools

Introduction - TC

Lexical Tools Lexical Tools A suite of text utilities

Lexical Tools Input Lexical Tools A suite of text utilities take the given input

Lexical Tools Input Output… Output.3 Output.2 Output.1 Lexical Tools A suite of text utilities that generate, mutate, and filter out lexical variants from the given input

Four Tools Input Output… Output.3 Output.2 Output.1 Lvg Norm LuiNorm WordIndex

Tool Types Command line tools –lvg (Lexical Variants Generation)lvg –normnorm –luiNormluiNorm –wordIndwordInd Lexical Gui Tool (lgt)Lexical Gui Tool Web Tools Java API’s

Functions Used in nature language processing for –aggressive text pattern matching –creating normalized and expanded terms –making word, term, phrase indexes –matching queries with indexed entries –increasing recall and/or precision

Facts Release annually 100% Java (since 2002) Free distributed with open source code Run on different platforms One complete package Documents & support

Lexical Variants Generation

LVG, flow componentsflow components 37 options –input filter options (3)input filter options –global behavior options (13)global behavior options –flow specific options (2)flow specific options –output filter options (19)output filter options

Flow Components leave leaves leaving left inflect

Command Line Tool > lvg –f:i leave leave|leave|128|1|i|1|1281 leave|leave|128|512|i|1| leave|leaves|128|8|i|1| leave|left|1024|64|i|1| leave|left|1024|32|i|1| leave|leave|1024|1|i|1| leave|leave|1024|262144|i|1| leave|leave|1024|1024|i|1| leave|leaves|1024|128|i|1| leave|leaving|1024|16|i|1|

Fielded Output Input Term Output Term Categories Inflections Flow history Flow Number leave i | || | | > lvg –f:i leave

A Serial Flow Input term Remove possessive lowercase Strip punctuation Remove stop words Strip diacritics Word order sort Output term Flow components can be arranged so that the output of one is the input to another.

A Serial Flow - Example > lvg –f:l:q:g:t:p:w The Gougerot-Sjögren's Syndrome The Gougerot-Sjögren's Syndrome| gougerotsjogren syndrome|2047| |l+q+g+t+p+w|1|

Parallel Flows Input term Output term Multiple flows can be defined noOperation Uninflect synonyms Output terms

Parallel Flows - Example > lvg –f:n –f:B:y ear ear|ear|2047| |n|1| ear|aural|1|1|B+y|2| ear|auricularis|1|1|B+y|2| ear|otic|1|1|B+y|2| ear|otor|1|1|B+y|2|

Input Filter Options Output terms Input term > lvg -f:u -t:7 -F:8:6 C |ENG|S|L |VW|S | Rheumatic carditis, acute acute Rheumatic carditis|S Take field 7 from the input

Global Behavior Options Output terms Input term Output terms > lvg -f:L –f:E –s:”\” otitis otitis\otitis\128\513\L\1 otitis\E \128\513\E\2 Change separator to “\”

Output Filter Options > lvg -f:L -SC -SI hot hot|hot| |<base+positive+infin itive+pres1p23p>|L|1| Show the category and inflection names Output terms Input term

Composed of 11 Lvg flow components to abstract away from: –case –punctuation –possessive forms –inflections –spelling variants –stop words –diacritics & ligatures –word order Norm

g: remove genitives t: strip stop words o: replace punctuation with spaces l: lowercase B: uninflect each words in a term w: sort words by order rs: remove parenthetic plural forms q: strip diacritics q2: split ligature Ct: retrieve citations q4: get symbol names synonymy

g: remove genitives t: strip stop words o: replace punctuation with spaces l: lowercase B: uninflect each words in a term w: sort words by order rs: remove parenthetic plural forms q: strip diacritics q2: split ligature Ct: retrieve citations q4: get symbol names synonymy Hodgkin's Diseases, NOS Norm

g: remove genitives t: strip stop words o: replace punctuation with spaces l: lowercase B: uninflect each words in a term w: sort words by order rs: remove parenthetic plural forms q: strip diacritics q2: split ligature Ct: retrieve citations q4: get symbol names synonymy Hodgkin Diseases, NOS Hodgkin's Diseases, NOS Norm

g: remove genitives t: strip stop words o: replace punctuation with spaces l: lowercase B: uninflect each words in a term w: sort words by order rs: remove parenthetic plural forms q: strip diacritics q2: split ligature Ct: retrieve citations q4: get symbol names synonymy Hodgkin Diseases, NOS Hodgkin's Diseases, NOS Hodgkin Diseases, NOS Norm

g: remove genitives t: strip stop words o: replace punctuation with spaces l: lowercase B: uninflect each words in a term w: sort words by order rs: remove parenthetic plural forms q: strip diacritics q2: split ligature Ct: retrieve citations q4: get symbol names synonymy Hodgkin Diseases, NOS Hodgkin's Diseases, NOS Hodgkin Diseases, NOS Hodgkin Diseases NOS Norm

g: remove genitives t: strip stop words o: replace punctuation with spaces l: lowercase B: uninflect each words in a term w: sort words by order rs: remove parenthetic plural forms q: strip diacritics q2: split ligature Ct: retrieve citations q4: get symbol names synonymy Hodgkin Diseases, NOS Hodgkin's Diseases, NOS Hodgkin Diseases, NOS Hodgkin Diseases NOS Hodgkin Diseases Norm

g: remove genitives t: strip stop words o: replace punctuation with spaces l: lowercase B: uninflect each words in a term w: sort words by order rs: remove parenthetic plural forms q: strip diacritics q2: split ligature Ct: retrieve citations q4: get symbol names synonymy Hodgkin Diseases, NOS Hodgkin's Diseases, NOS Hodgkin Diseases, NOS Hodgkin Diseases NOS Hodgkin Diseases Norm

g: remove genitives t: strip stop words o: replace punctuation with spaces l: lowercase B: uninflect each words in a term w: sort words by order rs: remove parenthetic plural forms q: strip diacritics q2: split ligature Ct: retrieve citations q4: get symbol names synonymy Hodgkin Diseases, NOS Hodgkin's Diseases, NOS Hodgkin Diseases, NOS Hodgkin Diseases NOS Hodgkin Diseases Norm

g: remove genitives t: strip stop words o: replace punctuation with spaces l: lowercase B: uninflect each words in a term w: sort words by order rs: remove parenthetic plural forms q: strip diacritics q2: split ligature Ct: retrieve citations q4: get symbol names synonymy Hodgkin Diseases, NOS Hodgkin's Diseases, NOS Hodgkin Diseases, NOS Hodgkin Diseases NOS Hodgkin Diseases hodgkin diseases Norm

g: remove genitives t: strip stop words o: replace punctuation with spaces l: lowercase B: uninflect each words in a term w: sort words by order rs: remove parenthetic plural forms q: strip diacritics q2: split ligature Ct: retrieve citations q4: get symbol names synonymy Hodgkin Diseases, NOS Hodgkin's Diseases, NOS Hodgkin Diseases, NOS Hodgkin Diseases NOS Hodgkin Diseases hodgkin diseases hodgkin disease Norm

g: remove genitives t: strip stop words o: replace punctuation with spaces l: lowercase B: uninflect each words in a term w: sort words by order rs: remove parenthetic plural forms q: strip diacritics q2: split ligature Ct: retrieve citations q4: get symbol names synonymy Hodgkin Diseases, NOS Hodgkin's Diseases, NOS Hodgkin Diseases, NOS Hodgkin Diseases NOS Hodgkin Diseases hodgkin diseases hodgkin disease Norm

g: remove genitives t: strip stop words o: replace punctuation with spaces l: lowercase B: uninflect each words in a term w: sort words by order rs: remove parenthetic plural forms q: strip diacritics q2: split ligature Ct: retrieve citations q4: get symbol names synonymy Hodgkin Diseases, NOS Hodgkin's Diseases, NOS Hodgkin Diseases, NOS Hodgkin Diseases NOS Hodgkin Diseases hodgkin diseases hodgkin disease disease hodgkin Norm

g: remove genitives t: strip stop words o: replace punctuation with spaces l: lowercase B: uninflect each words in a term w: sort words by order rs: remove parenthetic plural forms q: strip diacritics q2: split ligature Ct: retrieve citations q4: get symbol names synonymy Hodgkin Diseases, NOS Hodgkin's Diseases, NOS Hodgkin Diseases, NOS Hodgkin Diseases NOS Hodgkin Diseases hodgkin diseases hodgkin disease disease hodgkin Norm

Norm: Example disease hodgkin Hodgkin Disease HODGKINS DISEASE Hodgkin's Disease Disease, Hodgkin's HODGKIN'S DISEASE Hodgkin's disease Hodgkins Disease Hodgkin's disease NOS Hodgkin's disease, NOS Disease, Hodgkins Diseases, Hodgkins Hodgkins Diseases Hodgkins disease hodgkin's disease Disease;Hodgkins Disease, Hodgkin

Text Categorization Based on Journal Descriptor Indexing (JDI) methodology Uses a small set of high level descriptors, such as Journal Descriptors (JDs), Semantic Types (STs), Mesh subcategories, etc.. Used for categorize text, index contents, retrieve records, and word sense disambiguation

Text Categorization Free distributed with open source code 100 % in Java Run on different platforms One complete package Documents & support Provides Java APIs, command line tools, GUI tools, and Web tools Planned first release, TC 2007

Text Categorization Words Senses disambiguation (WSD) Free Text Metathesaurus Concept MetaMap (MMTX)

Text Categorization Words Senses disambiguation (WSD) Free Text Concept n Concept 2 Concept 1 MetaMap (MMTX)

Text Categorization Words Senses disambiguation (WSD) Free Text Concept n Concept 2 Concept 1 MetaMap (MMTX) TC Best Concept

Text Categorization Words Senses disambiguation (WSD) ….. transport... Patient Transport (ST: Health Care Activity) Biological Transport (ST: Cell Function) MetaMap (MMTX) TC Best Concept

Questions Lexical Systems Group: Lexical Tools:

Application Metathesaurus English Strings norm Normalized string index Normalized word index WordInd MRXNS.ENG MRXNW.ENG

Application norm Normalized string index Normalized word index Metathesaurus Concepts Query Normed term SUIS Metathesaurus concepts that match the normalized query

Example norm Query Normed term dry eye syndrome Dry Eyes Syndrome

ENG|dry eye syndrome|C |L |S | ENG|dry eye syndrome|C |L |S | ENG|dry eye syndrome|C |L |S | ENG|dry eye syndrome|C |L |S | ENG|dry eye syndrome|C |L |S | ENG|dry eye syndrome|C |L |S | ENG|dry eye syndrome|C |L |S | Normed term SUIS Example (Cont.)

C |ENG|P|L |VS |S |Dry eye syndrome C |ENG|P|L |VS |S |Dry Eye Syndrome C |ENG|P|L |VS |S |dry eye syndrome C |ENG|P|L |VWS|S |Syndrome, Dry Eye C |ENG|P|L |VWS|S |Dry, eye syndrome C |ENG|P|L |VW |S |Syndromes, Dry Eye SUIS MRCON C |ENG|P|L |PF |S | Dry Eye Syndromes Example (Cont.)

Questions Lexical Systems Group: Lexical Tools: