Intelligent Information Directory System for Clinical Documents Qinghua Zou 6/3/2005 Dr. Wesley W. Chu (Advisor)

Slides:



Advertisements
Similar presentations
Association Rule Mining
Advertisements

Huffman Codes and Asssociation Rules (II) Prof. Sin-Min Lee Department of Computer Science.
Association Analysis (2). Example TIDList of item ID’s T1I1, I2, I5 T2I2, I4 T3I2, I3 T4I1, I2, I4 T5I1, I3 T6I2, I3 T7I1, I3 T8I1, I2, I3, I5 T9I1, I2,
A Phrase Mining Framework for Recursive Construction of a Topical Hierarchy Date : 2014/04/15 Source : KDD’13 Authors : Chi Wang, Marina Danilevsky, Nihit.
Frequent Closed Pattern Search By Row and Feature Enumeration
LOGO Association Rule Lecturer: Dr. Bo Yuan
Association Rule Mining. 2 The Task Two ways of defining the task General –Input: A collection of instances –Output: rules to predict the values of any.
Zeev Dvir – GenMax From: “ Efficiently Mining Frequent Itemsets ” By : Karam Gouda & Mohammed J. Zaki.
1 Department of Information & Computer Education, NTNU SmartMiner: A Depth First Algorithm Guided by Tail Information for Mining Maximal Frequent Itemsets.
Association rules The goal of mining association rules is to generate all possible rules that exceed some minimum user-specified support and confidence.
FP-Growth algorithm Vasiljevic Vladica,
FP (FREQUENT PATTERN)-GROWTH ALGORITHM ERTAN LJAJIĆ, 3392/2013 Elektrotehnički fakultet Univerziteta u Beogradu.
Data Mining Association Analysis: Basic Concepts and Algorithms
Efficiently Mining Long Patterns from Databases Roberto J. Bayardo Jr. IBM Almaden Research Center.
Chapter 5: Mining Frequent Patterns, Association and Correlations
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Data Mining Association Analysis: Basic Concepts and Algorithms
Association Analysis (2). Example TIDList of item ID’s T1I1, I2, I5 T2I2, I4 T3I2, I3 T4I1, I2, I4 T5I1, I3 T6I2, I3 T7I1, I3 T8I1, I2, I3, I5 T9I1, I2,
Data Mining Association Analysis: Basic Concepts and Algorithms
Association Analysis: Basic Concepts and Algorithms.
Data Mining Association Analysis: Basic Concepts and Algorithms
Pattern Lattice Traversal by Selective Jumps Osmar R. Zaïane and Mohammad El-Hajj Department of Computing Science, University of Alberta Edmonton, AB,
Efficient Data Mining for Path Traversal Patterns CS401 Paper Presentation Chaoqiang chen Guang Xu.
Association Rule Mining - MaxMiner. Mining Association Rules in Large Databases  Association rule mining  Algorithms Apriori and FP-Growth  Max and.
Fast Algorithms for Association Rule Mining
Mining Association Rules
© Vipin Kumar CSci 8980 Fall CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance Computing Research Center Department of Computer.
Association Rule Mining. Mining Association Rules in Large Databases  Association rule mining  Algorithms Apriori and FP-Growth  Max and closed patterns.
Performance and Scalability: Apriori Implementation.
1 KeyGraph 2006 년 9 월 19 일 김경중. 2 Introduction 4 Explosion in the amount of machine readable text –Dealing with complete documents : Limited space and.
Abrar Fawaz AlAbed-AlHaq Kent State University October 28, 2011
Association Rules. 2 Customer buying habits by finding associations and correlations between the different items that customers place in their “shopping.
Rule Generation [Chapter ]
Sequential PAttern Mining using A Bitmap Representation
Dr. Gary Blau, Sean HanMonday, Aug 13, 2007 Statistical Design of Experiments SECTION V SCREENING.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining By Tan, Steinbach, Kumar Lecture.
Modul 7: Association Analysis. 2 Association Rule Mining  Given a set of transactions, find rules that will predict the occurrence of an item based on.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Association Rules. CS583, Bing Liu, UIC 2 Association rule mining Proposed by Agrawal et al in Initially used for Market Basket Analysis to find.
Querying Structured Text in an XML Database By Xuemei Luo.
Mining High Utility Itemset in Big Data
Graph Indexing: A Frequent Structure- based Approach Alicia Cosenza November 26 th, 2007.
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Association Rule Mining III COMP Seminar GNET 713 BCB Module Spring 2007.
CSE4334/5334 DATA MINING CSE4334/5334 Data Mining, Fall 2014 Department of Computer Science and Engineering, University of Texas at Arlington Chengkai.
LCM ver.3: Collaboration of Array, Bitmap and Prefix Tree for Frequent Itemset Mining Takeaki Uno Masashi Kiyomi Hiroki Arimura National Institute of Informatics,
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Data Mining: Association Analysis This lecture node is modified based on Lecture Notes for.
Mining Document Collections to Facilitate Accurate Approximate Entity Matching Presented By Harshda Vabale.
M. Sulaiman Khan Dept. of Computer Science University of Liverpool 2009 COMP527: Data Mining ARM: Improvements March 10, 2009 Slide.
Discriminative Frequent Pattern Analysis for Effective Classification By Hong Cheng, Xifeng Yan, Jiawei Han, Chih- Wei Hsu Presented by Mary Biddle.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
國立雲林科技大學 National Yunlin University of Science and Technology Mining Generalized Associations of Semantic Relations from Textual Web Content Tao Jiang,
Δ-Tolerance Closed Frequent Itemsets James Cheng,Yiping Ke,and Wilfred Ng ICDM ’ 06 報告者:林靜怡 2007/03/15.
Reducing Number of Candidates Apriori principle: – If an itemset is frequent, then all of its subsets must also be frequent Apriori principle holds due.
Data Mining Association Rules Mining Frequent Itemset Mining Support and Confidence Apriori Approach.
1 Data Mining Lecture 6: Association Analysis. 2 Association Rule Mining l Given a set of transactions, find rules that will predict the occurrence of.
University at BuffaloThe State University of New York Pattern-based Clustering How to cluster the five objects? qHard to define a global similarity measure.
CS685: Special Topics in Data Mining The UNIVERSITY of KENTUCKY Frequent Itemset Mining II Tree-based Algorithm Max Itemsets Closed Itemsets.
Reducing Number of Candidates
Frequent Pattern Mining
CARPENTER Find Closed Patterns in Long Biological Datasets
Data Mining Association Analysis: Basic Concepts and Algorithms
Data Mining Association Analysis: Basic Concepts and Algorithms
Association Rule Mining
Data Mining Association Analysis: Basic Concepts and Algorithms
Association Analysis: Basic Concepts and Algorithms
Fractional Factorial Design
Frequent-Pattern Tree
Lecture 11 (Market Basket Analysis)
Association Analysis: Basic Concepts
Presentation transcript:

Intelligent Information Directory System for Clinical Documents Qinghua Zou 6/3/2005 Dr. Wesley W. Chu (Advisor)

When searching clinical reports  Keyword Search  Problems Hard to compose good keywords Lack an outlook of the content Interchangeable words

Intelligent Directory System  1. Overview  2. Extracting Key Concepts  3. Mining Topics  4. Building Directories  5. Searching  6. Conclusion

1. System Overview

2. Concept Extraction  2.1 Introduction  2.2 Our approach: IndexFinder Index Phase (Offline) Search Phase (Real Time)  2.3 Experiments  2.4 Summary

2.1 Motivation  Clinical texts are valuable in medical practice Search relevant reports Search similar patients  What is key information?  UMLS provides key medical concepts  Our Goal Extract UMLS concepts from clinical texts Clinical Texts Extract key info. Standard terms

2.1 Previous Approaches Free text ip dpi1 i0vplambs will v0 eat dp oats NLP Parser UMLS Mapping UMLS Concepts Noun phrases lambs oats

2.1 Problems of Previous Approaches  Concepts cannot be discovered if they are not in a single noun phrase. E.g. In “second, third, and fourth ribs”, “Second rib” can not be discovered.  Difficult to scale to large text computing. Natural language processing requires significant computing resources

2.2 Our Approach: IndexFinder Free text NLP Parser Noun phrases UMLS MappingConcepts We would discard all words in the text except “lung” and “cancer”. Our approach: UMLS  free textPrevious: free text  UMLS Suppose UMLS contains only “Lung cancer” Indexing Index Data ~80MB UMLS 2GB Index phase (offline) conceptsFiltering Extracting Free text Search phase (real time)

2.2 Our Approach: What’s New?  Knowledge-based approach Using the compact index data without using any database system Permuting words in a sentence to generate UMLS concept candidates. Using filters to eliminate irrelevant concepts.

2.2 Concept Candidates Generation Assumptions  Knowledge base provides a phrase table.  Each phrase (concept) is a set of words.  An input text T is represented as a set of words. Goal  Combining words in T to generate concept candidates Example  T={D,E,F} Answer: 5

2.2 Search Phase: Filtering Use filters to eliminate irrelevant concepts  Syntactic filter: Word combination is limited within a sentence.  Semantic filter: Filter out irrelevant concepts using semantic types (e.g. body part, disease, treatment, diagnose). Filter out general concepts using the ISA relationship and keep the more specific ones.

2.3 Experiment Comparison with MetaMap [3] Input: A small mass was found in the left hilum of the lung. MetaMap IndexFinder

2.4 Summary  An efficient method that maps from UMLS to free text for extracting concepts without using any database system.  Syntactic and semantic filters are used to eliminate irrelevant candidates.  IndexFinder is able to find more specific concepts than NLP approaches.  IndexFinder is scalable and can be operated in real time.

3. Mining Topics: SmartMiner  3.1 Introduction  3.2 Search Space  3.3 SmartMiner  3.4 Experiment  3.5 Summary

3.1 Introduction  A Topic (assumption) a set of concepts a frequent pattern  Finding topics by data mining Frequent patterns, or Maximal frequent patterns  Require efficient data mining

3.1 Data Mining Problem 1: a b c d e 2: a b c d 3: b c d 4: b e 5: c d e id: item set Dataset MinSup=2 MFI abcd, be, cde What itemsets are frequent itemsets (FI)? a, b, c, d, e, ab, ac, ad, bc, bd, be, cd, ce, de, abc, abd, acd, bcd, cde, abcd Maximal frequent itemset(MFI): No superset is frequent.

3.1 Why MFI not FI?  Mining FI is infeasible when there exists long FI. E.g, Suppose we have a 20-item frequent set a 1 a 2 … a 20. All of its subset are frequent, i.e., 2 20 =1,048,576  Mining MFI is fast and we can generate all the FI.

3.1 Previous work  Superset checking. A study shows that CPU spends 40% time for superset checking.  Search tree is too large  A large number of support counting  Need more efficient method

3.2 Search space Given 5 items: a, b, c, d, e. What is the search space? Ø, a, b, c, d, e, ab, ac, ad, ae, bc, …, abcde We use “head:tail” to denote the space as: :abcde simplify Ø:abcde What is the space of ?ab:cd ab, abc, abd, abcd

3.2 Space decomposition For a space :abcde, if abcg is frequent,  Then, the known space any subset of abc is frequent known space is :abc  The unknown space are: Any itemsets contain d or e. d:abce and e:abc  :abcde = d:abce + e:abc + :abc

3.3 The basic idea (b) SmartMiner Strategy SmartMiner takes advantages of the information from previous steps. (a) Previous approach B2B2 … A1A1 B1B1 … Creating B 2 before exploring B 1 BnBn B’ … A1A1 B1B1 … Creating B’ after exploring B 1 Using information from B to prune the space at B’

3.3 The tail information  For the space :abcde, if we know abcf, abcg and abfg are frequent, then we project them to the space. abcf  abc. abcg  abc. abfg  ab.  Thus Tinf(abcf,abcg, abfg|:abcde)={abc}

3.4 Running time on Mushroom

3.5 Summary  SmartMiner uses tail information to guide the mining, efficient since A smaller search tree. No superset checking. Reduces the number of support counting.

4. Building Directories  4.1 Introduction  4.2 Knowledge Hierarchies  4.3 User Specification  4.4 Directory Generation  4.5 Integration various directories  4.6 Summary

4.1 Introduction  Three Inputs Topics  Key Content Knowledge trees  Meaningful User specs  Customized

4.2 Knowledge Hierarchies  UMLS concept hierarchies PA: parent-child relationship RA: rather-than relationship  Problems A concept: several parents, different granularity  [lung cancer]  [Neoplasms, Respiratory Tract]  [lung cancer]  [Neoplasms, Respiratory System] A concept: hundreds of paths to roots  [lung cancer]: 233 different paths in UMLS by PA

4.2 Select Proper Hierarchies  Set source preference order, e.g [disease]: ICD9>SNOMED>MeSH [body part]: SNOMED>ICD9  Select proper granularity C: a set of concepts; n: a path node Score function for selecting the node n  S(n)=|{c i | c i  n, c i in C}|  Expert review

4.3 User Specifications  A good directory ~ usage pattern  User spec  usage pattern  User may have different specs  A spec: a series of knowledge names [disease] + [body part], or [body part] + [disease]  Build a directory for a spec by the ordering

4.4 Directory Generation An example User spec 1: d + p [disease] + [body part] User spec 2: p + d [body part] + [disease]

4.4 ~ An example d + p p + d

4.4 ~ Algorithm

4.5 Integration various directories  For each D i, get all dir paths to D i  A D i is tree: XML Key words can associate with tree nodes Query: xpath  Exist redundant information

4.5 simplified model  Keep only the first level knowledge trees  For //d 6 //p 6, we use XPath query //doc[//d 6 and //p 6 ]  Size smaller, require some computation

4.6 Summary  Build directory by Topics Knowledge hierarchies User specifications  Mapping directories to XML By collecting directory paths for each document Leverage on existing XML technologies