Presentation is loading. Please wait.

Presentation is loading. Please wait.

Terminology Quality Evaluation S60 Rashmie Abeysinghe Joint work with

Similar presentations


Presentation on theme: "Terminology Quality Evaluation S60 Rashmie Abeysinghe Joint work with"— Presentation transcript:

1 Quality Assurance of NCI Thesaurus by Mining Structural-Lexical Patterns
Terminology Quality Evaluation S60 Rashmie Abeysinghe Joint work with Michael A. Brooks, Jeffery Talbert, Licong Cui University of Kentucky

2 Disclosure Licong Cui is part of the startup called Synamtics Inc.
AMIA | amia.org

3 Outline NCI Thesaurus Terminology Quality Assurance
Non-lattice Subgraphs Structural-Lexical Features Containment Union Intersection Union-Intersection Inference-Union Inference-Contradiction Results Evaluation Conclusion and Future Directions AMIA | amia.org

4 NCI Thesaurus (NCIt) National Cancer Institute (NCI) Thesaurus
First published in 2000 Contains over 118,000 concepts Hierarchically organized in 19 domains Abnormal Cell Anatomic Structure, System, or Substance Biological Process Disease, Disorder or Finding Molecular Abnormality etc. maintained by a multidisciplinary team of editors. 900 concepts added each month. covers terminology for clinical care, translational and basic research, public information and administrative activities. AMIA | amia.org

5 Terminology Quality Assurance (TQA)
Essential part of terminology management lifecycle Manual review: labor-intensive and time-consuming Automating TQA is an active area of research Missing Relation! AMIA | amia.org

6 Non-lattice Subgraphs
Lattice – a desirable property for a well-formed terminology* Lattice – a DAG such that any two nodes have a unique maximal common descendant as well as a unique minimal common ancestor A non-lattice subgraph Upper Bounds (U) Lower Bounds (L) *Zhang GQ, Bodenreider O. Large-scale, exhaustive lattice-based structural auditing of SNOMED CT. AMIA Annual Symposium Proc. 2010; AMIA | amia.org

7 Structural-Lexical Features
Considering the label of a concept as a set of words in lower case: Containment*: Union*: Intersection*: Union-Intersection*: Inference-Union: Inference-Contradiction 𝑈 𝑖 ⊂ 𝑈 𝑗 𝑜𝑟 𝐿 𝑖 ⊂ 𝐿 𝑗 𝑈 𝑖 U 𝑈 𝑗 = 𝐿 𝑘 𝐿 𝑖 ∩ 𝐿 𝑗 = 𝑈 𝑘 𝑈 𝑖 U 𝑈 𝑗 = 𝐿 𝑠 ∩𝐿 𝑡 𝑈 𝑠 U (𝐿 𝑖 ∩ 𝐿 𝑗 )= 𝐿 𝑘 *Cui L, Zhu W, Tao S, Case JT, Bodenreider O, Zhang GQ. Mining non-lattice subgraphs for detecting missing hierarchical relations and concepts in SNOMED CT. JAMIA Jul 1;24(4): AMIA | amia.org

8 Containment 𝐿 𝑗 ⊂ 𝐿 𝑖 𝑈 𝑖 ⊂ 𝑈 𝑗 𝑜𝑟 𝐿 𝑖 ⊂ 𝐿 𝑗 𝐿 𝑖 𝐿 𝑗
𝑈 𝑖 ⊂ 𝑈 𝑗 𝑜𝑟 𝐿 𝑖 ⊂ 𝐿 𝑗 Non-lattice subgraph 𝐿 𝑗 ⊂ 𝐿 𝑖 𝐿 𝑖 𝐿 𝑗 AMIA | amia.org

9 Containment 𝑈 𝑖 ⊂ 𝑈 𝑗 𝑜𝑟 𝐿 𝑖 ⊂ 𝐿 𝑗 Suggested Fix AMIA | amia.org

10 Union 𝑈 𝑖 U 𝑈 𝑗 = 𝐿 𝑘 𝑈 𝑖 𝑈 𝑗 𝑈 𝑖 U 𝑈 𝑗 = 𝐿 𝑘 Non-lattice subgraph
malignant, testicular, non-seminomatous, germ, cell, tumor 𝐿 𝑘 AMIA | amia.org

11 Union 𝑈 𝑖 U 𝑈 𝑗 = 𝐿 𝑘 Suggested Fix AMIA | amia.org

12 Intersection 𝐿 𝑖 ∩ 𝐿 𝑗 = 𝑈 𝑘 𝑈 𝑘 𝐿 𝑖 ∩ 𝐿 𝑗 = 𝐿 𝑖 𝐿 𝑗
𝐿 𝑖 ∩ 𝐿 𝑗 = 𝑈 𝑘 Non-lattice subgraph 𝑈 𝑘 𝐿 𝑖 ∩ 𝐿 𝑗 = splenic, lymphoblastic, lymphoma 𝐿 𝑖 𝐿 𝑗 AMIA | amia.org

13 Intersection 𝐿 𝑖 ∩ 𝐿 𝑗 = 𝑈 𝑘 Suggested Fix AMIA | amia.org

14 Union-Intersection 𝑈 𝑖 U 𝑈 𝑗 = 𝐿 𝑠 ∩𝐿 𝑡 𝑈 𝑖 𝑈 𝑗 𝑈 𝑖 U 𝑈 𝑗 =
Non-lattice subgraph 𝑈 𝑖 𝑈 𝑗 𝑈 𝑖 U 𝑈 𝑗 = 𝐿 𝑠 ∩ 𝐿 𝑡 = localized, adult liver, carcinoma localized, adult liver, carcinoma 𝐿 𝑠 𝐿 𝑡 AMIA | amia.org

15 Union-Intersection 𝑈 𝑖 U 𝑈 𝑗 = 𝐿 𝑠 ∩𝐿 𝑡 Suggested Fix
AMIA | amia.org

16 Inference-Union =𝐿 𝑖 𝑈 𝑠 U (𝐿 𝑖 ∩ 𝐿 𝑗 )= 𝐿 𝑘 𝑈 𝑠 𝐿 𝑖 ∩ 𝐿 𝑗 =
Non-lattice subgraph 𝑈 𝑠 𝐿 𝑖 ∩ 𝐿 𝑗 = gallbladder, papillary 𝑈 𝑠 U (𝐿 𝑖 ∩ 𝐿 𝑗 )= gallbladder, papillary, neoplasm =𝐿 𝑖 𝐿 𝑖 𝐿 𝑗 AMIA | amia.org

17 Inference-Union 𝑈 𝑠 U (𝐿 𝑖 ∩ 𝐿 𝑗 )= 𝐿 𝑘 Suggested Fix
AMIA | amia.org

18 Inference-Contradiction
Non-lattice subgraph anaplastic : neoplastic large anaplastic : neoplastic large AMIA | amia.org

19 Inference-Contradiction
Suggested Fix AMIA | amia.org

20 Five Patterns! Union, Union-Intersection, Inference-Union, Inference-Contradiction, Containment AMIA | amia.org

21 Results In total 8,143 non-lattice subgraphs were identified
809 of those exhibited lexical patterns 678 single patterns 131 multiple patterns AMIA | amia.org

22 Evaluation AMIA | amia.org

23 Evaluation Single-pattern non-lattice subgraphs: 44%
Multiple-pattern non-lattice subgraphs: 88% Overall: 66% AMIA | amia.org

24 Conclusion We investigated a hybrid approach to identifying potential errors in NCIt Remediations were automatically suggested An effective way for error detection and correction Applicable to other biomedical terminologies AMIA | amia.org

25 Future Work Investigate larger non-lattice subgraphs for evaluation
Using concept synonyms to complement concept labels Finding new patterns to uncover more errors AMIA | amia.org

26 Acknowledgement This work was supported by
National Institutes of Health National Center for Advancing Translational Sciences through grant UL1TR001998 National Science Foundation through grant IIS I would like to thank Dr. Licong Cui for the guidance AMIA | amia.org

27 Email me at: rashmie.abeysinghe@uky.edu
Thank you! me at:


Download ppt "Terminology Quality Evaluation S60 Rashmie Abeysinghe Joint work with"

Similar presentations


Ads by Google