Presentation is loading. Please wait.

Presentation is loading. Please wait.

Enhancing Text Classifiers to Identify Disease Aspect Information Rey-Long Liu Dept. of Medical Informatics Tzu Chi University Taiwan.

Similar presentations


Presentation on theme: "Enhancing Text Classifiers to Identify Disease Aspect Information Rey-Long Liu Dept. of Medical Informatics Tzu Chi University Taiwan."— Presentation transcript:

1 Enhancing Text Classifiers to Identify Disease Aspect Information Rey-Long Liu Dept. of Medical Informatics Tzu Chi University Taiwan

2 Outline Research background Problem definition The proposed approach: IDAI Empirical evaluation Conclusion Disease Aspect Classification2

3 Research Background Disease Aspect Classification3

4 Disease Aspect Information (DAI) Disease Aspect Classification4 An example from MedlinePlus: Several passages about three aspects of kidney cancer: treatment, symptom and sign, and etiology. It also contains several passages not related to any aspect. You have two kidneys... Kidney cancer forms in the … Risk factors include smoking, having certain genetic conditions and …. Often, kidney cancer doesn't have early symptoms. However, see your health care provider if you notice Blood in your urine A lump in your abdomen … Pain in your side … Treatment depends on your age, …. It might include surgery, radiation, chemotherapy …

5 Disease Knowledge Map: An Application of DAI Disease Aspect Classification5

6 Identification of DAI Disease Aspect Classification6 Healthcare professionals & consumers Disease Info. Query & Aspect Medical texts for specific diseases Disease Aspects Classifier Disease aspect information symptoms diagnosistreatment etiology prevention Healthcare decision support system Disease Info. Cross-disease query Medical information provider Verified Info. Aspect Info.

7 Problem Definition Disease Aspect Classification7

8 Goals Modeling the identification of DAI as a text classification problem –Disease aspects are predefined categories of interest, not brief descriptions of information needs Developing a technique to enhance various kinds of text classifiers –Given a medical text, the classifier can be more capable in identifying those texts that talk about aspects of diseases Disease Aspect Classification8

9 Related Work Text classification (TC) –Weakness: multi-aspect information in a text will incur noises to text classifiers Segment extraction for topic detection –Weakness: designed for specific descriptions (not for categories) Passage extraction for TC –Weakness: location and length of the passages that are relevant to a specific category  becoming another problem of TC Disease Aspect Classification 9

10 The Proposed Approach: IDAI Disease Aspect Classification10

11 IDAI: Revising Term Frequency (TF) to Improve Classifiers Disease Aspect Classification11 Categories (aspects) Classifier Development Training Testing Underlying Text ClassifierIDAI Classification Training Texts A text (d) Assessing Term Frequencies (TF) TF of terms w.r.t. each category Identifying Term-Category Correlation type

12 Two Strategies for TF Revision Disease Aspect Classification12 Underlying classifier GEnhanced classifier G+IDAI Feature setsTF revision by IDAI Accepting relevant texts P: Set of positively correlated features (Strategy I) TF of a feature f is amplified (reduced) if neighbors of f have the same (different) correlation type to the category (Strategy II) TF of a feature f in Q is reduced if f appears in a text segment that mainly mentions features in P Rejecting irrelevant texts Q: Set of negatively correlated features

13 Revised TF(t,d,c) = WindowTF(t,d,c), if t is positively correlated to c; (for Strategy I) Max c’  c {WindowTF(t,d,c’)} - InconsistencyTF(t,d,c), if t is negatively correlated to c (for Strategy II) WindowTF(t,d,c) =  k (0.5+P window,k ), for each occurrence of t at k, P window,k = Distance-based sum of weights of other positively correlated terms in a window at k InconsistencyTF(t,d,c) =  k (P inconsistency,k ), for each occurrence of t at k, P inconsistency,k =0.5  How the text segment before k is dominated by the terms positively correlated to c Disease Aspect Classification13

14 Empirical Evaluation Disease Aspect Classification14

15 Experimental Data Top-10 fatal diseases and top-20 cancers in Taiwan –Total # of diseases: 28 –Source: Web sites of hospitals, healthcare associations, and department of health in Taiwan –Disease aspects (categories): 5 spects: etiology, diagnosis, treatment, prevention, and symptom. –Splitting the texts into aspects: 4669 texts about individual aspects –Test data: Randomly sampling 10% of the 4669 texts and merging them into test texts of 1 to 5 aspects Disease Aspect Classification15

16 Underlying Classifiers & Experimental Baselines Underlying classifier –The Support Vector Machine (SVM) classifier Baseline enhancer –CTFA (Liu, 2010), which employs Strategy I for better TC –CTFA does not consider Strategy II Disease Aspect Classification16

17 Results Disease Aspect Classification17

18 Disease Aspect Classification18

19 Conclusion Disease Aspect Classification19

20 Disease knowledge map (Dmap) –Supporting evidence-based medicine, health education, and healthcare decision support A key step to build a Dmap: Automatic identification of disease aspect information (DAI) Identification of DAI as a text classification problem Term proximity as key information to enhance existing classifiers to classify DAI Disease Aspect Classification20


Download ppt "Enhancing Text Classifiers to Identify Disease Aspect Information Rey-Long Liu Dept. of Medical Informatics Tzu Chi University Taiwan."

Similar presentations


Ads by Google