Download presentation
Presentation is loading. Please wait.
Published byNancy Hunt Modified over 6 years ago
1
Improving Health Question Classification by Word Location Weights
Rey-Long Liu Dept. of Medical Informatics Tzu Chi University Taiwan
2
Outline Background Problem definition The proposed approach: WLW
Empirical evaluation Conclusion
3
Background
4
Categories of Health Questions
5
Classification of Health Questions
Why health questions? Health questions provide both reliable and readable health information Why classification of health questions? Given a health question q, retrieve related questions (and their answers)
6
Problem Definition
7
Goal & Motivation Goal Motivation
Target: Chinese Health Questions (CHQs) Contribution: Developing a technique WLW (Word Location Weight) that estimates the location weights of words in a CHQ based on their locations Motivation Location weights can be used by classifiers (e.g., SVM) to improve the classification Classifying in-space CHQs (cause, diagnosis, process) Filtering out-space CHQs (may be whatever)
8
Basic Idea Those words that are more related to the category of a CHQ tend to appear at the beginning and end of the CHQ Examples: 如何(how to)克服(deal with)緊張(nervous)的情緒(mood)? process 嬰兒(infant)體溫(body temperature)太低(too low)怎麼辦(how to do)? process
9
Related Work Recognition of question types (e.g., when, where)
Weakness: Types Intended categories of CHQs Classification by parsing Weakness I: Parsing Chinese is still challenging Weakness II: CHQs are NOT always well-formed Classification by pattern matching Weakness: Difficult to construct the string patterns
10
The Proposed Approach: WLW
11
Main Challenges (1) Defining the two weights of a location p in a CHQ q
12
Main Challenges (cont.)
(2) Encoding the location weights of a word w into two features for the underlying classifier
13
Interesting Behaviors of WLW
A word w in a question q has two features Fvaluefront and Fvaluerear Applicable to different categories and languages (e.g., English) When w is far from the front and the rear Both features reduce to the term frequency (TF) of w WLW reduces to traditional feature-encoding approach (using TF as the features)
14
Empirical Evaluation
15
Experimental Design CHQs were downloaded from a health information provider 864 in-space CHQs cause (category 1): 313 diagnosis (category 2): 92 process (category 3): 459 100 out-space CHQs whatever (general description) Five-fold cross validation
16
Underlying Classifiers
The Support Vector Machine (SVM) classifier
17
Results: Classification of In-Space CHQs
Evaluation criteria Micro-averaged F1 (MicroF1) Macro-averaged F1 (MacroF1)
18
SVM+WLW is significantly better than SVM
19
Results: Filtering of Out-Space CHQs
Evaluation criteria Filtering ratio (FR) = # out-space CHQs successfully rejected by all categories / # out-space CHQs Average number of misclassifications (AM) = # misclassifications for the out-space CHQs / # out-space CHQs
20
SVM+WLW achieves higher FR and lower AM
21
Conclusion
22
Healthcare consumers often read health information on the Internet
Health questions as the valuable resources for healthcare consumers Providing both reliable and readable health information Classification of health questions is basis for the retrieval of related questions cause, diagnosis, process, whatever WLW can help SVM to improve the classification of CHQs
Similar presentations
© 2025 SlidePlayer.com Inc.
All rights reserved.