Presentation is loading. Please wait.

Presentation is loading. Please wait.

Extracting Why Text Segment from Web Based on Grammar-gram

Similar presentations


Presentation on theme: "Extracting Why Text Segment from Web Based on Grammar-gram"— Presentation transcript:

1 Extracting Why Text Segment from Web Based on Grammar-gram
Iulia Nagy, Master student,

2 Summary Introduction Related work “Bag of Function Words” method
Rule Based Methods Machine Learning Approach “Bag of Function Words” method Method outline Adaptation of “Bag of Function Words” to English Experiments and Evaluation Conclusion and Remarks

3 Problem tremendous growth of the Internet  information hard to find

4 Solution Create QA system an exact question
system capable to give an exact answer to an exact question detect answer from arbitrary corpora Purpose obtain viable information rapidly

5 Purpose of our research
Create a why-QA system with automatically-built classifier Classifier Use a model presented in Japanese Literature created using Machine learning based on Bag of Grammar approach Purpose of this paper adapt Japanese method to English test effectiveness of the method on English

6 Related word Two main trends Rule Based methods
Machine Learning methods Preprocess text Detect patterns Create set of rules Apply rules to identify why-answer from text Preprocess text Identify and extract relevant features Create classification scheme Classify

7 Rule based in why-QA Suzan Vererne’s Approach Method : + -
Improve performance by re-ranking Method : weight the score assigned to a QA-pair by QAP with a number of syntactic features. + - Hardly adaptable to various languages Deep grammar knowledge Labour intensive Importance of syntax Effective

8 Machine Learning method
Higashinaka and Isozaki’s Approach Acquire causal expression from Japanese EDR dictionary Method : train a ranker based on clause structures extracted from EDR + - Hardly adaptable to various languages Not fully automated: based on EDR EDR rather high priced Partially automated Effective

9 Machine Learning method
Tanaka’s Approach Build why-classifier with function words as features Method : Bag of function words Adaptable to different languages Domain independent Scalable Effective Fully automated

10 Bag of function words Function words
Machine learning approach to automatically build domain independent why-classifier based of function words Conditions to obtain domain-independence Class fulfilling conditions Convergence and reasonable size of feature space Generality of features in feature space Ability of features to discriminate causality Function words

11 Bag of function words Method – same baseline for Japanese and English
Ts 1 Create feature space Create feature vectors Extract function words Ts 2 Tag label all words with POS tagger Classify Determine POS for function words Ts n Mapping using “tf-idf” on function words 𝑥 𝑖 , 𝑦 𝑖 𝑦 𝑖 є 𝑡𝑟𝑢𝑒, 𝑓𝑎𝑙𝑠𝑒 Vectors' format: Fv 1 Fv 2 Fv n for because at after in under which that why to therefore Classification scheme Trainer Loogit Boost weak learners

12 Adaptation to English Differences Japanese English Adjustments
Identify eligible function words in English Japanese Forms phrases by adding new words at the end of the phrase Use of particles to define syntactic roles in a phrase English Forms phrases by adding new words at the beginning of the phrase Words do not belong to an only grammatical category

13 Dataset : 432 text segments
Experiment 216 Why answers 216 definitions Dataset : 432 text segments Data Processing Label all words with POS and extract function words Calculate tf-idf for each function word Map features from feature set into feature vectors

14 Experiment Classifier Evaluation
Used Loogit Boost (Weka) with Decision stump Created 5 classifiers (50, 100, 150, 200, 250 iterations) Evaluation 10-fold cross validation Models trained on 9 folds and tested on 1 Measured precision, recall and F-measure

15 Results – why text segments
No of iterations

16 Results – non why text segments (NWTS)
No of iterations

17 Method effective on English
Conclusion Results 321 instances out of 432 correctly classified 76.1% precision and 70.6% recall on WTS 72.6% precision and 77.9% recall on NWTS Method effective on English Type of TS

18 Future works Experiment with a increased dataset (> 5000)
Use Yahoo!Answers database to extract dataset Interest Include causative construction in the analysis to identify optimal number of iteration to make a better selection of the function words to be used English English often expresses cause by a closed set of verbs or nouns Increase accuracy of the classifier

19 Thank you for your attention !
Questions and remarks Thank you for your attention !


Download ppt "Extracting Why Text Segment from Web Based on Grammar-gram"

Similar presentations


Ads by Google