Extracting Why Text Segment from Web Based on Grammar-gram

Extracting Why Text Segment from Web Based on Grammar-gram
Iulia Nagy, Master student,

Summary Introduction Related work “Bag of Function Words” method
Rule Based Methods Machine Learning Approach “Bag of Function Words” method Method outline Adaptation of “Bag of Function Words” to English Experiments and Evaluation Conclusion and Remarks

Problem tremendous growth of the Internet　　information hard to find

Solution Create QA system an exact question
system capable to give an exact answer to an exact question detect answer from arbitrary corpora Purpose obtain viable information rapidly

Purpose of our research
Create a why-QA system with automatically-built classifier Classifier Use a model presented in Japanese Literature created using Machine learning based on Bag of Grammar approach Purpose of this paper adapt Japanese method to English test effectiveness of the method on English

Related word Two main trends Rule Based methods
Machine Learning methods Preprocess text Detect patterns Create set of rules Apply rules to identify why-answer from text Preprocess text Identify and extract relevant features Create classification scheme Classify

Rule based in why-QA Suzan Vererne’s Approach Method : + -
Improve performance by re-ranking Method : weight the score assigned to a QA-pair by QAP with a number of syntactic features. + - Hardly adaptable to various languages Deep grammar knowledge Labour intensive Importance of syntax Effective

Machine Learning method
Higashinaka and Isozaki’s Approach Acquire causal expression from Japanese EDR dictionary Method : train a ranker based on clause structures extracted from EDR + - Hardly adaptable to various languages Not fully automated: based on EDR EDR rather high priced Partially automated Effective

Machine Learning method
Tanaka’s Approach Build why-classifier with function words as features Method : Bag of function words Adaptable to different languages Domain independent Scalable Effective Fully automated

Bag of function words Function words
Machine learning approach to automatically build domain independent why-classifier based of function words Conditions to obtain domain-independence Class fulfilling conditions Convergence and reasonable size of feature space Generality of features in feature space Ability of features to discriminate causality Function words

Bag of function words Method – same baseline for Japanese and English
Ts 1 Create feature space Create feature vectors Extract function words Ts 2 … Tag label all words with POS tagger Classify Determine POS for function words Ts n Mapping using “tf-idf” on function words 𝑥 𝑖 , 𝑦 𝑖 𝑦 𝑖 є 𝑡𝑟𝑢𝑒, 𝑓𝑎𝑙𝑠𝑒 Vectors' format: Fv 1 Fv 2 Fv n … for because at after in under which that why to therefore Classification scheme Trainer Loogit Boost weak learners

Adaptation to English Differences Japanese English Adjustments
Identify eligible function words in English Japanese Forms phrases by adding new words at the end of the phrase Use of particles to define syntactic roles in a phrase English Forms phrases by adding new words at the beginning of the phrase Words do not belong to an only grammatical category

Dataset : 432 text segments
Experiment 216 Why answers 216 definitions Dataset : 432 text segments Data Processing Label all words with POS and extract function words Calculate tf-idf for each function word Map features from feature set into feature vectors

Experiment Classifier Evaluation
Used Loogit Boost (Weka) with Decision stump Created 5 classifiers (50, 100, 150, 200, 250 iterations) Evaluation 10-fold cross validation Models trained on 9 folds and tested on 1 Measured precision, recall and F-measure

Results – why text segments
No of iterations

Results – non why text segments (NWTS)
No of iterations

Method effective on English
Conclusion Results 321 instances out of 432 correctly classified 76.1% precision and 70.6% recall on WTS 72.6% precision and 77.9% recall on NWTS Method effective on English Type of TS

Future works Experiment with a increased dataset (> 5000)
Use Yahoo!Answers database to extract dataset Interest Include causative construction in the analysis to identify optimal number of iteration to make a better selection of the function words to be used English English often expresses cause by a closed set of verbs or nouns Increase accuracy of the classifier

Thank you for your attention !
Questions and remarks Thank you for your attention !

Extracting Why Text Segment from Web Based on Grammar-gram

Similar presentations

Presentation on theme: "Extracting Why Text Segment from Web Based on Grammar-gram"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Extracting Why Text Segment from Web Based on Grammar-gram

Similar presentations

Presentation on theme: "Extracting Why Text Segment from Web Based on Grammar-gram"— Presentation transcript:

Similar presentations

About project

Feedback