Extracting Why Text Segment from Web Based on Grammar-gram

Slides:



Advertisements
Similar presentations
School of something FACULTY OF OTHER School of Computing FACULTY OF ENGINEERING Chunking: Shallow Parsing Eric Atwell, Language Research Group.
Advertisements

Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki
Florida International University COP 4770 Introduction of Weka.
Mustafa Cayci INFS 795 An Evaluation on Feature Selection for Text Clustering.
ECG Signal processing (2)
Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?
Proceedings of the Conference on Intelligent Text Processing and Computational Linguistics (CICLing-2007) Learning for Semantic Parsing Advisor: Hsin-His.
Sequence Clustering and Labeling for Unsupervised Query Intent Discovery Speaker: Po-Hsien Shih Advisor: Jia-Ling Koh Source: WSDM’12 Date: 1 November,
Ping-Tsun Chang Intelligent Systems Laboratory Computer Science and Information Engineering National Taiwan University Text Mining with Machine Learning.
A Survey on Text Categorization with Machine Learning Chikayama lab. Dai Saito.
Presented by Zeehasham Rasheed
Machine Learning: Ensemble Methods
CS344: Introduction to Artificial Intelligence Vishal Vachhani M.Tech, CSE Lecture 34-35: CLIR and Ranking in IR.
Webpage Understanding: an Integrated Approach
Language Identification of Search Engine Queries Hakan Ceylan Yookyung Kim Department of Computer Science Yahoo! Inc. University of North Texas 2821 Mission.
Empirical Methods in Information Extraction Claire Cardie Appeared in AI Magazine, 18:4, Summarized by Seong-Bae Park.
Probabilistic Model for Definitional Question Answering Kyoung-Soo Han, Young-In Song, and Hae-Chang Rim Korea University SIGIR 2006.
Classifying Tags Using Open Content Resources Simon Overell, Borkur Sigurbjornsson & Roelof van Zwol WSDM ‘09.
Processing of large document collections Part 2 (Text categorization) Helena Ahonen-Myka Spring 2006.
Scott Duvall, Brett South, Stéphane Meystre A Hands-on Introduction to Natural Language Processing in Healthcare Annotation as a Central Task for Development.
Date: 2014/02/25 Author: Aliaksei Severyn, Massimo Nicosia, Aleessandro Moschitti Source: CIKM’13 Advisor: Jia-ling Koh Speaker: Chen-Yu Huang Building.
Math Information Retrieval Zhao Jin. Zhao Jin. Math Information Retrieval Examples: –Looking for formulas –Collect teaching resources –Keeping updated.
A Bootstrapping Method for Building Subjectivity Lexicons for Languages with Scarce Resources Author: Carmen Banea, Rada Mihalcea, Janyce Wiebe Source:
Automatic Image Annotation by Using Concept-Sensitive Salient Objects for Image Content Representation Jianping Fan, Yuli Gao, Hangzai Luo, Guangyou Xu.
Today Ensemble Methods. Recap of the course. Classifier Fusion
Automatic Detection of Social Tag Spams Using a Text Mining Approach Hsin-Chang Yang Associate Professor Department of Information Management National.
CISC Machine Learning for Solving Systems Problems Presented by: Ashwani Rao Dept of Computer & Information Sciences University of Delaware Learning.
A Scalable Machine Learning Approach for Semi-Structured Named Entity Recognition Utku Irmak(Yahoo! Labs) Reiner Kraft(Yahoo! Inc.) WWW 2010(Information.
An Iterative Approach to Extract Dictionaries from Wikipedia for Under-resourced Languages G. Rohit Bharadwaj Niket Tandon Vasudeva Varma Search and Information.
Chapter Ⅳ. Categorization 2007 년 2 월 15 일 인공지능연구실 송승미 Text : THE TEXT MINING HANDBOOK Page. 64 ~ 81.
Class Imbalance in Text Classification
Improved Video Categorization from Text Metadata and User Comments ACM SIGIR 2011:Research and development in Information Retrieval - Katja Filippova -
26/01/20161Gianluca Demartini Ranking Categories for Faceted Search Gianluca Demartini L3S Research Seminars Hannover, 09 June 2006.
Exploiting Named Entity Taggers in a Second Language Thamar Solorio Computer Science Department National Institute of Astrophysics, Optics and Electronics.
Multilingual Information Retrieval using GHSOM Hsin-Chang Yang Associate Professor Department of Information Management National University of Kaohsiung.
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
A Document-Level Sentiment Analysis Approach Using Artificial Neural Network and Sentiment Lexicons Yan Zhu.
The University of Illinois System in the CoNLL-2013 Shared Task Alla RozovskayaKai-Wei ChangMark SammonsDan Roth Cognitive Computation Group University.
Language Identification and Part-of-Speech Tagging
Machine Learning: Ensemble Methods
A Simple Approach for Author Profiling in MapReduce
Event Detection Aliaksei Antonau 1 5. Juli
Automatically Labeled Data Generation for Large Scale Event Extraction
Presentation by: ABHISHEK KAMAT ABHISHEK MADHUSUDHAN SUYAMEENDRA WADKI
Queensland University of Technology
How to forecast solar flares?
Approaches to Machine Translation
Designing Cross-Language Information Retrieval System using various Techniques of Query Expansion and Indexing for Improved Performance  Hello everyone,
Debesh Jha and Kwon Goo-Rak
Presented by Khawar Shakeel
Sentiment analysis algorithms and applications: A survey
INAGO Project Automatic Knowledge Base Generation from Text for Interactive Question Answering.
CSSE463: Image Recognition Day 11
张昊.
Natural Language Processing of Knee MRI Reports
Damiano Bolzoni, Sandro Etalle, Pieter H. Hartel
CSSE463: Image Recognition Day 11
Presented by: Prof. Ali Jaoua
Discriminative Frequent Pattern Analysis for Effective Classification
iSRD Spam Review Detection with Imbalanced Data Distributions
Review-Level Aspect-Based Sentiment Analysis Using an Ontology
Automatic Detection of Causal Relations for Question Answering
Approaches to Machine Translation
CS246: Information Retrieval
FEATURE WEIGHTING THROUGH A GENERALIZED LEAST SQUARES ESTIMATOR
CSSE463: Image Recognition Day 11
CSSE463: Image Recognition Day 11
Introduction to Sentiment Analysis
Kanchana Ihalagedara Rajitha Kithuldeniya Supun weerasekara
Topic: Semantic Text Mining
Presentation transcript:

Extracting Why Text Segment from Web Based on Grammar-gram Iulia Nagy, Master student, 2010-02-27

Summary Introduction Related work “Bag of Function Words” method Rule Based Methods Machine Learning Approach “Bag of Function Words” method Method outline Adaptation of “Bag of Function Words” to English Experiments and Evaluation Conclusion and Remarks

Problem tremendous growth of the Internet  information hard to find

Solution Create QA system an exact question system capable to give an exact answer to an exact question detect answer from arbitrary corpora Purpose obtain viable information rapidly

Purpose of our research Create a why-QA system with automatically-built classifier Classifier Use a model presented in Japanese Literature created using Machine learning based on Bag of Grammar approach Purpose of this paper adapt Japanese method to English test effectiveness of the method on English

Related word Two main trends Rule Based methods Machine Learning methods Preprocess text Detect patterns Create set of rules Apply rules to identify why-answer from text Preprocess text Identify and extract relevant features Create classification scheme Classify

Rule based in why-QA Suzan Vererne’s Approach Method : + - Improve performance by re-ranking Method : weight the score assigned to a QA-pair by QAP with a number of syntactic features. + - Hardly adaptable to various languages Deep grammar knowledge Labour intensive Importance of syntax Effective

Machine Learning method Higashinaka and Isozaki’s Approach Acquire causal expression from Japanese EDR dictionary Method : train a ranker based on clause structures extracted from EDR + - Hardly adaptable to various languages Not fully automated: based on EDR EDR rather high priced Partially automated Effective

Machine Learning method Tanaka’s Approach Build why-classifier with function words as features Method : Bag of function words Adaptable to different languages Domain independent Scalable Effective Fully automated

Bag of function words Function words Machine learning approach to automatically build domain independent why-classifier based of function words Conditions to obtain domain-independence Class fulfilling conditions Convergence and reasonable size of feature space Generality of features in feature space Ability of features to discriminate causality Function words

Bag of function words Method – same baseline for Japanese and English Ts 1 Create feature space Create feature vectors Extract function words Ts 2 … Tag label all words with POS tagger Classify Determine POS for function words Ts n Mapping using “tf-idf” on function words 𝑥 𝑖 , 𝑦 𝑖 𝑦 𝑖 є 𝑡𝑟𝑢𝑒, 𝑓𝑎𝑙𝑠𝑒 Vectors' format: Fv 1 Fv 2 Fv n … for because at after in under which that why to therefore Classification scheme Trainer Loogit Boost weak learners

Adaptation to English Differences Japanese English Adjustments Identify eligible function words in English Japanese Forms phrases by adding new words at the end of the phrase Use of particles to define syntactic roles in a phrase English Forms phrases by adding new words at the beginning of the phrase Words do not belong to an only grammatical category

Dataset : 432 text segments Experiment 216 Why answers 216 definitions Dataset : 432 text segments Data Processing Label all words with POS and extract function words Calculate tf-idf for each function word Map features from feature set into feature vectors

Experiment Classifier Evaluation Used Loogit Boost (Weka) with Decision stump Created 5 classifiers (50, 100, 150, 200, 250 iterations) Evaluation 10-fold cross validation Models trained on 9 folds and tested on 1 Measured precision, recall and F-measure

Results – why text segments No of iterations

Results – non why text segments (NWTS) No of iterations

Method effective on English Conclusion Results 321 instances out of 432 correctly classified 76.1% precision and 70.6% recall on WTS 72.6% precision and 77.9% recall on NWTS Method effective on English Type of TS

Future works Experiment with a increased dataset (> 5000) Use Yahoo!Answers database to extract dataset Interest Include causative construction in the analysis to identify optimal number of iteration to make a better selection of the function words to be used English English often expresses cause by a closed set of verbs or nouns Increase accuracy of the classifier

Thank you for your attention ! Questions and remarks Thank you for your attention !