Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 A semantic approach for question classification using.

Slides:

Advertisements

Similar presentations

Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology VisualRank- Applying PageRank to Large-Scale Image Search.

Advertisements

Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Validating Transliteration Hypotheses Using the Web: Web.

GENERATING AUTOMATIC SEMANTIC ANNOTATIONS FOR RESEARCH DATASETS AYUSH SINGHAL AND JAIDEEP SRIVASTAVA CS DEPT., UNIVERSITY OF MINNESOTA, MN, USA.

Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology A novel document similarity measure based on earth mover’s.

Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Web-Page Summarization Using Clickthrough Data Advisor.

Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Quality evaluation of product reviews using an information.

Intelligent Database Systems Lab N.Y.U.S.T. I. M. Fast exact k nearest neighbors search using an orthogonal search tree Presenter : Chun-Ping Wu Authors.

Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 An Efficient Concept-Based Mining Model for Enhancing.

Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology U*F clustering : a new performant “ clustering-mining ”

Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 On-line Learning of Sequence Data Based on Self-Organizing.

Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Probabilistic Model for Definitional Question Answering.

Intelligent Database Systems Lab Advisor ： Dr. Hsu Graduate ： Chien-Shing Chen Author ： Satoshi Oyama Takashi Kokubo Toru lshida 國立雲林科技大學 National Yunlin.

Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology A data mining approach to the prediction of corporate failure.

Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 A Comparison of SOM Based Document Categorization Systems.

Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 The k-means range algorithm for personalized data clustering.

Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology A Taxonomy of Similarity Mechanisms for Case-Based Reasoning.

Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Mining Positive and Negative Patterns for Relevance Feature.

Intelligent Database Systems Lab N.Y.U.S.T. I. M. A Web 2.0-based collaborative annotation system for enhancing knowledge sharing in collaborative learning.

INTERESTING NUGGETS AND THEIR IMPACT ON DEFINITIONAL QUESTION ANSWERING Kian-Wei Kor, Tat-Seng Chua Department of Computer Science School of Computing.

Intelligent Database Systems Lab N.Y.U.S.T. I. M. OpinionMiner: A Novel Machine Learning System for Web Opinion Mining and Extraction Presenter : Jiang-Shan.

Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Visualizing Ontology Components through Self-Organizing.

Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Finding Terminology Translations From Hyperlinks On the.

Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Extracting meaningful labels for WEBSOM text archives Advisor.

Intelligent Database Systems Lab N.Y.U.S.T. I. M. A quantitative stock prediction system based on financial news Presenter : Chun-Jung Shih Authors :Robert.

Intelligent Database Systems Lab N.Y.U.S.T. I. M. A semantic similarity metric combining features and intrinsic information content Presenter: Chun-Ping.

Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Automatic Recommendations for E-Learning Personalization.

Intelligent Database Systems Lab N.Y.U.S.T. I. M. An IPC-based vector space model for patent retrieval Presenter: Jun-Yi Wu Authors: Yen-Liang Chen, Yu-Ting.

Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 GMDH-based feature ranking and selection for improved.

Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology A Plagiarism Detection Technique for Java Program Using.

Intelligent Database Systems Lab N.Y.U.S.T. I. M. How valuable is medical social media data? Content analysis of the medical web Presenter :Tsai Tzung.

Intelligent Database Systems Lab N.Y.U.S.T. I. M. Development of a reading material recommendation system based on a knowledge engineering approach Presenter.

Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Extensions of vector quantization for incremental clustering.

Intelligent Database Systems Lab N.Y.U.S.T. I. M. Word sense disambiguation of WordNet glosses Presenter: Chun-Ping Wu Author: Dan Moldovan, Adrian Novischi.

Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 An Adaptation of the Vector-Space Model for Ontology-Based.

Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Mining Logs Files for Data-Driven System Management Advisor.

Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 2007.SIGIR.8 New Event Detection Based on Indexing-tree.

Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Utilizing Marginal Net Utility for Recommendation in E-commerce.

Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Efficient Optimal Linear Boosting of a Pair of Classifiers.

Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Using Text Mining and Natural Language Processing for.

Intelligent Database Systems Lab N.Y.U.S.T. I. M. The application of SOM as a decision support tool to identify AACSB peer schools Presenter : Chun-Ping.

Intelligent Database Systems Lab Advisor ： Dr. Hsu Graduate ： Chien-Shing Chen Author ： Juan D.Velasquez Richard Weber Hiroshi Yasuda 國立雲林科技大學 National.

Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Extending the Growing Hierarchal SOM for Clustering Documents.

Intelligent Database Systems Lab N.Y.U.S.T. I. M. Unsupervised word sense disambiguation for Korean through the acyclic weighted digraph using corpus and.

Intelligent Database Systems Lab N.Y.U.S.T. I. M. Psychiatric document retrieval using a discourse-aware model Presenter : Wu, Jia-Hao Authors : Liang-Chih.

Intelligent Database Systems Lab N.Y.U.S.T. I. M. 1 Visualization of multi-algorithm clustering for better economic decisions - The case of car pricing.

Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Information Loss of the Mahalanobis Distance in High Dimensions-

Intelligent Database Systems Lab N.Y.U.S.T. I. M. Mining massive document collections by the WEBSOM method Presenter : Yu-hui Huang Authors :Krista Lagus,

Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Multiclass boosting with repartitioning Graduate : Chen,

Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Improving the performance of personal name disambiguation.

Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Region-based image retrieval using integrated color, shape,

Intelligent Database Systems Lab N.Y.U.S.T. I. M. Validity index for clusters of different sizes and densities Presenter: Jun-Yi Wu Authors: Krista Rizman.

Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 A personal route prediction system base on trajectory.

Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Semantic segment extraction and matching for Internet.

Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Cost- sensitive boosting for classification of imbalanced.

Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Direct mining of discriminative patterns for classifying.

Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Modeling Semantic Similarities in Multiple Maps Presenter.

Intelligent Database Systems Lab N.Y.U.S.T. I. M. Towards comprehensive support for organizational mining Presenter : Yu-hui Huang Authors : Minseok Song,

Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 A Study of Learning a Merge Model for Multilingual Information.

Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology ACM SIGMOD1 Subsequence Matching on Structured Time Series.

Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Text Classification Improved through Multigram Models.

1 Question Answering and Logistics. 2 Class Logistics  Comments on proposals will be returned next week and may be available as early as Monday  Look.

Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Mining Advisor-Advisee Relationships from Research Publication.

Intelligent Database Systems Lab N.Y.U.S.T. I. M. Key Blog Distillation: Ranking Aggregates Presenter : Yu-hui Huang Authors :Craig Macdonald, Iadh Ounis.

Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Discovering Interesting Usage Patterns in Text Collections:

Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Text Classification, Business Intelligence, and Interactivity:

Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Information Extraction from Wikipedia: Moving Down the Long.

Intelligent Database Systems Lab N.Y.U.S.T. I. M. Visualizing social network concepts Presenter : Chun-Ping Wu Authors :Bin Zhu, Stephanie Watts, Hsinchun.

Intelligent Database Systems Lab N.Y.U.S.T. I. M. Named Entity Disambiguation by Leveraging Wikipedia Semantic Knowledge Presenter : Jiang-Shan Wang Authors.

Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Enhancing Text Clustering by Leveraging Wikipedia Semantics.

Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology A support system for predicting eBay end prices Presenter.

Presentation transcript:

Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 A semantic approach for question classification using WordNet and Wikipedia Presenter : Cheng-Hui Chen Authors : Santosh Kumar Ray, Shailendra Singh, B.P. Joshi PRL, 2010

Intelligent Database Systems Lab N.Y.U.S.T. I. M. 2 Outlines Motivation Objectives Methodology Experiments Conclusions Comments

Intelligent Database Systems Lab N.Y.U.S.T. I. M. Motivation  Question classification module of a Question Answering System plays a very important role.  Web pages retrieved by these search engines do not provide precise information and may contain irrelevant information in even top ranked results.  Moldovan et al. (2003) showed that 36.4% of the errors were generated due to incorrect question classification. 3

Intelligent Database Systems Lab N.Y.U.S.T. I. M. Objectives  Proposed a question classification method that exploits the powerful semantic features of the WordNet and the vast knowledge repository of the Wikipedia to describe informative terms explicitly.  Provide answers of the user queries in succinct form. 4

Intelligent Database Systems Lab N.Y.U.S.T. I. M. Methodology  Question classification algorithm to classify questions using WordNet and Wikipedia.  Detail ─ Question database collection ─ Identification of question patterns ─ Question classification algorithm 5

Intelligent Database Systems Lab N.Y.U.S.T. I. M. Question database collection  The question database consists of 5500 training and 500 test questions collected from english questions published by USC.  All questions of the dataset have been manually labeled by Li and Roth according to the coarse and fine grained categories 6 Coarse classFine classes ABBREVIATION ENTITYabbreviation, expression abbreviated animal, body, color, creative, currency, diseases and medical, event, food,instrument, lang, letter, other, plant,product,religion, sport, substance, symbol, technique, term, vehicle, word DESCRIPTION HUMAN LOCATION NUMERIC definition, description, manner, reason group, ind, title, description city, country, mountain, other, state code, count, date, distance, money, order, other, period, percent, speed, temp, size, weight

Intelligent Database Systems Lab N.Y.U.S.T. I. M. Identification of question patterns Question typeCharacteristicExample Functional Word Question (1)All Non-Wh questions (except how) (2)Start with Non-significant verb phrases. I don't know the man. When Questions(1)start with ‘‘When” keyword and related to the year or day with month. (2)The general pattern is “When (do|does|did|AUX) NP VP X”. When did you write that book? Where Questions(1)start with ‘‘Where” keyword and are related to the location. Where is my dog? Which Questions(1)The general pattern is ‘‘Which NP X”? Which company manufactures video-game hardware? Who/Whose/Wh om Questions (1) These questions generally ask about an individual or an organization. Who is Mary? 7

Intelligent Database Systems Lab N.Y.U.S.T. I. M. Identification of question patterns 8 Question typecharacteristicExample Why Questions(1)These questions ask for certain reasons or explanations. Why do heavier objects travel downhill faster? How Question(1)The general pattern is ‘‘How [do|does|did|AUX] NP VP X?” (1.1)Answer type is description of some process (2)How[big|fast|long|many|much|far|awa yerthigh|…] X?” (2.1) pattern returns some number as answer. (1)How do you know? (2)How long are you living in? What QuestionsIt can ask for virtually anythingWhat is your name?

Intelligent Database Systems Lab N.Y.U.S.T. I. M. Question classification algorithm  If any of the question patterns matches with the given question, its entity type will be determined using algorithm QC (question classification). 9 Where is my dog? Location label I don't know the man. Delete do and return the man

Intelligent Database Systems Lab N.Y.U.S.T. I. M. Question classification algorithm  Takes a string as an input and calls the Procedure online for determination of expected entity type. 10 The man Human, Vehicle

Intelligent Database Systems Lab N.Y.U.S.T. I. M. Question classification algorithm  Input and uses online resources, Wikipedia and WordNet, to determine the type of expected entity. ─ It was observed that a typical article in Wikipedia starts like‘‘...X is a Y, Z,...” Y, Z etc. are synonyms, hypernyms, hyponyms or some semantically related term to X and these are considered to be possible entity types. If a sentence written in Wikipedia is ‘‘X is Y, Z,...”, the procedure online takes Y, Z,... as possible entity type of X. 11 Vehicle, Human, Location (TE1) Human, Indiviadual, Vehicle (TE2) Human, Vehicle (C)

Intelligent Database Systems Lab N.Y.U.S.T. I. M. An application of question classification: answer validation  The question‘‘In what year did Arundhati Roy receive a Booker Prize?” 1. Similarity computation ─ Similarity score The Question contains five tokens ‘‘a Number”,‘‘Arundhati Roy”, ‘‘received”, ‘‘Booker”, ‘‘Prize”.  If a candidate answer sentence when parsed contains two tokens out of these five tokens, it has similarity score of 0.4.  The expanded query ‘‘ In what year did (‘‘Arundhati Roy” or Arundhati) (Receive OR Get) Booker (Prize OR Award)?”.  The passage retrieval phase return top 10 answer sentences. Five answer sentences out of these 10 answer sentences got required similarity score. 12

Intelligent Database Systems Lab N.Y.U.S.T. I. M. An application of question classification : answer validation 2. Entity type The question classification module computes ‘‘date” as expected entity type for this question. It considering date to be a number (optionally with month name or word ‘‘year”), four candidate answer sentences containing some number were sent to the next stage for further processing. 13

Intelligent Database Systems Lab N.Y.U.S.T. I. M. An application of question classification : answer validation 3. World Wide Web validation Four candidates passed the first two tests. Three contained‘‘1997” as answer in them and the fourth returned ‘‘ ￡ 20,000”. Only the first answer (1997) was validated by topmost documents returned by Google. Hence, the three candidate answer sentences containing this answer were validated as correct answers. 14

Intelligent Database Systems Lab N.Y.U.S.T. I. M. Experiments (QC algorithm) 15

Intelligent Database Systems Lab N.Y.U.S.T. I. M. Experiments (Answer validation)  Sourse ─ TREC (Text REtrieval Conference) ─ WorldBook (The World Book) ─ Worldfactbook (CIA the world Factbook) ─ Other standard resources. 16

Intelligent Database Systems Lab N.Y.U.S.T. I. M. Experiments (Answer validation) 17

Intelligent Database Systems Lab N.Y.U.S.T. I. M. 18 Conclusions  Question classification algorithm with high accuracy.  The proposed method seems to be promising for question classification in the field of open- domain question answering.  The proposed method combines the World Wide Web with Natural Language Processing (NLP) techniques.

Intelligent Database Systems Lab N.Y.U.S.T. I. M. 19 Comments  Advantages ─ The distinctive points of the algorithm are lying in its dynamic and extendible properties. ─ Proposed method promising for question classification.  Shortages ─ It is having few limitations  Applications ─ Information retrieval