Deep Questions without Deep Understanding

Slides:



Advertisements
Similar presentations
Section 4.1 – Vectors (in component form)
Advertisements

Context-based object-class recognition and retrieval by generalized correlograms by J. Amores, N. Sebe and P. Radeva Discussion led by Qi An Duke University.
Document Summarization using Conditional Random Fields Dou Shen, Jian-Tao Sun, Hua Li, Qiang Yang, Zheng Chen IJCAI 2007 Hao-Chin Chang Department of Computer.
Relevant characteristics extraction from semantically unstructured data PhD title : Data mining in unstructured data Daniel I. MORARIU, MSc PhD Supervisor:
1 Evaluation Rong Jin. 2 Evaluation  Evaluation is key to building effective and efficient search engines usually carried out in controlled experiments.
Integrated Instance- and Class- based Generative Modeling for Text Classification Antti PuurulaUniversity of Waikato Sung-Hyon MyaengKAIST 5/12/2013 Australasian.
Search Engines Information Retrieval in Practice All slides ©Addison Wesley, 2008.
Engeniy Gabrilovich and Shaul Markovitch American Association for Artificial Intelligence 2006 Prepared by Qi Li.
Ke Liu1, Junqiu Wu2, Shengwen Peng1,Chengxiang Zhai3, Shanfeng Zhu1
Crowdsourcing 04/11/2013 Neelima Chavali ECE 6504.
GENERATING AUTOMATIC SEMANTIC ANNOTATIONS FOR RESEARCH DATASETS AYUSH SINGHAL AND JAIDEEP SRIVASTAVA CS DEPT., UNIVERSITY OF MINNESOTA, MN, USA.
Watching Unlabeled Video Helps Learn New Human Actions from Very Few Labeled Snapshots Chao-Yeh Chen and Kristen Grauman University of Texas at Austin.
Evaluating Search Engine
K nearest neighbor and Rocchio algorithm
Morris LeBlanc.  Why Image Retrieval is Hard?  Problems with Image Retrieval  Support Vector Machines  Active Learning  Image Processing ◦ Texture.
Page-level Template Detection via Isotonic Smoothing Deepayan ChakrabartiYahoo! Research Ravi KumarYahoo! Research Kunal PuneraUniv. of Texas at Austin.
MANISHA VERMA, VASUDEVA VARMA PATENT SEARCH USING IPC CLASSIFICATION VECTORS.
Visual Querying By Color Perceptive Regions Alberto del Bimbo, M. Mugnaini, P. Pala, and F. Turco University of Florence, Italy Pattern Recognition, 1998.
Approaches to automatic summarization Lecture 5. Types of summaries Extracts – Sentences from the original document are displayed together to form a summary.
IR Models: Latent Semantic Analysis. IR Model Taxonomy Non-Overlapping Lists Proximal Nodes Structured Models U s e r T a s k Set Theoretic Fuzzy Extended.
Introduction to Machine Learning course fall 2007 Lecturer: Amnon Shashua Teaching Assistant: Yevgeny Seldin School of Computer Science and Engineering.
Learning syntactic patterns for automatic hypernym discovery Rion Snow, Daniel Jurafsky and Andrew Y. Ng Prepared by Ang Sun
TTI's Gender Prediction System using Bootstrapping and Identical-Hierarchy Mohammad Golam Sohrab Computational Intelligence Laboratory Toyota.
Artificial Intelligence Research Centre Program Systems Institute Russian Academy of Science Pereslavl-Zalessky Russia.
Slide Image Retrieval: A Preliminary Study Guo Min Liew and Min-Yen Kan National University of Singapore Web IR / NLP Group (WING)
1 Yolanda Gil Information Sciences InstituteJanuary 10, 2010 Requirements for caBIG Infrastructure to Support Semantic Workflows Yolanda.
MediaEval Workshop 2011 Pisa, Italy 1-2 September 2011.
Richard Socher Cliff Chiung-Yu Lin Andrew Y. Ng Christopher D. Manning
Empirical Methods in Information Extraction Claire Cardie Appeared in AI Magazine, 18:4, Summarized by Seong-Bae Park.
Learning to Classify Short and Sparse Text & Web with Hidden Topics from Large- scale Data Collections Xuan-Hieu PhanLe-Minh NguyenSusumu Horiguchi GSIS,
Review of the web page classification approaches and applications Luu-Ngoc Do Quang-Nhat Vo.
Xiaoying Gao Computer Science Victoria University of Wellington Intelligent Agents COMP 423.
UOS 1 Ontology Based Personalized Search Zhang Tao The University of Seoul.
The PATENTSCOPE search system: CLIR February 2013 Sandrine Ammann Marketing & Communications Officer.
©2003 Paula Matuszek CSC 9010: Text Mining Applications Document Summarization Dr. Paula Matuszek (610)
Implicit User Feedback Hongning Wang Explicit relevance feedback 2 Updated query Feedback Judgments: d 1 + d 2 - d 3 + … d k -... Query User judgment.
Learning from Multi-topic Web Documents for Contextual Advertisement KDD 2008.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Extracting meaningful labels for WEBSOM text archives Advisor.
Automatic Image Annotation by Using Concept-Sensitive Salient Objects for Image Content Representation Jianping Fan, Yuli Gao, Hangzai Luo, Guangyou Xu.
Seven Key Intervention of Data Warehouse Success By : Yahya Alhawsawi.
1 Opinion Retrieval from Blogs Wei Zhang, Clement Yu, and Weiyi Meng (2007 CIKM)
Artificial Intelligence Research Center Pereslavl-Zalessky, Russia Program Systems Institute, RAS.
Understanding User’s Query Intent with Wikipedia G 여 승 후.
2005/12/021 Fast Image Retrieval Using Low Frequency DCT Coefficients Dept. of Computer Engineering Tatung University Presenter: Yo-Ping Huang ( 黃有評 )
Probabilistic Latent Query Analysis for Combining Multiple Retrieval Sources Rong Yan Alexander G. Hauptmann School of Computer Science Carnegie Mellon.
Topic Models Presented by Iulian Pruteanu Friday, July 28 th, 2006.
Yajuan Lü, Jin Huang and Qun Liu EMNLP, 2007 Presented by Mei Yang, May 12nd, 2008 Improving SMT Preformance by Training Data Selection and Optimization.
Implicit User Feedback Hongning Wang Explicit relevance feedback 2 Updated query Feedback Judgments: d 1 + d 2 - d 3 + … d k -... Query User judgment.
Creating Subjective and Objective Sentence Classifier from Unannotated Texts Janyce Wiebe and Ellen Riloff Department of Computer Science University of.
LINKING IMAGES ACROSS TEXT REBECKA WEEGAR | KALLE ASTROM | PIERRE NUGUES CS671A Paper Presentation by: Archit Rathore
Objectives: Terminology Components The Design Cycle Resources: DHS Slides – Chapter 1 Glossary Java Applet URL:.../publications/courses/ece_8443/lectures/current/lecture_02.ppt.../publications/courses/ece_8443/lectures/current/lecture_02.ppt.
Improved Video Categorization from Text Metadata and User Comments ACM SIGIR 2011:Research and development in Information Retrieval - Katja Filippova -
 Effective Multi-Label Active Learning for Text Classification Bishan yang, Juan-Tao Sun, Tengjiao Wang, Zheng Chen KDD’ 09 Supervisor: Koh Jia-Ling Presenter:
Instance Discovery and Schema Matching With Applications to Biological Deep Web Data Integration Tantan Liu, Fan Wang, Gagan Agrawal {liut, wangfa,
Article Filtering for Conflict Forecasting Benedict Lee and Cuong Than Comp 540 4/25/2006.
An Ontology-based Automatic Semantic Annotation Approach for Patent Document Retrieval in Product Innovation Design Feng Wang, Lanfen Lin, Zhou Yang College.
Multi-Class Sentiment Analysis with Clustering and Score Representation Yan Zhu.
Relation Extraction (RE) via Supervised Classification See: Jurafsky & Martin SLP book, Chapter 22 Exploring Various Knowledge in Relation Extraction.
CS 445/545 Machine Learning Winter, 2017
Deeply learned face representations are sparse, selective, and robust
Compact Bilinear Pooling
CS 445/545 Machine Learning Spring, 2017
Factual Claim Validation Models Extraction of Evidence
Ganapathy Mani, Bharat Bhargava, Jason Kobes*
Presented by: Prof. Ali Jaoua
Introduction Task: extracting relational facts from text
Discriminative Probabilistic Models for Relational Data
SVMs for Document Ranking
Bug Localization with Combination of Deep Learning and Information Retrieval A. N. Lam et al. International Conference on Program Comprehension 2017.
Goodfellow: Chapter 14 Autoencoders
Presentation transcript:

Deep Questions without Deep Understanding Paper Presentation

Abstract Develop an approach to generate deep comprehension question from novel text without creating a full semantic representation of text.

Deep questions Open ended problems that require deep thinking and recall It requires significant amounts of content rather than a single sentence. Why deep questions? Helps in understanding the text at its fullest and provides greatest educational value Key assessment mechanism for online educational options including MOOCs

Approach This paper introduced an ontology-crowd-relevance workflow to generate high level questions. This involves: Decomposing the original text into low dimensional ontology Obtaining high-level question templates from the crowd Retrieving subset of collected templates for a target text segment based on its ontological categories and ranking these questions

Example question template “Who were the key influences on <Person> in their childhood?” is a question template for category: Person and section: Early life

Category-section ontology Freebase “notable type” for each Wikipedia article is used to find the high-level categories. Took 300 most common categories across Wikipedia and then merged these categories into eight broad categories to reduce crowdsourcing effort: Person, Location, Event, Organization, Art, Science, Health, and Religion. These 8 categories cover 78% of Wikipedia articles. Category-section pairs for an article about Albert Einstein contains (Person, Early life), (Person, Awards), and (Person, Political views)

Crowdsourcing Methodology Designed a two-stage crowdsourcing pipeline Question generation task: to create question templates that are targeted to set of category-section pairs Question relevance rating task: to obtain binary relevance judgments for the generated question templates in relation to a set of article segments that match in category-section labels. Rating for each question is done on 3 dimensions: relevance, quality and scope.

Model Category/section inference Trained individual logistic regression classifiers for the eight categories and the 50 top section types using the default L2 regularization parameter in LIBLINEAR. Obtained an accuracy of 83% for category and 95% for section.

Model Relevance Classification Used a vector of the component-wise Euclidean distances between individual features of the question and article segment fi=(qi−ai)^2 where qi and ai are the components of the question and article feature vectors. The paper then augmented the vector by concatenating additional distance features between the target article segment and one specific instance of an entire article for which the question applied. Resulting feature vector include first k distances between question tem-plate and the target segment, and the next 𝑘 were between the augmenting article and the target segment. Trained the relevance classifier, a single logistic regression model using LIBLINEAR with default L2 regularization

Results

The slides are part of paper review for course CS671 This presentation, in no way, claims ownership of any contents in the slides