Table Cell Search for Question Answering Huan Sun

Table Cell Search for Question Answering Huan Sun
Table Cell Search for Question Answering Huan Sun*, Hao Ma#, Xiaodong He#, Wen-Tau Yih#, Yu Su*, Xifeng Yan* *University of California, Santa Barbara #Microsoft Research WWW’16 Question Answering for Structural Data Qingxia Liu

Why table? Tasks: ubiquitous: web, enterprises
vs. text: have schema, more structured manner vs. KB: complement, more updated, straightforward Tasks: (1) find table (2) find cell Owing to the web redundancy and broad coverage, it's more likely that a pair of column names can match a given question, in contrast to a single predicate in a knowledge base.

Question Chain & Relation Chain
question pattern The question pattern is just the input question excluding the topic entity, and expresses the relation between the topic entity and the answer. （当有多个实体时，可包含多个question chain） topic cell answer cell

Strategy Assumptions A joint entity and relation matching problem
at least one topic entity (in a question); represented by a single table (for the relation in a question chain); in the same row (topic entity and answer); A joint entity and relation matching problem ②rel between topic entity in question and the answer can be rep by the information in a single table; ③A cell matched with a topic entity and that matched with an answer should occur in the same row. 一个question可以包含多个topic entity（各自对应一个chain），求解时同时考虑多个chain的mapping；

Table Cell Search Candidate Chain Coarse-Grained Prunning: relevance
NER elink(Freebase) [ACL’15] for each possible answer cell Coarse-Grained Prunning: relevance enrich q words top 50 search engine snippets table words caption, topic/answer cells, column names measures (top 3k *2) consine InterScore Deep Chain Inference matching degree: Deep NN question pattern candidate chain 借助Freebase的别名； relevance的计算，面临question中词量过少的问题； question（wq）：Bing->top50 snippets->word vector； table(wc)：caption，cells，column names； InterScore(wc;wq) computes the number of unique words in common.

Chain Inference Perspectives for candidate chain Question Pattern qp
Answer type ca column name of answer cell, e.g. MainLanguage Pseudo-predicate cp e.g. Country-MainLanguage Entity pairs ce e pairs in the two column Question Pattern qp Pseudo-predicate 连接topic cell和answer cell的relation（其实就是这两列的列名） e pairs：对Pseudo-predicate的补充，隐式表示了question chain中两实体间的关系； text embedding

What language do people in <e> speak?
y=Ws V extract the most salient local features and forms a fixed-length global feature vector. ht=tanh(Wc[ft-1,ft,ft+1]) Convolutional Deep Structured Semantic Model 卷积深度结构化语义模型[WWW’14]; each word will be converted to an N*1 vector （N=唯一的字母三元组个数），letter-trigram的频率；每个aspect都要训练出一个这样的模型； training set：通过已有QA pair与table的映射关系，构造相关rel-chain（top20 cos score + mannual checking），然后抽取其三个aspect、以及question pattern；5% held out； CDSSM在一定范围内对超参不太敏感(e.g., nodes in the semantic layer, and learning rate ). ft(“speak”)=[...tf(s-p-e),...] #-s-p, s-p-e, p-e-a, e-a-k... What language do people in <e> speak?

the Last Step Features Rank MART gradient boosting decision tree
Sentence （作为cp的一个特例）topic entity + pseudo-predicate cp* 每个候选chain（共3k*2个）表示成一个与question相关的feature vector； shallow, deep

Experiments Data Table Sets QA Evaluation Sets
WikiTables: 5 million tables, extracted from Wikipedia AllTables: 99 million tables, from Web QA Evaluation Sets WebQ: WebQuestions, 5810 QA pairs, from Freebase BingQ: from search engine logs, answer by crowdsource Remove unanswerable questions Measures: P=(nq/K), R, F Systems Sempre [EMNLP’13] ParaSempre [ACL’14] TabelCell 用[47]的extractor抽取（取HTML relational table）; BingQ: answer not in KB； remove: 基于假设约束的条件：eNum，same table same row, by string similarity top1

Experiments Different Feature Groups
shallow features complement deep features type and predicate are important

Experiments Comparison with KB-based QA systems
WebQ: freebase (from/answer) BingQ: not necessary in Freebase WebQ: freebase (from/answer); BingQ: not necessary in Freebase merging top1 content(=> k=2?)

Experiments Different table sources Top-k using all the features,
WebQ-Freebase – WikiTable-Wikipedia; AllTables: noisier WebQ: multi-answers;

Conclusion outperform on real-world questions from search engine query logs web tables are a good complement to KB Limitations complex questions multi-tables

Thank you ~

Table Cell Search for Question Answering Huan Sun

Similar presentations

Presentation on theme: "Table Cell Search for Question Answering Huan Sun"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Table Cell Search for Question Answering Huan Sun

Similar presentations

Presentation on theme: "Table Cell Search for Question Answering Huan Sun"— Presentation transcript:

Similar presentations

About project

Feedback