Presentation is loading. Please wait.

Presentation is loading. Please wait.

ProBase: common Sense Concept KB and Short Text Understanding

Similar presentations


Presentation on theme: "ProBase: common Sense Concept KB and Short Text Understanding"— Presentation transcript:

1 ProBase: common Sense Concept KB and Short Text Understanding
Wentao Ding

2 Term explanation Common sense KB vs Encyclopedia KB Common short text
Search Query, Document Title, Ad keyword, Caption, Anchor text, Question, Image Tag, Tweet/Weibo Common Sense Knowledge Base Encyclopedia Knowledge Base Common sense linguistic knowledge among terms Entities Facts isA, isPropertyOf, co-occurrence, … DayOfBirth, LocatedIn, SpouseOf, … Typicality, basic level of categorization Black or White Precision WordNet*, KnowItAll, NELL, Probase (MSCG), ConceptNet … Freebase, Yago, DBPedia, Google knowledge graph, Wikidata, …

3 ProBase/Microsoft Concept Graph
A probabilistic taxonomy for Text understanding, harnessed from billions of web pages and years' worth of search logs. 2016 version 5,401,933 unique concepts 12,551,613 unique instances 87,603,947 IsA relations

4

5 Probabilistic taxonomy

6 Concept Distribution X axis: the 5.4 million concepts ordered by their size, Y axis: the number of instances each concept contains(logarithmic scale)

7 Quality Evaluation Coverage Precision
Analyzed Bing’s query log from a two year period, sorted the queries in decreasing order by frequency. Precision On a benchmark dataset containing 40 concepts in various domains. The concept size varies from 21 instances (for aircraft model) to 85,391 (for company), with a median of 917. 

8 Coverage Taxonomy Coverage: The query contains at least one concept or instance within the taxonomy. Concept Coverage: The query contains at least one concept in the taxonomy.

9 Precision Knowledge Base P on average ProBase 92.8% KnowItAll 64% NELL
74% TextRunner 80% WikiTaxonomy 86% YAGO 95%

10 Constructing ProBase Extract superordinate-subordinate pairs from sentences Merge nodes of same sense

11 Conceptualization The Microsoft Concept Tagging model (a.k.a. the Conceptualization model) aims to map text format entities into semantic concept categories with some probabilities.

12 Concept Labeling for understanding short texts
Challenges of short text understanding.

13 Concept Labeling for understanding short texts
Concept coherence

14 Head, Modifier, and Constraint Detection in Short Texts
[popular]modifier [iphone 5s]constraint [smart cover]head To solve this, we need to know (Instance-level head-modifier knowledge) “smart cover” is the head, and “iphone 5s” is the constraint. (Conceptual knowledge) “smart cover” is an accessary, and “iphone 5s” is a device (Concept-level head-modifier knowledge) when an accessary and a device appear together, the device is the constraint and the accessary is the head.

15 Head, Modifier, and Constraint Detection in Short Texts

16 Reference ProBase. Microsoft Concept Graph For Short Text Understanding. Understanding Short Texts, Zhongyuan Wang and Haixun Wang, in the Association for Computational Linguistics (ACL), August Probase, Haixun Wang, APWeb s/Haixun-APWeb13-Tutorial.pdf Probase: A Probabilistic Taxonomy for Text Understanding. Wentao Wu, Hongsong Li, Haixun Wang, Kenny Q. Zhu, ACM International Conference on Management of Data (SIGMOD) | May 2012 Short Text Understanding Through Lexical-Semantic Analysis. Wen Hua, Zhongyuan Wang, Haixun Wang, Kai Zheng, Xiaofang Zhou, International Conference on Data Engineering (ICDE) | April 2015 Head, Modifier, and Constraint Detection in Short Texts. Zhongyuan Wang, Haixun Wang, Zhirui Hu, International Conference on Data Engineering (ICDE) | January 2014

17 Thanks for listening Q & A


Download ppt "ProBase: common Sense Concept KB and Short Text Understanding"

Similar presentations


Ads by Google