Question Identification on Twitter 1 The Chinese University of Hong Kong, Shatin, N.T., Hong Kong 2 Google Research, Beijing, China 3 AT&T Labs Research,

Question Identification on Twitter 1 The Chinese University of Hong Kong, Shatin, N.T., Hong Kong 2 Google Research, Beijing, China 3 AT&T Labs Research, San Francisco, CA, USA bcli@cse.cuhk.edu.hk, sxc@google.com, lyu@cse.cuhk.edu.hkbcli@cse.cuhk.edu.hksxc@google.comlyu@cse.cuhk.edu.hk {king@cse.cuhk.edu.hk{king@cse.cuhk.edu.hk, irwin@research.att.com}, edchang@google.comirwin@research.att.comedchang@google.com Baichuan Li 1, Xiance Si 2, Michael R. Lyu 1, Irwin King 13, and Edward Y. Chang 2 People Ask Questions on Twitter! 10% of Twitter Users once asked questions on Twitter 13% of Tweets contain questions 1,600 tweets were posted per second -> 200 questions per second on Twitter were asked How to Find Questions automatically? Tweet is short and noisy Tweets containing questions are not always asking questions Interrogative Tweet Detection Rule-based Approach  Question marks  5W1H words and Refined 5W1H words  H1: They must appear at the beginning of one sentence.  H2: Auxiliary words are added to the original words. E.g., we change “what” to “what is” and “what are”.  Heuristic Rules (Efron and Winget, 2010) Learning-based Approach  Frequent question patterns mining  One-class SVM Taxonomy of Interrogative Tweets Advertisement  Incorporating your business this year? Call us today for a free consultation with one of our attorneys. 855-529-8753. http://buz.tw/FjJCV Article or News Title on the Web  New post: Pregnancy Miracle - A Miracle or a Scam?http://articlescontentonline.com/pregnancy-miracle-a-miracle- or-a-scam/ II. Approach (cont.)I. Motivations II. Approach ~ Propose a novel problem of automatically identifying questions on Twitter Provide a two-phase classification model to discover interrogative tweets and qweets Investigate different feature sets’ influence on qweet extraction (especially, Tweet- specific features such as @username, retweet, and hashtag) III. Experiments IV. Conclusions Data Set 1 – Twitter stream from 11:00am to 12:00am on April 18, 2011 Objective: Discovering interrogative tweets and qweets Content: 2,045 English tweets (227 interrogative tweets and 127 qweets) Data Set 2 – QA pairs from Yahoo! Answers and WikiAnswers Objective: Extracting frequent question patterns Content: Over 850,000 question titles and the corresponding best answers – Experimental Results: Qweet Extraction Figure 1. Two-phase classification model Question with Answer  I even tried staying away from my using my Internet for a couple hours. The result? Insanity! Question as Quotation  I think Brian’s been drinking in there because I’m hearing him complain about girls, and then he goes “Wright,are you sure you’re not gay?” Rhetorical Question  You ruined my life and I’m supposed to like you? Qweet  What’s your favorite Harry Potter scene? Qweet Detection using a Random Forrest classifier Interrogative Tweets: Tweets which contain questions Qweets: Interrogative tweets which require information or help FeatureDescription Question features (Q) Quoted question Whether the question sentence is quoted from other sources Strong feeling Whether the question sentence contains strong feeling such as “???” and “?!” Context features (C) URLWhether the context contains any url Phone number or Email Whether the context contains any phone number or email Strong feeling Whether there is any strong feeling such as “!” follows the question sentence Declarative sentence after question sentence Whether there is any declarative sentence follows the question sentence Word featuresUnigram words appear in the contexts of tweets Question-Context features (QC) Self ask self answer Whether the tweet contains obvious self ask self answer pattern. E.g., Q:...A:... Question-URL sameness Whether the question sentence is the same as the webpage's title linked through the URL Tweet-Specic features (T) @usernameWhether the tweet mentions other user's name RetweetWhether the tweet is a Retweet HashtagWhether the tweet contains any hashtag – Experimental Results: Interrogative Tweet Detection Table 1. Features extracted for qweet extraction MethodsPrecisionRecallF1 QM 0.9690.8460.903 QM or 5W1H 0.5470.9730.700 QM or refined 5W1H (H1) 0.8780.9160.899 QM or refined 5W1H (H2) 0.8750.9250.899 QM or refined 5W1H (H1 and H2) 0.9540.9070.930 Rules in (Efron and Winget, 2010) 0.9600.8550.904 Question Patterns (Confidence≥0.7) 0.5760.8990.702 Question Patterns (Confidence≥0.8) 0.7150.8720.786 Question Patterns (Confidence≥0.9) 0.8570.8460.851 Table 2. Accuracies of interrogative tweet detection for various methods (QM: question mark; best results are in bold) Figure 2. Influence of feature sets on qweet extraction

Question Identification on Twitter 1 The Chinese University of Hong Kong, Shatin, N.T., Hong Kong 2 Google Research, Beijing, China 3 AT&T Labs Research,

Similar presentations

Presentation on theme: "Question Identification on Twitter 1 The Chinese University of Hong Kong, Shatin, N.T., Hong Kong 2 Google Research, Beijing, China 3 AT&T Labs Research,"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Question Identification on Twitter 1 The Chinese University of Hong Kong, Shatin, N.T., Hong Kong 2 Google Research, Beijing, China 3 AT&T Labs Research,

Similar presentations

Presentation on theme: "Question Identification on Twitter 1 The Chinese University of Hong Kong, Shatin, N.T., Hong Kong 2 Google Research, Beijing, China 3 AT&T Labs Research,"— Presentation transcript:

Similar presentations

About project

Feedback