1 Natural Language Emory Eugene Agichtein Math & Computer Science and CCI Andrew Post CCI and Biomedical Engineering (?)

Slides:



Advertisements
Similar presentations
Temporal Query Log Profiling to Improve Web Search Ranking Alexander Kotov (UIUC) Pranam Kolari, Yi Chang (Yahoo!) Lei Duan (Microsoft)
Advertisements

1 Evaluation Rong Jin. 2 Evaluation  Evaluation is key to building effective and efficient search engines usually carried out in controlled experiments.
A Graph-based Recommender System Zan Huang, Wingyan Chung, Thian-Huat Ong, Hsinchun Chen Artificial Intelligence Lab The University of Arizona 07/15/2002.
Bringing Order to the Web: Automatically Categorizing Search Results Hao Chen SIMS, UC Berkeley Susan Dumais Adaptive Systems & Interactions Microsoft.
1.Accuracy of Agree/Disagree relation classification. 2.Accuracy of user opinion prediction. 1.Task extraction performance on Bing web search log with.
SIGIR 2008 Yandong Liu, Jiang Bian, Eugene Agichtein from Emory & Georgia Tech University.
Search Engines Information Retrieval in Practice All slides ©Addison Wesley, 2008.
1 Learning User Interaction Models for Predicting Web Search Result Preferences Eugene Agichtein Eric Brill Susan Dumais Robert Ragno Microsoft Research.
Evaluating Search Engine
Search Engines and Information Retrieval
Finding High-Quality Content in Social Media chenwq 2011/11/26.
Modeling User Interactions in Social Media Eugene Agichtein Emory University.
WebMiningResearch ASurvey Web Mining Research: A Survey Raymond Kosala and Hendrik Blockeel ACM SIGKDD, July 2000 Presented by Shan Huang, 4/24/2007.
Retrieval Evaluation. Brief Review Evaluation of implementations in computer science often is in terms of time and space complexity. With large document.
WebMiningResearchASurvey Web Mining Research: A Survey Raymond Kosala and Hendrik Blockeel ACM SIGKDD, July 2000 Presented by Shan Huang, 4/24/2007 Revised.
1 CS 430 / INFO 430 Information Retrieval Lecture 24 Usability 2.
CONTENT-BASED BOOK RECOMMENDING USING LEARNING FOR TEXT CATEGORIZATION TRIVIKRAM BHAT UNIVERSITY OF TEXAS AT ARLINGTON DATA MINING CSE6362 BASED ON PAPER.
The Social Web: A laboratory for studying s ocial networks, tagging and beyond Kristina Lerman USC Information Sciences Institute.
Finding Advertising Keywords on Web Pages Scott Wen-tau YihJoshua Goodman Microsoft Research Vitor R. Carvalho Carnegie Mellon University.
Quality-aware Collaborative Question Answering: Methods and Evaluation Maggy Anastasia Suryanto, Ee-Peng Lim Singapore Management University Aixin Sun.
CS598CXZ Course Summary ChengXiang Zhai Department of Computer Science University of Illinois, Urbana-Champaign.
Slide Image Retrieval: A Preliminary Study Guo Min Liew and Min-Yen Kan National University of Singapore Web IR / NLP Group (WING)
EUBA: The Emory User Behavior Analysis System Eugene Agichtein, Qi Guo and Ryan Kelly Intelligent Information Access Lab
Personalization in Local Search Personalization of Content Ranking in the Context of Local Search Philip O’Brien, Xiao Luo, Tony Abou-Assaleh, Weizheng.
Modeling Information Seeking Behavior in Social Media Eugene Agichtein Emory University.
MediaEval Workshop 2011 Pisa, Italy 1-2 September 2011.
Search Engines and Information Retrieval Chapter 1.
Processing of large document collections Part 2 (Text categorization) Helena Ahonen-Myka Spring 2006.
TREC 2009 Review Lanbo Zhang. 7 tracks Web track Relevance Feedback track (RF) Entity track Blog track Legal track Million Query track (MQ) Chemical IR.
Learning to Classify Short and Sparse Text & Web with Hidden Topics from Large- scale Data Collections Xuan-Hieu PhanLe-Minh NguyenSusumu Horiguchi GSIS,
Understanding and Predicting Graded Search Satisfaction Tang Yuk Yu 1.
Modeling User Interactions in Web Search and Social Media Eugene Agichtein Intelligent Information Access Lab Emory University.
Bringing Order to the Web: Automatically Categorizing Search Results Hao Chen, CS Division, UC Berkeley Susan Dumais, Microsoft Research ACM:CHI April.
A Simple Unsupervised Query Categorizer for Web Search Engines Prashant Ullegaddi and Vasudeva Varma Search and Information Extraction Lab Language Technologies.
Improving Web Search Ranking by Incorporating User Behavior Information Eugene Agichtein Eric Brill Susan Dumais Microsoft Research.
1 Mining User Behavior Mining User Behavior Eugene Agichtein Mathematics & Computer Science Emory University.
Detecting Semantic Cloaking on the Web Baoning Wu and Brian D. Davison Lehigh University, USA WWW 2006.
Modeling Information Seeking Behavior in Social Media Eugene Agichtein Intelligent Information Access Lab (IRLab)
Question Answering.  Goal  Automatically answer questions submitted by humans in a natural language form  Approaches  Rely on techniques from diverse.
CIKM’09 Date:2010/8/24 Advisor: Dr. Koh, Jia-Ling Speaker: Lin, Yi-Jhen 1.
 Text Representation & Text Classification for Intelligent Information Retrieval Ning Yu School of Library and Information Science Indiana University.
WebMining Web Mining By- Pawan Singh Piyush Arora Pooja Mansharamani Pramod Singh Praveen Kumar 1.
Modeling User Interactions in Web Search and Social Media Eugene Agichtein Intelligent Information Access Lab Emory University.
Implicit Acquisition of Context for Personalization of Information Retrieval Systems Chang Liu, Nicholas J. Belkin School of Communication and Information.
Instrumenting the Learning Commons Eugene Agichtein, Qi Guo and Ryan Kelly Intelligent Information Access Lab, Math & CS Department Arthur Murphy, Selden.
Binxing Jiao et. al (SIGIR ’10) Presenter : Lin, Yi-Jhen Advisor: Dr. Koh. Jia-ling Date: 2011/4/25 VISUAL SUMMARIZATION OF WEB PAGES.
LOGO Finding High-Quality Content in Social Media Eugene Agichtein, Carlos Castillo, Debora Donato, Aristides Gionis and Gilad Mishne (WSDM 2008) Advisor.
EXPLOITING DYNAMIC VALIDATION FOR DOCUMENT LAYOUT CLASSIFICATION DURING METADATA EXTRACTION Kurt Maly Steven Zeil Mohammad Zubair WWW/Internet 2007 Vila.
How Useful are Your Comments? Analyzing and Predicting YouTube Comments and Comment Ratings Stefan Siersdorfer, Sergiu Chelaru, Wolfgang Nejdl, Jose San.
IR, IE and QA over Social Media Social media (blogs, community QA, news aggregators)  Complementary to “traditional” news sources (Rathergate)  Grow.
Finding high-Quality contents in Social media BY : APARNA TODWAL GUIDED BY : PROF. M. WANJARI.
Chapter 8 Evaluating Search Engine. Evaluation n Evaluation is key to building effective and efficient search engines  Measurement usually carried out.
Improving Search Results Quality by Customizing Summary Lengths Michael Kaisser ★, Marti Hearst  and John B. Lowe ★ University of Edinburgh,  UC Berkeley,
CoCQA : Co-Training Over Questions and Answers with an Application to Predicting Question Subjectivity Orientation Baoli Li, Yandong Liu, and Eugene Agichtein.
Creating Subjective and Objective Sentence Classifier from Unannotated Texts Janyce Wiebe and Ellen Riloff Department of Computer Science University of.
Post-Ranking query suggestion by diversifying search Chao Wang.
Finding the Right Facts in the Crowd: Factoid Question Answering over Social Media J. Bian, Y. Liu, E. Agichtein, and H. Zha ACM WWW, 2008.
NTU & MSRA Ming-Feng Tsai
11 A Classification-based Approach to Question Routing in Community Question Answering Tom Chao Zhou 1, Michael R. Lyu 1, Irwin King 1,2 1 The Chinese.
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
Date: 2013/9/25 Author: Mikhail Ageev, Dmitry Lagun, Eugene Agichtein Source: SIGIR’13 Advisor: Jia-ling Koh Speaker: Chen-Yu Huang Improving Search Result.
A Framework to Predict the Quality of Answers with Non-Textual Features Jiwoon Jeon, W. Bruce Croft(University of Massachusetts-Amherst) Joon Ho Lee (Soongsil.
Learning to Rank: From Pairwise Approach to Listwise Approach Authors: Zhe Cao, Tao Qin, Tie-Yan Liu, Ming-Feng Tsai, and Hang Li Presenter: Davidson Date:
Usefulness of Quality Click- through Data for Training Craig Macdonald, ladh Ounis Department of Computing Science University of Glasgow, Scotland, UK.
Semi-Supervised Recognition of Sarcastic Sentences in Twitter and Amazon -Smit Shilu.
Search User Behavior: Expanding The Web Search Frontier
Eugene Agichtein Mathematics & Computer Science Emory University
An Inteligent System to Diabetes Prediction
A Classification-based Approach to Question Routing in Community Question Answering Tom Chao Zhou 22, Feb, 2010 Department of Computer.
Web Mining Research: A Survey
Presentation transcript:

1 Natural Language Emory Eugene Agichtein Math & Computer Science and CCI Andrew Post CCI and Biomedical Engineering (?)

2 Projects in the IR Lab (Agichtein Lab)

3 NLP & Text Mining Projects in IRLab EMText: Information Extraction from Text in Electronic Medical Records Other projects: Collaborative filtering for Med. Literature Recognizing textual entailment (TAC 2008 RTE track) Web-scale semantic network extraction

4 Information Extraction From EMR Text Electronic Medical Records (EMRs) contain important metadata for analysis, data mining, and decision support Electronic Medical Records (EMRs) contain important metadata for analysis, data mining, and decision support –Example: patient who has had diabetes should have different interpretation of MPI results; depends on how long, how severe, and how long since has been controlled –This information often resides in the text of the EMR (physican/nurse reports, notes, discharge summaries) Challenges: Challenges: –Access to data –Inconsistent information –Little or no manually labeled data

5 I2B2 NLP 2008 Obesity Challenge (SUNY/MIT/Partners Healthcare) Participated in the I2B NLP Obesity Challenge Participated in the I2B NLP Obesity Challenge –The Challenge: to build systems that will correctly replicate the textual and intuitive judgments of the obesity experts on obesity and [15] co-morbidities based on the narrative patient records. Our approach: machine learning over lexical, semantic, and statistical features Our approach: machine learning over lexical, semantic, and statistical features –Words, phrases, UMLS terms in text –Negation –Corpus co-occurrence statistics –SVM, boosting, TBL to combine predictions Outcome: Outcome: –Much room for improvement exists both for accuracy and efficiency, great learning experience

I2B2 NLP Challenge 2010

7 User Behavior: The 3 rd Dimension of the Web Amount exceeds web content and structure Amount exceeds web content and structure –Published: 4Gb/day; Social Media: 10gb/Day –Page views: 100Gb/day [Andrew Tomkins, Yahoo! Search, 2007]

8 Web search user behavior: goldmine of noisy data Relative clickthrough for queries with known relevant results in position 1 and 3 respectively Higher clickthrough at top non-relevant than at top relevant document

9 Approach: go beyond clickthrough/download counts Presentation ResultPosition Position of the URL in Current ranking QueryTitleOverla p Fraction of query terms in result Title Clickthrough DeliberationTime Seconds between query and first click ClickFrequency Fraction of all clicks landing on page ClickDeviation Deviation from expected click frequency Browsing DwellTime Result page dwell time DwellTimeDeviati on Deviation from expected dwell time for query

10 Example results: Predicting User Preferences Baseline < SA+N < CD << UserBehavior Rich user behavior features result in dramatic improvement

11 User Behavior Complements Content and Web Topology RN (Content + Links) RN + All (User Behavior) (10%) BM BM25+All (31%)

12 Instrumenting the Emory Library and Beyond Evaluate effectiveness of search/discovery with behavioral metrics (task-specific) Evaluate effectiveness of search/discovery with behavioral metrics (task-specific) –Perform aggregate, longitudinal studies Develop tools for usability studies “ in the wild ” Develop tools for usability studies “ in the wild ” –Scale (hundreds/thousands of “ participants ” ) –Realistic behavior and tasks –On-demand playback of “ interesting ” sessions Unified analysis/query framework for internal and external resource access and usage statistics Unified analysis/query framework for internal and external resource access and usage statistics –Web-based query and statistics interface –Access auditing, privacy, anonymity enforced

13 Emory User Behavior Analysis System (EUBA) EUBA: EUBA: –Client-side instrumentation (Firefox toolbar) –Data mining/machine learning components –Log DB management system, web- based interface for querying, playback, annotation Plan: to release the system to research/library community (Q2 2009)? Plan: to release the system to research/library community (Q2 2009)?

14 14 Simple features Basic Features Basic Features –Trajectory length –Horizontal range –Vertical range Horizontal range Vertical range Trajectory length

15 15 Intelligent Information Access Lab Mouse Movement Representation Features Second representation: Second representation: –5 segments: initial, early, middle, late, and end –Each segment: speed, acceleration, rotation, slope, etc

16 Summary of Experimental Results Client-side behavior mining significantly outperforms aggregate, server-side measures for user intent detection and satisfaction tasks Client-side behavior mining significantly outperforms aggregate, server-side measures for user intent detection and satisfaction tasks Can be used even if user does not generate server-trackable action (e.g., click or download) Can be used even if user does not generate server-trackable action (e.g., click or download) Feasible to perform inference on search instance vs. aggregating across different users/searchers Feasible to perform inference on search instance vs. aggregating across different users/searchers 16

17 Outline Overview of Intelligent Information Access Lab Research Overview of Intelligent Information Access Lab Research –Information retrieval & extraction, text mining, and data integration –User behavior modeling, interactions, and collaborative filtering Mining User-generated content Mining User-generated content Current and Future Collaborations Current and Future Collaborations

18 User Generated Content

19 h1HdO

20 Some goals of mining social media Find high-quality content Find high-quality content Find relevant and high quality content Find relevant and high quality content Use millions of interactions to Use millions of interactions to –Understand complex information needs –Model subjective information seeking –Understand cultural dynamics

21

22

23

24

25

26

27

28

29 Community

30

31

32

33

34

35 Editorial Quality != User Perception!

36 Lifecycle of a Question User Choose a category Choose a category Compose the question Open question Open question Examine Find the answer? Close question Choose best answers Give ratings Close question Choose best answers Give ratings Question is closed by system. Best answer is chosen by voters Question is closed by system. Best answer is chosen by voters Yes No Answer

37 Yahoo! Answers: The Good News Active community of millions of users in many countries and languages Active community of millions of users in many countries and languages Accumulated a great number of questions and answers Accumulated a great number of questions and answers Effective for subjective information needs Effective for subjective information needs –Great forum for socialization/chat (Can be) invaluable for hard-to- find information not available on web (Can be) invaluable for hard-to- find information not available on web

38

39 Yahoo! Answers: The Bad News May have to wait a long time to get a satisfactory answer May have to wait a long time to get a satisfactory answer May never obtain a satisfying answer May never obtain a satisfying answer FIFA World Cup 2. Optical 3. Poetry 4. Football (American) 5. Scottish Football (Soccer) 6. Medicine 7. Winter Sports 8. Special Education 9. General Health Care 10. Outdoor Recreation Time to close a question (hours) for sample question categories Time to close

40 The Problem of Asker Satisfaction Given a question submitted by an asker in CQA, predict whether the user will be satisfied with the answers contributed by the community. Given a question submitted by an asker in CQA, predict whether the user will be satisfied with the answers contributed by the community. –Where “ Satisfied ” is defined as: The asker personally has closed the question AND The asker personally has closed the question AND Selected the best answer AND Selected the best answer AND Provided a rating of at least 3 “ stars ” for the best answer Provided a rating of at least 3 “ stars ” for the best answer –Otherwise, the asker is “ Unsatisfied ”

41 Classifier Support Vector Machines Decision Tree Boosting Naïve Bayes asker is satisfied asker is not satisfied Satisfaction Prediction Framework Approach: Classification algorithms from machine learning Approach: Classification algorithms from machine learning Textual Features Category Features Answerer History Features Asker History Features Answer Features Question Features

42 Question-Answer Features Q: length, posting time… QA: length, KL divergence Q:Votes Q:Terms

43 User Features U: Member since U: Total points U: #Questions U: #Answers

44 Category Features CA: Average time to close a question CA: Average time to close a question CA: Average # answers per question CA: Average # answers per question CA: Average asker rating CA: Average asker rating CA: Average voter rating CA: Average voter rating CA: Average # questions per hour CA: Average # questions per hour CA: Average # answers per hour CA: Average # answers per hourCategory#Q#A #A per Q Satisfied Avg asker rating Time to close by asker General Health % day and 13 hours

45 Classification Algorithms Weka implementation Weka implementation – Decision Tree Decision Tree –C4.5: confidence factor Ross Quinlan (1993) –RandomForest: Leo Breiman (2001) Support Vector Machine: J. Platt (1999). Support Vector Machine: J. Platt (1999). Boosting(AdaBoost): Yoav Freund, Robert E. Schapire (1996) Boosting(AdaBoost): Yoav Freund, Robert E. Schapire (1996) Na ï ve Bayes: George H. John, Pat Langley (1995) Na ï ve Bayes: George H. John, Pat Langley (1995)

46 Methods Heuristic: # answers Heuristic: # answers Baseline: Simply predicts the majority class (satisfied). Baseline: Simply predicts the majority class (satisfied). ASP_SVM: Our system with the SVM classifier ASP_SVM: Our system with the SVM classifier ASP_C4.5: with the C4.5 classifier ASP_C4.5: with the C4.5 classifier ASP_RandomForest: with the RandomForest classifier ASP_RandomForest: with the RandomForest classifier ASP_Boosting: with the AdaBoost algorithm combining weak learners ASP_Boosting: with the AdaBoost algorithm combining weak learners ASP_NaiveBayes: with the Naive Bayes classifier ASP_NaiveBayes: with the Naive Bayes classifier

47 Evaluation metrics Precision Precision –The fraction of the predicted satisfied asker information needs that were indeed rated satisfactory by the asker. Recall Recall –The fraction of all rated satisfied questions that were correctly identified by the system. F-score F-score –The geometric mean of Precision and Recall measures, –Computed as 2*(precision*recall)/(precision+recall) Accuracy Accuracy –The overall fraction of instances classified correctly into the proper class.

48 Dataset Crawled from Yahoo! Answers in early 2008 Data is available at Categorie s % Satisfied 216,1701,963,615158, %

49 Dataset (cont.) Realistic prediction task: given askers ’ previous history, we try to predict satisfaction with her current (most recent) question 216,170 questions 1,963,615answers 158,515 askers 100 categories most recent 10,000 questions random 5000 questions trainingtest randomize

50 Dataset Statistics Category#Q#A #A per Q Satisfied Avg asker rating Time to close by asker 2006 FIFA World Cup(TM) % minutes Mental Health % day and 13 hours Mathematics % minutes Diet & Fitness % days Asker satisfaction varies significantly across different categories. #Q, #A, Time to close … -> Asker Satisfaction

51 Human Satisfaction Prediction Truth: asker ’ s rating Truth: asker ’ s rating A random sample of 130 questions A random sample of 130 questions Annotated by researchers to calibrate the asker satisfaction Annotated by researchers to calibrate the asker satisfaction –Agreement: 0.82 –F1: 0.45

52 Human Satisfaction Prediction (Cont ’ d): Amazon Mechanical Turk A service provided by Amazon. Workers submit responses to a Human Intelligence Task (HIT) for a small fee A service provided by Amazon. Workers submit responses to a Human Intelligence Task (HIT) for a small fee HIT: HIT: –Used the same 130 questions –For each question, list the best answer, as well as other four answers ordered by votes –Five independent raters for each question. –Agreement: 0.9 F1: –Best accuracy achieved when at least 4 out of 5 raters predicted asker to be ‘ satisfied ’ (otherwise, labeled as “ unsatisfied ” ).

53 Amazon Mechanical Turk

54 Comparison of Classifiers (F-score) Classifier With Text Without Text Selected Features ASP_SVM ASP_C ASP_RandomForest ASP_Boosting ASP_NB Human0.61 Baseline0.66 C4.5 is the most effective classifier in this task Human F1 performance is lower than the na ï ve baseline!

55 F1 (Satisfied) with varying training sizes ASP_C4.5 substantially outperforms others 2000 questions is sufficient to achieve 0.75 F1

56 Features by Information Gain (Satisfied) Q: Askers ’ previous rating Q: Average past rating by asker UH: Member since (interval) UH: Average # answers for by past Q UH: Previous Q resolved for the asker CA: Average asker rating for the category UH: Total number of answers received CA: Average voter rating Q: Question posting time CA: Average # answers per Q

57 “ Offline ” vs. “ Online ” Prediction Offline prediction: Offline prediction: –All features( question, answer, asker & category) –F1: 0.77 Online prediction: Online prediction: –all answer features –question features (stars, #comments, sum of votes … ) –F1: 0.74

58 Feature Ablation PrecisionRecallF1 Selected features No question-answer features No answerer features No category features No asker features No question features Asker & Question features are most important. Answer quality/Answerer expertise/Category characteristics: may not be important caring or supportive answers might be preferred sometimes

59 Satisfaction with varying experience Group together questions from askers with the same number of previous questions Accuracy of prediction increase dramatically Reaching F1 of 0.9 for askers with >= 5 questions

60 Summary Asker satisfaction is predictable Asker satisfaction is predictable –Can achieve higher than human accuracy by exploiting history User ’ s experience is important User ’ s experience is important General model: one-size-fits-all General model: one-size-fits-all –2000 questions for training model are enough Current work Current work –Personalized satisfaction prediction –Y.Liu, E. Agichtein. You've Got Answers: Towards Personalized Models for Predicting Success in Community Question Answering (ACL 2008)

61 ACL08 Textual features only become helpful for users with more than 20 questions Textual features only become helpful for users with more than 20 questions Personalized classifier achieves surprisingly good accuracy Personalized classifier achieves surprisingly good accuracy For users with only 1 previous question, personalized classifiers works very well For users with only 1 previous question, personalized classifiers works very well Simple strategy of grouping users by number of previous questions is even more effective than other methods for users with moderate amount of history Simple strategy of grouping users by number of previous questions is even more effective than other methods for users with moderate amount of history For users with few questions, non-textual features are dominant For users with few questions, non-textual features are dominant For users with lots of questions, textual features are more significant For users with lots of questions, textual features are more significant

Some personalized models 62

63

Other tasks Subjectivity, sentiment analysis Subjectivity, sentiment analysis –B. Li, Y. Liu, and E. Agichtein, CoCQA: Co-Training Over Questions and Answers with an Application to Predicting Question Subjectivity Orientation, in EMNLP 2008 Discourse analysis Discourse analysis Cross-cultural comparisons Cross-cultural comparisons CQA vs. web search comparison CQA vs. web search comparison 64

65 Outline Overview of Intelligent Information Access Lab Research Overview of Intelligent Information Access Lab Research –Information retrieval & extraction, text mining, and data integration –User behavior modeling, interactions, and collaborative filtering Mining User-generated content Mining User-generated content Current and Future Research Current and Future Research