Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Natural Language Emory Eugene Agichtein Math & Computer Science and CCI Andrew Post CCI and Biomedical Engineering (?)

Similar presentations


Presentation on theme: "1 Natural Language Emory Eugene Agichtein Math & Computer Science and CCI Andrew Post CCI and Biomedical Engineering (?)"— Presentation transcript:

1 1 Natural Language Processing @ Emory Eugene Agichtein Math & Computer Science and CCI Andrew Post CCI and Biomedical Engineering (?)

2 2 Projects in the IR Lab (Agichtein Lab)

3 3 NLP & Text Mining Projects in IRLab EMText: Information Extraction from Text in Electronic Medical Records Other projects: Collaborative filtering for Med. Literature Recognizing textual entailment (TAC 2008 RTE track) Web-scale semantic network extraction

4 4 Information Extraction From EMR Text Electronic Medical Records (EMRs) contain important metadata for analysis, data mining, and decision support Electronic Medical Records (EMRs) contain important metadata for analysis, data mining, and decision support –Example: patient who has had diabetes should have different interpretation of MPI results; depends on how long, how severe, and how long since has been controlled –This information often resides in the text of the EMR (physican/nurse reports, notes, discharge summaries) Challenges: Challenges: –Access to data –Inconsistent information –Little or no manually labeled data

5 5 I2B2 NLP 2008 Obesity Challenge (SUNY/MIT/Partners Healthcare) Participated in the I2B2 2008 NLP Obesity Challenge Participated in the I2B2 2008 NLP Obesity Challenge –The Challenge: to build systems that will correctly replicate the textual and intuitive judgments of the obesity experts on obesity and [15] co-morbidities based on the narrative patient records. Our approach: machine learning over lexical, semantic, and statistical features Our approach: machine learning over lexical, semantic, and statistical features –Words, phrases, UMLS terms in text –Negation –Corpus co-occurrence statistics –SVM, boosting, TBL to combine predictions Outcome: Outcome: –Much room for improvement exists both for accuracy and efficiency, great learning experience

6 I2B2 NLP Challenge 2010

7 7 User Behavior: The 3 rd Dimension of the Web Amount exceeds web content and structure Amount exceeds web content and structure –Published: 4Gb/day; Social Media: 10gb/Day –Page views: 100Gb/day [Andrew Tomkins, Yahoo! Search, 2007]

8 8 Web search user behavior: goldmine of noisy data Relative clickthrough for queries with known relevant results in position 1 and 3 respectively Higher clickthrough at top non-relevant than at top relevant document

9 9 Approach: go beyond clickthrough/download counts Presentation ResultPosition Position of the URL in Current ranking QueryTitleOverla p Fraction of query terms in result Title Clickthrough DeliberationTime Seconds between query and first click ClickFrequency Fraction of all clicks landing on page ClickDeviation Deviation from expected click frequency Browsing DwellTime Result page dwell time DwellTimeDeviati on Deviation from expected dwell time for query

10 10 Example results: Predicting User Preferences Baseline < SA+N < CD << UserBehavior Rich user behavior features result in dramatic improvement

11 11 User Behavior Complements Content and Web Topology MethodP@1Gain RN (Content + Links) 0.632 RN + All (User Behavior) 0.6930.061(10%) BM250.525 BM25+All0.687 0.162 (31%)

12 12 Instrumenting the Emory Library and Beyond Evaluate effectiveness of search/discovery with behavioral metrics (task-specific) Evaluate effectiveness of search/discovery with behavioral metrics (task-specific) –Perform aggregate, longitudinal studies Develop tools for usability studies “ in the wild ” Develop tools for usability studies “ in the wild ” –Scale (hundreds/thousands of “ participants ” ) –Realistic behavior and tasks –On-demand playback of “ interesting ” sessions Unified analysis/query framework for internal and external resource access and usage statistics Unified analysis/query framework for internal and external resource access and usage statistics –Web-based query and statistics interface –Access auditing, privacy, anonymity enforced

13 13 Emory User Behavior Analysis System (EUBA) EUBA: EUBA: –Client-side instrumentation (Firefox toolbar) –Data mining/machine learning components –Log DB management system, web- based interface for querying, playback, annotation Plan: to release the system to research/library community (Q2 2009)? Plan: to release the system to research/library community (Q2 2009)?

14 14 14 Simple features Basic Features Basic Features –Trajectory length –Horizontal range –Vertical range Horizontal range Vertical range Trajectory length

15 15 15 Intelligent Information Access Lab http://ir.mathcs.emory.edu/ Mouse Movement Representation Features Second representation: Second representation: –5 segments: initial, early, middle, late, and end –Each segment: speed, acceleration, rotation, slope, etc. 1 2 3 4 5

16 16 Summary of Experimental Results Client-side behavior mining significantly outperforms aggregate, server-side measures for user intent detection and satisfaction tasks Client-side behavior mining significantly outperforms aggregate, server-side measures for user intent detection and satisfaction tasks Can be used even if user does not generate server-trackable action (e.g., click or download) Can be used even if user does not generate server-trackable action (e.g., click or download) Feasible to perform inference on search instance vs. aggregating across different users/searchers Feasible to perform inference on search instance vs. aggregating across different users/searchers 16

17 17 Outline Overview of Intelligent Information Access Lab Research Overview of Intelligent Information Access Lab Research –Information retrieval & extraction, text mining, and data integration –User behavior modeling, interactions, and collaborative filtering Mining User-generated content Mining User-generated content Current and Future Collaborations Current and Future Collaborations

18 18 User Generated Content

19 19 http://answers.yahoo.com/question/index;_ylt=3?qid=20071008115118AA h1HdO

20 20 Some goals of mining social media Find high-quality content Find high-quality content Find relevant and high quality content Find relevant and high quality content Use millions of interactions to Use millions of interactions to –Understand complex information needs –Model subjective information seeking –Understand cultural dynamics

21 21

22 22

23 23

24 24

25 25

26 26

27 27

28 28

29 29 Community

30 30

31 31

32 32

33 33

34 34

35 35 Editorial Quality != User Perception!

36 36 Lifecycle of a Question User Choose a category Choose a category Compose the question Open question Open question Examine Find the answer? Close question Choose best answers Give ratings Close question Choose best answers Give ratings Question is closed by system. Best answer is chosen by voters Question is closed by system. Best answer is chosen by voters Yes No Answer + - - - + + +

37 37 Yahoo! Answers: The Good News Active community of millions of users in many countries and languages Active community of millions of users in many countries and languages Accumulated a great number of questions and answers Accumulated a great number of questions and answers Effective for subjective information needs Effective for subjective information needs –Great forum for socialization/chat (Can be) invaluable for hard-to- find information not available on web (Can be) invaluable for hard-to- find information not available on web

38 38

39 39 Yahoo! Answers: The Bad News May have to wait a long time to get a satisfactory answer May have to wait a long time to get a satisfactory answer May never obtain a satisfying answer May never obtain a satisfying answer 1. 2006 FIFA World Cup 2. Optical 3. Poetry 4. Football (American) 5. Scottish Football (Soccer) 6. Medicine 7. Winter Sports 8. Special Education 9. General Health Care 10. Outdoor Recreation Time to close a question (hours) for sample question categories Time to close

40 40 The Problem of Asker Satisfaction Given a question submitted by an asker in CQA, predict whether the user will be satisfied with the answers contributed by the community. Given a question submitted by an asker in CQA, predict whether the user will be satisfied with the answers contributed by the community. –Where “ Satisfied ” is defined as: The asker personally has closed the question AND The asker personally has closed the question AND Selected the best answer AND Selected the best answer AND Provided a rating of at least 3 “ stars ” for the best answer Provided a rating of at least 3 “ stars ” for the best answer –Otherwise, the asker is “ Unsatisfied ”

41 41 Classifier Support Vector Machines Decision Tree Boosting Naïve Bayes asker is satisfied asker is not satisfied Satisfaction Prediction Framework Approach: Classification algorithms from machine learning Approach: Classification algorithms from machine learning Textual Features Category Features Answerer History Features Asker History Features Answer Features Question Features

42 42 Question-Answer Features Q: length, posting time… QA: length, KL divergence Q:Votes Q:Terms

43 43 User Features U: Member since U: Total points U: #Questions U: #Answers

44 44 Category Features CA: Average time to close a question CA: Average time to close a question CA: Average # answers per question CA: Average # answers per question CA: Average asker rating CA: Average asker rating CA: Average voter rating CA: Average voter rating CA: Average # questions per hour CA: Average # questions per hour CA: Average # answers per hour CA: Average # answers per hourCategory#Q#A #A per Q Satisfied Avg asker rating Time to close by asker General Health 1347375.4670.4%4.49 1 day and 13 hours

45 45 Classification Algorithms Weka implementation Weka implementation –http://www.cs.waikato.ac.nz/ml/weka Decision Tree Decision Tree –C4.5: confidence factor 0.05. Ross Quinlan (1993) –RandomForest: Leo Breiman (2001) Support Vector Machine: J. Platt (1999). Support Vector Machine: J. Platt (1999). Boosting(AdaBoost): Yoav Freund, Robert E. Schapire (1996) Boosting(AdaBoost): Yoav Freund, Robert E. Schapire (1996) Na ï ve Bayes: George H. John, Pat Langley (1995) Na ï ve Bayes: George H. John, Pat Langley (1995)

46 46 Methods Heuristic: # answers Heuristic: # answers Baseline: Simply predicts the majority class (satisfied). Baseline: Simply predicts the majority class (satisfied). ASP_SVM: Our system with the SVM classifier ASP_SVM: Our system with the SVM classifier ASP_C4.5: with the C4.5 classifier ASP_C4.5: with the C4.5 classifier ASP_RandomForest: with the RandomForest classifier ASP_RandomForest: with the RandomForest classifier ASP_Boosting: with the AdaBoost algorithm combining weak learners ASP_Boosting: with the AdaBoost algorithm combining weak learners ASP_NaiveBayes: with the Naive Bayes classifier ASP_NaiveBayes: with the Naive Bayes classifier

47 47 Evaluation metrics Precision Precision –The fraction of the predicted satisfied asker information needs that were indeed rated satisfactory by the asker. Recall Recall –The fraction of all rated satisfied questions that were correctly identified by the system. F-score F-score –The geometric mean of Precision and Recall measures, –Computed as 2*(precision*recall)/(precision+recall) Accuracy Accuracy –The overall fraction of instances classified correctly into the proper class.

48 48 Dataset Crawled from Yahoo! Answers in early 2008 Data is available at http://ir.mathcs.emory.edu/QuestionAnswerAsker Categorie s % Satisfied 216,1701,963,615158,51510050.7%

49 49 Dataset (cont.) Realistic prediction task: given askers ’ previous history, we try to predict satisfaction with her current (most recent) question 216,170 questions 1,963,615answers 158,515 askers 100 categories most recent 10,000 questions random 5000 questions trainingtest randomize

50 50 Dataset Statistics Category#Q#A #A per Q Satisfied Avg asker rating Time to close by asker 2006 FIFA World Cup(TM) 119435659329.8655.4%2.63 47 minutes Mental Health 15111597.6870.9%4.30 1 day and 13 hours Mathematics 65123293.5844.5%4.48 33 minutes Diet & Fitness 45024365.4168.4%4.30 1.5 days Asker satisfaction varies significantly across different categories. #Q, #A, Time to close … -> Asker Satisfaction

51 51 Human Satisfaction Prediction Truth: asker ’ s rating Truth: asker ’ s rating A random sample of 130 questions A random sample of 130 questions Annotated by researchers to calibrate the asker satisfaction Annotated by researchers to calibrate the asker satisfaction –Agreement: 0.82 –F1: 0.45

52 52 Human Satisfaction Prediction (Cont ’ d): Amazon Mechanical Turk A service provided by Amazon. Workers submit responses to a Human Intelligence Task (HIT) for a small fee A service provided by Amazon. Workers submit responses to a Human Intelligence Task (HIT) for a small fee HIT: HIT: –Used the same 130 questions –For each question, list the best answer, as well as other four answers ordered by votes –Five independent raters for each question. –Agreement: 0.9 F1: 0.61. –Best accuracy achieved when at least 4 out of 5 raters predicted asker to be ‘ satisfied ’ (otherwise, labeled as “ unsatisfied ” ).

53 53 Amazon Mechanical Turk

54 54 Comparison of Classifiers (F-score) Classifier With Text Without Text Selected Features ASP_SVM0.690.720.62 ASP_C4.50.750.760.77 ASP_RandomForest0.700.740.68 ASP_Boosting0.670.670.67 ASP_NB0.610.650.58 Human0.61 Baseline0.66 C4.5 is the most effective classifier in this task Human F1 performance is lower than the na ï ve baseline!

55 55 F1 (Satisfied) with varying training sizes ASP_C4.5 substantially outperforms others 2000 questions is sufficient to achieve 0.75 F1

56 56 Features by Information Gain (Satisfied) 0.14219 Q: Askers ’ previous rating 0.13965 Q: Average past rating by asker 0.10237 UH: Member since (interval) 0.04878 UH: Average # answers for by past Q 0.04878 UH: Previous Q resolved for the asker 0.04381 CA: Average asker rating for the category 0.04306 UH: Total number of answers received 0.03274 CA: Average voter rating 0.03159 Q: Question posting time 0.02840 CA: Average # answers per Q

57 57 “ Offline ” vs. “ Online ” Prediction Offline prediction: Offline prediction: –All features( question, answer, asker & category) –F1: 0.77 Online prediction: Online prediction: –all answer features –question features (stars, #comments, sum of votes … ) –F1: 0.74

58 58 Feature Ablation PrecisionRecallF1 Selected features0.800.730.77 No question-answer features0.760.740.75 No answerer features0.760.75 No category features0.750.76 0.75 No asker features0.720.690.71 No question features0.680.720.70 Asker & Question features are most important. Answer quality/Answerer expertise/Category characteristics: may not be important caring or supportive answers might be preferred sometimes

59 59 Satisfaction with varying experience Group together questions from askers with the same number of previous questions Accuracy of prediction increase dramatically Reaching F1 of 0.9 for askers with >= 5 questions

60 60 Summary Asker satisfaction is predictable Asker satisfaction is predictable –Can achieve higher than human accuracy by exploiting history User ’ s experience is important User ’ s experience is important General model: one-size-fits-all General model: one-size-fits-all –2000 questions for training model are enough Current work Current work –Personalized satisfaction prediction –Y.Liu, E. Agichtein. You've Got Answers: Towards Personalized Models for Predicting Success in Community Question Answering (ACL 2008)

61 61 ACL08 Textual features only become helpful for users with more than 20 questions Textual features only become helpful for users with more than 20 questions Personalized classifier achieves surprisingly good accuracy Personalized classifier achieves surprisingly good accuracy For users with only 1 previous question, personalized classifiers works very well For users with only 1 previous question, personalized classifiers works very well Simple strategy of grouping users by number of previous questions is even more effective than other methods for users with moderate amount of history Simple strategy of grouping users by number of previous questions is even more effective than other methods for users with moderate amount of history For users with few questions, non-textual features are dominant For users with few questions, non-textual features are dominant For users with lots of questions, textual features are more significant For users with lots of questions, textual features are more significant

62 Some personalized models 62

63 63

64 Other tasks Subjectivity, sentiment analysis Subjectivity, sentiment analysis –B. Li, Y. Liu, and E. Agichtein, CoCQA: Co-Training Over Questions and Answers with an Application to Predicting Question Subjectivity Orientation, in EMNLP 2008 Discourse analysis Discourse analysis Cross-cultural comparisons Cross-cultural comparisons CQA vs. web search comparison CQA vs. web search comparison 64

65 65 Outline Overview of Intelligent Information Access Lab Research Overview of Intelligent Information Access Lab Research –Information retrieval & extraction, text mining, and data integration –User behavior modeling, interactions, and collaborative filtering Mining User-generated content Mining User-generated content Current and Future Research Current and Future Research

66


Download ppt "1 Natural Language Emory Eugene Agichtein Math & Computer Science and CCI Andrew Post CCI and Biomedical Engineering (?)"

Similar presentations


Ads by Google