1 Mining User Behavior Mining User Behavior Eugene Agichtein Mathematics & Computer Science Emory University.

Slides:



Advertisements
Similar presentations
Data Mining and the Web Susan Dumais Microsoft Research KDD97 Panel - Aug 17, 1997.
Advertisements

Optimizing search engines using clickthrough data
1.Accuracy of Agree/Disagree relation classification. 2.Accuracy of user opinion prediction. 1.Task extraction performance on Bing web search log with.
1 Learning User Interaction Models for Predicting Web Search Result Preferences Eugene Agichtein Eric Brill Susan Dumais Robert Ragno Microsoft Research.
Search Engines and Information Retrieval
Presented by Li-Tal Mashiach Learning to Rank: A Machine Learning Approach to Static Ranking Algorithms for Large Data Sets Student Symposium.
Social Tagging and Search Marti Hearst UC Berkeley.
Shared Ontology for Knowledge Management Atanas Kiryakov, Borislav Popov, Ilian Kitchukov, and Krasimir Angelov Meher Shaikh.
University of Minho School of Engineering Computer Science and Technology Center Uma Escola a Reinventar o Futuro – Semana da Escola de Engenharia - 24.
1 A Topic Modeling Approach and its Integration into the Random Walk Framework for Academic Search 1 Jie Tang, 2 Ruoming Jin, and 1 Jing Zhang 1 Knowledge.
Mining Officially Unrecognized Side effects of drugs by combining Web Search and Machine learning Carlo Carino, Yuanyuan Jia, Bruce Lambert, Patricia West.
Modern Retrieval Evaluations Hongning Wang
EUBA: The Emory User Behavior Analysis System Eugene Agichtein, Qi Guo and Ryan Kelly Intelligent Information Access Lab
Search Engines and Information Retrieval Chapter 1.
RuleML-2007, Orlando, Florida1 Towards Knowledge Extraction from Weblogs and Rule-based Semantic Querying Xi Bai, Jigui Sun, Haiyan Che, Jin.
THOMSON SCIENTIFIC Web of Science Using the specialized search and analyze features Jackie Stapleton, librarian Fall 2006.
Modeling User Interactions in Web Search and Social Media Eugene Agichtein Intelligent Information Access Lab Emory University.
Bringing Order to the Web: Automatically Categorizing Search Results Hao Chen, CS Division, UC Berkeley Susan Dumais, Microsoft Research ACM:CHI April.
Improving Web Search Ranking by Incorporating User Behavior Information Eugene Agichtein Eric Brill Susan Dumais Microsoft Research.
What ’ s on Wikipedia, and What ’ s Not … ? Completeness of Information on the Online Collaborative Encyclopedia Cindy Royal, Ph.D. Assistant Professor.
1 Discovering Authorities in Question Answer Communities by Using Link Analysis Pawel Jurczyk, Eugene Agichtein (CIKM 2007)
 Text Representation & Text Classification for Intelligent Information Retrieval Ning Yu School of Library and Information Science Indiana University.
WebMining Web Mining By- Pawan Singh Piyush Arora Pooja Mansharamani Pramod Singh Praveen Kumar 1.
PLoS ONE Application Journal Publishing System (JPS) First application built on Topaz application framework Web 2.0 –Uses a template engine to display.
Modeling User Interactions in Web Search and Social Media Eugene Agichtein Intelligent Information Access Lab Emory University.
Implicit User Feedback Hongning Wang Explicit relevance feedback 2 Updated query Feedback Judgments: d 1 + d 2 - d 3 + … d k -... Query User judgment.
Personalized Search Xiao Liu
Data Mining By Dave Maung.
Tutorial EBSCO Discovery Service for Corporate Users support.ebsco.com.
Question Answering over Implicitly Structured Web Content
Anomalies in Open-Access & Traditional Biomedical Literature: A Comparative Analysis Abstract This research compares rates of anomaly and post-publication.
LOGO Finding High-Quality Content in Social Media Eugene Agichtein, Carlos Castillo, Debora Donato, Aristides Gionis and Gilad Mishne (WSDM 2008) Advisor.
Data Mining for Web Intelligence Presentation by Julia Erdman.
1 Web-Page Summarization Using Clickthrough Data* JianTao Sun, Yuchang Lu Dept. of Computer Science TsingHua University Beijing , China Dou Shen,
Qi Guo Emory University Ryen White, Susan Dumais, Jue Wang, Blake Anderson Microsoft Presented by Tetsuya Sakai, Microsoft Research.
IR, IE and QA over Social Media Social media (blogs, community QA, news aggregators)  Complementary to “traditional” news sources (Rathergate)  Grow.
Finding high-Quality contents in Social media BY : APARNA TODWAL GUIDED BY : PROF. M. WANJARI.
Mining real world data Web data. World Wide Web Hypertext documents –Text –Links Web –billions of documents –authored by millions of diverse people –edited.
CiteSight: Contextual Citation Recommendation with Differential Search Avishay Livne 1, Vivek Gokuladas 2, Jaime Teevan 3, Susan Dumais 3, Eytan Adar 1.
Implicit User Feedback Hongning Wang Explicit relevance feedback 2 Updated query Feedback Judgments: d 1 + d 2 - d 3 + … d k -... Query User judgment.
CoCQA : Co-Training Over Questions and Answers with an Application to Predicting Question Subjectivity Orientation Baoli Li, Yandong Liu, and Eugene Agichtein.
Iana Atanassova Research: – Information retrieval in scientific publications exploiting semantic annotations and linguistic knowledge bases – Ranking algorithms.
Web Information Retrieval Prof. Alessandro Agostini 1 Context in Web Search Steve Lawrence Speaker: Antonella Delmestri IEEE Data Engineering Bulletin.
Finding the Right Facts in the Crowd: Factoid Question Answering over Social Media J. Bian, Y. Liu, E. Agichtein, and H. Zha ACM WWW, 2008.
Semantic Web COMS 6135 Class Presentation Jian Pan Department of Computer Science Columbia University Web Enhanced Information Management.
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
Date: 2013/9/25 Author: Mikhail Ageev, Dmitry Lagun, Eugene Agichtein Source: SIGIR’13 Advisor: Jia-ling Koh Speaker: Chen-Yu Huang Improving Search Result.
Learning to Rank: From Pairwise Approach to Listwise Approach Authors: Zhe Cao, Tao Qin, Tie-Yan Liu, Ming-Feng Tsai, and Hang Li Presenter: Davidson Date:
Text Information Management ChengXiang Zhai, Tao Tao, Xuehua Shen, Hui Fang, Azadeh Shakery, Jing Jiang.
Toward Entity Retrieval over Structured and Text Data Mayssam Sayyadian, Azadeh Shakery, AnHai Doan, ChengXiang Zhai Department of Computer Science University.
To Personalize or Not to Personalize: Modeling Queries with Variation in User Intent Presented by Jaime Teevan, Susan T. Dumais, Daniel J. Liebling Microsoft.
Table of Contents – Part B HINARI Resources –Clinical Evidence –Cochrane Library –EBM Guidelines –BMJ Practice –HINARI EBM Journals.
Autumn Web Information retrieval (Web IR) Handout #14: Ranking Based on Click Through data Ali Mohammad Zareh Bidoki ECE Department, Yazd University.
The Thomson Reuters Journal Selection Policy – Building Great Journals - Adding Value to Web of Science Maintaining and Growing Web of Science Regional.
Potential for Personalization Transactions on Computer-Human Interaction, 17(1), March 2010 Data Mining for Understanding User Needs Jaime Teevan, Susan.
Image Retrieval and Ranking using L.S.I and Cross View Learning Sumit Kumar Vivek Gupta
Personalized Ontology for Web Search Personalization S. Sendhilkumar, T.V. Geetha Anna University, Chennai India 1st ACM Bangalore annual Compute conference,
Data mining in web applications
Google Scholar and ShareLaTeX
MINING DEEP KNOWLEDGE FROM SCIENTIFIC NETWORKS
Contextual Intelligence as a Driver of Services Innovation
School of Computer Science & Engineering
Search User Behavior: Expanding The Web Search Frontier
A Paper Presentation Vikram Singh Dept. of Computer Engineering ,
Eugene Agichtein Mathematics & Computer Science Emory University
Things to Remember… PubMed
Data Mining Chapter 6 Search Engines
Evidence from Behavior
Learning to Rank with Ties
Presentation transcript:

1 Mining User Behavior Mining User Behavior Eugene Agichtein Mathematics & Computer Science Emory University

2 The Big Picture: Intelligent Information Access

3 Text Mining for Patient Medical Care with E. V. Garcia (Emory SoM) and A. Ram (Georgia Tech) Rule Discovery from Medical Literature (MERLIN project): Rule Discovery from Medical Literature (MERLIN project): –Identify articles containing useful clinical knowledge –Extract new expert system rules, test/modify based on patient DB Personalized diagnosis and care (PRETEX project): Personalized diagnosis and care (PRETEX project): –Extract relevant clinical variables from text in patient records –Personalize expert system rules for a given patient or population –Automatically identify harmful drug interactions and side effects

4 Mining Textual Data in Patient Electronic Medical Records

5 More info: Archana Bhattarai et al., poster at reception this evening

6 Example rule: IF LV_stress_perfusion_is_abnormal THEN STRONG POSITIVE EVIDENCE THAT Diseased_coronary_is(LAD) From Medical Literature to Structured Clinical Knowledge

7 Baoli Li et al., poster at reception this evening

8 This study claims WHAT?!? If it’s printed, must be true If it’s printed, must be true –Published studies are never disproven –Experimental study data is never massaged Big Pharma funding  overstated claims Big Pharma funding  overstated claims R. Smith, 2005: Medical journals are an extension of the marketing arm of pharmaceutical companies, PLoS Medicine R. Smith, 2005: Medical journals are an extension of the marketing arm of pharmaceutical companies, PLoS Medicine How to evaluate quality/soundness of literature? How to evaluate quality/soundness of literature?

9

10 Challenges Authority and trust Authority and trust Privacy of contributors vs. authority Privacy of contributors vs. authority Many dimensions of quality Many dimensions of quality –Equipment sensitivity –Recency (studies grow obsolete) –Size of the clinical trial –Correlational vs. controlled –Randomization –… Work in progress Work in progress

11 The Big Picture: Intelligent Information Access

12 Social media: Planetary-scale user behavior experiment Real information needs and subjective relevance judgments Real information needs and subjective relevance judgments Traces of many interactions recorded Traces of many interactions recorded Allows shared, reproducible experiments Allows shared, reproducible experiments Some semantic organization (tags, categories) Some semantic organization (tags, categories)

13 Social Media (emerging)

14 Traditional vs. social media

15

16

17

18

19

20

21

22

23

24

25

26Community

27

28

29

30

31

32

33

34 How to find relevant and high-quality content in social media?

35 Learning-based Approach Content features Community interaction Features relevance Quality Unified Ranking Function

36 Ranking Algorithm – GBrank [Zheng 2007] Start with an initial guess h 0, for k = 1,2, … Start with an initial guess h 0, for k = 1,2, … Using h k-1 as the current approximation of h, we separate S into two disjoint sets Using h k-1 as the current approximation of h, we separate S into two disjoint sets Fit a regression function g k (x) using Gradient Boosting Tree [Friedman 2001] and the following training data Fit a regression function g k (x) using Gradient Boosting Tree [Friedman 2001] and the following training data Form the new ranking function as Form the new ranking function as

37 Experimental Results Removing textual features Removing community interaction features Baseline GBrank

38 Intelligent Information Access

39 User Behavior: The 3 rd Dimension of the Web Amount exceeds web content and structure Amount exceeds web content and structure –Published: 4Gb/day; Social Media: 10gb/Day –Page views: 100Gb/day [Andrew Tomkins, Yahoo! Search, 2007]

40 Clickthrough for Queries with Known Position of Top Relevant Result Relative clickthrough for queries with known relevant results in position 1 and 3 respectively Higher clickthrough at top non-relevant than at top relevant document E. Agichtein, E. Brill, and S. Dumais, SIGIR 2006

41 Full Search Engine, User Behavior: NDCG, MAP MAPGain RN0.270 RN+ALL ( 19.13%) BM BM25+ALL (23.71%)

42 User Behavior Complements Content and Web Topology RN (Content + Links)0.632 RN + All (User Behavior) (10%) BM BM25+All (31%)

43 Fine grained behavior analysis

Data captured with Tobii eye tracker, courtesy Andy Edmonds,

45 Preliminary results on using mouse trajectories to infer user intent Q. Guo and E. Agichtein, to appear in SIGIR 2008

46