Download presentation
Presentation is loading. Please wait.
Published byDeborah Russell Modified over 8 years ago
2
1 Mining User Behavior Mining User Behavior Eugene Agichtein Mathematics & Computer Science Emory University
3
2 The Big Picture: Intelligent Information Access
4
3 Text Mining for Patient Medical Care with E. V. Garcia (Emory SoM) and A. Ram (Georgia Tech) Rule Discovery from Medical Literature (MERLIN project): Rule Discovery from Medical Literature (MERLIN project): –Identify articles containing useful clinical knowledge –Extract new expert system rules, test/modify based on patient DB Personalized diagnosis and care (PRETEX project): Personalized diagnosis and care (PRETEX project): –Extract relevant clinical variables from text in patient records –Personalize expert system rules for a given patient or population –Automatically identify harmful drug interactions and side effects
5
4 Mining Textual Data in Patient Electronic Medical Records
6
5 More info: Archana Bhattarai et al., poster at reception this evening
7
6 Example rule: IF LV_stress_perfusion_is_abnormal THEN STRONG POSITIVE EVIDENCE THAT Diseased_coronary_is(LAD) From Medical Literature to Structured Clinical Knowledge
8
7 Baoli Li et al., poster at reception this evening
9
8 This study claims WHAT?!? If it’s printed, must be true If it’s printed, must be true –Published studies are never disproven –Experimental study data is never massaged Big Pharma funding overstated claims Big Pharma funding overstated claims R. Smith, 2005: Medical journals are an extension of the marketing arm of pharmaceutical companies, PLoS Medicine R. Smith, 2005: Medical journals are an extension of the marketing arm of pharmaceutical companies, PLoS Medicine How to evaluate quality/soundness of literature? How to evaluate quality/soundness of literature?
10
9 www.falsemed.org
11
10 Challenges Authority and trust Authority and trust Privacy of contributors vs. authority Privacy of contributors vs. authority Many dimensions of quality Many dimensions of quality –Equipment sensitivity –Recency (studies grow obsolete) –Size of the clinical trial –Correlational vs. controlled –Randomization –… Work in progress Work in progress
12
11 The Big Picture: Intelligent Information Access
13
12 Social media: Planetary-scale user behavior experiment Real information needs and subjective relevance judgments Real information needs and subjective relevance judgments Traces of many interactions recorded Traces of many interactions recorded Allows shared, reproducible experiments Allows shared, reproducible experiments Some semantic organization (tags, categories) Some semantic organization (tags, categories)
14
13 Social Media (emerging)
15
14 Traditional vs. social media
16
15
17
16
18
17
19
18
20
19
21
20
22
21
23
22
24
23
25
24
26
25
27
26Community
28
27
29
28
30
29
31
30
32
31
33
32
34
33
35
34 How to find relevant and high-quality content in social media?
36
35 Learning-based Approach Content features Community interaction Features relevance Quality Unified Ranking Function
37
36 Ranking Algorithm – GBrank [Zheng 2007] Start with an initial guess h 0, for k = 1,2, … Start with an initial guess h 0, for k = 1,2, … Using h k-1 as the current approximation of h, we separate S into two disjoint sets Using h k-1 as the current approximation of h, we separate S into two disjoint sets Fit a regression function g k (x) using Gradient Boosting Tree [Friedman 2001] and the following training data Fit a regression function g k (x) using Gradient Boosting Tree [Friedman 2001] and the following training data Form the new ranking function as Form the new ranking function as
38
37 Experimental Results Removing textual features Removing community interaction features Baseline GBrank
39
38 Intelligent Information Access
40
39 User Behavior: The 3 rd Dimension of the Web Amount exceeds web content and structure Amount exceeds web content and structure –Published: 4Gb/day; Social Media: 10gb/Day –Page views: 100Gb/day [Andrew Tomkins, Yahoo! Search, 2007]
41
40 Clickthrough for Queries with Known Position of Top Relevant Result Relative clickthrough for queries with known relevant results in position 1 and 3 respectively Higher clickthrough at top non-relevant than at top relevant document E. Agichtein, E. Brill, and S. Dumais, SIGIR 2006
42
41 Full Search Engine, User Behavior: NDCG, MAP MAPGain RN0.270 RN+ALL0.3210.052 ( 19.13%) BM250.236 BM25+ALL0.2920.056 (23.71%)
43
42 User Behavior Complements Content and Web Topology MethodP@1Gain RN (Content + Links)0.632 RN + All (User Behavior)0.6930.061(10%) BM250.525 BM25+All0.6870.162 (31%)
44
43 Fine grained behavior analysis
45
44 1 2 3 4 5 6 7 8 22 14 15 18 19 20 21 Data captured with Tobii eye tracker, courtesy Andy Edmonds, http://www.alwaysbetesting.com/
46
45 Preliminary results on using mouse trajectories to infer user intent Q. Guo and E. Agichtein, to appear in SIGIR 2008
47
46 http://www.ir.mathcs.emory.edu/Summary
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.