Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Mining User Behavior Mining User Behavior Eugene Agichtein Mathematics & Computer Science Emory University.

Similar presentations


Presentation on theme: "1 Mining User Behavior Mining User Behavior Eugene Agichtein Mathematics & Computer Science Emory University."— Presentation transcript:

1

2 1 Mining User Behavior Mining User Behavior Eugene Agichtein Mathematics & Computer Science Emory University

3 2 The Big Picture: Intelligent Information Access

4 3 Text Mining for Patient Medical Care with E. V. Garcia (Emory SoM) and A. Ram (Georgia Tech) Rule Discovery from Medical Literature (MERLIN project): Rule Discovery from Medical Literature (MERLIN project): –Identify articles containing useful clinical knowledge –Extract new expert system rules, test/modify based on patient DB Personalized diagnosis and care (PRETEX project): Personalized diagnosis and care (PRETEX project): –Extract relevant clinical variables from text in patient records –Personalize expert system rules for a given patient or population –Automatically identify harmful drug interactions and side effects

5 4 Mining Textual Data in Patient Electronic Medical Records

6 5 More info: Archana Bhattarai et al., poster at reception this evening

7 6 Example rule: IF LV_stress_perfusion_is_abnormal THEN STRONG POSITIVE EVIDENCE THAT Diseased_coronary_is(LAD) From Medical Literature to Structured Clinical Knowledge

8 7 Baoli Li et al., poster at reception this evening

9 8 This study claims WHAT?!? If it’s printed, must be true If it’s printed, must be true –Published studies are never disproven –Experimental study data is never massaged Big Pharma funding  overstated claims Big Pharma funding  overstated claims R. Smith, 2005: Medical journals are an extension of the marketing arm of pharmaceutical companies, PLoS Medicine R. Smith, 2005: Medical journals are an extension of the marketing arm of pharmaceutical companies, PLoS Medicine How to evaluate quality/soundness of literature? How to evaluate quality/soundness of literature?

10 9 www.falsemed.org

11 10 Challenges Authority and trust Authority and trust Privacy of contributors vs. authority Privacy of contributors vs. authority Many dimensions of quality Many dimensions of quality –Equipment sensitivity –Recency (studies grow obsolete) –Size of the clinical trial –Correlational vs. controlled –Randomization –… Work in progress Work in progress

12 11 The Big Picture: Intelligent Information Access

13 12 Social media: Planetary-scale user behavior experiment Real information needs and subjective relevance judgments Real information needs and subjective relevance judgments Traces of many interactions recorded Traces of many interactions recorded Allows shared, reproducible experiments Allows shared, reproducible experiments Some semantic organization (tags, categories) Some semantic organization (tags, categories)

14 13 Social Media (emerging)

15 14 Traditional vs. social media

16 15

17 16

18 17

19 18

20 19

21 20

22 21

23 22

24 23

25 24

26 25

27 26Community

28 27

29 28

30 29

31 30

32 31

33 32

34 33

35 34 How to find relevant and high-quality content in social media?

36 35 Learning-based Approach Content features Community interaction Features relevance Quality Unified Ranking Function

37 36 Ranking Algorithm – GBrank [Zheng 2007] Start with an initial guess h 0, for k = 1,2, … Start with an initial guess h 0, for k = 1,2, … Using h k-1 as the current approximation of h, we separate S into two disjoint sets Using h k-1 as the current approximation of h, we separate S into two disjoint sets Fit a regression function g k (x) using Gradient Boosting Tree [Friedman 2001] and the following training data Fit a regression function g k (x) using Gradient Boosting Tree [Friedman 2001] and the following training data Form the new ranking function as Form the new ranking function as

38 37 Experimental Results Removing textual features Removing community interaction features Baseline GBrank

39 38 Intelligent Information Access

40 39 User Behavior: The 3 rd Dimension of the Web Amount exceeds web content and structure Amount exceeds web content and structure –Published: 4Gb/day; Social Media: 10gb/Day –Page views: 100Gb/day [Andrew Tomkins, Yahoo! Search, 2007]

41 40 Clickthrough for Queries with Known Position of Top Relevant Result Relative clickthrough for queries with known relevant results in position 1 and 3 respectively Higher clickthrough at top non-relevant than at top relevant document E. Agichtein, E. Brill, and S. Dumais, SIGIR 2006

42 41 Full Search Engine, User Behavior: NDCG, MAP MAPGain RN0.270 RN+ALL0.3210.052 ( 19.13%) BM250.236 BM25+ALL0.2920.056 (23.71%)

43 42 User Behavior Complements Content and Web Topology MethodP@1Gain RN (Content + Links)0.632 RN + All (User Behavior)0.6930.061(10%) BM250.525 BM25+All0.6870.162 (31%)

44 43 Fine grained behavior analysis

45 44 1 2 3 4 5 6 7 8 22 14 15 18 19 20 21 Data captured with Tobii eye tracker, courtesy Andy Edmonds, http://www.alwaysbetesting.com/

46 45 Preliminary results on using mouse trajectories to infer user intent Q. Guo and E. Agichtein, to appear in SIGIR 2008

47 46 http://www.ir.mathcs.emory.edu/Summary


Download ppt "1 Mining User Behavior Mining User Behavior Eugene Agichtein Mathematics & Computer Science Emory University."

Similar presentations


Ads by Google