Presentation is loading. Please wait.

Presentation is loading. Please wait.

Jingjing Liu A talk at School of Information and Library Science University of North Carolina – Chapel Hill November 30, 2012 Beyond Google:

Similar presentations


Presentation on theme: "Jingjing Liu A talk at School of Information and Library Science University of North Carolina – Chapel Hill November 30, 2012 Beyond Google:"— Presentation transcript:

1 Jingjing Liu jingjing@sc.edu A talk at School of Information and Library Science University of North Carolina – Chapel Hill November 30, 2012 Beyond Google: Improving Searchers’ Experience & Satisfaction Using their Behaviors and Contexts

2 Outline 2 Background Two approaches that help systems to understand and make use of users’ behaviors and contexts, and provide desired search results or search assistance o Personalization of search using behaviors and context factors o Understanding and predicting search task difficulty from behaviors Future directions Wrap up

3 Search engines Do a decent job with simple and unambiguous search tasks For example: where is the University of North Carolina at Chapel Hill?where is the University of North Carolina at Chapel Hill Google search features o Query automatic completion o Instant results o Address right there in the search result snippiet o Maps, pictures o … 3

4 4

5 However… Search engines do not do as well for some other tasks which could be ambiguous, e.g., o “jaguar” by a kid looking for some fun pictures of the big animal or o by a car buyer looking for information about the car make 5

6 Or… 6 Tasks could be complex or difficult, e.g., o Collect information about good renting apartments in Chapel Hill, NC Why is this not easy? Could be due to… o Searchers’ lack of knowledge about the neighborhood o Not a single webpage available to complete this task o Analysis need(s) to be done, and decision(s) needs to be made o …

7 The problem 7 Traditional search systems return results based almost only on search queries by keywords matching Little consideration on who, when, where, and why (for what purposes), or the searcher’s task at hand Current search engines have started to incorporate some of these factors into search algorithms, e.g., o location detection o search history o peer use… But there are more that needs to be done

8 Two approaches 8 Towards personalization of search systems o Tailor search results to specific users and/or user gruops o Understanding users behaviors & the roles of search contexts in interpreting user behaviors for their search interests o Predicting document usefulness based on user behaviors and contextual factors Understanding and predicting search task difficulty o Characterizing user behaviors in difficult vs. easy tasks o Predicting search task difficulty from behaviors o Understanding why a task is difficult to users o Providing desired assistance

9 9 Approach 1: Towards Personalization of Search: Understanding the Roles of Contexts Related publications: Liu, J. & Belkin, N. J. (journal article in process). Exploring the roles of contextual factors in personalizing information search. Liu, J. & Belkin, N. J. (2010). Personalizing information retrieval for multi-session tasks: The roles of task stage and task type. SIGIR ’10. Liu, J. & Belkin, N. J. (2010). Personalizing information retrieval for people with different levels of topic knowledge. JCDL ’10.

10 Rationale 10 Tailoring search toward specific users or user groups to improve users’ search experience and satisfaction Further understanding of users & their contexts o preferences & interests: what information is desired and useful? o a person's individual characteristics: knowledge level, cognitive ability, etc. o tasks at hand: type, difficulty level, complexity, etc. o current situation: time, location, stage, etc. Building models of document usefulness prediction Techniques: result re-ranking or query reformulation

11 Context 11 Plays important roles in user-information interaction An umbrella term, including multiple aspects Individual background Time Tasks Location Culture Economics … … …

12 How to learn about user interests? 12 Explicitly asking o Users unwilling to do so Implicitly guessing from user behaviors o Not accurate enough

13 Problems with interpreting reading time Varies with contextual factors Tasks Knowledge Cognitive abilities … cannot reliably predict (Behaviors) Reading time Document usefulness (User interest) 13

14 A three-way relationship 14 User Behaviors Document Usefulness Context observable can be learned explicitly or implicitly 14

15 Research questions 15 cannot reliably predict (Behavioral data) Reading time Document usefulness (Contextual factors) (User interest) RQ1: Does the stage of the user’s task help in interpreting time as an indicator of document usefulness? RQ2: If task stage helps, does this role vary in different task types? 15 Task stage Task type

16 Multi-session tasks Often seen in everyday life o 25% web users conducted multi-session searches (Donato, Bonchi, Chi, & Maarek, 2010 ) Could occur due to o time constraints o difficulties in locating desired information o complexity of the tasks Provide a natural, reasonable, simple, and meaningful way to identify stages in task completion o Has been used in lab experiment setting (Lin, 2001) 16

17 General Design & Data Collection Task type Sub-task topics Session 1Session 2Session 3 S1-S6Dependent123 S7-S12Dependent123 S13-S18Parallel123 S19-S24Parallel123 17 3-session (stage) lab experiment, each session one sub-task 24 Journalism/Media Studies undergraduates 2 task types according to sub-task relationship

18 Parallel vs. Dependent Tasks In a Parallel Task Honda Civic Toyota Camry Nissan Altima In a Dependent Task Collect information on which manufacturers have hybrid cars. Select three models that mainly focus on in the article. Compare the pros and cons of three models of hybrid cars. 18 Suppose you are a journalist writing a feature story about hybrid cars. You want to write the article in three sections, one at one time: 18

19 General Design & Data Collection Task type Activities Session 1Session 2Session 3 S1-S6Dependent 40 mins. Searching & writing; usefulness evaluation questionnaires 40 mins. Searching & writing; usefulness evaluation questionnaires 40 mins. Searching & writing; usefulness evaluation questionnaires S7-S12Dependent S13- S18 Parallel S19- S24 Parallel 19 Each session: about an hour 40 mins. search & report writing Webpage usefulness judgment after report submission (7-point scale) Questionnaires eliciting knowledge, task difficulty, etc.

20 General Design & Data Collection Task type S1-S6Dependent S7-S12Dependent S13-S18Parallel S19-S24Parallel 20 Session 1Session 2Session 3 S1-S6Dependent S7-S12Dependent S13-S18Parallel S19-S24Parallel System version Session 1Session 2Session 3 S1-S6DependentIEIE Plus S7-S12DependentIE S13-S18ParallelIEIE Plus S19-S24ParallelIE System Normal IE IE Plus: system recommended keywords No system effect on users’ reading time

21 Search systems IE Plus version 21 2 versions: IE (normal) vs. IE Plus (term recom.) Note: no effect on users’ reading time System recommended terms

22 General Design & Data Collection Task type S1-S6Dependent S7-S12Dependent S13-S18Parallel S19-S24Parallel 22 Session 1Session 2Session 3 S1-S6Dependent S7-S12Dependent S13-S18Parallel S19-S24Parallel System version Session 1Session 2Session 3 S1-S6DependentIEIE Plus S7-S12DependentIE S13-S18ParallelIEIE Plus S19-S24ParallelIE Logging software Morae: mouse keyboard time stamps of each action event screen video

23 Data analysis cannot reliably predict (Behavioral data) Reading time Document usefulness (Contextual factors) (User interest) 23 Task stage

24 Data analysis method General Linear Model Time = α usefulness + β stage + γ usefulness*stage Examination of the relationship among three variables 1. time: first reading time on a page (revisiting not counted) total reading time on a page (revisiting counted) 2. usefulness: 7-point ratings  3 levels: little, somewhat, very useful 3. stage: 1, 2, 3 Analysis conducted in o both tasks combined o dependent task o parallel task 24

25 Total reading time (both tasks) 25 GLM P values: S:.187 U:.000 S*U:.514 Time index Usefulness levels (U) Stage (S)

26 Total reading time (dependent task) GLM P values: S:.276 U:.000 S*U:.507 Time index Usefulness levels (U) Stage (S) 26

27 Total reading time (parallel task) GLM P values: S:.621 U:.000 S*U:.791 Time index Usefulness levels (U) Stage (S) 27

28 Summary of these findings 28 Strong correlation between usefulness and time o Possible reason: writing reports in parallel with searching and reading information o Implications: this time alone can be a reliable indicator of usefulness No significant difference among different stages o Stage did not play a role No differences between task types o Task type did not play a role However, total reading time cannot be easily obtain by system in real-time

29 First reading time (both tasks) 29 GLM P values: S:.722 U:.116 S*U:.006 29 Time index Usefulness levels (U) Stage (S) 1 3 2

30 Summary: both tasks combined 30 Usefulness and first reading time did not have significant correlation o First reading time only is not a reliable indicator of usefulness Stage and usefulness had significant interaction on time. o Stage played a role 30

31 First reading time (parallel task) 31 Time index Usefulness levels (U) Stage (S) GLM P values: S:.639 U:.869 S*U:.043 1 1 2 2 3 3

32 Summary: parallel task task 32 Usefulness and first reading time did not have significant correlation o First reading time only is not a reliable indicator of usefulness Stage and usefulness had significant interaction on time o Possible explanation: subtask differences were only car type; users in later sessions could have obtained some knowledge about what kinds of pages were useful 32

33 First reading time (dependent task) 33 Time index Usefulness levels (U) Stage (S) GLM P values: S:.454 U:.036 S*U:.180 1 1 2 2 3 3

34 Summary: dependent task 34 Usefulness and first reading time have significant correlation o First dwell time only could reliably indicate usefulness Stage and usefulness did not have significant interaction on time. o Stage did not play a role o Possible explanation: sub-tasks are independent upon each other; users’ knowledge did not increase for each of them 34

35 Summary of findings for reading time 35 First reading time (which can be easily captured by the system) cannot always be a reliable indicator of usefulness on its own Stage could help Task type matters

36 The factor of user knowledge 36 In addition to task stage, we also looked at users’ knowledge as a contextual factor Similar patterns as the stage factor When knowledge and stage both considered, knowledge plays a more significant role However, stage is more easily determined in practice

37 Significance of the study 37 Found that contexts do matter in inferring document usefulness from behaviors o Task stage o Task type o User knowledge Created a method to explore the effects of contextual factors through lab experiments Has implications on search system design o Taking account of task stage, user knowledge, and task type could help infer users’ interests and accordingly tailor search results to specific searchers

38 Limitations o Lab experiment o Effect size Future studies o Other contextual factors: other task type, etc. o Other behaviors than dwell time: clickthrough, revisit, etc. o Naturalistic study o Build models of usefulness prediction based on behaviors and contextual factors o Prototype building and evaluation Limitations & follow up studies 38

39 39 Approach 2: Understanding and Predicting Search Task Difficulty Related publications: Liu, J. & Kim, C. (2012). Search tasks: Why do people feel them difficult? HCIR 2012. Liu, J., Liu, C., Cole, M., Belkin, N. J., & Zhang, X. (2012). Examining and predicting search task difficulty. CIKM 2012. Liu, J., Liu, C., Yuan, X., & Belkin, N. J. (2011). Understanding searchers’ perception of task difficulty: Relationships with task type. ASIS&T 2011. Liu, J., Gwizdka, J., Liu, C., & Belkin, N. J. (2010). Predicting task difficulty in different task types. ASIS&T 2010. Liu, J., Liu, C., Gwizdka, J., & Belkin, N. J. (2010). Can search systems detect users’ task difficulty? Some behavioral signals. SIGIR 2010.

40 Rationale 40 Search systems needs to be improved for better performance in “difficult” search tasks It is important for the system to detect when users are having “difficult” tasks Whether and when to intervene and/or provide assistance Prevent users from getting frustrated or switching to other search engines (from individual search engine’s perspective)

41 Task difficulty & search behaviors 41 Previous studies found more difficulty tasks are associated with users visiting more webpages (Kim 06; Gwizdka & Spence 06) issuing more queries (Kim 06; Aula et al., 10) spending more time on Search Engine Result Pages (SERPs) (Aula et al., 10)

42 Behaviors as task difficulty predictors 42 Some good predictors of task difficulty (Gwizdka 08) o task completion time, o number of queries, etc. The problem o Whole-session level factors, cannot be obtained until the end of the search

43 Levels of user behaviors 43 Whole-task-session level o e.g., task completion time, total number of queries, total number of webpage visits o cannot be captured until the end of a session, therefore, not good for real-time system adaptation Within-task-session level o e.g., dwell (reading) time, number of content pages viewed per query o can be captured in real-time

44 The current study 44 Revisits relations between task difficulty and user behaviors Explores what behaviors are significant in predicting task difficulty o Especially within-session behaviors

45 Methodology 45 Controlled lab experiment 48 student participants o Find useful webpages and save (bookmark & tag) them Search tasks o 12-task pool: 6 pairs o Each participant worked with 6 out of 12, at the choice of their preferences in 6 pairs of questions o 3 types FF-S: fact finding – single item FF-M: fact finding – multiple items IG-M: information gathering – multiple items o Task type showed effects on user behaviors, but the current presentation does not focus on this

46 FF-S task example 46 “ Everybody talks these days about global warming. By how many degrees (Celsius or Fahrenheit) is the temperature predicted to rise by the end of the XXI century? ” One piece of fact

47 FF-M task example 47 “ A friend has just sent an email from an Internet café in the southern USA where she is on a hiking trip. She tells you that she has just stepped into an anthill of small red ants and has a large number of painful bites on her leg. She wants to know what species of ants they are likely to be, how dangerous they are and what she can do about the bites. What will you tell her? ” 3 pieces of facts

48 IG-M task example 48 “ You recently heard about the book "Fast Food Nation," and it has really influenced the way you think about your diet. You note in particular the amount and types of food additives contained in the things that you eat every day. Now you want to understand which food additives pose a risk to your physical health, and are likely to be listed on grocery store labels. ” Information gathering of 2 concepts

49 Data collection 49 Post-task questionnaires o Ratings of task difficulty, etc. o 5 point  binary (rating scores 1-3; 4-5) Morae logging software o Mouse o Keyboard o Webpages o Time stamp of each activity o Screen video

50 Whole-session level behaviors 50 Pages visited o Number of content pages (all, unique) o Number of SERPs (all, unique) Queries o Number of all queries issued o Queries leading to saving pages (number, ratio) o Queries not leading to saving pages (number, ratio) Time o Task completion time o Total time on content pages o Total time on SERPs

51 Results: whole-session behaviors in difficult vs. easy tasks 51 p=.276 p=.000

52 Within-session level behaviors 52 Time o First dwell (reading) time (content pages, SERPs) o Mean dwell (reading) time (content pages, SERPs) Number of pages per query (all, unique)

53 Within-session behaviors in difficult vs. easy tasks 53 p=.778p=.386 p=.217p=.660 p=.000

54 Prediction models in general 54 Logistic regression (forward conditional method) o Good for binary data prediction o Automatically select significant variables in the model Two models o Whole-session level behaviors Both whole-session level and within-task-session levels variables considered Only whole-session level variables considered o Within-session level behaviors Only within-session level variables considered

55 Whole-session level model 55 VariablesNumber of unique SERPs Number of queries with saving pages Total dwell time on unique SERPs p value.000.013.008 B value-.434.381-.023 Log(p/1-p) = c -.434*number of unique SERPs +.381*number of queries with saving pages -.023*total dwell time on unique SERPs (p: the probability of a task being difficult)

56 Whole-session level model accuracy 56 Predicted DifficultEasy ObservedDifficult5446 Easy20168 Prediction accuracy77.1%

57 Within-session level model 57 VariablesFirst dwell time on unique SERPs p value.027 B value-.028 Log(p/1-p) = c -.028*First dwell time on unique SERPs (p: the probability of a task being difficult)

58 Within-session level model accuracy 58 Predicted DifficultEasy ObservedDifficult0100 Easy7181 Prediction accuracy62.8%

59 Summary 59 Whole-session level indicators showed higher prediction accuracy but they cannot be obtained real-time for ongoing sessions Within-session level indicators o can be used for ongoing sessions o but more behaviors need to be identified to improve overall prediction accuracy

60 A further study Controlled lab experiment using a system built with TREC 2004 Genomics track collection o documents from 2000-2004 (n=1.85 million) 38 medical/health related major students o Undergraduate to postdocs (treat as graduates) Tasks: TREC 2004 Genomics track topic pool o A pool of 5 topics, selected by their difficulty (using topic title as query in our system) o 4 topics each participants Questionnaires o Elicited task difficulty ratings, etc. Liu, J., Liu, C., Cole, M., Belkin, N. J., & Zhang, X. (2012). Examining and predicting search task difficulty. CIKM 2012.

61 More behavioral variables 61 First round (8 variables) o Variables captured in the first query round o E.g., first query length, first dwell time on first SERP, first dwell time on first viewed document… Accumulated level (16 variables) o Captured during the search process o E.g., mean dwell time of all documents, average rank of viewed docs, average query length… Whole session level (21 variables) o Captured by the end of the search session o E.g., task completion time, total number of pages, total number of queries…

62 Difficulty prediction 8 models in total Plain models (all variables in each level) o First-round level variables (FR) o Accumulated level variables (AC) o First-round + accumulated level variables (FA) o Whole-session level variables (WS) Forward conditional (FC) models (selected significant variables using FC method; 10 runs of 80% samples) o First-round level variables (FC_FR) o Accumulated level variables (FC_AC) o First-round + accumulated level variables (FC_FA) o Whole-session level variables (FC_WS) 80/20 cross validation

63 Plain models recall-precision graph

64 FC models recall-precision graph

65 Findings summary User behavior differences were found in in easy and difficult tasks at all 3 levels The real-time prediction model (FA) had fairly good performance (accuracy 83%; precision 88%) Using a limited number of significant factors in the model (FA_FC) had comparable performance (accuracy 79%; precision 88%) Some significant behavioral predictors of task difficulty can be captured early in the search process

66 66 Other Research on Task Difficulty

67 Task difficulty and task type 67 Liu et al., ASIS&T 2011: Understanding searchers’ perception of task difficulty: Relationships with task type o Post- vs. pre-task difficulty: increase, no change, decrease o Users’ background factors do not usually related to the change o Task type matters

68 Why are tasks difficult? 68 Liu & Kim, HCIR 2012: Search tasks: Why are they difficult? Examples “I have no knowledge of the subject so I won't know exactly what I'm looking for.” “It was difficult in that I was not familiar with certain cites it was on, it was easier over time after learning more information about the topic.” Implications Help increase result page readability Build interface easy to learn and use

69 Future research directions 69 Continue with the ongoing project about understanding task difficulty Propose and validate a task difficulty taxonomy Explore and test ways of helping with different types of difficult tasks/difficulties

70 Acknowledgments 70 Funding agencies: Mentors & collaborators: Nick Belkin Xiangmin Zhang Jacek Gwizdka Diane Kelly Chang Liu Xiaojun Yuan Michael Cole Chang Suk Kim …

71 71 Thank You! Questions & Comments? Contact: jingjing@sc.edujingjing@sc.edu

72 Selected references 72 Aula, A., Khan, R. & Guan, Z. (2010). How does search behavior change as search becomes more difficult? Proceedings of CHI '10, 35-44. Donato, D., Bonchi, F., Chi, T., & Maarek, Y. (2010). Do you want to take notes? Identifying research missions in Yahoo! Search Pad. In Proceedings of WWW 2010. Gwizdka, J., Spence, I. (2006). What can searching behavior tell us about the difficulty of information tasks? A study of Web navigation. ASIST '06. Kim, J. (2006). Task difficulty as a predictor and indicator of web searching interaction. CHI '06, 959-964. Lin, S.-J. (2001). Modeling and Supporting Multiple Information Seeking Episodes over the Web. Unpublished dissertation. Rutgers University.


Download ppt "Jingjing Liu A talk at School of Information and Library Science University of North Carolina – Chapel Hill November 30, 2012 Beyond Google:"

Similar presentations


Ads by Google