Presentation is loading. Please wait.

Presentation is loading. Please wait.

Web Behavior Analysis. Your Last Words? (in 22 nd century) To family To your best friend?

Similar presentations

Presentation on theme: "Web Behavior Analysis. Your Last Words? (in 22 nd century) To family To your best friend?"— Presentation transcript:

1 Web Behavior Analysis

2 Your Last Words? (in 22 nd century) To family To your best friend?

3 Web Behavior Analysis Why important? Why scary?

4 Part I: Why Important?

5 Q. In the past six months have you used a search engine to help inform your decisions for the following tasks? 66% of people are using search more frequently to make decisions We rely more and more on search for our real-life decision –Opportunities for business –Concerns for privacy

6 Length of Sessions by Type What should be done? Focus on new territory

7 Taxonomy of Web queries Navigational (we are good at this) –to reach a particular site E.g., Searching for top page of company Informational –to acquire pages that provide knowledge for u sers information need Conventional ad hoc retrieval Transactional –to perform a Web-mediated activity E.g., online shopping

8 Navigational Queries Pseudo- Navigational Queries Example: Good and Bad

9 Car GPS around $300 Four day trip to Bhutan from Delhi to visit important Buddhist places Example of Hard Queries: Informational/Transactional

10 Game Console s Party Site

11 What we want?

12 Current research directions How to classify queries? Then what? –Search engines trying to reduce clicks forhard queries –Extracting info from forum

13 Importance of query classification: obama Informational: People may search to know more about Barak Obama Navigational: visit his official website Transactional: perhaps the user goal is to donate money online to support Mr. Obam a s campaign

14 Yahoo numbers ~25 informational content text? ~40 navigational anchor text? ~35 transactional site template?

15 Can you tell if query is navigational or not?

16 Lee et al.[WWW05]: Overview Analyzing how query term is used in anch or texts WWW2008 search Top page of WWW2008 Description in Wikipedia Search engine Destinations are identical Navigational Destinations are diverse Informational Q = searchQ = WWW2008

17 Anchor-link distribution (ALD) Probability that page linked by t is d Top page of WWW2008 t = WWW2008 ALD is skewed GoogleYahoo! Wikipedia t = search ALD is uniform NavigationalInformational

18 Lee et al.: Problem Targeting only anchor texts that are exactl y same as the query –If the same anchor text as the query does not exist on the Web, ALD cannot be computed Problematic queries –Long phrase E.g., information retrieval system research –Multiple keywords E.g., trec, nist, test collection

19 Multi-query solution Query Q = trec, test collection t = trec t = test t = collection Terms T = {trec, test, collection} destinations D = {d1, d2, …} Compute ALD on a term-by-term basis and integrate them

20 Computation of classification score Entropy of D Entropy of a single term t Weighted average

21 Now what? For WWII –Google: &output=search&tbs=ww:1 &output=search&tbs=ww:1 –Microsoft: ?fwd=1&qpvt=wwii&src=abop&q=wwii ?fwd=1&qpvt=wwii&src=abop&q=wwii –Wolfram: Can you tell information vs. transactional?

22 Challenges/Opportunities Slightly subtle/interleaved But huge advertisement revenue (yet to be explored)!!!! Classic querylog+Clicks on surface web not enough.. Any ideas?

23 More signals? Eye movement?Brain signal?

24 More corpus? (social corpus for polls? expert advice?)

25 More signal

26 CS: Client Simple First representation: –Trajectory length –Horizontal range –Vertical range Horizontal range Vertical range Trajectory length

27 CF: Client Full Second representation: –5 segments: initial, early, middle, late, and end –Each segment: speed, acceleration, rotat ion, slope, etc. 1 2 3 4 5

28 Navigational query: facebook

29 Informational query: spanish wine

30 Transactional query: integrator

31 More corpus cQA successful, as additional corpus, not as additional means Challenges?

32 cQA (Yahoo Answers)




36 How Yahoo Answers works






42 Good questions draw good answers

43 Good Q/A? -- Text Check also:

44 Good Q/A? -- Clicks

45 Good Q/As? -- Community

46 Why scary?

47 Useful beyond imagination Spell checker: SIGMOD Did you meansigmoid? Entity relation: SIGMOD ~ SIGIR Translation: SIGMOD, Query suggestion: Rank learning: top 10 entry is visited all the time, what should we do? Reason of migrain?

48 Companies need YOUR HELP AOL released logs Guess what happened?

49 More scientific observations (Yahoo Research) X={query1, query2, query3} Y= age gender area X Y (how likely?) Validate with ground-truth info (Yahoo account)

50 See if you can do it? You observe yourself:

51 Gender Female: fanfiction, bridal, makeup, womens, knitting, hair, ecards, glitter, yoga, and diet Male: nfl, poker, espn, ufc, railroad, prostate, football, golf, male, wrestling, compusa, as well as a variety of adult terms Accuracy: 80+%

52 Age YOUNG: myspace, pregnancy, wikipedia, lyrics, quotes, apartments, torrent, baby, wedding, mall, soundtrack; OLD: aarp, telephone, lottery,, retirement, funeral, senior, mapquest, medicare, newspapers, repair.

53 Place A users zip code (US postal code) or other identifier of location may be detectable from place names used in Check out YahooGEO Apis

54 Name? 50+% issued their name (but other names too) Ref: "Vanity Fair: Privacy in Querylog Bundles"

55 User Solutions? TrackMeNot (TMN) Their tool is an extension to the Firefox web browser, and initiates randomized search queries in the background to a number of commercial search engines. Tor: change IP/cookie (prevents aggregation) - Losing services e.g., personalization

56 Company Solution K-anonymity (bundling) Reported to be unsafe for (vanity search + geo-query, long-tail keywords) [so far, it is considered to be TOO RISKY]

57 Summary You are leaving trails in the cyber world, which aligns more and more with real-life trails Companies are interested in predicting as much as possible of your next behaviors More signals? More corpus? Can you hide as much to protect privacy, while reveal as much to enable such prediction? (privacy dilemma) But it is ok even if we cant know (product state- of-the-art)

58 Search UI? Visualization?

59 What are query aspects?

60 Challenge Intentions are hidden –omission of key information makes intent in queries ambiguous –eg: omitting reviews when searching for reviews of Canon EOS 40D SLR –eg: omitting location/city when searching for jobs Queries are often too broad

61 Goal Mine broad latent aspects from search logs –Formulate the problem based on a real-world model of user interaction with search engine (session = 10 mins) –Bring interesting aspects to the attention of editors who can then determine saliency and usefulness

62 User reformulates query by adding qualifier reviews User reformulates query by selecting reviews aspect User interaction model User enters original query Canon EOS 40D Search engine (SE) returns general results SE returns reviews of the camera Users query is satisfied. eg: she clicks on a result. Search engine (SE) returns general results + query aspects Learning of query aspects from reformulations

63 Results: Examples of aspects found

64 New directions might be Taking target web page clicked into account while constructing aspects Or visualization techniques helping to visually/perceptually/cognitively mine such aspects –Visualization/refinement iterations to narrow down Tomorrow 4:15pm (B2 102) Title: Using Information Visualization to Understand Data Abstract: Information Visualization is the art and science of representing abstract information in a visual form that enables users to understand data through their perceptual and cognitive capabilities. Dr. Bongshin Lee (Microsoft Research)

Download ppt "Web Behavior Analysis. Your Last Words? (in 22 nd century) To family To your best friend?"

Similar presentations

Ads by Google