Presentation is loading. Please wait.

Presentation is loading. Please wait.

Making Your Spider Outperform Google Enhancing Search Precision through Manually-Selected “Best Bets” Richard Wiggins Michigan State University

Similar presentations

Presentation on theme: "Making Your Spider Outperform Google Enhancing Search Precision through Manually-Selected “Best Bets” Richard Wiggins Michigan State University"— Presentation transcript:

1 Making Your Spider Outperform Google Enhancing Search Precision through Manually-Selected “Best Bets” Richard Wiggins Michigan State University May 2003

2 Poll: Who owns a piece of luggage with built-in wheels? Wheel was invented thousands of years ago… … We just figured out how to add them to luggage about 20 years ago  Also needed enabling technology of the inline skate wheel…

3 Innovations Are Obvious: After You Innovate “The Accidental Thesaurus”: a thesaurus driven purely from log analysis The accidental thesaurus represents the wheels you should add to your search engine

4 Thesis By analyzing search logs, you engage in a conversation with your customers At best, it’s a two way conversation:  Your users tell you what they seek  You tune your search engine (and your site) to give them what they seek the most Search is too important to leave in the hands of robots

5 Access 98 Conference Proposal: The Accidental Thesaurus

6 Agenda: The Accidental Thesaurus Why We Needed It How We Built It Why It Works So Well Fellow Travelers: “Best Bets” Conclusions

7 Why We Needed It

8 The Wonderful Things Search Engines Do Help harness massive amounts of content  Thousands, millions, billions of URLs Cut across barriers  Document structure  Topical structure  Institutional structure

9 The Horrible Things Search Engines Do Confuse low-value content with vital content  And obsolete content  And humorous content  And draft, internal, duplicative content Rank leaf pages ahead of starting points Rank popular or personal pages ahead of official content

10 How People Approach a Home Page Some people just start clicking -- browsers Others look for a search box and immediately type a search word or phrase -- searchers Browsers convert to searchers when:  Too many mouse clicks fail to yield result  Information architecture is poor Therefore, search engines serve two groups of users:  Those who prefer to search in the first place  Those who have tried browsing, and are now somewhat frustrated

11 What Users Expect of Search Engines They type in the word(s) they think of, not the labels we assign  "Jobs" instead of "Employment Office" They expect official sites to float to the top of the hit list  If it’s not in the first 10 hits, often the user gives up They expect complete coverage They expect disambiguation "Human Resources"  Finds the right office at MSU  …or degree programs They do not iterate or use complicated syntax

12 Searching Webspace at Michigan State University is a big, complicated space MSU Webspace in general is much, much more complicated  Hundreds of official servers  Perhaps 2,000,000 URLs including many personal pages Result: browsing to find the page you want is ever more futile Both browsers and searchers increasingly frustrated

13 One user view of Academics application for graduation overseas study ordering catalog School of Music Computer Science human ecology department psychology 101

14 Another user view of Virtual Library DNA sequencing climate change beam theory feline brain tumor PRL and sequencing

15 Another user view of Extension livestock pavilion wildlife fisheries bathtub removal and installation Round Bale Storage

16 Google Versus AltaVista MSU AltaVista launched in 1996 Google indexed MSU in 2000  Google is now default search engine at hundreds of universities Google exploits simplicity  Heavy weighting on link popularity  Assumed "and" among operators  Result: desired item much more likely to appear at top of hit list

17 MSU AltaVista vs. Google’s MSU Index Search Phrase (entered as-is, no quotes, no caps) MSU AltaVista hit list position hit list position human resources21 breslin center121 admissions11 sky calendar32 manual of business procedures 61 anatomy351 blackboard85 orientationNot in index.2 remote sensingNot in index.2

18 MSU AltaVista Search: Grades Student wants to find semester grades – None of these relevant!

19 MSU Google: “grades” Even the mighty Google can’t find the best “grades” site

20 Hell Hath No Fury Like a Content Provider Scorned You think your users get upset when they can’t find things… Your content providers are really full of rage We decided we needed to do something…  To help our users  And to mollify our content providers

21 How About a Registered Keywords Approach? Student needs to find the Registrar’s home page Registrar wants students to find the home page Let’s just map a search for “registrar” to a known, hand-picked URL A la AOL Keywords

22 AOL Keywords Example: Search for “survivor” and you get…

23 ESPN Keywords Example:

24 Why Not Pick the Most Popular Keywords First? Note: These searches are for starting points!

25 How We Did It

26 MSU Keywords Features We store popular search phrases into a database:  We map key words to the “best” URLs For each popular search phrase, we look up “the” best URL, and enter into database We manage it all through a Web interface

27 What the User Sees When user searches, we query the database, then query search engine Present results for both on one screen What we hand-pick is always at the top of the hit list

28 User Searches for “grades” “Stuinfo” is the place where a student finds grades

29 Functional View

30 Web-Based Management

31 Why It Works So Well

32 Evidence that MSU Keywords Helps Fewer complaints from users  Far fewer complaints “I can’t find how to apply for a job” Fewer complaints from content providers Positive feedback from both Testing confirms that people do use MSU Keywords

33 Backwards Scientific Method First build the thing Find out it works well Now form hypothesis as to why

34 What Search Logs Reveal

35 A Classic Zipf Distribution Most commonly-used search phrases Least commonly-used search phrases

36 Why the Approach Works So Well To understand success, you must understand the Zipf curve A small number of unique search phrases…  …accounts for a large number of all searches performed Out of 200,000 searches:  The top 500 account for 40%!  The top 1000 account for 50%! A database with only 1000 entries can assist your customers with 50% of their searches

37 How Big Should Your Accidental Thesaurus Be? Out of 200,000 searches at Percent Coverage Unique Key Words /Needed 1014 2060 30166 40402 50895 602,041 705,101 8013,360 9032,455 10053,035

38 The Perfect Marriage: Google and Best Bets 50% of unique searches are rarely entered MSU Keywords works great for popular searches Google’s relevancy works great for the uncommon search  calculate GPA  student change of address  guest policy

39 Fellow Travelers

40 Bristol-Myers Squibb Built a “best bets” service same time we did Served intranet needs Federated search with two existing intranet search engines Very successful Inspired by work information architect Vivian Bliss has done at Microsoft Work by Mike Rogers, Lydia Bauer, et al

41 BBC – “Best Links”

42 Techstreet Ann Arbor, Michigan based company Sells engineering standards online 90% of site visitors enter a search immediately  A technical standard  A Techstreet document number  An area of interest

43 Techstreet Log Analysis NumberKey Words 315astm 188standards 161api 141steel 138water 126test 125standard 117code 116concrete 99design 91systems 84electrical 81handbook 79power

44 Other Universities Compared with Ohio State and Northwestern Same curve Similar search terms Similar rankings! But vastly different Web sites

45 Conclusions

46 Every Web Site Can Benefit At least any non-trivial site Listen to your customers Tune your search engine to deliver the right results for the high-frequency part of your Zipf curve

47 Best Bets Also Serves What the Institution Wants to Convey Breaking news  New product  Management of bad news Dealing with controversy Example: graduate students want to form a union  We added the U position to MSU Keywords  Also the Union’s position  We drove traffic to their Web site!

48 Pro-Active Best Bets

49 Think Pro-Actively When breaking news occurs … people will come to your Web site and search for information Think the way they would Every press release should be examined for Best Bets material

50 University of Alabama: Coach Fired, Google Still Doesn’t Know Coach was fired May 5 Today is May 7 Google/ua points to story from May 1

51 Challenges Guard against overpopulating  Keywords can scale indefinitely But the "good" words need to map to what users want… … not what MSU webmasters want!.com analogy  A-Z index can only scale so far Judicious use of non-public aliases helps Too many cooks  Need single editorial "voice", style sheet  Need consistency

52 New Interface: Integrate with Google

53 Question: Is This Rational Behavior?

54 Remember: Search is too important to leave in the hands of robots The search experience you deliver is part of your information architecture  You can control the top of the hit list

55 Credits MSU Keywords conceived by Richard Wiggins Implemented by Mathew Schuster Recent revisions by Ryan Simmons and Mike Zakhem Other Best Bets projects conceived and implemented by wise people at many places

Download ppt "Making Your Spider Outperform Google Enhancing Search Precision through Manually-Selected “Best Bets” Richard Wiggins Michigan State University"

Similar presentations

Ads by Google