Presentation is loading. Please wait.

Presentation is loading. Please wait.

The Wild Thing Goes Mobile And Local Kenneth Church and Bo Thiesson Text Mining, Search and Navigation (TMSN) Microsoft Corporation.

Similar presentations


Presentation on theme: "The Wild Thing Goes Mobile And Local Kenneth Church and Bo Thiesson Text Mining, Search and Navigation (TMSN) Microsoft Corporation."— Presentation transcript:

1 The Wild Thing Goes Mobile And Local Kenneth Church and Bo Thiesson Text Mining, Search and Navigation (TMSN) Microsoft Corporation

2 Standard Word Wheeling (T9) Better with Wild Cards! Find k-best regex matches, subject to language model Wild Thing Goes Mobile

3 Search Given An input pattern (regex), and A language model (LM) A list of queries and Their popularities in the MSN logs Find the k-best (most popular) matches Conceptually grep pattern LM | sort –nr | head Heuristic speed-ups

4 Wild Thing > Word Wheeling For surnames, filenames, URLs Regex More general than prefix matching /C.* OH.*/ >> /C.*/ Challenge: Will users enter Wild Cards? Implicit Wild Cards Added after each “word” Initials K F C N Y C Two Implicit Wild Cards

5 Phone Mode And Local Phone Mode Regex notation 7#6  /[PQRS].* [MNO].*/ Local Different Language Model Pr(query)  Pr(query | location) Local queries are different Local: Restaurants (Pizza) Non-local Web Services (e-mail, shopping) Entertainment (adult)

6 Goal: All Forms  Go Wild But with different language models for different contexts

7

8 Demos here here Condoleezza Rice Arnold Schwarzenegger Hot-mail programs

9 Wild Thing + Virtual Earth Better together here here Going Local

10 Different Expansions In Different Locations B C British Columbia Boeing Company Baptist Church Bible College * beach WaikikiNarragansett Pebble Beach Old Orchard FDetroit New London * high * school * univ * hospital * airport * river One Letter Queries

11 Conclusions: Why Go Local? That’s where the money is All politics is local Ditto for classified ads It is nice to be able to search the world But I often want stuff near me It is nice to be able to drive my car anywhere But most accidents are not far from home Geo-tagging URLs and Queries Method 1: Parse docs (hard) Method 2: Logs (easy)

12 Wild Thing Goes Local Wild Thing  Find the k-best matches Non-local: k-best ≡ Pr(query) Local: k-best ≡ Pr(query|location) Probabilities based on query logs Non-local case Conceptually, search list of queries in freq order Stop after finding k matches Local case Ditto, but store a different list for each location Local queries are different from non-local queries Lots of requests for pizza near x Lots of requests for Britney Spears But these are not local searches Apparently, not so many people want her nearby??? Heuristic Speed-ups

13 Smoothing Computational and statistical motivations Can’t store/estimate Pr(query | location) For all queries everywhere Locations defined by a kd-tree Smoothing Rule: Counts  Parent Unless significantly larger than sibling’s counts One parameter: p (significance level) 29 Split by latitude Split by long After smoothing: Most counts  0 Leaf inherits counts from ancestors 2 2 4/2 13 4/2 130 2/2 8/4

14 Search Speed-Ups grep pattern LM | sort –nr | head Heuristic speed-up Generate candidates that might match Filter candidates with standard regex tool Generating candidates (Suffix Array) regex  substring /C.* OH.*/  OH Popularity Modification Suffix arrays designed for all matches (not k-best) Single sort order  Two Alphabetic Order + Popularity Alternate on odd and even levels (like a kd-tree)

15 Standard Suffix Arrays

16 Sort Suffix arrays: Designed to find Frequency and Location Of pattern (substring) First “To Be” Last “To Be”

17 Single Sort Order  Two Alphabetic and popularity Standard App Find all matches Modify Data Structure To find k-best Search On alphabetic splits Do the standard thing On popularity splits, go left (pop) Stop if you have found k matches Otherwise, go right, if you have to Sort by 1 st order Sort by 2 nd order Sort by 1 st

18 Modified Suffix Array Time Complexity O(log N)  O(sqrt(N)) Worst case: Pattern with 0 matches Alphabetic splits are same as before Unfortunately, popularity splits don’t help Have to go both left and right everywhere (for 0 matches) Let P(N) be work to process N items on popularity splits A(N) be work to process N items on alphabetic splits In worst case A(N) = P(N/2) + C 2 P(N) = 2A(N/2) + C 1 Therefore, P(N) = C 3 sqrt(N) + C4

19 Conclusions Personalization and collaborative filtering To find stuff you search for a lot Or other people search for a lot You shouldn’t have to type a lot Wild Thing User enter wild cards anywhere Implicitly or Explicitly System finds k-best expansions Matching their Favorites and Hot Stuff Favorites (Personalization) Hot Stuff

20 Simple Uniform Look-And-Feel Simple, easy to use Even if you can’t spell, type… Even Bo’s 3-year-old can do it Goal: All Forms Go Wild Uniform Look-and-Feel Currently, different systems are different Internet Browser Address Bar remembers where you’ve been Forms autopop name, credit card numbers, etc. Outlook Remembers you favorite e-mail addresses

21 Wild Thing Means different things to different people Encourage use of wild cards Implicit as well as explicit A Children’s Story With apologies to Hippos Go Berserk! Wild Thing Goes Mobile! Wild Thing Goes Local! All Forms Go Wild!!! For the young adult Wild Thing: You Make My Phone Sing!

22 © 2006 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

23


Download ppt "The Wild Thing Goes Mobile And Local Kenneth Church and Bo Thiesson Text Mining, Search and Navigation (TMSN) Microsoft Corporation."

Similar presentations


Ads by Google