Presentation is loading. Please wait.

Presentation is loading. Please wait.

Dr. Susan Gauch When is a rock not a rock? Conceptual Approaches to Personalized Search and Recommendations Nov. 8, 2011 TResNet.

Similar presentations


Presentation on theme: "Dr. Susan Gauch When is a rock not a rock? Conceptual Approaches to Personalized Search and Recommendations Nov. 8, 2011 TResNet."— Presentation transcript:

1 Dr. Susan Gauch When is a rock not a rock? Conceptual Approaches to Personalized Search and Recommendations Nov. 8, 2011 TResNet

2 Outline Background Motivation Collecting User Information Building Conceptual Profiles Using User Profiles in Search –Misearch Using User Profiles in Recommender Systems –MyCiteSeer x Issues with User Profiles

3 Background Information retrieval (IR) studies the indexing and retrieval of textual documents Searching for pages on the World Wide Web is the most recent “killer app” Concerned with retrieving relevant documents to a query Concerned with retrieving from large sets of documents efficiently

4 Web Search System Query String IR System Ranked Documents 1. Page1 2. Page2 3. Page3. Document corpus Web Spider

5 The Vector-Space Model Assume t distinct terms remain after preprocessing; call them index terms or the vocabulary. These “orthogonal” terms form a vector space. Dimension = t = |vocabulary| Each term, i, in a document or query, j, is given a real-valued weight, w ij. Both documents and queries are expressed as t-dimensional vectors: d j = (w 1j, w 2j, …, w tj )

6 Graphic Representation T3T3 T1T1 T2T2 Q = 0T 1 + 0T 2 + 2T 3 Is D 1 or D 2 more similar to Q? How to measure the degree of similarity? Distance? Angle? D 2 = 3T 1 + 7T 2 + T 3 7 3 D 1 = 2T 1 +3T 2 + 5T 3 2 5 3

7

8 Cosine Similarity Measure Cosine similarity measures the cosine of the angle between two vectors. Inner product normalized by the vector lengths.  t3t3 t1t1 t2t2 D1D1 D2D2 Q  D 1 is 6 times better match than D 2 using cosine similarity CosSim(d j, q) =

9 Motivation Search engines contain very large collections –Google reports over 1 trillion web pages Receive very short queries –68% are 3 words long or less Users examine few results –rarely go beyond first page –rarely examine more than 1 result –Exacerbated by small mobile screens

10 Ambiguity How return precise results with ambiguous queries? Return results based on simple key- word matches No consideration of differing meanings If the query is “salsa”, is it……

11

12 Dealing with Ambiguity Expand user queries using a thesaurus –“An Expert System for Searching in Full-Text,” Susan Gauch, 1990 –Basically, make query vectors longer so more likely to match documents Represent documents and queries using high- level concepts instead of keywords –“Conceptual Search with KeyConcept,” Susan Gauch, 2010 –Basically, make reduce dimensions in vectors to provide conceptual match

13 Ontologies A structured set of concepts Where do ontologies come from?

14 Semantic Web Manually build ontologies Experts manually tag data items Very “intelligent” but not scalable

15 IR Community Use implicit ontologies –Wikipedia –Open Directory Project Develop automated techniques to tag items Not as “intelligent” but much more scalable

16 Need for Personalization All users get identical results for identical queries No distinction between veterinarian and child for query “beagle puppy” Need for personalized results based on background and current context How pick best 10 (or 1!) result for _you_?

17 How to Personalize Build a user profile that represents user interests –Collect information –Construct user profile –Use user profile for personalized interactions

18 Collecting User Information Explicit user information –Users fill in site-specific surveys –Users too lazy busy –Data may be deliberately accidentally inaccurate –Information becomes out of date

19 Implicit user information –Software collects information about user activity as they perform regular activities –Information is indirect noisy –Various approaches used by well- known applications

20 Implicit Sources Browsing histories –User connects to Internet via a proxy –User periodically shares history –Pros: captures browsing activity at multiple sites –Cons: captures history from only one computer

21 My Browsing History

22 Used to Autofill urls

23 Implicit Sources Desktop toolbar –User must install desktop toolbar –Communication between toolbar and site –Pros: interactions tracked across multiple sites access to desktop windows, file system –Cons: user must install software fine line between toolbar and spyware

24 Google’s Toolbar

25 Used to Personalize Search

26 Implicit Sources –User Account user activity is tracked via cookies/session variables best if user signs in to retain same profile across multiple machines –Pros: users tracked across all interactions –Cons: only works at one site users must create an account

27 Amazon’s Login

28 Used for Recommendations

29 Our Approach –Personalization based on implicit data –Represent profile using weighted conceptual taxonomy –Use profile for personalization in many different ways OBIWAN – Web browsing Misearch – Web search MyCiteSeer x – recommender system

30 Building a Conceptual Profile Need an ontology for the domain Need a collection of text that represents the user’s interests Need classification technique –train classifier with training data –classify user texts w.r.t ontology/taxonomy/concept hierarchy/thesaurus/knowledge base –accumulate weights

31 Building the User Profile

32 User Profile Representation Entertainment 0.01 Homemaking 0.04 Cooking 0.49 Lessons 0.3 Videos 0.1 Root

33 MiSearch User search histories –information available to search engine itself –collect the user’s queries, clicked on search results –no software installed Users create accounts –login –just track userid in a cookie during the session –Similar to Amazon, Ebay, etc.

34 Personalizing Search Results Submit query to Internet search engine (e.g., Google) Categorize each result into same concept hierarchy to create result profiles –top 3 levels of ODP, ~3,000 categories Calculate similarity between result profile and user profile

35 Ambiguous: “canon book”

36 User Profile (Classics)

37

38 User Profile (Photography)

39

40 MyCiteSeer x Categorize contents of CiteSeer x with respect to ACM CCS topic hierarchy Users create an account Capture their queries and clicked-on documents Build a conceptual profile Compare user concepts to document concepts to create recommendations

41 User interested in IR

42 Their recommendations

43 User interested in multimedia

44 Their recommendations

45 Recent Work Bridge gap between Semantic Web and Information Retrieval –Semi-automatically build domain- specific ontologies Do text mining from domain-specific literature collection

46 Conclusions Information on which to base user profiles can be collected via interactions with a specific site Conceptual profiles can be used to improve search (misearch) Conceptual profiles can be used to provide conceptual recommendations for the CiteSeer x collection Creates issues for profile sharing and user privacy Leads to work on how to reuse/expand/build ontologies for narrow domains


Download ppt "Dr. Susan Gauch When is a rock not a rock? Conceptual Approaches to Personalized Search and Recommendations Nov. 8, 2011 TResNet."

Similar presentations


Ads by Google