Presentation is loading. Please wait.

Presentation is loading. Please wait.

Combining Content-Based and Collaborative Filters in an Online Newspaper Mark Claypool, Anuja Gokhale, Tim Miranda, Pavel Murnikov, Dmitry Netes and Matthew.

Similar presentations


Presentation on theme: "Combining Content-Based and Collaborative Filters in an Online Newspaper Mark Claypool, Anuja Gokhale, Tim Miranda, Pavel Murnikov, Dmitry Netes and Matthew."— Presentation transcript:

1 Combining Content-Based and Collaborative Filters in an Online Newspaper Mark Claypool, Anuja Gokhale, Tim Miranda, Pavel Murnikov, Dmitry Netes and Matthew Sartin Computer Science Department Worcester Polytechnic Institute

2 Outline F Introduction F Approach F System F Experiments F Conclusions

3 Information Overload F Newspapers –1/2 dozen delivered daily –2500 daily via Web –Thousands of articles –Personalization u (Bogart, 1989) Need Filters! Quantity! Quality? F Filters –Usenet news: GroupLens, NewsWeeder, PHOAKS –Aggregate: CRAYON, Fishwrap, My Yahoo! –Layout: Krakatoa Chronicle

4 Information Filtering F Apply power of computers to filtering F How do we filter information? –Get recommendation from friend –Most popular newspapers F Opinions –Peers –Aggregate opinions F Collaborative Filters

5 Collaborative Filtering Problems Early Rater Problem 1 64 3 2 5 1 64 3 2 Sparsity Problem E DA C F B 1 64 3 2 5 “Gray Sheep” Problem Changing Interests Problem X

6 Information Filtering F How else do we filter information? –Skimming the newspaper –Picking newspaper section –Reading byline F Item characteristics –Like sports –Hate field hockey –Like reporter F Content-Based Filters

7 Other Approaches F ProfBuilder –(Wasfi, 1999) F GroupLens –(Sarwar, Konstan et al, 1998) F Basu, Hirsh and Cohen –(1998) F Fab –(Balabanovic and Shoham, 1997)

8 Research Approach F Combine Collaborative and Content-Based using Weighted Average F Per-User Weights –address “gray sheep” problem F Per-Item Weights –address “early rater” problem F Other benefits –realize individual algorithm improvements –extensible –hierarchical

9 Collaborative Filtering Algorithm U x = U +  J x - J)r uj  |r uj | F Article x F User U F Pearson’s Correlation r uj F (GroupLens, 1994)

10 Content-Based Filter F Selectable sections –business, sports, entertainment … F Keywords –explicit –implicit –article: stop list, word stems, top 50% F Keyword match by Overlap Coefficient M = 2 x  A  P  min (  A ,  P  ) F Combine via weighted average (1/3 each)

11 Combination Filter F Linear combination of scores –(Vogt and Cotrell, 1996, 1998) F Weights are based on previous accuracy F Reorder F “Top-10” Section

12 P-Tango System Architecture Database Tango Web Browser Front End User Profile Login Ratings Back End Correlations Import Keywords Predictions Web Browser

13 P-Tango User Profile

14 P-Tango Interface

15 Experiment F 18 Users F 3 Week trial F 1300 Articles F Density: 0.5% –EachMovie: 2% F User correlation: -0.2 / 0.02 / 1

16 Results

17 Conclusions F Need to incorporate content-based with collaborative predictions F Linear-mixture of predictions –simple –effective –extensible –realizes benefits of individual alg improvements F Online newspapers promising domain for exploration

18 Future Work F More Experiments –more users, more time, real newspaper readers F Confidence in prediction –per-item weights –better restructuring –information to user F Implicit Ratings F Restructuring


Download ppt "Combining Content-Based and Collaborative Filters in an Online Newspaper Mark Claypool, Anuja Gokhale, Tim Miranda, Pavel Murnikov, Dmitry Netes and Matthew."

Similar presentations


Ads by Google