Presentation is loading. Please wait.

Presentation is loading. Please wait.

Online Learning by Projecting: From Theory to Large Scale Web-spam filtering Yoram Singer Koby Crammer (Upenn), Ofer Dekel (Google/HUJI), Vineet Gupta.

Similar presentations


Presentation on theme: "Online Learning by Projecting: From Theory to Large Scale Web-spam filtering Yoram Singer Koby Crammer (Upenn), Ofer Dekel (Google/HUJI), Vineet Gupta."— Presentation transcript:

1 Online Learning by Projecting: From Theory to Large Scale Web-spam filtering Yoram Singer Koby Crammer (Upenn), Ofer Dekel (Google/HUJI), Vineet Gupta (Google), Joseph Keshet (HUJI), Andrew Ng (Stanford), Shai Shalev-Shwartz (HUJI) Based on joint work with: UT Austin AIML Seminar, Jan. 27, 2005

2 Online Binary Classification No animal eats bees Pearls melt in vinegar Dr. Seuss finished Dartmouth There are weapons of mass destruction in Iraq True False True

3 Binary Classification Instances (documents, signals): Labels (true/false, good/bad): Classification and Prediction: Mistakes and losses:

4 Online Binary Classification Initialize your classifier ( ) For t = 1,2,3,…,T,… Receive an instance: Predict label: Receive true label: [suffer “loss”/error] Update classifier ( ) Goal: suffer small losses while learning

5 Why Online? Adaptive Simple to implement Fast, small memory footprint Can be converted to batch learning (O2B) Formal guarantees But: might not be as effective as a well designed batch learning algorithms

6 Linear Classifiers & Margins The prediction is formed as follows: The margin of an example w.r.t Positive Margin Negative Margin

7 Separability Assumption

8 Classifier Update - Passive Mode

9 Prediction & Margin Errors

10 Hinge Loss

11 Version Space In case of a prediction mistake then must reside

12 Mistake  Aggressive Mode is projected onto the feasible (dual) space

13 Passive-Aggressive Update

14 Three Decision Problems: A Unified View ClassificationRegressionUniclass

15 The Generalized PA Algorithm Each example induces a set of consistent hypotheses (half-space, hyper-slub, ball) The new vector is set to be the projection of onto set of consistent hyp. ClassificationRegressionUniclass

16 Loss Bound (Classification) If there exists such that Then where PA makes a bounded number of mistakes

17 Proof Sketch Define: Upper bound: Lower bound: Lipschitz Condition

18 Proof Sketch (Cont.) Combining upper and lower bounds L=B for classification and regression L=1 for uniclass

19 Unrealizable Case ???

20 Unrealizable Case (Classification) PA-IPA-II

21 (Not-really) Aggressive Updates

22 Mistake Bound for PA-I Loss suffered by PA-I on round t: Loss suffered by any fixed vector: #Mistakes made by PA-I is at most:

23 Loss Bound for PA-II Loss suffered by PA-II on round t: Loss suffered by any fixed vector: Cumulative loss ( ) of PA-II is at most:

24 Beyond Binary Decision Problems Applications and generalizations of PA: Multiclass categorization Topic ranking and filtering Hierarchical classification Sequence learning (Markov Networks) Segmentation of sequences Learning of pseudo-metrics

25 Movie Recommendation System Recommender System

26 Recommending by Projecting 1234 Project Apply Thresholds

27 Prank Update w 513 24 Rank Levels Thresholds

28 Prank Update w 513 24

29 PRank w 513 24 Correct Rank Interval

30 Prank Update w 513 24 {2, 3}

31 PRank Update w

32 w x w

33 74424 registered Viewers 1648 listed Movies Viewers rated subsets of movies Demo: online movie recommendation EachMovie Database

34 PA@Google: Web Spam Filtering [With Vineet Gupta] Query: “hotels palo alto” Spammers: Cardinal Hotel - Palo Alto - Reviews of Cardinal Hotel... Palo Alto, California 94301 United States. Deals on Palo Alto hotels.... More Palo Altohotels.... Research other Palo Alto hotels. Is this hotel not right for you?... www.tripadvisor.com/Hotel_Review-g32849-d79154-… Cardinal Hotel - Palo Alto - Reviews of Cardinal Hotel Palo Alto Hotels - Cheap Hotels - Palo Alto Hotels... Book Palo Alto Hotels Online or Call Toll Free 1-800-359-7234.... Keywords: Palo AltoHotel Discounts - Cheap Hotels in Palo Alto. Hotels In Palo Alto.... www.hotelsbycity.com/california/hotels-palo-alto-… Palo Alto Hotels - Cheap Hotels - Palo Alto Hotels...

35 Enhancements for Web Spam Various “signals”  features Design of special kernels Multi-tier feedback (label): +2 navigational site (e.g. www.stanford.edu)www.stanford.edu +1 on topic -1 off topic -2 nuke the spammer Loss is sensitive to site label Algorithmic modifications due to scale: Online-to-batch conversions Re-projections of old examples Part of a recent revision to search (Google3)

36 Web Spam Filtering - Results Specific queries and domains are heavily spammed: Over 50% of the returned URL for travel search Certain countries are more spam prone Training set size: over half a million domains Training time: 2 hours to 5 days Test set size: the entire web crawled by Google (over 100 million domains) A few hours to filter all domains on 100’s of cpus Current reduction achieved (estimate): 50% of spammers

37 Summary Unified online framework for decision problems Simple and efficient algorithms (“kernelizable”) Analyses for realizable and unrealizable cases Numerous applications Batch learning conversions & generalization Generalizations using general Bregman projections Approximate projections for large scale problems Applications of PA to other decision problems

38 Related Work Projections Onto Convex Sets (POCS): Y. Censor & S.A. Zenios, “Parallel Optimization” (Hildreth’s projection algorithm), Oxford UP, 1997 H.H. Bauschke & J.M. Borwein, “On Projection Algorithms for Solving Convex Feasibility Problems”, SIAM Review, 1996 Online Learning: M. Herbster, “Learning additive models online with fast evaluating kernels”, COLT 2001 J. Kivinen, A. Smola, and R.C. Williamson, “Online learning with kernels”, IEEE Trans. on SP, 2004

39 Relevant Publications Online Passive Aggressive Algorithms, CDSS’03 CSKSS’05 Family of Additive Online Algorithms for Category Ranking, CS’03 Ultraconservative Online Algorithms for Multiclass Problems, CS’02 CS’03 On the algorithmic implementation of Multiclass SVM, CS’03 PRanking with Ranking, CS’01 CS’04 Large Margin Hierarchical Classification, DKS’04 Learning to Align Polyphonic Music, SKS’04 Online and Batch Learning of Pseudo-metrics, SSN’04 The Power of Selective Memory: Self-Bounded Learning of Prediction Suffix Trees, DSS’04 A Temporal Kernel-Based Model for Tracking Hand- Movements from Neural Activities, SCPVS’04

40 Hierarchical Classification: Motivation Phonetic transcription of DECEMBER Gross erorr Small errors T ix s eh m bcl b er d AE s eh m bcl b er d ix s eh NASAL bcl b er

41 Phonetic Hierarchy b g PHONEMES Sononorants Silences Obstruents Nasals Liquids Vowels Plosives Fricatives Front Center Back n m ng d k p t f v sh s th dh zh z l y w r Affricates jh ch oy ow uh uw aa ao er aw ay iy ih ey eh ae

42 Common Constructions Ignore the hierarchy - solve as multiclass C A greedy approach: solve a multiclass problem at each nodeC CC

43 Hierarchical Classifier Assume and Associate a prototype with each label Classification rule: W4W4 W5W5 W6W6 W7W7 W8W8 W9W9 W 10 W1W1 W0W0 W2W2 W3W3

44 Hierarchical Classifier (cont.) Define W4W4 W5W5 W6W6 W7W7 W8W8 W9W9 W 10 W1W1 W0W0 W2W2 W3W3

45 A Metric Over Labels b a A given hierarchy defines a metric over the set of labels via graph distance

46 From PA to Hieron Replace a simple margin constraint with a tree-based margin constraint: - correct label - predicted label

47 Hieron - Update w4w4 w5w5 w6w6 w7w7 w8w8 w9w9 w 10 w1w1 w2w2 w3w3

48 Hieron - Update w6w6 w7w7 w 10

49 Sample Run on Synthetic Data The hierarchy given to the algorithm An edge indicates that prototypes are “close”

50 Experiments with Hieron Datasets used Compared two models: Hieron with knowledge of the correct hierarchy Hieron without knowledge of the correct hierarchy (flat) # train# test# labelsdepth DMOZ (web pages)85764-FCV3168 Speech (phonemes)8000020000404 Synthetic data1210060501214

51 Experimental Results Each graph shows the difference between the error histograms of the two models Hieron makes fewer “gross” mistakes State-of-the-art results for frame-based phoneme classification DMOZPhoneme (TIMIT)Synthetic


Download ppt "Online Learning by Projecting: From Theory to Large Scale Web-spam filtering Yoram Singer Koby Crammer (Upenn), Ofer Dekel (Google/HUJI), Vineet Gupta."

Similar presentations


Ads by Google