Research Ranked Recall: Efficient Classification by Learning Indices That Rank Omid Madani with Michael Connor (UIUC)

Research Many Category Learning (e.g. Y! Directory) Arts&Humanities Photography Magazines Contests Education History Business&EconomyRecreation&Sports Sports Amateur college basketball Over 100,000 categories in the Yahoo! directory Given a page, quickly categorize… Larger for vision, text prediction,... (millions and beyond)

Research Supervised Learning Often two phases: Training Execution/Testing A Learnt classifier f (categorizer) f (unseen) instanceclass prediction(s) Class features 1103 5001 2110 2000 ?001 x1 x2 x3 Often learn binary classifiers

Research Massive Learning Lots of... Instances (millions, unbounded..) Dimensions (1000s and beyond) Categories (1000s and beyond) Two questions: 1.How to quickly categorize? 2.How to efficiently learn to categorize efficiently?

Research Efficiency 1.Two phases (combined when online): 1.Learning 2.Classification time/deployment 2.Resource requirements: 1.Memory 2.Time 3.Sample efficiency

Research Idea Cues in input may quickly narrow down possibilities => “index” categories Like search engine, but learn a good index Goal: learn to strike a good balance between accuracy and efficiency

Research Summary Findings Very fast: Train time: minutes versus hours/days (compared against one-versus-rest and top-down) Classification time: O(|x|)? Memory efficient Simple to use (runs on laptop..) Competitive accuracy!

Research Problem Formulation

Research Input-Output Summary features categories instances Input: tripartite graph learn features categories Output: an index = sparse weighted directed bipartite graph (sparse matrix)

Research Scheme Learn a weighted bipartite graph Rank categories retrieved For category assignment, could use rank, or define thresholds, or map scores to probabilities, etc.

Research Three Parts to the Online of Solution How to use the index? How to update (learn) it? When to update it?

Research Retrieval (Ranked Recall) 1. Features are “activated” featurescategories c1 c2 c3 c4 c5 f1 f2 f3 f4 2. Edges are activated 3. Receiving categories are activated 4. Categories sorted/ranked 1.Like use of inverted indices 2. Sparse dot products

Research Computing the Index Efficiency: Impose a constraint on every feature’s maximum out-degree Accuracy: Connect and compute weights so that some measure of accuracy is maximized..

Research Measure average performance per instance Recall: The proportion of instances for which the right category ended up in top k Recall at k = 1 (R1), 5 (R5), 10, … R1=“Accuracy” when “multiclass” Measure of Accuracy: Recall

Research Computational Complexity NP-Hard! The problem: given a finite set of instances (Boolean features), exactly one category per instance, is there an index with max out-degree 1, such that R1 on training set is greater than a threshold t ? Reduction from set cover Approximation? (not known)

Research How About Practice? Devised two main learning algorithms: IND treats features independently. Feature Normalize (FN) doesn’t make an independence assumption; it’s online. Only non-negative weights are learned.

Research Feature Normalize (FN) Algorithm Begin with an empty index Repeat Input instance (features + categories), and retrieve and rank candidate categories If margin is not met, update index

Research Three Parts (Online Setting) How to use the index? How to update it? When to update it?

Research Index Updating For each active feature: Strengthen weights between active feature and true category Weaken the other connections to the feature Strengthening = Increase weight by addition or multiplication

Research Updating featurescategories c1 c2 c3 c4 c5 f1 f2 f3 f4 1. Identify connection 2. Increase weight 3. Normalize/weaken other weights 4. Drop small weights

Research Three Parts How to use an index? How to update it? When to update it?

Research A Tradeoff 1.To achieve stability (helps accuracy), we need to keep updating (think single feature scenario) 2.To “fit” more instances, we need to stop updates on instances that we get “right” Use of margin threshold strikes a balance.

Research Margin Definition Margin = score of the true positive category MINUS score of highest ranked negative category Choice of margin threshold: Fixed, e.g. 0,0.1, 0.5, … Online average (eg: average of the last 10000 margins + 0.1)

Research Salient Aspects of FN “Differentially” updates, attempts to improve retrieved ranking (in “context”) Normalizes, but from “feature’s side” No explicit weight demotion/punishment! (normalization/weakening achieves demotion/reordering..) Memory/Efficiency conscious design from the outset Very dynamic/adaptive: edges added and dropped Weights adjusted, categories reordered Extensions/variations exit (e.g. each feature’s out-degree may dynamically adjust)

Research Reuters 21578 Domain statistics 121014k685k 70k Web 1.4 2712.6k 301k369k Ads 2.087641447k 23k Reuters RCV1 Avg labels per x Avg vector length |C| # of features # of Instances industry 115.117.4k299k749kJane Austin Experiments are average of 10 runs, each run is a single pass, with 90% for training, 10% held out |C| is the number of classes, L is avg vector length, Cavg is average Number of categories per instance 20 News grp 23k 9.6k 20k 9.4k 60k 33k 1 180.9 10 20 1120 80 10469k

Research Smaller Domains Keerthi and DeCoste, 06 (fast linear SVM) Max out-degree = 25, min allowed weight = 0.01, tested with margins 0, 0.1, and 0.5 and up to 10 passes 90-10 random splits 10 categories, 10k instances

Research Three Smaller Domains 20 categories, 20k instances

Research Three Smaller Domains 104 categories, 10k instances

Research 3 Large Data Sets (top-down comparisons) ~500 categories, 20k instances ~12.6k categories, ~370k instances ~14k categories, ~70k instances

Research Accuracy vs. Max Out-Degree max out-degree allowed accuracy Web page categorization Ads RCV1

Research Accuracy vs. Passes and Margin # passes Accuracy

Research Related Work and Discussion Multiclass learning/categorization algorithms (top-down, nearest neighbors, perceptron, Naïve Bayes, MaxEnt, SVMs, online methods,..), Speed up methods (trees, indices, …) Feature selection/reduction Evaluation criteria Fast categorization in the natural world Prediction games! (see poster)

Research Summary A scalable supervised learning method for huge class sets (and instances,..) Idea: learn an index (a sparse weighted bipartite graph, mapping features to categories) Online time/memory efficient algorithms Current/future: more algorithms, theory, other domains/applications,..

Research Ranked Recall: Efficient Classification by Learning Indices That Rank Omid Madani with Michael Connor (UIUC)

Similar presentations

Presentation on theme: "Research Ranked Recall: Efficient Classification by Learning Indices That Rank Omid Madani with Michael Connor (UIUC)"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Research Ranked Recall: Efficient Classification by Learning Indices That Rank Omid Madani with Michael Connor (UIUC)

Similar presentations

Presentation on theme: "Research Ranked Recall: Efficient Classification by Learning Indices That Rank Omid Madani with Michael Connor (UIUC)"— Presentation transcript:

Similar presentations

About project

Feedback