Research Ranked Recall: Efficient Classification by Learning Indices That Rank Omid Madani with Michael Connor (UIUC)

Slides:



Advertisements
Similar presentations
Why does it work? We have not addressed the question of why does this classifier performs well, given that the assumptions are unlikely to be satisfied.
Advertisements

Linear Classifiers (perceptrons)
Bringing Order to the Web: Automatically Categorizing Search Results Hao Chen SIMS, UC Berkeley Susan Dumais Adaptive Systems & Interactions Microsoft.
Data Mining Classification: Alternative Techniques
Structured SVM Chen-Tse Tsai and Siddharth Gupta.
SVM—Support Vector Machines
Search Engines Information Retrieval in Practice All slides ©Addison Wesley, 2008.
Machine learning continued Image source:
CSCI 347 / CS 4206: Data Mining Module 07: Implementations Topic 03: Linear Models.
1 CS 391L: Machine Learning: Instance Based Learning Raymond J. Mooney University of Texas at Austin.
Semi-supervised Learning Rong Jin. Semi-supervised learning  Label propagation  Transductive learning  Co-training  Active learning.
Confidence-Weighted Linear Classification Mark Dredze, Koby Crammer University of Pennsylvania Fernando Pereira Penn  Google.
Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?
1 Chapter 10 Introduction to Machine Learning. 2 Chapter 10 Contents (1) l Training l Rote Learning l Concept Learning l Hypotheses l General to Specific.
1 Learning to Detect Objects in Images via a Sparse, Part-Based Representation S. Agarwal, A. Awan and D. Roth IEEE Transactions on Pattern Analysis and.
Ensemble Learning: An Introduction
ROC Curves.
Semi-Supervised Clustering Jieping Ye Department of Computer Science and Engineering Arizona State University
Scalable Text Mining with Sparse Generative Models
Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)
Online Learning Algorithms
Digital Camera and Computer Vision Laboratory Department of Computer Science and Information Engineering National Taiwan University, Taipei, Taiwan, R.O.C.
This week: overview on pattern recognition (related to machine learning)
Processing of large document collections Part 2 (Text categorization) Helena Ahonen-Myka Spring 2006.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Advisor : Dr. Hsu Presenter : Chien-Shing Chen Author: Tie-Yan.
Digital Camera and Computer Vision Laboratory Department of Computer Science and Information Engineering National Taiwan University, Taipei, Taiwan, R.O.C.
CS 8751 ML & KDDSupport Vector Machines1 Support Vector Machines (SVMs) Learning mechanism based on linear programming Chooses a separating plane based.
Iterative Readability Computation for Domain-Specific Resources By Jin Zhao and Min-Yen Kan 11/06/2010.
8/25/05 Cognitive Computations Software Tutorial Page 1 SNoW: Sparse Network of Winnows Presented by Nick Rizzolo.
Classification and Ranking Approaches to Discriminative Language Modeling for ASR Erinç Dikici, Murat Semerci, Murat Saraçlar, Ethem Alpaydın 報告者:郝柏翰 2013/01/28.
Classification: Feature Vectors
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.6: Linear Models Rodney Nielsen Many of.
© 2008 SRI International Systems Learning for Complex Pattern Problems Omid Madani AI Center, SRI International.
Support Vector Machines Reading: Ben-Hur and Weston, “A User’s Guide to Support Vector Machines” (linked from class web page)
Classifiers Given a feature representation for images, how do we learn a model for distinguishing features from different classes? Zebra Non-zebra Decision.
Today Ensemble Methods. Recap of the course. Classifier Fusion
Greedy is not Enough: An Efficient Batch Mode Active Learning Algorithm Chen, Yi-wen( 陳憶文 ) Graduate Institute of Computer Science & Information Engineering.
Research Prediction Games in Infinitely Rich Worlds Omid Madani Yahoo! Research.
Research Recall Systems: Efficient Learning and Use of Category Indices Omid Madani With Wiley Greiner, David Kempe, and Mohammad Salavatipour.
Powerpoint Templates Page 1 Powerpoint Templates Scalable Text Classification with Sparse Generative Modeling Antti PuurulaWaikato University.
Digital Camera and Computer Vision Laboratory Department of Computer Science and Information Engineering National Taiwan University, Taipei, Taiwan, R.O.C.
Project by: Cirill Aizenberg, Dima Altshuler Supervisor: Erez Berkovich.
1 Chapter 10 Introduction to Machine Learning. 2 Chapter 10 Contents (1) l Training l Rote Learning l Concept Learning l Hypotheses l General to Specific.
Classification (slides adapted from Rob Schapire) Eran Segal Weizmann Institute.
Support vector machine LING 572 Fei Xia Week 8: 2/23/2010 TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A 1.
KNN & Naïve Bayes Hongning Wang Today’s lecture Instance-based classifiers – k nearest neighbors – Non-parametric learning algorithm Model-based.
Text Categorization With Support Vector Machines: Learning With Many Relevant Features By Thornsten Joachims Presented By Meghneel Gore.
Improved Video Categorization from Text Metadata and User Comments ACM SIGIR 2011:Research and development in Information Retrieval - Katja Filippova -
Fast Query-Optimized Kernel Machine Classification Via Incremental Approximate Nearest Support Vectors by Dennis DeCoste and Dominic Mazzoni International.
26/01/20161Gianluca Demartini Ranking Categories for Faceted Search Gianluca Demartini L3S Research Seminars Hannover, 09 June 2006.
Evaluating Classifiers Reading: T. Fawcett, An introduction to ROC analysis, Sections 1-4, 7 (linked from class website)An introduction to ROC analysis.
Combining multiple learners Usman Roshan. Decision tree From Alpaydin, 2010.
Support Vector Machines Reading: Ben-Hur and Weston, “A User’s Guide to Support Vector Machines” (linked from class web page)
SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.
KNN & Naïve Bayes Hongning Wang
Mistake Bounds William W. Cohen. One simple way to look for interactions Naïve Bayes – two class version dense vector of g(x,y) scores for each word in.
Machine Learning Usman Roshan Dept. of Computer Science NJIT.
Machine Learning: Ensemble Methods
Semi-Supervised Clustering
Evaluating Classifiers
Artificial Intelligence
MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.
Instance Based Learning
Trees, bagging, boosting, and stacking
Classification with Perceptrons Reading:
CS 4/527: Artificial Intelligence
Revision (Part II) Ke Chen
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
Instance Based Learning
MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.
Presentation transcript:

Research Ranked Recall: Efficient Classification by Learning Indices That Rank Omid Madani with Michael Connor (UIUC)

Research Many Category Learning (e.g. Y! Directory) Arts&Humanities Photography Magazines Contests Education History Business&EconomyRecreation&Sports Sports Amateur college basketball Over 100,000 categories in the Yahoo! directory Given a page, quickly categorize… Larger for vision, text prediction,... (millions and beyond)

Research Supervised Learning Often two phases: Training Execution/Testing A Learnt classifier f (categorizer) f (unseen) instanceclass prediction(s) Class features ?001 x1 x2 x3 Often learn binary classifiers

Research Massive Learning Lots of... Instances (millions, unbounded..) Dimensions (1000s and beyond) Categories (1000s and beyond) Two questions: 1.How to quickly categorize? 2.How to efficiently learn to categorize efficiently?

Research Efficiency 1.Two phases (combined when online): 1.Learning 2.Classification time/deployment 2.Resource requirements: 1.Memory 2.Time 3.Sample efficiency

Research Idea Cues in input may quickly narrow down possibilities => “index” categories Like search engine, but learn a good index Goal: learn to strike a good balance between accuracy and efficiency

Research Summary Findings Very fast: Train time: minutes versus hours/days (compared against one-versus-rest and top-down) Classification time: O(|x|)? Memory efficient Simple to use (runs on laptop..) Competitive accuracy!

Research Problem Formulation

Research Input-Output Summary features categories instances Input: tripartite graph learn features categories Output: an index = sparse weighted directed bipartite graph (sparse matrix)

Research Scheme Learn a weighted bipartite graph Rank categories retrieved For category assignment, could use rank, or define thresholds, or map scores to probabilities, etc.

Research Three Parts to the Online of Solution How to use the index? How to update (learn) it? When to update it?

Research Retrieval (Ranked Recall) 1. Features are “activated” featurescategories c1 c2 c3 c4 c5 f1 f2 f3 f4 2. Edges are activated 3. Receiving categories are activated 4. Categories sorted/ranked 1.Like use of inverted indices 2. Sparse dot products

Research Computing the Index Efficiency: Impose a constraint on every feature’s maximum out-degree Accuracy: Connect and compute weights so that some measure of accuracy is maximized..

Research Measure average performance per instance Recall: The proportion of instances for which the right category ended up in top k Recall at k = 1 (R1), 5 (R5), 10, … R1=“Accuracy” when “multiclass” Measure of Accuracy: Recall

Research Computational Complexity NP-Hard! The problem: given a finite set of instances (Boolean features), exactly one category per instance, is there an index with max out-degree 1, such that R1 on training set is greater than a threshold t ? Reduction from set cover Approximation? (not known)

Research How About Practice? Devised two main learning algorithms: IND treats features independently. Feature Normalize (FN) doesn’t make an independence assumption; it’s online. Only non-negative weights are learned.

Research Feature Normalize (FN) Algorithm Begin with an empty index Repeat Input instance (features + categories), and retrieve and rank candidate categories If margin is not met, update index

Research Three Parts (Online Setting) How to use the index? How to update it? When to update it?

Research Index Updating For each active feature: Strengthen weights between active feature and true category Weaken the other connections to the feature Strengthening = Increase weight by addition or multiplication

Research Updating featurescategories c1 c2 c3 c4 c5 f1 f2 f3 f4 1. Identify connection 2. Increase weight 3. Normalize/weaken other weights 4. Drop small weights

Research Three Parts How to use an index? How to update it? When to update it?

Research A Tradeoff 1.To achieve stability (helps accuracy), we need to keep updating (think single feature scenario) 2.To “fit” more instances, we need to stop updates on instances that we get “right” Use of margin threshold strikes a balance.

Research Margin Definition Margin = score of the true positive category MINUS score of highest ranked negative category Choice of margin threshold: Fixed, e.g. 0,0.1, 0.5, … Online average (eg: average of the last margins + 0.1)

Research Salient Aspects of FN “Differentially” updates, attempts to improve retrieved ranking (in “context”) Normalizes, but from “feature’s side” No explicit weight demotion/punishment! (normalization/weakening achieves demotion/reordering..) Memory/Efficiency conscious design from the outset Very dynamic/adaptive: edges added and dropped Weights adjusted, categories reordered Extensions/variations exit (e.g. each feature’s out-degree may dynamically adjust)

Research Reuters Domain statistics k685k 70k Web k 301k369k Ads k 23k Reuters RCV1 Avg labels per x Avg vector length |C| # of features # of Instances industry k299k749kJane Austin Experiments are average of 10 runs, each run is a single pass, with 90% for training, 10% held out |C| is the number of classes, L is avg vector length, Cavg is average Number of categories per instance 20 News grp 23k 9.6k 20k 9.4k 60k 33k k

Research Smaller Domains Keerthi and DeCoste, 06 (fast linear SVM) Max out-degree = 25, min allowed weight = 0.01, tested with margins 0, 0.1, and 0.5 and up to 10 passes random splits 10 categories, 10k instances

Research Three Smaller Domains 20 categories, 20k instances

Research Three Smaller Domains 104 categories, 10k instances

Research 3 Large Data Sets (top-down comparisons) ~500 categories, 20k instances ~12.6k categories, ~370k instances ~14k categories, ~70k instances

Research Accuracy vs. Max Out-Degree max out-degree allowed accuracy Web page categorization Ads RCV1

Research Accuracy vs. Passes and Margin # passes Accuracy

Research Related Work and Discussion Multiclass learning/categorization algorithms (top-down, nearest neighbors, perceptron, Naïve Bayes, MaxEnt, SVMs, online methods,..), Speed up methods (trees, indices, …) Feature selection/reduction Evaluation criteria Fast categorization in the natural world Prediction games! (see poster)

Research Summary A scalable supervised learning method for huge class sets (and instances,..) Idea: learn an index (a sparse weighted bipartite graph, mapping features to categories) Online time/memory efficient algorithms Current/future: more algorithms, theory, other domains/applications,..