Download presentation

Published byRalf Cummings Modified over 2 years ago

1
**Jianfu Chen, David S. Warren Stony Brook University**

Cost-Sensitive Learning for Large-Scale Hierarchical Classification of Commercial Products Jianfu Chen, David S. Warren Stony Brook University

2
**Classification is a fundamental problem in information management.**

content Product description UNSPSC Product and material transport vehicles (16) Passenger motor vehicles (15) Safety and rescue vehicles (17) Limousines (06) Automobiles or cars (03) Buses (02) Food Beverage and Tobacco Products (50) Vehicles and their Accessories and Components (25) Office Equipment and Accessories and Supplies (44) Marine transport (11) Motor vehicles (10) Aerospace systems (20) Segment Family Class Commodity Spam Ham Introduce classification problem: given a product description find the commodity node it corresponds to.

3
**How should we design a classifier for a given real world task?**

Abstraction of context, but may matter in a real-life application to a real-world problem. So improve by looking at details on particular instance to improve its usefulness for the particular problem.

4
**Try Off-the-shelf Classifiers**

Method 1. No Design Training Set f(x) Test Set Try Off-the-shelf Classifiers SVM Logistic-regression Decision Tree Neural Network ... Classification is a well-studied problem. There is no need of reinventing the wheel. We can just try a variety of standard classifiers to see which one works best. So far, we are very happy, we have a simple solution. Simplicity is beauty. What else do we need to do? We all forget to ask some simple questions: What’s the use of the classifier? Why do we care about a particular classification task at the first place? How do we measure the performance of a classifier, according to our interests? Most standard classifiers do answer what we care about. They simply assume we care about the error rate, and tries to minimize error rate, or equivalently, maximize accuracy. However, most standard classifiers are designed for minimizing error rate or maximizing accuracy. For many real world tasks, this is true. However, for some other real world tasks, minimizing error rate might not be exactly what we want to achieve in practice. Implicit Assumption: We are trying to minimize error rate, or equivalently, maximize accuracy

5
**Method 2. Optimize what we really care about**

What’s the use of the classifier? How do we evaluate the performance of a classifier according to our interests? Quantify what we really care about Optimize what we care about Cover very quickly here, and move this extended methodology to conclusion. Tightly couples performance evaluation and learning

6
**Hierarchical classification of commercial products**

Textual product description UNSPSC Product and material transport vehicles (16) Passenger motor vehicles (15) Safety and rescue vehicles (17) Limousines (06) Automobiles or cars (03) Buses (02) Food Beverage and Tobacco Products (50) Vehicles and their Accessories and Components (25) Office Equipment and Accessories and Supplies (44) Marine transport (11) Motor vehicles (10) Aerospace systems (20) Segment Family Class Commodity

7
**Product taxonomy helps customers to find desired products quickly.**

Facilitates exploring similar products Helps product recommendation Facilitates corporate spend analysis Toys&Games Looking for gift ideas for a kid? taxonomy organization complements keyword search. dolls puzzles building toys ...

8
**We assume misclassification of products leads to revenue loss.**

Textual product description of a mouse Product ... ... ... ... Desktop computer and accessories ... mouse keyboard pet realize an expected annual revenue lose part of the potential revenue

9
**What do we really care about?**

? maximize profit A vendor’s business goal is to maximize revenue, or equivalently, minimize revenue loss

10
**Observation 1: the misclassification cost of a product depends on its potential revenue.**

11
**Desktop computer and accessories**

Observation 2: the misclassification cost of a product depends on how far apart the true class and the predicted class in the taxonomy. Textual product description of a mouse Product ... ... ... Pet instead of car? ... Desktop computer and accessories ... mouse keyboard pet

12
**The proposed performance evaluation metric: average revenue loss**

revenue loss of product x 𝑅 𝑒𝑚 = 1 𝑚 𝑥,𝑦, 𝑦 ′ ∈𝐷 𝑣 𝑥 ⋅ 𝐿 𝑦 𝑦 ′ example weight 𝑣 𝑥 is the potential annual revenue of product x error function 𝐿 𝑦 𝑦 ′ is the loss ratio the percentage of the potential revenue a vendor will lose due to misclassification from class y to class y’. a non-decreasing monotonic function of hierarchical distance between y and y’, f(d(y, y’)) d(y,y’) 1 2 3 4 𝐿 𝑦 𝑦 ′ 0.2 0.4 0.6 0.8

13
**Learning – minimizing average revenue loss**

𝑅 𝑒𝑚 = 1 𝑚 𝑥,𝑦, 𝑦 ′ ∈𝐷 𝑣 𝑥 ⋅ 𝐿 𝑦 𝑦 ′ Minimize convex upper bound

14
**Multi-class SVM with margin re-scaling**

𝜃 𝑦 𝑖 𝑇 𝑥 𝑖 𝜃 𝑦 𝑖 𝑇 𝑥 𝑖 − 𝜃 𝑦 ′ 𝑇 𝑥 𝑖 ≥𝐿 𝑥 𝑖 , 𝑦 𝑖 , 𝑦 ′ =𝑣 𝑥 𝑖 ⋅ 𝐿 𝑦 𝑖 𝑦 ′ 𝜃 𝑦 ′ 𝑇 𝑥 𝑖 min 𝜃,𝜉 𝜃 𝐶 𝑚 𝑖=1 𝑚 𝜉 𝑖 ∀𝑖, ∀ 𝑦 ′ : 𝜃 𝑦 𝑖 𝑇 𝑥 𝑖 − 𝜃 𝑦 ′ 𝑇 𝑥 𝑖 ≥𝐿 𝑥 𝑖 , 𝑦 𝑖 , 𝑦 ′ 𝑠.𝑡. − 𝜉 𝑖 𝜉 𝑖 ≥0

15
**Multi-class SVM with margin re-scaling**

Convex upper bound of 1 𝑚 𝑖=1 𝑚 𝐿( 𝑥 𝑖 , 𝑦 𝑖 , 𝑦 ′ ) min 𝜃,𝜉 𝜃 𝐶 𝑚 𝑖=1 𝑚 𝜉 𝑖 ∀𝑖, ∀ 𝑦 ′ : 𝜃 𝑦 𝑖 𝑇 𝑥 𝑖 − 𝜃 𝑦 ′ 𝑇 𝑥 𝑖 ≥𝐿 𝑥 𝑖 , 𝑦 𝑖 , 𝑦 ′ 𝑠.𝑡. − 𝜉 𝑖 𝜉 𝑖 ≥0 plug in any loss function 0-1 [ 𝑦 𝑖 ≠ 𝑦 ′ ] error rate (standard multi-class SVM) VALUE 𝑣 𝑥 𝑖 [ 𝑦 𝑖 ≠ 𝑦 ′ ] product revenue TREE 𝐷( 𝑦 𝑖 , 𝑦 ′ ) hierarchical distance REVLOSS 𝑣 𝑥 𝑖 𝐿 𝑦 𝑖 𝑦 ′ revenue loss

16
Dataset UNSPSC (United Nations Standard Product and Service Code) dataset Product revenues are simulated revenue = price * sales data source multiple online market places oriented for DoD and Federal government customers GSA Advantage DoD EMALL taxonomy structure 4-level balanced tree UNSPSC taxonomy #examples 1.4M #leaf classes 1073

17
**Average revenue loss (in K$) of different algorithms**

Experimental results Average revenue loss (in K$) of different algorithms

18
**What’s wrong? min 𝜃,𝜉 1 2 𝜃 2 + 𝐶 𝑚 𝑖=1 𝑚 𝜉 𝑖 𝑠.𝑡. ∀𝑖, ∀ 𝑦 ′ ≠ 𝑦 𝑖 :**

𝜃 𝑦 𝑖 𝑇 𝑥 𝑖 − 𝜃 𝑦 ′ 𝑇 𝑥 𝑖 ≥𝐿 𝑥 𝑖 , 𝑦 𝑖 , 𝑦 ′ − 𝜉 𝑖 𝜉 𝑖 ≥0 𝑣 𝑥 𝑖 ⋅ 𝐿 𝑦 𝑖 𝑦 ′ Revenue loss ranges from a few K to several M

19
Loss normalization Linearly scale loss function to a fixed range [1, 𝑀 𝑚𝑎𝑥 ], say [1, 10] 𝐿 𝑥,𝑦, 𝑦 ′ 𝑠 =1+ 𝐿 𝑥,𝑦, 𝑦 ′ − 𝐿 𝑚𝑖𝑛 𝐿 𝑚𝑎𝑥 − 𝐿 𝑚𝑖𝑛 ⋅( 𝑀 𝑚𝑎𝑥 −1) The objective now upper bounds both 0-1 loss and the average normalized loss.

20
**Average revenue loss (in K$) of different algorithms**

Final results 7.88% reduction in average revenue loss! Average revenue loss (in K$) of different algorithms

21
**Conclusion empirical risk, average misclassification cost:**

𝑅 𝑒𝑚 = 1 𝑚 𝑥,𝑦, 𝑦 ′ ∈𝐷 𝐿(𝑥,𝑦, 𝑦 ′ ) = 1 𝑚 𝑥,𝑦, 𝑦 ′ ∈𝐷 𝑤 𝑥 ⋅Δ(𝑦, 𝑦 ′ ) What do we really care about for this task? Minimize error rate? Minimize revenue loss? Performance evaluation metric regularized empirical risk minimization A general method: multi-class SVM with margin re-scaling and loss normalization How do we approximate the performance evaluation metric to make it tractable? Model + Tractable loss function Optimization Find the best parameters

22
Thank you! Questions?

Similar presentations

OK

Mining customer ratings for product recommendation using the support vector machine and the latent class model William K. Cheung, James T. Kwok, Martin.

Mining customer ratings for product recommendation using the support vector machine and the latent class model William K. Cheung, James T. Kwok, Martin.

© 2018 SlidePlayer.com Inc.

All rights reserved.

Ads by Google

Ppt on indian festivals in hindi Ppt on standing orders Ppt on summary writing examples Ppt on 60 years of indian parliament live Ppt on p/e ratio Ppt on spiritual leadership qualities 20 slides ppt on global warming Ppt on social networking sites project Ppt on bridge construction in india Ppt on the road not taken audio