Presentation is loading. Please wait.

Presentation is loading. Please wait.

© 2008 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice George Forman Martin Scholz Shyam.

Similar presentations


Presentation on theme: "© 2008 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice George Forman Martin Scholz Shyam."— Presentation transcript:

1

2 © 2008 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice George Forman Martin Scholz Shyam Rajaram HP Labs, Palo Alto, CA, USA Feature Shaping for Linear SVM Classifiers

3 Linear SVMs ? In reality: High-dimensional Varying predictiveness Heterogenous features common for feature selection

4 Example: Useful Non-linear Feature

5 Feature Transformations and SVMs Affine transformations no Linear transformation relative Distance between examples yes Non-monotonic transform. yes Change to single featureEffect

6 Wishlist: Raw Data - Things to Fix Detection of irrelevant features Appropriate scaling of feature ranges − Blood pressure vs. BMI: scale = importance ? Linear dependence of feature on target − FIX: Speeding - death rate doubles every 10mph Monotonic relationships with the target − FIX: blood pressure etc. healthy in a specific interval

7 The Transformation Landscape Complexity & Costs Feature Selection x i ’:=w i x i w i  {0,1} Feature Scaling w i  R + … Feature Shaping Non-linear kernels Feature Construction Kernel Learning Individual features Features sets Raw feature x i Transformed x i ’

8 8 Feature Selection Metrics [Forman CIKM’08]

9 9 BNS for feature selection [Forman, JMLR’02]

10 10 Scaling beats selection [Forman CIKM’08] BNS scaling binary features BNS selection IG selection F-measure

11 11 Scaling beats selection [Forman CIKM’08] BNS scaling binary features BNS selection IG selection F-measure

12 Shaping Example

13 Estimating class distributions Input: labeled examples projected to feature x i Goal: estimate p i := P( y | x i = v ) Large variety of cases: − Nominal, binary features − Ordinal features − Continuous features Output: p i : R  [0, 1] Compute blue curve!

14 Input: p i : R  [0, 1] Goal: make x i “more linearly dependent” Local probability (LP) shaper − x i ’ := p i ( x i ) − non-monotonic transformation Monotonic transformations: − Use rank as new feature value − Derive values from ROC plots Output: function for each i, mapping x i to x i ’ Reshaping Features

15 Coherent Data Processing Blocks PDF estimation Reshaping Features Feature Scaling Normalization Preserving sparsity

16 Feature Scaling Scale of features should reflect importance BNS scaling for binary features: For continuous case: − use BNS score of best binary split Diffing: scale each feature to [0, |BNS(x i ’)|]

17 Normalization Options tested in our experiments: L2 normalization – standard in text mining L1 normalization – sparse solutions No normalization

18 Preserving Sparsity Text data usually very sparse Substantial impact on complexity Discussed transformations: not sparsity- preserving Solution: − Affine transformation  no effect on SVMs − Adapt f i so that f i (x i,m ) = 0 if x i,m is mode of x i

19 Experiments Benchmarks − Text: News articles, TREC, Web data, … − UCI: 11 popular datasets, mixed attribute types − Used as binary classification problems, 50+ positives Learner: − Linear SVM (SMO) − 5x XVal to determine C (out of {.01,.1,1,10,100}) − No internal normalization of input − Logistic scaling activated for output

20 Text: Accuracy vs. training set size

21 UCI data: AUC vs. training set size

22 Overview: All binary UCI tasks

23 Lesion Study on UCI data PDF estimation Reshaping Features Feature Scaling Normalization Preserving sparsity

24 Conclusions Data representation is crucial in data mining “Feature Shaping”: − expressive, local technique for transforming features − generalizes selection and scaling − computationally cheap, very practical − tuned locally for each feature Simplistic implementation  decent improvements Case dependent, smart implementation  ? Questions?

25


Download ppt "© 2008 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice George Forman Martin Scholz Shyam."

Similar presentations


Ads by Google