Presentation is loading. Please wait.

Presentation is loading. Please wait.

Database Implementation of a Model-Free Classifier Konstantinos Morfonios ADBIS 2007 University of Athens.

Similar presentations


Presentation on theme: "Database Implementation of a Model-Free Classifier Konstantinos Morfonios ADBIS 2007 University of Athens."— Presentation transcript:

1 Database Implementation of a Model-Free Classifier Konstantinos Morfonios ADBIS 2007 University of Athens

2 Introduction LOCUS Parallel Execution Experimental Evaluation Conclusions & Future Work Motivation

3 Introduction LOCUS Parallel Execution Experimental Evaluation Conclusions & Future Work Motivation

4 Introduction ω 1 = ω 2 = Classification x = ω = f(x)

5 . “Lazy”“Eager” Introduction x 1 = x 2 = (+) Faster decisions ( - ) Large/complex datasets ( - ) Dynamic datasets ( - ) Dynamic models (Nearest Neighbors)(Decision Trees)

6 Introduction LOCUS Parallel Execution Experimental Evaluation Conclusions & Future Work Motivation

7 Introduction LOCUS Parallel Execution Experimental Evaluation Conclusions & Future Work Motivation

8 Large/complex datasets

9 Motivation

10 Large/complex datasets Dynamic datasets

11 Motivation

12 Large/complex datasets Dynamic datasets Dynamic models

13 Motivation

14 Large/complex datasets Dynamic datasets Dynamic models Lazy (model-free)

15 Motivation Large/complex datasets Dynamic datasets Dynamic models Lazy (model-free) Nearest Neighbors Disk-based

16 Motivation Nearest Neighbors Suffers from “curse of dimensionality” Not reliable [Beyer et al., ICDT 1999] Not indexable [Shaft et al., ICDT 2005] LOCUS (Lazy Optimal Classifier of Unlimited Scalability)

17 Motivation Category? LOCUS (Lazy Optimal Classifier of Unlimited Scalability)

18 Motivation Lazy LOCUS (Lazy Optimal Classifier of Unlimited Scalability)

19 Motivation Lazy LOCUS (Lazy Optimal Classifier of Unlimited Scalability) Scaling?

20 Motivation Lazy LOCUS (Lazy Optimal Classifier of Unlimited Scalability) Based on simple SQL queries

21 Motivation Lazy LOCUS (Lazy Optimal Classifier of Unlimited Scalability) Based on simple SQL queries Accuracy?

22 Motivation Lazy LOCUS (Lazy Optimal Classifier of Unlimited Scalability) Based on simple SQL queries Converges to optimal Bayes Classifier

23 Motivation Lazy LOCUS (Lazy Optimal Classifier of Unlimited Scalability) Based on simple SQL queries Converges to optimal Bayes Classifier Other features?

24 Motivation Lazy LOCUS (Lazy Optimal Classifier of Unlimited Scalability) Based on simple SQL queries Converges to optimal Bayes Classifier Parallelizable

25 Introduction LOCUS Parallel Execution Experimental Evaluation Conclusions & Future Work Motivation

26 Introduction LOCUS Parallel Execution Experimental Evaluation Conclusions & Future Work Motivation

27 LOCUS x = ω 2 = ω 1 = (f 1  [0, 20], f 2  [0, 10]) f2f2 f1f1 Example

28 LOCUS f2f2 f1f1 Ideally: Dense space

29 LOCUS f2f2 f1f1 ω( ) = ? Ideally: Dense space

30 LOCUS f2f2 f1f1 ω( ) =

31 LOCUS f2f2 f1f1 Reality: Many features Large domains  Sparse space

32 Reality: Many features Large domains  Sparse space LOCUS f2f2 f1f1 ω( ) = ? ?

33 LOCUS f2f2 f1f1 ω( ) = ? ω 1 : 2 ω 2 : 1  3-NN

34 LOCUS f2f2 f1f1 ω( ) = ω 1 : 2 ω 2 : 1  3-NN

35 LOCUS f2f2 f1f1 ω( ) = ? LOCUS

36 f2f2 f1f1 ω( ) = ? ω 1 : 7 ω 2 : 3  LOCUS

37 f2f2 f1f1 ω( ) =  ω 1 : 7 ω 2 : 3 LOCUS

38 f2f2 f1f1 Disk-based implementation LOCUS

39 2δ12δ1 2δ22δ2 SELECT ω, count(*) FROM R WHERE f 1 ≥x 1 -δ 1 AND f 1 ≤x 1 +δ 1 AND f 2 ≥x 2 -δ 2 AND f 2 ≤x 2 +δ 2 GROUP BY ω R(f 1, f 2, ω) ω 1 : 7 ω 2 : 3 ω( ) = 

40 LOCUS SELECT ω, count(*) FROM R WHERE f 1 ≥x 1 -δ 1 AND f 1 ≤x 1 +δ 1 AND f 2 ≥x 2 -δ 2 AND f 2 ≤x 2 +δ 2 GROUP BY ω R(f 1, f 2, ω) What if R is large? Classical optimization techniques for a well-known type of aggregate queries Indexing Presorting Materialized views

41 LOCUS SELECT ω, count(*) FROM R WHERE f 1 ≥x 1 -δ 1 AND f 1 ≤x 1 +δ 1 AND f 2 ≥x 2 -δ 2 AND f 2 ≤x 2 +δ 2 GROUP BY ω R(f 1, f 2, ω) Method reliability? LOCUS converges to the optimal Bayes classifier as the size of the dataset increases (proof in the paper)

42 LOCUS SELECT ω, count(*) FROM R WHERE f 1 ≥x 1 -δ 1 AND f 1 ≤x 1 +δ 1 AND f 2 ≥x 2 -δ 2 AND f 2 ≤x 2 +δ 2 GROUP BY ω R(f 1, f 2, ω) What if a feature, say f 2, is categorical? (e.g. sex)

43 LOCUS SELECT ω, count(*) FROM R WHERE f 1 ≥x 1 -δ 1 AND f 1 ≤x 1 +δ 1 AND f 2 =x 2 GROUP BY ω R(f 1, f 2, ω) Not a problem, since generally in practice: Combinations of categorical and numeric features Categorical features have small domains Hence, they do not contribute to sparsity What if a feature, say f 2, is categorical? (e.g. sex)

44 Introduction LOCUS Parallel Execution Experimental Evaluation Conclusions & Future Work Motivation

45 Introduction LOCUS Parallel Execution Experimental Evaluation Conclusions & Future Work Motivation

46 SELECT Parallel Execution R1R1 R2R2 R3R3 R4R4 R = R 1  R 2  R 3  R 4

47 Parallel Execution ω 1 : 5 ω 2 : 2 ω 1 : 7 ω 2 : 1 ω 1 : 5 ω 2 : 1 ω 1 : 6 ω 2 : 0 R1R1 R2R2 R3R3 R4R4 Count: distributive function ω 1 : 23 ω 2 : 4 5252 12 3 18 3 23 4

48 ω 1 : 7 ω 2 : 1 ω 1 : 5 ω 2 : 1 ω 1 : 6 ω 2 : 0 ω 1 : 5 ω 2 : 2 Parallel Execution Small network traffic Load balancing Lightweight operations on the main server SELECT R1R1 R2R2 R3R3 R4R4 ω 1 : 7 ω 2 : 1 ω 1 : 5 ω 2 : 1 ω 1 : 6 ω 2 : 0 ω 1 : 5 ω 2 : 2 5252 12 3 18 3 23 4

49 Introduction LOCUS Parallel Execution Experimental Evaluation Conclusions & Future Work Motivation

50 Introduction LOCUS Parallel Execution Experimental Evaluation Conclusions & Future Work Motivation

51 Experimental Evaluation LOCUS vs DTs and NNs (weka) Synthetic datasets  Ten functions [Agrawal et al., IEEE TKDE 1993]  D = 9  N  [5  10 3, 5  10 6 ] Real-world datasets  UCI Repository

52 Experimental Evaluation Classification error rate (synthetic datasets, N = 5  10 4 )

53 Experimental Evaluation Effect of dataset size on classification error rate of LOCUS (synthetic datasets, N  [5  10 3, 5  10 6 ])

54 Experimental Evaluation Effect of dataset size on time scalability of LOCUS (synthetic datasets, N  [5  10 3, 5  10 6 ])

55 Experimental Evaluation Classification error rate (real-world datasets)

56 Experimental Evaluation Effect of dataset size on classification error rate (dataset CovType, N  [5  10 3, 5  10 5 ])

57 Introduction LOCUS Parallel Execution Experimental Evaluation Conclusions & Future Work Motivation

58 Introduction LOCUS Parallel Execution Experimental Evaluation Conclusions & Future Work Motivation

59 Conclusions & Future Work LOCUS  Lazy (complex/dynamic datasets and models)  Efficient (based on simple SQL queries)  Reliable (converging to optimal)  Parallelizable

60 Conclusions & Future Work Similar techniques for  feature selection  regression Implementation of a parallel version

61 Questions?

62 Thank you!


Download ppt "Database Implementation of a Model-Free Classifier Konstantinos Morfonios ADBIS 2007 University of Athens."

Similar presentations


Ads by Google