Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Building Hierarchical Classifiers Using Class Proximity Ke Wang Senqiang Zhou Shiang Chen Liew National University of Singapore.

Similar presentations


Presentation on theme: "1 Building Hierarchical Classifiers Using Class Proximity Ke Wang Senqiang Zhou Shiang Chen Liew National University of Singapore."— Presentation transcript:

1 1 Building Hierarchical Classifiers Using Class Proximity Ke Wang Senqiang Zhou Shiang Chen Liew National University of Singapore

2 VLDB99, Sept 6-10, Edinburgh 2 Hierarchical classification Given –a class hierarchy –a collection of pre-classified documents –a document is a set of terms Build – a classifier that assigns a relevant class to a new document Key –extract features of classes

3 VLDB99, Sept 6-10, Edinburgh 3 Yahoo classes Yahoo recreation science automotive sports skating cycling

4 VLDB99, Sept 6-10, Edinburgh 4 ACM classes Hardware General Memory_structure General Design_style Cache_memories Level 1 Level 2 Level 3 Level 4

5 VLDB99, Sept 6-10, Edinburgh 5 Existing local approaches build one classifier at each split of the class hierarchy determine features locally at each node classify a document by going through a path of classifiers starting from the root

6 VLDB99, Sept 6-10, Edinburgh 6 Diminishing of high level structure rely on classification at high levels but high level structures usually weak, i.e., divergence of topics e.g., “car” is a feature at Recreation: Automotive, but not at Recreation

7 VLDB99, Sept 6-10, Edinburgh 7 Bias of misclassification sibling classes Vs. nephew classes misclassification at high levels Vs. at low levels specialisation Vs. generalisation

8 VLDB99, Sept 6-10, Edinburgh 8 Features should be determined wrt the target class determined at all concept levels correlated The solution: generalised association rules (SA95, HF95) {sql, IO}  DB {language, performance}  CS

9 VLDB99, Sept 6-10, Edinburgh 9 Our approach class proximity global classifier term hierarchy use the “best” generalised association rule T  C to determine the class

10 VLDB99, Sept 6-10, Edinburgh 10 Rank association rules Biased confidence Biased J-measure

11 VLDB99, Sept 6-10, Edinburgh 11 An example author story writer editor fiction poem Music Literature A_Music A_Literature Arts Term hierarchy Class hierarchy...

12 VLDB99, Sept 6-10, Edinburgh 12 Term hierarchy(T)=Yes Class proximity(B)=Yes R0: author,story  Literature (Conf B =1,Clist=d6,d7) R1: author  Literature (Conf B =1) R2: story  Literature (Conf B =0.67, Wlist=d5(1)) R4: hall  Music (Conf B =0.4, Clist=d1,d2, Wlist=d3(1)) R3: States  A_Literature (Conf B =0.33, Clist=d4,d5)

13 VLDB99, Sept 6-10, Edinburgh 13

14 VLDB99, Sept 6-10, Edinburgh 14 Experiment I http://www.acm.org/dl/toc.html/ 26,515 papers, 78 classes, 14,754 terms class hierarchy=Level-1 and level-2 categories term hierarchy=Level-3 and level-4 categories document=Title and level-4 categories

15 VLDB99, Sept 6-10, Edinburgh 15 Best rules found by (B,T) CSO: –vector,stream,processor,parallel  Processor_Architectures –multiple_instruction_stream  Processor_Architectures –data_flow,architectur  Processor_Architectures –internet, architectur  Computer_Communication_Networks –mode,atm  Computer_Communication_Networks –network,circuit_switching  Computer_Communication_Networks –tecniqu, model, attribut  Performance_of_Systems Software: –program,function, application  Programming_Techniques –object_oriented_programming  Programming_Techniques –reusable_software  Software_Engineering –software,methodologie  Software_Engineering –organization, distributed_system  Operating_Systems

16 VLDB99, Sept 6-10, Edinburgh 16 () --- | (T) ---  (B) ---  (B,T) ---  (CDAR97,T) ---   (CDAR97) --- 

17 VLDB99, Sept 6-10, Edinburgh 17

18 VLDB99, Sept 6-10, Edinburgh 18 Experiment II http://dir.yahoo.com/recreation/sports 7,550 documents 367 classes, 7 levels 10,747 terms 90% of the terms occur in no more than 10 documents and many documents contain only such terms

19 VLDB99, Sept 6-10, Edinburgh 19 Best rules found by (B,T) Sports:Cycling: –page,mountain  Mountain_Biking –product,bike  Mountain_Biking –mtb,mountain  Mountain_Biking –held,bicycl  Races –classic,bicycl  Races –trip,tour  Travelogues –trip,canada  Travelogues –bicycl,alaska  Travelogues Sports:Auto_Racing: –team,result,driver  Formula_one –model,featur  Tracks_and_Speedways –oval  Tracks_and_Speedways –raceway  Tracks_and_Speedways

20 VLDB99, Sept 6-10, Edinburgh 20

21 VLDB99, Sept 6-10, Edinburgh 21


Download ppt "1 Building Hierarchical Classifiers Using Class Proximity Ke Wang Senqiang Zhou Shiang Chen Liew National University of Singapore."

Similar presentations


Ads by Google