Download presentation
Presentation is loading. Please wait.
1
1 Building Hierarchical Classifiers Using Class Proximity Ke Wang Senqiang Zhou Shiang Chen Liew National University of Singapore
2
VLDB99, Sept 6-10, Edinburgh 2 Hierarchical classification Given –a class hierarchy –a collection of pre-classified documents –a document is a set of terms Build – a classifier that assigns a relevant class to a new document Key –extract features of classes
3
VLDB99, Sept 6-10, Edinburgh 3 Yahoo classes Yahoo recreation science automotive sports skating cycling
4
VLDB99, Sept 6-10, Edinburgh 4 ACM classes Hardware General Memory_structure General Design_style Cache_memories Level 1 Level 2 Level 3 Level 4
5
VLDB99, Sept 6-10, Edinburgh 5 Existing local approaches build one classifier at each split of the class hierarchy determine features locally at each node classify a document by going through a path of classifiers starting from the root
6
VLDB99, Sept 6-10, Edinburgh 6 Diminishing of high level structure rely on classification at high levels but high level structures usually weak, i.e., divergence of topics e.g., “car” is a feature at Recreation: Automotive, but not at Recreation
7
VLDB99, Sept 6-10, Edinburgh 7 Bias of misclassification sibling classes Vs. nephew classes misclassification at high levels Vs. at low levels specialisation Vs. generalisation
8
VLDB99, Sept 6-10, Edinburgh 8 Features should be determined wrt the target class determined at all concept levels correlated The solution: generalised association rules (SA95, HF95) {sql, IO} DB {language, performance} CS
9
VLDB99, Sept 6-10, Edinburgh 9 Our approach class proximity global classifier term hierarchy use the “best” generalised association rule T C to determine the class
10
VLDB99, Sept 6-10, Edinburgh 10 Rank association rules Biased confidence Biased J-measure
11
VLDB99, Sept 6-10, Edinburgh 11 An example author story writer editor fiction poem Music Literature A_Music A_Literature Arts Term hierarchy Class hierarchy...
12
VLDB99, Sept 6-10, Edinburgh 12 Term hierarchy(T)=Yes Class proximity(B)=Yes R0: author,story Literature (Conf B =1,Clist=d6,d7) R1: author Literature (Conf B =1) R2: story Literature (Conf B =0.67, Wlist=d5(1)) R4: hall Music (Conf B =0.4, Clist=d1,d2, Wlist=d3(1)) R3: States A_Literature (Conf B =0.33, Clist=d4,d5)
13
VLDB99, Sept 6-10, Edinburgh 13
14
VLDB99, Sept 6-10, Edinburgh 14 Experiment I http://www.acm.org/dl/toc.html/ 26,515 papers, 78 classes, 14,754 terms class hierarchy=Level-1 and level-2 categories term hierarchy=Level-3 and level-4 categories document=Title and level-4 categories
15
VLDB99, Sept 6-10, Edinburgh 15 Best rules found by (B,T) CSO: –vector,stream,processor,parallel Processor_Architectures –multiple_instruction_stream Processor_Architectures –data_flow,architectur Processor_Architectures –internet, architectur Computer_Communication_Networks –mode,atm Computer_Communication_Networks –network,circuit_switching Computer_Communication_Networks –tecniqu, model, attribut Performance_of_Systems Software: –program,function, application Programming_Techniques –object_oriented_programming Programming_Techniques –reusable_software Software_Engineering –software,methodologie Software_Engineering –organization, distributed_system Operating_Systems
16
VLDB99, Sept 6-10, Edinburgh 16 () --- | (T) --- (B) --- (B,T) --- (CDAR97,T) --- (CDAR97) ---
17
VLDB99, Sept 6-10, Edinburgh 17
18
VLDB99, Sept 6-10, Edinburgh 18 Experiment II http://dir.yahoo.com/recreation/sports 7,550 documents 367 classes, 7 levels 10,747 terms 90% of the terms occur in no more than 10 documents and many documents contain only such terms
19
VLDB99, Sept 6-10, Edinburgh 19 Best rules found by (B,T) Sports:Cycling: –page,mountain Mountain_Biking –product,bike Mountain_Biking –mtb,mountain Mountain_Biking –held,bicycl Races –classic,bicycl Races –trip,tour Travelogues –trip,canada Travelogues –bicycl,alaska Travelogues Sports:Auto_Racing: –team,result,driver Formula_one –model,featur Tracks_and_Speedways –oval Tracks_and_Speedways –raceway Tracks_and_Speedways
20
VLDB99, Sept 6-10, Edinburgh 20
21
VLDB99, Sept 6-10, Edinburgh 21
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.