Presentation is loading. Please wait.

Presentation is loading. Please wait.

Outline Introduction Background Our Approach Experimental Results

Similar presentations


Presentation on theme: "Outline Introduction Background Our Approach Experimental Results"— Presentation transcript:

1 Parameter Optimized Vertical, Nearest Neighbor- Vote and Boundary Based Classification
Outline Introduction Background Our Approach Experimental Results Conclusions 1/1/2019

2 Parameter Optimized Vertical Classification
Introduction Computer Aided Detection (CAD): Interesting data mining applications Typical Medical Image Data Sets are Large Extremely unbalanced between + & - classes Large number of “irrelevant” features Noisy Labels based on human decisions that take only a few features in to consideration due to the human mind limitation: ~ 5 ± 2 contexts. Major Requirement: Extremely high performance thresholds for clinical acceptance (High Negative Prediction Values). 1/1/2019 Parameter Optimized Vertical Classification

3 Parameter Optimized Vertical Classification
Introduction (Cont.) Pulmonary Embolism (PE): 650,000 cases per year in US (root cause can be anything that stresses cardiovascular system). Condition that occurs when thromboses (blood clots), usually from the legs, move thru ever enlargening vein system, to and through heart into the ever narrowing pulmonary arterial system, where they lodge and block lung arteries. Highly lethal condition symptoms are often detected in an emergency room setting diagnosis of true positives has to be followed by swift treatment treatment usually involves a blood thinner (e.g., warfarin) False negatives are very bad symptoms can resemble brain aneurysm where the immediate treatment is opposite to that for an embolism, but giving warfarin to a patient with a brain aneurysm will cause death!) Holy Grail of PE CAD is fast, accurate detection of negatives (High Negative Predictive Value or NPV ) 1/1/2019 Parameter Optimized Vertical Classification

4 Parameter Optimized Vertical Classification
Introduction (Cont.) Several hundred classification attributes are automatically generated from a large number of radiological or magnetic images (e.g., Computed Tomography Angiography (CTA) images). Objective of a PE CAD system: Identify the sick patients from the available descriptive features with high accuracy (especially NPV accuracy). We applied: Parameter Optimized Vertical, Nearest Neighbor-Vote and Boundary Based Classification The approach was successfully used in ACM 2006 KDD Cup data mining competition (won the NPV task with a score that was twice as high as the nearest competitor). 1/1/2019 Parameter Optimized Vertical Classification

5 Parameter Optimized Vertical Classification
KDD 2006 PE Data 67 CTA Cases (patients) 4424 PE candidates (lung spots) 116 Features generated from Computed Tomography Angiography 1/1/2019 Parameter Optimized Vertical Classification

6 Parameter Optimized Vertical Classification
Our Approach Our Attribute Selection (AS) step was followed by a combination of Gaussian Nearest Neighbor (GNN) and Local Class Boundary (LCB) based classification. Classification parameters were optimized with Genetic Algorithm. Training Set structured vertically into Predicate-trees or P-trees1 (losslessly compressed, data-mining-ready vertical structures). attribute relevance analysis was done, nearest neighbor sets were created, class boundary analysis was done. With compressed P-trees, processing can be done in compressed form (no need to uncompress, process and then compress again). 1/1/2019 Parameter Optimized Vertical Classification

7 P-tree* Vertical Data Structure
Predicate-trees (P-trees) Lossless , Compressed, Data-mining-ready Successfully used in KNN, ARM, Bayesian Classification, SVM... P-tree processing speed allowed for multiple rounds of attribute relevance analysis, including: Information gain based rounds, statistics based rounds, heuristic rounds. * Predicate Tree (Ptree) technology is patented by North Dakota State University (William Perrizo, primary inventor of record); patent number 6,941,303 issued September 6, 2005. 1/1/2019 Parameter Optimized Vertical Classification

8 Method Overview Horizontal Training Data Gaussian Near Neighbor
Vertical Training Data (P-trees) Genetic Algorithm Attribute Relevance Analysis Param. Fitness Relevant Attributes Gaussian Near Neighbor Local Class Boundary Combination Classifier Multiple iterations of the GA produces an optimized classifier that is used to classify the unknown test cases. Test Data Final Results Optimized Classifier 1/1/2019 Parameter Optimized Vertical Classification

9 Parameter Optimized Vertical Classification
Results : Quality KDD 2006 Data Set Best submission KDD Cup NPV task by a factor of 2. 1/1/2019 Parameter Optimized Vertical Classification


Download ppt "Outline Introduction Background Our Approach Experimental Results"

Similar presentations


Ads by Google