# Treatment Learning: Implementation and Application Ying Hu Electrical & Computer Engineering University of British Columbia.

## Presentation on theme: "Treatment Learning: Implementation and Application Ying Hu Electrical & Computer Engineering University of British Columbia."— Presentation transcript:

Treatment Learning: Implementation and Application Ying Hu Electrical & Computer Engineering University of British Columbia

Ying Hu http://www.ece.ubc.ca/~yingh 2 Outline 1.An example 2.Background Review 3.TAR2 Treatment Learner TARZAN: Tim Menzies TAR2: Ying Hu & Tim Menzies 4.TAR3: improved tar2 TAR3: Ying Hu 5.Evaluation of treatment learning 6.Application of Treatment Learning 7.Conclusion

Ying Hu http://www.ece.ubc.ca/~yingh 3 First Impression low high 6.7 <= rooms < 9.8 and 12.6 <= parent teacher ratio < 15.9 0.6 <= nitric oxide < 1.9 and 17.16 <= living standard < 39 C4.5’s decision tree: Treatment learner:  Boston Housing Dataset (506 examples, 4 classes)

Ying Hu http://www.ece.ubc.ca/~yingh 4 Review: Background  What is KDD ? –KDD = Knowledge Discovery in Database [fayyad96] –Data mining: one step in KDD process –Machine learning: learning algorithms  Common data mining tasks –Classification Decision tree induction (C4.5) [quinlan86] Nearest neighbors [cover67] Neural networks [rosenblatt62] Naive Baye’s classifier [duda73] –Association rule mining APRIORI algorithm [agrawal93] Variants of APRIORI

Ying Hu http://www.ece.ubc.ca/~yingh 5 Treatment Learning: Definition –Input: classified dataset Assume: classes are ordered –Output: Rx=conjunction of attribute-value pairs Size of Rx = # of pairs in the Rx –confidence(Rx w.r.t Class) = P(Class|Rx) –Goal: to find Rx that have different level of confidence across classes –Evaluate Rx: lift –Visualization form of output

Ying Hu http://www.ece.ubc.ca/~yingh 6 Motivation: Narrow Funnel Effect  When is enough learning enough? –Attributes: < 50%, accuracy: decrease 3-5% [shavlik91] –1-level decision tree is comparable to C4 [Holte93] –Data engineering: ignoring 81% features result in 2% increase of accuracy [kohavi97] –Scheduling: random sampling outperforms complete search (depth-first) [crawford94]  Narrow funnel effect –Control variables vs. derived variables –Treatment learning: finding funnel variables

Ying Hu http://www.ece.ubc.ca/~yingh 7 TAR2: The Algorithm  Search + attribute utility estimation –Estimation heuristic: Confidence1 –Search: depth-first search Search space: confidence1 > threshold  Discretization: equal width interval binning  Reporting Rx –Lift(Rx) > threshold  Software package and online distribution

Ying Hu http://www.ece.ubc.ca/~yingh 8 The Pilot Case Study  Requirement optimization –Goal: optimal set of mitigations in a cost effective manner Risks Mitigations Requirements Cost reduce relates Benefit incur achieve  Iterative learning cycle

Ying Hu http://www.ece.ubc.ca/~yingh 9 The Pilot Study (continue)  Cost-benefit distribution (30/99 mitigations)  Compared to Simulated Annealing

Ying Hu http://www.ece.ubc.ca/~yingh 10 Problem of TAR2  Runtime vs. Rx size  To generate Rx of size r:  To generate Rx from size [1..N]

Ying Hu http://www.ece.ubc.ca/~yingh 11 TAR3: the improvement  Random sampling –Key idea: Confidence1 distribution = probability distribution sample Rx from confidence1 distribution –Steps: Place item (a i ) in increasing order according to confidence1 value Compute CDF of each a i Sample a uniform value u in [0..1] The sample is the least a i whose CDF>u –Repeat till we get a Rx of given size

Ying Hu http://www.ece.ubc.ca/~yingh 12 Comparison of Efficiency  Runtime vs. Data size  Runtime vs. Rx size  Runtime vs. TAR2

Ying Hu http://www.ece.ubc.ca/~yingh 13 Comparison of Results  Mean and STD in each round  Final Rx: TAR2=19, TAR3=20  10 UCI domains, identical best Rx  pilot2 dataset (58 * 30k )

Ying Hu http://www.ece.ubc.ca/~yingh 14 External Evaluation All attributes (10 UCI datasets) learning  FSS framework some attributes learning Compare Accuracy C4.5 Naive Bayes Feature subset selector TAR2less

Ying Hu http://www.ece.ubc.ca/~yingh 15 The Results  Accuracy using Naïve Bayes (Avg increase = 0.8% )  Number of attributes  Accuracy using C4.5 (avg decrease 0.9%)

Ying Hu http://www.ece.ubc.ca/~yingh 16 Compare to other FSS methods  # of attribute selected (C4.5 )  # of attribute selected (Naive Bayes)  17/20, fewest attributes selected  Another evidence for funnels

Ying Hu http://www.ece.ubc.ca/~yingh 17 Applications of Treatment Learning  Downloading site: http://www.ece.ubc.ca/~yingh/http://www.ece.ubc.ca/~yingh/  Collaborators: JPL, WV, Portland, Miami  Application examples –pair programming vs. conventional programming –identify software matrix that are superior error indicators –identify attributes that make FSMs easy to test –find the best software inspection policy for a particular software development organization  Other applications: –1 journal, 4 conference, 6 workshop papers

Ying Hu http://www.ece.ubc.ca/~yingh 18 Main Contributions  New learning approach  A novel mining algorithm  Algorithm optimization  Complete package and online distribution  Narrow funnel effect  Treatment learner as FSS  Application on various research domains

Ying Hu http://www.ece.ubc.ca/~yingh 19 ======================  Some notes follow

Ying Hu http://www.ece.ubc.ca/~yingh 20 Rx Definition example  Input example –classified dataset –Output example: Rx=conjunction of attribute-value pairs confidence(Rx w.r.t C) = P(C|Rx)

Ying Hu http://www.ece.ubc.ca/~yingh 21 TAR2 in practice  Domains containing narrow funnels –A tail in the confidence1 distribution –A small number of variables that have disproportionally large confidence1 value –Satisfactory Rx of small size (<6)

Ying Hu http://www.ece.ubc.ca/~yingh 22 Background: Classification  2-step procedure –The learning phase –The testing phase  Strategies employed –Eager learning Decision tree induction (e.g. C4.5) Neural Networks (e.g. Backpropagation) –Lazy learning Nearest neighbor classifiers (e.g. K-nearest neighbor classifier)

Ying Hu http://www.ece.ubc.ca/~yingh 23 Background: Association Rule Possible Rule: B => C,E [support=2%, confidence= 80%] Where support(X->Y) = P(X) confidence(X->Y) = P(Y|X)  Representative algorithms –APRIORI Apriori property of large itemset –Max-Miner More concise representation of the discovered rules Different prune strategies. IDTransactions 1A, B, C,E,F 2B,C,E 3B,C,D,E 4…

Ying Hu http://www.ece.ubc.ca/~yingh 24 Background: Extension  CBA classifier –CBA = Classification Based on Association –X=>Y, Y = class label –More accurate than C4.5 (16/26)  JEP classifier –JEP = Jumping Emerging Patterns Support(X w.r.t D1) = 0, Support(X w.r.t D2) > 0 Model: collection of JEPs Classify: maximum collective impact –More accurate than both C4.5 & CBA (15/25)

Ying Hu http://www.ece.ubc.ca/~yingh 25 Background: Standard FSS Method  Information Gain attribute ranking  Relief  Principle Component Analysis (PCA)  Correlation based feature selection  Consistency based subset evaluation  Wrapper subset evaluation

Ying Hu http://www.ece.ubc.ca/~yingh 26 Comparison  Relation to classification –Class boundary / class density –Class weighting  Relation to association rule mining –Multiple classes / no class –Confidence-based pruning  Relation to change detecting algorithm –support: |P(X|y=c1)-P(X|y=c2)| –confidence: |P(y=c1|X)-P(y=c2|X)| –Baye’s rule

Ying Hu http://www.ece.ubc.ca/~yingh 27 Confidence Property  Universal-extential upward closure R1: Age.young -> Salary.low R2: Age.young, Gender.m -> Salary.low R2: Age.young, Gender.f -> Salary.low  Long rule tend to have high confidence  Large Rx tend to have high lift value

Ying Hu http://www.ece.ubc.ca/~yingh 28 TAR3: Usability  Usability: more user-friendly –Intuitive, default setting

Download ppt "Treatment Learning: Implementation and Application Ying Hu Electrical & Computer Engineering University of British Columbia."

Similar presentations