Download presentation

Presentation is loading. Please wait.

Published byTeagan Hickey Modified about 1 year ago

1
View Learning: An extension to SRL An application in Mammography Jesse Davis, Beth Burnside, Inês Dutra Vítor Santos Costa, David Page, Jude Shavlik & Raghu Ramakrishnan

2
Background Breast cancer is the most common cancer Breast cancer is the most common cancer Mammography is the only proven screening test Mammography is the only proven screening test At this time approximately 61% of women have had a mammogram in the last 2 years At this time approximately 61% of women have had a mammogram in the last 2 years Translates into 20 million mammograms per year Translates into 20 million mammograms per year

3
The Problem Radiologists interpret mammograms Variability in among radiologists differences in training and experience differences in training and experience Experts have higher cancer detection and less benign biopsies Shortage of experts

4
Common Mammography findings Microcalcifications Microcalcifications Masses Masses Architectural distortion Architectural distortion

5
Calcifications

6
Mass

7
Architectural distortion

8
Other important features Microcalcifications Microcalcifications Shape, distribution, stability Shape, distribution, stability Masses Masses Shape, margin, density, size, stability Shape, margin, density, size, stability Associated findings Associated findings Breast Density Breast Density

9
Other variables influence risk Demographic risk factors Demographic risk factors Family History Family History Hormone therapy Hormone therapy Age Age

10
Standardization of Practice -Passage of the Mammography Quality Standards Act (MQSA) in Requires tracking of patient outcomes through regular audits of mammography interpretations and cases of breast cancer -Standardized lexicon: BI-RADS was developed incorporating 5 categories that include 43 unique descriptors

11
Mass Density -high -equal -low -fat containing Shape -round -oval -lobular -irregular Margins - circumscribed -microlobulated -obscured -indistinct -Spiculated Associated Findings Special Cases Architectural Distortion Calcifications Higher Probability Malignancy - pleomorphic -fine/linear/branching Intermediate -amorphous Typically Benign -skin -vascular -coarse/popcorn -rod-like -round -lucent-centered -eggshell/rim -milk of calcium -suture -dystrophic -punctate BI-RADS Trabecular Thickening Skin Thickening Nipple Retraction Skin Retraction Skin Lesion Axillary Adenopathy Focal Assymetric Density Assymetric Breast Tissue Lymph Node Tubular Density Distribution -clustered -linear -segmental -regional -diffuse/scattered

12
Mammography Database Radiologist interpretation of mammogram Radiologist interpretation of mammogram Patient may have multiple mammograms Patient may have multiple mammograms A mammogram may have multiple abnormalities A mammogram may have multiple abnormalities Expert defined Bayes net for determining whether an abnormality is malignant Expert defined Bayes net for determining whether an abnormality is malignant

13
Original Expert Structure

14
P1 1 5/02 Spic 0.03 RU4 B P1 2 5/04 Var 0.04 RU4 M P1 3 5/04 Spic 0.04 LL3 B … … … … … … … Patient Abnormality Date Mass Shape … Mass Size Loc Be/Mal Mammography Database

15
Types of Learning Hierarchy of ‘types’ of learning that we can perform on the Mammography database Hierarchy of ‘types’ of learning that we can perform on the Mammography database

16
Level 1: Parameters Be/Mal Shape Size Given: Features (node labels, or fields in database), Data, Bayes net structure Learn: Probabilities. Note: probabilities needed are Pr(Be/Mal), Pr(Shape|Be/Mal), Pr (Size|Be/Mal)

17
Level 2: Structure Be/Mal Shape Size Given: Features, Data Learn: Bayes net structure and probabilities. Note: with this structure, now will need Pr(Size|Shape,Be/Mal) instead of Pr(Size|Be/Mal).

18
P1 1 5/02 Spic 0.03 RU4 B P1 2 5/04 Var 0.04 RU4 M P1 3 5/04 Spic 0.04 LL3 B … … … … … … … Patient Abnormality Date Mass Shape … Mass Size Loc Be/Mal Mammography Database

19
P1 1 5/02 Spic 0.03 RU4 B P1 2 5/04 Var 0.04 RU4 M P1 3 5/04 Spic 0.04 LL3 B … … … … … … … Patient Abnormality Date Mass Shape … Mass Size Loc Be/Mal Mammography Database

20
P1 1 5/02 Spic 0.03 RU4 B P1 2 5/04 Var 0.04 RU4 M P1 3 5/04 Spic 0.04 LL3 B … … … … … … … Patient Abnormality Date Mass Shape … Mass Size Loc Be/Mal Mammography Database

21
Level 3: Aggregates Given: Features, Data, Background knowledge – aggregation functions such as average, mode, max, etc. Learn: Useful aggregate features, Bayes net structure that uses these features, and probabilities. New features may use other rows/tables. Be/Mal Shape Size Avg size this date

22
P1 1 5/02 Spic 0.03 RU4 B P1 2 5/04 Var 0.04 RU4 M P1 3 5/04 Spic 0.04 LL3 B … … … … … … … Patient Abnormality Date Mass Shape … Mass Size Loc Be/Mal Mammography Database

23
P1 1 5/02 Spic 0.03 RU4 B P1 2 5/04 Var 0.04 RU4 M P1 3 5/04 Spic 0.04 LL3 B … … … … … … … Patient Abnormality Date Mass Shape … Mass Size Loc Be/Mal Mammography Database

24
P1 1 5/02 Spic 0.03 RU4 B P1 2 5/04 Var 0.04 RU4 M P1 3 5/04 Spic 0.04 LL3 B … … … … … … … Patient Abnormality Date Mass Shape … Mass Size Loc Be/Mal Mammography Database

25
Level 4: View Learning Given: Features, Data, Background knowledge – aggregation functions and intensionally-defined relations such as “increase” or “same location” Learn: Useful new features defined by views (equivalent to rules or SQL queries), Bayes net structure, and probabilities. Be/Mal Shape Size Avg size this date Shape change in abnormality at this location Increase in average size of abnormalities

26
Structure Learning Algorithms Three different algorithms Three different algorithms Naïve Bayes Naïve Bayes Tree Augmented Naïve Bayes (TAN) Tree Augmented Naïve Bayes (TAN) Sparse Candidate Algorithm Sparse Candidate Algorithm

27
Naïve Bayes Net Simple, computationally efficient Simple, computationally efficient Attr 2Attr 3Attr N-2Attr N-1Attr NAttr 1 Class Value …

28
Example TAN Net Also computationally efficient Also computationally efficient [Friedman,Geiger & Goldszmidt ‘97] Attr 2Attr N-2Attr N-1Attr NAttr 1Attr 3 Class Value …

29
TAN Arc from class variable to each attribute Arc from class variable to each attribute Less Restrictive than Naïve Bayes Less Restrictive than Naïve Bayes Each attribute permitted at most one extra parent Each attribute permitted at most one extra parent Polynomial time bound on constructing network Polynomial time bound on constructing network O((# attributes) 2 * |training set|) O((# attributes) 2 * |training set|) Guaranteed to maximize LL(B T | D) Guaranteed to maximize LL(B T | D)

30
TAN Algorithm Constructs a complete graph between all the attributes (excluding class variable) Constructs a complete graph between all the attributes (excluding class variable) Edge weight is conditional mutual information between the vertices Edge weight is conditional mutual information between the vertices Find maximum weight spanning tree over the graph Find maximum weight spanning tree over the graph Pick root in tree and make edges directed Pick root in tree and make edges directed Add edges from directed tree to network Add edges from directed tree to network

31
General Bayes Net Attr N-3 Class Value Attr N Attr 2 Attr N-1 Attr 1 Attr N-2 Attr 3

32
Sparse Candidate Friedman et al ‘97 Friedman et al ‘97 No restrictions on directionality of arcs for class attribute No restrictions on directionality of arcs for class attribute Limits possible parents for each node to a small “candidate” set Limits possible parents for each node to a small “candidate” set

33
Sparse Candidate Algorithm Greedy hill climbing search with restarts Greedy hill climbing search with restarts Initial structure is empty graph Initial structure is empty graph Score graph using BDe metric (Cooper & Herskovits ’92, Heckerman ’96) Score graph using BDe metric (Cooper & Herskovits ’92, Heckerman ’96) Selects candidate set using an information metric Selects candidate set using an information metric Re-estimate candidate set after each restart Re-estimate candidate set after each restart

34
Sparse Candidate Algorithm We looked at several initial structures We looked at several initial structures Expert structure Expert structure Naïve Bayes Naïve Bayes TAN TAN Scored network on tune set accuracy Scored network on tune set accuracy

35
Our Initial Approach for Level 4 Use ILP to learn rules predictive of “malignant” Use ILP to learn rules predictive of “malignant” Treat the rules as intensional definitions of new fields Treat the rules as intensional definitions of new fields The new view consists of the original table extended with the new fields The new view consists of the original table extended with the new fields

36
Using Views malignant(A) :- massesStability(A,increasing), massesStability(A,increasing), prior_mammogram(A,B,_), prior_mammogram(A,B,_), H0_BreastCA(B,hxDCorLC). H0_BreastCA(B,hxDCorLC).

37
Sample Rule malignant(A) :- BIRADS_category(A,b5), BIRADS_category(A,b5), MassPAO(A,present), MassPAO(A,present), MassesDensity'(A,high), MassesDensity'(A,high), HO_BreastCA(A,hxDCorLC), HO_BreastCA(A,hxDCorLC), in_same_mammogram(A,B), in_same_mammogram(A,B), Calc_Pleomorphic(B,notPresent), Calc_Pleomorphic(B,notPresent), Calc_Punctate(B,notPresent). Calc_Punctate(B,notPresent).

38
Methodology 10 fold cross validation 10 fold cross validation Split at the patient level Split at the patient level Roughly 40 malignant cases and 6000 benign cases in each fold Roughly 40 malignant cases and 6000 benign cases in each fold

39
Methodology Without the ILP rules Without the ILP rules 6 folds for training set 6 folds for training set 3 folds for tuning set 3 folds for tuning set With ILP With ILP 4 folds to learn ILP rules 4 folds to learn ILP rules 3 folds for training set 3 folds for training set 2 folds for tuning set 2 folds for tuning set TAN/Naïve Bayes don’t require tune set TAN/Naïve Bayes don’t require tune set

40
Evaluation Precision and recall curves Precision and recall curves Why not ROC curves? Why not ROC curves? With many negatives ROC curves look overly optimistic With many negatives ROC curves look overly optimistic Large change in number of false positives yields small change in ROC curve Large change in number of false positives yields small change in ROC curve Pooled results over all 10 folds Pooled results over all 10 folds

41
ROC: Level 2 (TAN) vs. Level 1

42
Precision-Recall Curves

43

44

45
Related Work: ILP for Feature Construction Pompe & Kononenko, ILP’95 Pompe & Kononenko, ILP’95 Srinivasan & King, ILP’97 Srinivasan & King, ILP’97 Perlich & Provost, KDD’03 Perlich & Provost, KDD’03 Neville, Jensen, Friedland and Hay, KDD’03 Neville, Jensen, Friedland and Hay, KDD’03

46
Ways to Improve Performance Learn rules to predict “benign” as well as “malignant.” Learn rules to predict “benign” as well as “malignant.” Use Gleaner (Goadrich, Oliphant & Shavlik, ILP’04) to get better spread of Precision vs. Recall in the learned rules. Use Gleaner (Goadrich, Oliphant & Shavlik, ILP’04) to get better spread of Precision vs. Recall in the learned rules. Incorporate aggregation into the ILP runs themselves. Incorporate aggregation into the ILP runs themselves.

47
Richer View Learning Approaches Learn rules predictive of other fields. Learn rules predictive of other fields. Use WARMR or other first-order clustering approaches. Use WARMR or other first-order clustering approaches. Integrate Structure Learning and View Learning…score a rule by how much it helps the current model when added Integrate Structure Learning and View Learning…score a rule by how much it helps the current model when added

48
Level 4: View Learning Given: Features, Data, Background knowledge – aggregation functions and intensionally-defined relations such as “increase” or “same location” Learn: Useful new features defined by views (equivalent to rules or SQL queries), Bayes net structure, and probabilities. Be/Mal Shape Size Avg size this date Shape change in abnormality at this location Increase in average size of abnormalities

49
Integrated View/Structure Learning Be/Mal Shape Size Avg size this date sc(X):- id(X,P), id(Y,P), loc(X,L), loc(Y,L), date(Y,D1), date(X,D2), before(D1,D2), shape(X,Sh1), shape(Y,Sh2), Sh1 \= Sh2. Increase in average size of abnormalities

50
Integrated View/Structure Learning Be/Mal Shape Size Avg size this date sc(X):- id(X,P), id(Y,P), loc(X,L), loc(Y,L), date(Y,D1), date(X,D2), before(D1,D2), shape(X,Sh1), shape(Y,Sh2), Sh1 \= Sh2, size(X,S1), size(Y,S2), S1 > S2. Increase in average size of abnormalities

51
Integrated View/Structure Learning Be/Mal Shape Size Avg size this date sc(X):- id(X,P), id(Y,P), loc(X,L), loc(Y,L), date(Y,D1), date(X,D2), before(D1,D2), shape(X,Sh1), shape(Y,Sh2), Sh1 \= Sh2, size(X,S1), size(Y,S2), S1 > S2. Increase in average size of abnormalities

52
Integrated View/Structure Learning Be/Mal Shape Size Avg size this date sc(X):- id(X,P), id(Y,P), loc(X,L), loc(Y,L), date(Y,D1), date(X,D2), before(D1,D2), shape(X,Sh1), shape(Y,Sh2), Sh1 \= Sh2, size(X,S1), size(Y,S2), S1 > S2.

53
Richer View Learning (Cont.) Learning new tables Learning new tables Just rules for non-unary predicates Just rules for non-unary predicates Train on pairs of malignancies for the same mammogram or patient Train on pairs of malignancies for the same mammogram or patient Train on pairs (triples, etc.) of fields, where pairs of values that appear in rows for malignant abnormalities are positive examples, while those that appear only in rows for benign are negative examples Train on pairs (triples, etc.) of fields, where pairs of values that appear in rows for malignant abnormalities are positive examples, while those that appear only in rows for benign are negative examples

54
Conclusions Graphical models over databases were originally limited to the schema provided Graphical models over databases were originally limited to the schema provided Humans find it useful to define new views of a database (new fields or tables intensionally defined from existing data) Humans find it useful to define new views of a database (new fields or tables intensionally defined from existing data) View learning appears to have promise for increasing the capabilities of graphical models over relational databases, perhaps other SRL approaches View learning appears to have promise for increasing the capabilities of graphical models over relational databases, perhaps other SRL approaches

55
WILD Group Jesse Davis Jesse Davis Beth Burnside Beth Burnside Ines Dutra Ines Dutra Vitor Santos Costa Vitor Santos Costa Raghu Ramakrishnan Raghu Ramakrishnan Jude Shavlik Jude Shavlik David Page David Page Others: Others: Hector Corrada-Bravo Irene Ong Mark Goadrich Louis Oliphant Bee-Chung Chen

Similar presentations

© 2017 SlidePlayer.com Inc.

All rights reserved.

Ads by Google