Presentation is loading. Please wait.

Presentation is loading. Please wait.

THE LEARNING AND USE OF GRAPHICAL MODELS FOR IMAGE INTERPRETATION Thesis for the degree of Master of Science By Leonid Karlinsky Under the supervision.

Similar presentations


Presentation on theme: "THE LEARNING AND USE OF GRAPHICAL MODELS FOR IMAGE INTERPRETATION Thesis for the degree of Master of Science By Leonid Karlinsky Under the supervision."— Presentation transcript:

1 THE LEARNING AND USE OF GRAPHICAL MODELS FOR IMAGE INTERPRETATION Thesis for the degree of Master of Science By Leonid Karlinsky Under the supervision of Professor Shimon Ullman

2 Introduction

3

4 Part I: Part I: MaxMI Training

5 Classification subset of “ trained ” features Goal: Classify C, using a subset of “ trained ” features - F on new examples with minimum error Training tasks: Best F Best Efficient model More … Best = Maximal MI

6 MaxMI Training - The Past Model: simple “ Flat ” structure, NCC thresholds Training:  Features and thresholds selected one by one 123 456  Cond. independence in C increased MI upper bound More …

7 MaxMI Training – Our Approach modelalltogether Learn model and all together maximizing:

8 MaxMI Training – Learning MaxMI: Decompose MI EfficientlyGDL Efficiently learn parameters using GDL Maximize all for all together More …

9 MaxMI Training – Assumptions 1.TAN model structure 1.TAN model structure – Tree Augmented Na ï ve Bayes [Friedman, 97] 2.Feature Tree (FT) 2.Feature Tree (FT) – can remove C preserving the feature tree.

10 MaxMI Training – TAN and 1.TAN structure is unknown 2.Learn and TAN s.t.:  is maximized.  Asymptotic correctness  FT holds  Efficiency

11 MaxMI Training – MaxMI hybrid

12 More … [Chow & Liu, 68] MaxMI: [Friedman, 97]

13 MaxMI Training – MaxMI hybrid  Convergent algorithm: TAN More …

14 MaxMI Training – empirical results More …

15 MaxMI Training – empirical results More …

16 MaxMI Training – Generalizations  Train any parameters  Any low-TREEWIDTH structure  Even without assumptions:

17 Back to the Goals

18 Part II: Part II: Loopy MAP approximation

19 Loopy network example  Want to solve MAP:  NP-hard in general! [Cooper 90, Shimony 94]

20 Our approach – opening loops maximize  Now, we can maximize: legal  The assignment is legal for the loopy problem if:

21 Our approach – opening loops  Legally  Legally maximize:  Can maximize unrestricted:  Usually slow connections  Our solution – slow connections

22 Our approach – slow connections  Fix z=Z legalize  Now legalize and return to step one. Maximize-and-Legalize  Iterate until convergence. This is the Maximize-and-Legalize algorithm. Maximize  Maximize (loop-free, use GDL):

23 Our approach – slow connections When will this work? The intuition:  The intuition: z-minor Strong z-minor  Strong z-minor Weak z-minor  Weak z-minor global maximumsingle step global maximum – single step local optimumseveral steps local optimum – several steps

24 Making the assumptions true Selecting z-variables The intuition:  The intuition: recursive z-selection strong z-minor  Recursive strong z-minor: single step, global maximum!  Recursive weak z-minor: iterations, local maximum.  Different / Same speed  Remove – Contract – Split algorithm More …

25 Making the assumptions true Approximating the function The intuition:  The intuition: recursively “ chip away ” small parts of the function More …

26 Existing approximation algorithms Clustering  Clustering: triangulation [Pearl, 88] Loopy Belief Revision [McEliece, 98]  Loopy Belief Revision [McEliece, 98] Bethe-Kikuchi Free-Energy  Bethe-Kikuchi Free-Energy: CCCP [Yuille, 02] Tree Re-Parametrization (TRP) [Wainwright, 03]  Tree Re-Parametrization (TRP) [Wainwright, 03]

27 Experimental Results More …

28 Experimental Results More …

29 More… Maximum MI vs. Minimum P E More …

30

31

32 Classification Specifics How do we classify a new example? What are “the best” features and parameters? Why maximize MI? MAP: Maximize MI:  More reasons – if time permits  Tightly related to P E Back …

33 MaxMI Training - The Past - Reasons Why did it work?  Conditional independence in C What was missing?  Increased MI upper bound Conditional independence in C was assumed!  Conditional independence in C was assumed! 123 456 Maximizing the “ whole ” MI.  Maximizing the “ whole ” MI. Learning model structure.  Learning model structure. Back …

34 MaxMI Training – JT  JTTAN  JT structure = TAN structure TREEWIDTH  GDL - exponential in TREEWIDTH   TREEWIDTH = 2 Back …

35 MaxMI Training – EM Why not EM? static training data  EM assumes static training data!  Not true in our scenario!  [Redner, Walker, 84] EM algorithm:  Training CPTs with EM Back …

36 MaxMI Training – MaxMI hybrid solution  [Chow, Liu 68] “ Best ” Feature Tree  [Friedman, et al. 97] “ Best ” TAN Back …  [We, 2004] Maximal MI

37 MaxMI Training – MaxMI hybrid solution     Increase: ?  ICR  Non-decrease:  TAN Asymptotic correctness  Back …

38 MaxMI Training – MaxMI hybrid Back …

39 MaxMI Training – empirical results Before training: After training: Back …

40 MaxMI Training – empirical results Back …

41 MaxMI Training – empirical results Error rate on training DB Error rate on test DB MI model to class on training DB Class entropy on training DB Training DB Size Test DB SizeFace Parts Model 251350.7582424640.7926908347672257MaxMI Training 351360.7224293520.7926908347672257Original Training Miss=15, FA=3 Miss=62, FA=360.7568551680.7926908347672257 MaxMI Training with constrained TAN restructure Miss=16, FA=3 Miss=30, FA=440.7465169130.7926908347672257 MaxMI Training with greedy TAN restructure N / A Miss=33, FA=1090.747114840.7926908347672257 Alternative MaxMI Training with TAN restructure Miss=30, FA=5 Miss=84, FA=460.7386769810.7926908347672257 Threshold only training (without restructure) N / A67N / A0.7926908347672257 Observed & Un-observed model training constructed from the all- observed model and soft EM Back …

42 MaxMI Training – empirical results Error rate on training DB Error rate on test DB MI model to class on training DB Class entropy on training DB Training DB Size Test DB SizeCow Parts Model Miss=36, FA=16 Miss=84, FA=64N / A0.465356639612256Original Training Miss=25, FA=17 Miss=53, FA=42N / A0.465356639612256MaxMI Training Miss=17, FA=12 Miss=32, FA=48N / A0.465356639612256 MaxMI Training with constrained TAN restructure Miss=23, FA=16 Miss=59, FA=30N / A0.465356639612256 MaxMI Training with greedy TAN restructure N /A89N / A0.465356639612256 Observed & Un-observed model training constructed from the all- observed model and trained using soft EM Back …

43 Remove – Contract – Split Back …

44 Making the assumptions true Approximating the function Strong z-minor  Strong z-minor  Challenge:  Challenge: selecting proper Z constants  Benefit:  Benefit: single step convergence Weak z-minor  Weak z-minor  Drawback:  Drawback: exponential in number of “ chips ”  Benefit:  Benefit: less restrictive Back …

45 The clique tree Back …

46 Experimental Results A2 (same "slow" speed) Sample CountValue CountNode CountModel Size Average Match (%) Average Mismatch Average Approximation 50.31%15-1694.11%1000431 Depth=3, Branching=5 63.70%11-1294.55%1000331 Depth=3, Branching=5 84.60%4-597.16%1000231 Depth=3, Branching=5 93.62%1-298.34%~2000225 Based on Natural feature trees, 4 cliques of size 7 A2 (different "slow" speed) Sample CountValue CountNode CountModel Size Average Match (%) Average Mismatch Average Approximation 65.22%10-1198.26%1000431 Depth=3, Branching=5 74.51%7-898.08%1000331 Depth=3, Branching=5 88.62%3-498.55%1000231 Depth=3, Branching=5 86.14%3-497.85%~2000225 Based on Natural feature trees, 4 cliques of size 7

47 Experimental Results Random Slow Connections Sample CountValue CountNode CountModel Size Average Match (%) Average Mismatch Average Approximation 34.58%20-2182.70%1000431 Depth=3, Branching=5 45.48%16-1781.52%1000331 Depth=3, Branching=5 62.23%11-1279.37%1000231 Depth=3, Branching=5 N/A ~2000225 Based on Natural feature trees, 4 cliques of size 7 Loopy Belief Revision (50 messages per node) Sample CountValue CountNode CountModel Size Average Match (%) Average Mismatch Average Approximation N/A 1000431 Depth=3, Branching=5 55.31%13-1489.17%1000331 Depth=3, Branching=5 72.80%8-988.73%1000231 Depth=3, Branching=5 87.73%3-493.34%~2000225 Based on Natural feature trees, 4 cliques of size 7

48 Experimental Results Loopy Belief Revision (10 messages per node) Sample CountValue CountNode CountModel Size Average Match (%) Average Mismatch Average Approximation 41.95%17-1887.65% 1000431 Depth=3, Branching=5 54.02%14-1586.74%1000331 Depth=3, Branching=5 71.80%8-985.78%1000231 Depth=3, Branching=5 N/A ~2000225 Based on Natural feature trees, 4 cliques of size 7 Ignore Sibling Loopy Links Sample CountValue CountNode CountModel Size Average Match (%) Average Mismatch Average Approximation 29.25%21-2274.04%1000431 Depth=3, Branching=5 38.56%19-2071.89%1000331 Depth=3, Branching=5 56.09%13-1469.38%1000231 Depth=3, Branching=5 63.88%9-1073.45%~2000225 Based on Natural feature trees, 4 cliques of size 7 Back …

49

50 MaxMI Training – extensions  Observed and unobserved model.  MaxMI augmented to support O&U  Training observed only + EM heuristic.  Complete training  Constrained and greedy TAN restructure.  MaxMI vs. MinP E in ideal scenario – characterization and comparison.  Future research directions

51 MaxMI vs. MinP E  MinP E :  MaxMI:  Fano & inverse Fano (binary C): Back …

52 MaxMI vs. MinP E – ideal scenario  MinP E :  MaxMI:  Setting: n-valued C, k-valued F.  Arrange:  Select F:  Divide:  Select F: Back …

53 MaxMI vs. MinP E – ideal scenario  In general MaxMI MinP E  In special cases MaxMI MinP E  With increase in number of guesses: Implications: Back …


Download ppt "THE LEARNING AND USE OF GRAPHICAL MODELS FOR IMAGE INTERPRETATION Thesis for the degree of Master of Science By Leonid Karlinsky Under the supervision."

Similar presentations


Ads by Google