Online Max-Margin Weight Learning with Markov Logic Networks Tuyen N. Huynh and Raymond J. Mooney Machine Learning Group Department of Computer Science.

Slides:

Advertisements

Similar presentations

TWO STEP EQUATIONS 1. SOLVE FOR X 2. DO THE ADDITION STEP FIRST

Advertisements

Max-Margin Weight Learning for Markov Logic Networks

Advanced Piloting Cruise Plot.

Kapitel S3 Astronomie Autor: Bennett et al. Raumzeit und Gravitation Kapitel S3 Raumzeit und Gravitation © Pearson Studium 2010 Folie: 1.

© 2008 Pearson Addison Wesley. All rights reserved Chapter Seven Costs.

Chapter 1 The Study of Body Function Image PowerPoint

Joint Inference in Information Extraction Hoifung Poon Dept. Computer Science & Eng. University of Washington (Joint work with Pedro Domingos)

1 Unsupervised Ontology Induction From Text Hoifung Poon Dept. Computer Science & Eng. University of Washington (Joint work with Pedro Domingos)

Cognitive Radio Communications and Networks: Principles and Practice By A. M. Wyglinski, M. Nekovee, Y. T. Hou (Elsevier, December 2009) 1 Chapter 12 Cross-Layer.

1 Copyright © 2010, Elsevier Inc. All rights Reserved Fig 2.1 Chapter 2.

UNITED NATIONS Shipment Details Report – January 2006.

Summary of Convergence Tests for Series and Solved Problems

Jeopardy Q 1 Q 6 Q 11 Q 16 Q 21 Q 2 Q 7 Q 12 Q 17 Q 22 Q 3 Q 8 Q 13

Jeopardy Q 1 Q 6 Q 11 Q 16 Q 21 Q 2 Q 7 Q 12 Q 17 Q 22 Q 3 Q 8 Q 13

Title Subtitle.

DIVIDING INTEGERS 1. IF THE SIGNS ARE THE SAME THE ANSWER IS POSITIVE 2. IF THE SIGNS ARE DIFFERENT THE ANSWER IS NEGATIVE.

FACTORING ax2 + bx + c Think “unfoil” Work down, Show all steps.

Year 6 mental test 10 second questions

Mean-Field Theory and Its Applications In Computer Vision1 1.

2010 fotografiert von Jürgen Roßberg © Fr 1 Sa 2 So 3 Mo 4 Di 5 Mi 6 Do 7 Fr 8 Sa 9 So 10 Mo 11 Di 12 Mi 13 Do 14 Fr 15 Sa 16 So 17 Mo 18 Di 19.

Vote Elicitation with Probabilistic Preference Models: Empirical Estimation and Cost Tradeoffs Tyler Lu and Craig Boutilier University of Toronto.

1 Column Generation. 2 Outline trim loss problem different formulations column generation the trim loss problem master problem and subproblem in column.

ABC Technology Project

EU Market Situation for Eggs and Poultry Management Committee 21 June 2012.

3 Logic The Study of What’s True or False or Somewhere in Between.

1 Undirected Breadth First Search F A BCG DE H 2 F A BCG DE H Queue: A get Undiscovered Fringe Finished Active 0 distance from A visit(A)

Green Eggs and Ham.

Text Categorization.

1 Breadth First Search s s Undiscovered Discovered Finished Queue: s Top of queue 2 1 Shortest path from s.

Introduction to Machine Learning Fall 2013 Perceptron (6) Prof. Koby Crammer Department of Electrical Engineering Technion 1.

1 CS 391L: Machine Learning: Rule Learning Raymond J. Mooney University of Texas at Austin.

BIOLOGY AUGUST 2013 OPENING ASSIGNMENTS. AUGUST 7, 2013  Question goes here!

Factor P 16 8(8-5ab) 4(d² + 4) 3rs(2r – s) 15cd(1 + 2cd) 8(4a² + 3b²)

© 2012 National Heart Foundation of Australia. Slide 2.

Science as a Process Chapter 1 Section 2.

Understanding Generalist Practice, 5e, Kirst-Ashman/Hull

GG Consulting, LLC I-SUITE. Source: TEA SHARS Frequently asked questions 2.

Addition 1’s to 20.

25 seconds left…...

Januar MDMDFSSMDMDFSSS

Analyzing Genes and Genomes

We will resume in: 25 Minutes.

©Brooks/Cole, 2001 Chapter 12 Derived Types-- Enumerated, Structure and Union.

Intracellular Compartments and Transport

PSSA Preparation.

Immunobiology: The Immune System in Health & Disease Sixth Edition

Essential Cell Biology

Immunobiology: The Immune System in Health & Disease Sixth Edition

CpSc 3220 Designing a Database

1 Functions and Applications

Self-training with Products of Latent Variable Grammars Zhongqiang Huang, Mary Harper, and Slav Petrov.

CO-AUTHOR RELATIONSHIP PREDICTION IN HETEROGENEOUS BIBLIOGRAPHIC NETWORKS Yizhou Sun, Rick Barber, Manish Gupta, Charu C. Aggarwal, Jiawei Han 1.

Profile. 1.Open an Internet web browser and type into the web browser address bar. 2.You will see a web page similar to the one on.

Minimum Vertex Cover in Rectangle Graphs

Discriminative Training of Markov Logic Networks

Improving the Accuracy and Scalability of Discriminative Learning Methods for Markov Logic Networks Tuyen N. Huynh Adviser: Prof. Raymond J. Mooney PhD.

University of Texas at Austin Machine Learning Group Department of Computer Sciences University of Texas at Austin Discriminative Structure and Parameter.

Online Max-Margin Weight Learning for Markov Logic Networks Tuyen N. Huynh and Raymond J. Mooney Machine Learning Group Department of Computer Science.

Online Structure Learning for Markov Logic Networks Tuyen N. Huynh and Raymond J. Mooney Department of Computer Science The University of Texas at Austin.

Efficient Weight Learning for Markov Logic Networks Daniel Lowd University of Washington (Joint work with Pedro Domingos)

Web Query Disambiguation from Short Sessions Lilyana Mihalkova* and Raymond Mooney University of Texas at Austin *Now at University of Maryland College.

Presentation transcript:

Online Max-Margin Weight Learning with Markov Logic Networks Tuyen N. Huynh and Raymond J. Mooney Machine Learning Group Department of Computer Science The University of Texas at Austin Star AI 2010, July 12, 2010

Outline 2 Motivation Background Markov Logic Networks Primal-dual framework New online learning algorithm for structured prediction Experiments Citation segmentation Search query disambiguation Conclusion

Motivation Most of the existing weight learning for MLNs are in the batch setting. Need to run inference over all the training examples in each iteration Usually take a few hundred iterations to converge Cannot fit all the training examples in the memory Conventional solution: online learning 3

Background 4

An MLN is a weighted set of first-order formulas Larger weight indicates stronger belief that the clause should hold Probability of a possible world (a truth assignment to all ground atoms) x: Markov Logic Networks (MLNs) Weight of formula iNo. of true groundings of formula i in x [Richardson & Domingos, 2006] 2.5 Center(i,c) => InField(Ftitle,i,c) 1.2 InField(f,i,c) ^ Next(j,i) ^ ¬HasPunc(c,i)=> InField(f,j,c) 5

Existing discriminative weight learning methods for MLNs maximize the Conditional Log Likelihood (CLL) [Singla & Domingos, 2005], [Lowd & Domingos, 2007], [Huynh & Mooney, 2008] maximize the margin, the log ratio between the probability of the correct label and the closest incorrect one [Huynh & Mooney, 2009] 6

Online learning 7

A general and latest framework for deriving low- regret online algorithms Rewriting the regret bound as an optimization problem (called the primal problem), then considering the dual problem of the primal one A condition that guarantees the increase in the dual objective in each step Incremental-Dual-Ascent (IDA) algorithms. For example: subgradient methods Primal-dual framework [Shalev-Shwartz et al., 2006] 8

Primal-dual framework (cont.) 9 Proposed a new class of IDA algorithms called Coordinate-Dual-Ascent (CDA) algorithm: The CDA update rule only optimizes the dual w.r.t the last dual variable A closed-form solution of CDA update rule CDA algorithms have the same cost as subgradient methods but increase the dual objective more in each step converging to the optimal value faster

Primal-dual framework (cont.) 10

CDA algorithms for max-margin structured prediction 11

Max-margin structured prediction 12

Steps for deriving new CDA algorithms Define the regularization and loss functions 2. Find the conjugate functions 3. Derive a closed-form solution for the CDA update rule

1. Define the regularization and loss functions 14 Label loss function

1. Define the regularization and loss functions (cont.) 15

2. Find the conjugate functions 16

2. Find the conjugate functions (cont.) 17

18 Optimization problem: Solution: 3. Closed-form solution for the CDA update rule

CDA algorithms for max-margin structured prediction 19

Experiments 20

Citation segmentation 21 Citeseer dataset [Lawrence et.al., 1999] [ Poon and Domingos, 2007 ] 1,563 citations, divided into 4 research topics Each citation is segmented into 3 fields: Author, Title, Venue Used the simplest MLN in [ Poon and Domingos, 2007] Similar to a linear chain CRF: Next(j,i) ^ !HasPunc(c,i) ^ InField(c,+f,i) => InField(c,+f,j)

Experimental setup Systems compared: MM: the max-margin weight learner for MLNs in batch setting [Huynh & Mooney, 2009] 1-best MIRA [Crammer et al., 2005] Subgradient [Ratliff et al., 2007] CDA1/PA1 CDA2 22

Experimental setup (cont.) 4-fold cross-validation Metric: CiteSeer: micro-average F 1 at the token level Used exact MPE inference (Integer Linear Programming) for all online algorithms and approximate MPE inference (LP-relaxation) for the batch one. Used Hamming loss as the label loss function 23

Average F1 24

Average training time in minutes 25

Microsoft web search query dataset 26 Used the clean-up dataset created by Mihalkova & Mooney [2009] Has thousands of search sessions where an ambiguous queries was asked Goal: disambiguate search query based on previous related search sessions Used 3 MLNs proposed in [Mihalkova & Mooney, 2009]

Experimental setup Systems compared: Contrastive Divergence (CD) [Hinton 2002]: used in [Mihalkova & Mooney, 2009] 1-best MIRA Subgradient CDA1/PA1 CDA2 Metric: Mean Average Precision (MAP): how close the relevant results are to the top of the rankings 27

MAP scores 28

Conclusion 29 Derived CDA algorithms for max-margin structured prediction Have same computational cost as existing online algorithms but increase the dual objective more Experimental results on two real-world problems show that the new algorithms generally achieve better accuracy and also have more consistent performance.

Thank you! 30 Questions?