Download presentation

Presentation is loading. Please wait.

Published byJagger Smee Modified over 2 years ago

1
Learning structured ouputs A reinforcement learning approach P. Gallinari F. Maes, L. Denoyer www-connex.lip6.fr University Pierre et Marie Curie – Paris – Fr Computer Science Lab.

2
ANNPR P. Gallinari2 Outline Motivation and examples Approaches for structured learning Generative models Discriminant models Reinforcement learning for structured prediction Experiments

3
ANNPR P. Gallinari3 Machine learning and structured data Different types of problems Model, classify, cluster structured data Predict structured outputs Learn to associate structured representations Structured data and applications in many domains chemistry, biology, natural language, web, social networks, data bases, etc

4
ANNPR P. Gallinari4 Sequence labeling: POS ThisWorkshopbringstogetherscientistsandengineers DTNNVBZRBNNSCCNNS interestedinrecentdevelopmentsinexploitingMassive VBNINJJNNSINVBGJJ datasets NP determiner nounVerb 3rd persadverbNoun pluralCoord. Conj. adjectiveVerb gerundVerb plural

5
ANNPR P. Gallinari5 Segmentation + labeling: syntactic chunking (Washington Univ. tagger) ThisWorkshopbringstogetherscientistsandengineers NPVPADVPNP interestedinrecentdevelopmentsin VPINNPPNP exploitingMassivedatasets NP Noun PhraseVerb PhraseNoun Phraseadverbial Phrase Noun Phrase

6
ANNPR P. Gallinari6 Syntaxic Parsing (Stanford Parser)

7
ANNPR P. Gallinari7 Tree mapping: Problem query heterogeneous XML collections Web information extraction Need to know the correspondence between the structured representations usually made by hand Learn the correspondence between the different sources Labeled tree mapping problem La cantine 65 rue des pyrénées, Paris, 19 ème, FRANCE Canard à lorange, Lapin au miel La cantine Paris pyrénées 65 Canard à lorange Lapin au miel

8
ANNPR P. Gallinari8 Others Taxonomies Social networks Adversial computing: Webspam, Blogspam, … Translation Biology …..

9
ANNPR P. Gallinari9 Is structure really useful ? Can we make use of structure ? Yes Evidence from many domains or applications Mandatory for many problems e.g. 10 K classes classification problem Yes but Complex or long term dependencies often correspond to rare events Practical evidence for large size problems Simple models sometimes offer competitive results Information retrieval Speech recognition, etc

10
ANNPR P. Gallinari10 Why is structure prediction difficult Size of the output space # of potential labeling for a sequence # of potential joint labeling and segmentation # of potential labeled trees for an input sequence … Exponential output space Inference often amounts at a combinatorial search problem Scaling issues in most models

11
ANNPR P. Gallinari11 Structured learning X, Y : input and output spaces Structured output y Y decomposes into parts of variable size y = (y 1, y 2,…, y T ) Dependencies Relations between y parts Local, long term, global Cost function O/ 1 loss: Hamming loss: F-score BLEU etc

12
ANNPR P. Gallinari12 General approach Predictive approach: where F : X x Y R is a score function used to rank potential outputs F trained to optimize some loss function Inference problem |Y| sometimes exponential Argmax is often intractable: hypothesis decomposability of the score function over the parts of y Restricted set of outputs

13
ANNPR P. Gallinari13 Structured algorithms differ by: Feature encoding Hypothesis on the output structure Hypothesis on the cost function Type of structure prediction problem

14
Three main families of SP models Generative models Discriminant models Search models

15
ANNPR P. Gallinari15 Generative models 30 years of generative models Hidden Markov Models Many extensions Hierarchical HMMs, Factorial HMMs, etc Probabilistic Context Free grammars Tree models, Graphs models

16
ANNPR P. Gallinari16 Usual hypothesis Features : natural encoding of the input Local dependency hypothesis On the outputs (Markov) On the inputs Cost function Usually joint likelihood Decomposes, e.g. sum of local cost on each subpart Inference and learning usually use dynamic programming HMMs Decoding complexity O(n|Q| 2 ) for a sequence of length n and |Q| states PCFGs Decoding Complexity: O(m 3 n 3 ), n = length of the sentence, m = # non terminals in the grammar Prohibitive for many problems

17
ANNPR P. Gallinari17 Summary generative SP models Well known approaches Large number of existing models Local hypothesis Often imply using dynamic programming Inference learning

18
ANNPR P. Gallinari18 Discriminant models Structured Perceptron (Collins 2002) Breakthrough Large margin methods (Tsochantaridis et al. 2004, Taskar 2004) Many others now Considered as an extension of multi-class clasification

19
ANNPR P. Gallinari19 Usual hypothesis Joint representation of input – output Φ(x, y) Encode potential dependencies among and between input and output e.g. histogram of state transitions observed in training set, frequency of (x i,y j ), POS tags, etc Large feature sets (10 2 -> 10 4 ) Linear score function: Decomposability of features set (outputs) and of the loss function

20
ANNPR P. Gallinari20 Extension of large margin methods 2 problems Classical 0/1 loss is meaningless with structured outputs Structured outputs is not a simple extension of multiclass classification Generalize max margin principle to other loss functions than 0/1 loss Number of constraints for the QP problem is proportional to |Y|, i.e. potentially exponential Different solutions Consider only a polynomial number of active constraints - > bounds on the optimality of the solution Learning require solving an Argmax problem at each iteration i.e. Dynamic programming

21
ANNPR P. Gallinari21 Summary: discriminant approaches Hypothesis Local dependencies for the output Decomposability of the loss function Long term dependencies in the input Nice convergence properties + bounds Complexity Learning often does not scale

22
Reinforcement learning for structured outputs

23
ANNPR P. Gallinari23 Precursors Incremental Parsing, Collins 2004 SEARN, Daume et al Reinforcement Learning, Maes et al. 2007

24
ANNPR P. Gallinari24 General idea The structured output will be built incrementally ŷ = (ŷ 1, ŷ 2,…, ŷ T ) Different from solving directly the prediction preblem It will by build via Machine learning Loss : Goal Construct incrementally ŷ so as to minimize the loss Training: learn how to search the solution space Inference: build ŷ as a sequence of successive decisions

25
ANNPR P. Gallinari25 Example : sequence labelling Example 2 labels R et B Search space : (input sequence, {sequence of labels}) For a size 3 sequence x = x 1 x 2 x 3 : A node represents a state in the search space

26
ANNPR P. Gallinari26 Example:actions and decisions Actions Π(s, a) Sequence of size 3 target : Π(s, a) : decision function

27
ANNPR P. Gallinari27 Example : expected loss C=1 C=0 C T =1 C T =2 C T =3 C T =0 C T =1 C T =2 C=0 Sequence of size 3 target : Loss does not always separate !!

28
ANNPR P. Gallinari28 Inference Suppose we have a policy function F which decides at each step which action to take Inference could be performed by computing ŷ 1 = F(x,. ), ŷ t = F(ŷ 1,… ŷ t-1 ),…, ŷ T = F(ŷ 1,…, ŷ T-1 ) ŷ = F(ŷ 1,…, ŷ T ) No Dynamic programming needed

29
ANNPR P. Gallinari29 Example: state space exploration guided by local costs C=1 C=0 C T =1 C T =2 C T =3 C T =0 C T =1 C T =2 C=0 Sequence of size 3 target : Goal: generalize to unseen situation

30
ANNPR P. Gallinari30 Training Learn a policy F From a set of (input, output) pairs Which takes the optimal action at each state Which is able to generalize to unknown situations

31
ANNPR P. Gallinari31 Reinforcement learning for SP Formalizes search based ideas using Markov Decision Processes as model and Reinforcement Learning for training Provides a general framework for this approach Many RL algorithm could be used for training Differences with classical use of RL Specific hypothesis Deterministic environment Problem size State : up to 10 6 variables Generalization properties

32
ANNPR P. Gallinari32 Markov Decision Process A MDP is a tuple (S, A, P, R) S is the State space, A is the action space, P is a transition function describing the dynamic of the environment P(s, a, s) = P(s t+1 = s| s t = s, a t = a) R is a reward function R(s, a, s) = E[r t |s t +1 = s, s t = s, a t = a)

33
ANNPR P. Gallinari33 Example: Prediction problem

34
ANNPR P. Gallinari34 Example: State space Deterministic MDP States: input + partial output Actions: elementary modifications of the partial output Rewards: quality of prediction

35
ANNPR P. Gallinari35 Example: partially observable reward The reward function is only partially observable : limited set of training (input, output)

36
ANNPR P. Gallinari36 Learning the MPDP The reward function cannot be computed on the whole MDP Approximated RL Learn the policy on a subspace Generalize to the whole space The policy is learned as a linear function Φ(s, a) is a joint description of (s, a) θ is a,parameter vector Learning algorithm Sarsa, Q-learning, etc

37
ANNPR P. Gallinari37 Inference State at step t Initial input Partial output Once the MDP is learned Using the state at step t, compute state t+1 Until a solution is reached Greedy approach Only the best action is chosen at each time

38
ANNPR P. Gallinari38 Structured outputs and MDP State s t Input x + partial output ŷ t Initial state : (x,) Actions Task dependent POS: new tag for the current word Tree mapping: insert a new path in a partial tree Inference Greedy approach Only the best action is chosen at each time Reward Final: Heuristic:

39
ANNPR P. Gallinari39 Exemple: sequence labeling Left Right model Actions: label Order free model Actions: label + position Loss : Hamming cost or F score Tasks Named entity recognition (Shared task at CoNNL train, 1500 test) Chunking – Noun Phrases (CoNNL 2002) Handwriten word recognition (5000 train, 1000 test) Complexity of inference O(sequence size * number of labels)

40
ANNPR P. Gallinari40

41
ANNPR P. Gallinari41 Tree mapping: Document mapping Problem Labeled tree mapping problem Different instances Flat text to XML HTML to XML XML to XML…. La cantine 65 rue des pyrénées, Paris, 19 ème, FRANCE Canard à lorange, Lapin au miel La cantine Paris 19 pyrénées 65 Canard à lorange Lapin au miel

42
ANNPR P. Gallinari42 Document mapping problem Central issue: Complexity Large collections Large feature space: 10 3 to 10 6 Large search space (exponential) Other SP methods fail

43
ANNPR P. Gallinari43 XML structuration Action Attach path ending with the current leaf to a position in the current partial tree Φ(a,s) encodes a series of potential (state, action) pairs Loss: F-Score for trees

44
ANNPR P. Gallinari44 HTML HEAD BODY TITLEITH1PFONT ExampleFrancis MAES Title of the section Welcome to INEX This is a footnote Example TITLE DOCUMENT AUTHOR Francis MAES Title of the section SECTION TITLE Welcome to INEX TEXT INPUT DOCUMENT TARGET DOCUMENT

45
ANNPR P. Gallinari45 HTML HEAD BODY TITLEITH1PFONT ExampleFrancis MAES Title of the section Welcome to INEX This is a footnote

46
ANNPR P. Gallinari46 HTML HEAD BODY TITLEITH1PFONT ExampleFrancis MAES Title of the section Welcome to INEX This is a footnote

47
ANNPR P. Gallinari47 HTML HEAD BODY TITLEITH1PFONT ExampleFrancis MAES Title of the section Welcome to INEX This is a footnote Example TITLE DOCUMENT

48
ANNPR P. Gallinari48 HTML HEAD BODY TITLEITH1PFONT ExampleFrancis MAES Title of the section Welcome to INEX This is a footnote Example TITLE DOCUMENT

49
ANNPR P. Gallinari49 HTML HEAD BODY TITLEITH1PFONT ExampleFrancis MAES Title of the section Welcome to INEX This is a footnote Example TITLE DOCUMENT

50
ANNPR P. Gallinari50 HTML HEAD BODY TITLEITH1PFONT ExampleFrancis MAES Title of the section Welcome to INEX This is a footnote Example TITLE DOCUMENT Francis MAES AUTHOR

51
ANNPR P. Gallinari51 HTML HEAD BODY TITLEITH1PFONT ExampleFrancis MAES Title of the section Welcome to INEX This is a footnote Example TITLE DOCUMENT Francis MAES AUTHOR

52
ANNPR P. Gallinari52 HTML HEAD BODY TITLEITH1PFONT ExampleFrancis MAES Title of the section Welcome to INEX This is a footnote Example TITLE DOCUMENT Francis MAES AUTHOR Title of the section SECTION TITLE

53
ANNPR P. Gallinari53 HTML HEAD BODY TITLEITH1PFONT ExampleFrancis MAES Title of the section Welcome to INEX This is a footnote Example TITLE DOCUMENT AUTHOR Francis MAES Title of the section SECTION TITLE

54
ANNPR P. Gallinari54 HTML HEAD BODY TITLEITH1PFONT ExampleFrancis MAES Title of the section Welcome to INEX This is a footnote Example TITLE DOCUMENT Title of the section SECTION TITLE AUTHOR Francis MAES

55
ANNPR P. Gallinari55 HTML HEAD BODY TITLEITH1PFONT ExampleFrancis MAES Title of the section Welcome to INEX This is a footnote Example TITLE DOCUMENT AUTHOR Francis MAES Title of the section SECTION TITLE

56
ANNPR P. Gallinari56 HTML HEAD BODY TITLEITH1PFONT ExampleFrancis MAES Title of the section Welcome to INEX This is a footnote Example TITLE DOCUMENT AUTHOR Francis MAES Title of the section SECTION TITLE Welcome to INEX TEXT

57
ANNPR P. Gallinari57 HTML HEAD BODY TITLEITH1PFONT ExampleFrancis MAES Title of the section Welcome to INEX This is a footnote Example TITLE DOCUMENT AUTHOR Francis MAES Title of the section SECTION TITLE Welcome to INEX TEXT

58
ANNPR P. Gallinari58 HTML HEAD BODY TITLEITH1PFONT ExampleFrancis MAES Title of the section Welcome to INEX This is a footnote Example TITLE DOCUMENT AUTHOR Francis MAES Title of the section SECTION TITLE Welcome to INEX TEXT This is a footnote

59
ANNPR P. Gallinari59 HTML HEAD BODY TITLEITH1PFONT ExampleFrancis MAES Title of the section Welcome to INEX This is a footnote Example TITLE DOCUMENT AUTHOR Francis MAES Title of the section SECTION TITLE Welcome to INEX TEXT This is a footnote

60
ANNPR P. Gallinari60 HTML HEAD BODY TITLEITH1PFONT ExampleFrancis MAES Title of the section Welcome to INEX This is a footnote Example TITLE DOCUMENT AUTHOR Francis MAES Title of the section SECTION TITLE Welcome to INEX TEXT INPUT DOCUMENT TARGET DOCUMENT

61
ANNPR P. Gallinari61 Results

62
ANNPR P. Gallinari62 Conclusion Learn to explore the state space of the problem Alternative to DP or classical search algorithms Could be used with any decomposable cost function Fewer assumptions than other methods (e.g. no Markov hypothesis) Leads to solutions that may scale with complex and large SP problems

63
ANNPR P. Gallinari63 More on this Paper at European Workshop on Reinforcement Learning 2008 Preliminary paper ECML 2007 Library available

Similar presentations

© 2016 SlidePlayer.com Inc.

All rights reserved.

Ads by Google