Presentation is loading. Please wait.

Presentation is loading. Please wait.

Learning structured ouputs A reinforcement learning approach P. Gallinari F. Maes, L. Denoyer www-connex.lip6.fr University Pierre.

Similar presentations


Presentation on theme: "Learning structured ouputs A reinforcement learning approach P. Gallinari F. Maes, L. Denoyer www-connex.lip6.fr University Pierre."— Presentation transcript:

1 Learning structured ouputs A reinforcement learning approach P. Gallinari F. Maes, L. Denoyer Patrick.gallinari@lip6.fr www-connex.lip6.fr University Pierre et Marie Curie – Paris – Fr Computer Science Lab.

2 2008-07-02ANNPR 2008 - P. Gallinari2 Outline Motivation and examples Approaches for structured learning Generative models Discriminant models Reinforcement learning for structured prediction Experiments

3 2008-07-02ANNPR 2008 - P. Gallinari3 Machine learning and structured data Different types of problems Model, classify, cluster structured data Predict structured outputs Learn to associate structured representations Structured data and applications in many domains chemistry, biology, natural language, web, social networks, data bases, etc

4 2008-07-02ANNPR 2008 - P. Gallinari4 Sequence labeling: POS ThisWorkshopbringstogetherscientistsandengineers DTNNVBZRBNNSCCNNS interestedinrecentdevelopmentsinexploitingMassive VBNINJJNNSINVBGJJ datasets NP determiner nounVerb 3rd persadverbNoun pluralCoord. Conj. adjectiveVerb gerundVerb plural

5 2008-07-02ANNPR 2008 - P. Gallinari5 Segmentation + labeling: syntactic chunking (Washington Univ. tagger) ThisWorkshopbringstogetherscientistsandengineers NPVPADVPNP interestedinrecentdevelopmentsin VPINNPPNP exploitingMassivedatasets NP Noun PhraseVerb PhraseNoun Phraseadverbial Phrase Noun Phrase

6 2008-07-02ANNPR 2008 - P. Gallinari6 Syntaxic Parsing (Stanford Parser)

7 2008-07-02ANNPR 2008 - P. Gallinari7 Tree mapping: Problem query heterogeneous XML collections Web information extraction Need to know the correspondence between the structured representations usually made by hand Learn the correspondence between the different sources Labeled tree mapping problem La cantine 65 rue des pyrénées, Paris, 19 ème, FRANCE Canard à lorange, Lapin au miel La cantine Paris pyrénées 65 Canard à lorange Lapin au miel

8 2008-07-02ANNPR 2008 - P. Gallinari8 Others Taxonomies Social networks Adversial computing: Webspam, Blogspam, … Translation Biology …..

9 2008-07-02ANNPR 2008 - P. Gallinari9 Is structure really useful ? Can we make use of structure ? Yes Evidence from many domains or applications Mandatory for many problems e.g. 10 K classes classification problem Yes but Complex or long term dependencies often correspond to rare events Practical evidence for large size problems Simple models sometimes offer competitive results Information retrieval Speech recognition, etc

10 2008-07-02ANNPR 2008 - P. Gallinari10 Why is structure prediction difficult Size of the output space # of potential labeling for a sequence # of potential joint labeling and segmentation # of potential labeled trees for an input sequence … Exponential output space Inference often amounts at a combinatorial search problem Scaling issues in most models

11 2008-07-02ANNPR 2008 - P. Gallinari11 Structured learning X, Y : input and output spaces Structured output y Y decomposes into parts of variable size y = (y 1, y 2,…, y T ) Dependencies Relations between y parts Local, long term, global Cost function O/ 1 loss: Hamming loss: F-score BLEU etc

12 2008-07-02ANNPR 2008 - P. Gallinari12 General approach Predictive approach: where F : X x Y R is a score function used to rank potential outputs F trained to optimize some loss function Inference problem |Y| sometimes exponential Argmax is often intractable: hypothesis decomposability of the score function over the parts of y Restricted set of outputs

13 2008-07-02ANNPR 2008 - P. Gallinari13 Structured algorithms differ by: Feature encoding Hypothesis on the output structure Hypothesis on the cost function Type of structure prediction problem

14 Three main families of SP models Generative models Discriminant models Search models

15 2008-07-02ANNPR 2008 - P. Gallinari15 Generative models 30 years of generative models Hidden Markov Models Many extensions Hierarchical HMMs, Factorial HMMs, etc Probabilistic Context Free grammars Tree models, Graphs models

16 2008-07-02ANNPR 2008 - P. Gallinari16 Usual hypothesis Features : natural encoding of the input Local dependency hypothesis On the outputs (Markov) On the inputs Cost function Usually joint likelihood Decomposes, e.g. sum of local cost on each subpart Inference and learning usually use dynamic programming HMMs Decoding complexity O(n|Q| 2 ) for a sequence of length n and |Q| states PCFGs Decoding Complexity: O(m 3 n 3 ), n = length of the sentence, m = # non terminals in the grammar Prohibitive for many problems

17 2008-07-02ANNPR 2008 - P. Gallinari17 Summary generative SP models Well known approaches Large number of existing models Local hypothesis Often imply using dynamic programming Inference learning

18 2008-07-02ANNPR 2008 - P. Gallinari18 Discriminant models Structured Perceptron (Collins 2002) Breakthrough Large margin methods (Tsochantaridis et al. 2004, Taskar 2004) Many others now Considered as an extension of multi-class clasification

19 2008-07-02ANNPR 2008 - P. Gallinari19 Usual hypothesis Joint representation of input – output Φ(x, y) Encode potential dependencies among and between input and output e.g. histogram of state transitions observed in training set, frequency of (x i,y j ), POS tags, etc Large feature sets (10 2 -> 10 4 ) Linear score function: Decomposability of features set (outputs) and of the loss function

20 2008-07-02ANNPR 2008 - P. Gallinari20 Extension of large margin methods 2 problems Classical 0/1 loss is meaningless with structured outputs Structured outputs is not a simple extension of multiclass classification Generalize max margin principle to other loss functions than 0/1 loss Number of constraints for the QP problem is proportional to |Y|, i.e. potentially exponential Different solutions Consider only a polynomial number of active constraints - > bounds on the optimality of the solution Learning require solving an Argmax problem at each iteration i.e. Dynamic programming

21 2008-07-02ANNPR 2008 - P. Gallinari21 Summary: discriminant approaches Hypothesis Local dependencies for the output Decomposability of the loss function Long term dependencies in the input Nice convergence properties + bounds Complexity Learning often does not scale

22 Reinforcement learning for structured outputs

23 2008-07-02ANNPR 2008 - P. Gallinari23 Precursors Incremental Parsing, Collins 2004 SEARN, Daume et al. 2006 Reinforcement Learning, Maes et al. 2007

24 2008-07-02ANNPR 2008 - P. Gallinari24 General idea The structured output will be built incrementally ŷ = (ŷ 1, ŷ 2,…, ŷ T ) Different from solving directly the prediction preblem It will by build via Machine learning Loss : Goal Construct incrementally ŷ so as to minimize the loss Training: learn how to search the solution space Inference: build ŷ as a sequence of successive decisions

25 2008-07-02ANNPR 2008 - P. Gallinari25 Example : sequence labelling Example 2 labels R et B Search space : (input sequence, {sequence of labels}) For a size 3 sequence x = x 1 x 2 x 3 : A node represents a state in the search space

26 2008-07-02ANNPR 2008 - P. Gallinari26 Example:actions and decisions Actions Π(s, a) Sequence of size 3 target : Π(s, a) : decision function

27 2008-07-02ANNPR 2008 - P. Gallinari27 Example : expected loss C=1 C=0 C T =1 C T =2 C T =3 C T =0 C T =1 C T =2 C=0 Sequence of size 3 target : Loss does not always separate !!

28 2008-07-02ANNPR 2008 - P. Gallinari28 Inference Suppose we have a policy function F which decides at each step which action to take Inference could be performed by computing ŷ 1 = F(x,. ), ŷ t = F(ŷ 1,… ŷ t-1 ),…, ŷ T = F(ŷ 1,…, ŷ T-1 ) ŷ = F(ŷ 1,…, ŷ T ) No Dynamic programming needed

29 2008-07-02ANNPR 2008 - P. Gallinari29 Example: state space exploration guided by local costs C=1 C=0 C T =1 C T =2 C T =3 C T =0 C T =1 C T =2 C=0 Sequence of size 3 target : Goal: generalize to unseen situation

30 2008-07-02ANNPR 2008 - P. Gallinari30 Training Learn a policy F From a set of (input, output) pairs Which takes the optimal action at each state Which is able to generalize to unknown situations

31 2008-07-02ANNPR 2008 - P. Gallinari31 Reinforcement learning for SP Formalizes search based ideas using Markov Decision Processes as model and Reinforcement Learning for training Provides a general framework for this approach Many RL algorithm could be used for training Differences with classical use of RL Specific hypothesis Deterministic environment Problem size State : up to 10 6 variables Generalization properties

32 2008-07-02ANNPR 2008 - P. Gallinari32 Markov Decision Process A MDP is a tuple (S, A, P, R) S is the State space, A is the action space, P is a transition function describing the dynamic of the environment P(s, a, s) = P(s t+1 = s| s t = s, a t = a) R is a reward function R(s, a, s) = E[r t |s t +1 = s, s t = s, a t = a)

33 2008-07-02ANNPR 2008 - P. Gallinari33 Example: Prediction problem

34 2008-07-02ANNPR 2008 - P. Gallinari34 Example: State space Deterministic MDP States: input + partial output Actions: elementary modifications of the partial output Rewards: quality of prediction

35 2008-07-02ANNPR 2008 - P. Gallinari35 Example: partially observable reward The reward function is only partially observable : limited set of training (input, output)

36 2008-07-02ANNPR 2008 - P. Gallinari36 Learning the MPDP The reward function cannot be computed on the whole MDP Approximated RL Learn the policy on a subspace Generalize to the whole space The policy is learned as a linear function Φ(s, a) is a joint description of (s, a) θ is a,parameter vector Learning algorithm Sarsa, Q-learning, etc

37 2008-07-02ANNPR 2008 - P. Gallinari37 Inference State at step t Initial input Partial output Once the MDP is learned Using the state at step t, compute state t+1 Until a solution is reached Greedy approach Only the best action is chosen at each time

38 2008-07-02ANNPR 2008 - P. Gallinari38 Structured outputs and MDP State s t Input x + partial output ŷ t Initial state : (x,) Actions Task dependent POS: new tag for the current word Tree mapping: insert a new path in a partial tree Inference Greedy approach Only the best action is chosen at each time Reward Final: Heuristic:

39 2008-07-02ANNPR 2008 - P. Gallinari39 Exemple: sequence labeling Left Right model Actions: label Order free model Actions: label + position Loss : Hamming cost or F score Tasks Named entity recognition (Shared task at CoNNL 2002 - 8 000 train, 1500 test) Chunking – Noun Phrases (CoNNL 2002) Handwriten word recognition (5000 train, 1000 test) Complexity of inference O(sequence size * number of labels)

40 2008-07-02ANNPR 2008 - P. Gallinari40

41 2008-07-02ANNPR 2008 - P. Gallinari41 Tree mapping: Document mapping Problem Labeled tree mapping problem Different instances Flat text to XML HTML to XML XML to XML…. La cantine 65 rue des pyrénées, Paris, 19 ème, FRANCE Canard à lorange, Lapin au miel La cantine Paris 19 pyrénées 65 Canard à lorange Lapin au miel

42 2008-07-02ANNPR 2008 - P. Gallinari42 Document mapping problem Central issue: Complexity Large collections Large feature space: 10 3 to 10 6 Large search space (exponential) Other SP methods fail

43 2008-07-02ANNPR 2008 - P. Gallinari43 XML structuration Action Attach path ending with the current leaf to a position in the current partial tree Φ(a,s) encodes a series of potential (state, action) pairs Loss: F-Score for trees

44 2008-07-02ANNPR 2008 - P. Gallinari44 HTML HEAD BODY TITLEITH1PFONT ExampleFrancis MAES Title of the section Welcome to INEX This is a footnote Example TITLE DOCUMENT AUTHOR Francis MAES Title of the section SECTION TITLE Welcome to INEX TEXT INPUT DOCUMENT TARGET DOCUMENT

45 2008-07-02ANNPR 2008 - P. Gallinari45 HTML HEAD BODY TITLEITH1PFONT ExampleFrancis MAES Title of the section Welcome to INEX This is a footnote

46 2008-07-02ANNPR 2008 - P. Gallinari46 HTML HEAD BODY TITLEITH1PFONT ExampleFrancis MAES Title of the section Welcome to INEX This is a footnote

47 2008-07-02ANNPR 2008 - P. Gallinari47 HTML HEAD BODY TITLEITH1PFONT ExampleFrancis MAES Title of the section Welcome to INEX This is a footnote Example TITLE DOCUMENT

48 2008-07-02ANNPR 2008 - P. Gallinari48 HTML HEAD BODY TITLEITH1PFONT ExampleFrancis MAES Title of the section Welcome to INEX This is a footnote Example TITLE DOCUMENT

49 2008-07-02ANNPR 2008 - P. Gallinari49 HTML HEAD BODY TITLEITH1PFONT ExampleFrancis MAES Title of the section Welcome to INEX This is a footnote Example TITLE DOCUMENT

50 2008-07-02ANNPR 2008 - P. Gallinari50 HTML HEAD BODY TITLEITH1PFONT ExampleFrancis MAES Title of the section Welcome to INEX This is a footnote Example TITLE DOCUMENT Francis MAES AUTHOR

51 2008-07-02ANNPR 2008 - P. Gallinari51 HTML HEAD BODY TITLEITH1PFONT ExampleFrancis MAES Title of the section Welcome to INEX This is a footnote Example TITLE DOCUMENT Francis MAES AUTHOR

52 2008-07-02ANNPR 2008 - P. Gallinari52 HTML HEAD BODY TITLEITH1PFONT ExampleFrancis MAES Title of the section Welcome to INEX This is a footnote Example TITLE DOCUMENT Francis MAES AUTHOR Title of the section SECTION TITLE

53 2008-07-02ANNPR 2008 - P. Gallinari53 HTML HEAD BODY TITLEITH1PFONT ExampleFrancis MAES Title of the section Welcome to INEX This is a footnote Example TITLE DOCUMENT AUTHOR Francis MAES Title of the section SECTION TITLE

54 2008-07-02ANNPR 2008 - P. Gallinari54 HTML HEAD BODY TITLEITH1PFONT ExampleFrancis MAES Title of the section Welcome to INEX This is a footnote Example TITLE DOCUMENT Title of the section SECTION TITLE AUTHOR Francis MAES

55 2008-07-02ANNPR 2008 - P. Gallinari55 HTML HEAD BODY TITLEITH1PFONT ExampleFrancis MAES Title of the section Welcome to INEX This is a footnote Example TITLE DOCUMENT AUTHOR Francis MAES Title of the section SECTION TITLE

56 2008-07-02ANNPR 2008 - P. Gallinari56 HTML HEAD BODY TITLEITH1PFONT ExampleFrancis MAES Title of the section Welcome to INEX This is a footnote Example TITLE DOCUMENT AUTHOR Francis MAES Title of the section SECTION TITLE Welcome to INEX TEXT

57 2008-07-02ANNPR 2008 - P. Gallinari57 HTML HEAD BODY TITLEITH1PFONT ExampleFrancis MAES Title of the section Welcome to INEX This is a footnote Example TITLE DOCUMENT AUTHOR Francis MAES Title of the section SECTION TITLE Welcome to INEX TEXT

58 2008-07-02ANNPR 2008 - P. Gallinari58 HTML HEAD BODY TITLEITH1PFONT ExampleFrancis MAES Title of the section Welcome to INEX This is a footnote Example TITLE DOCUMENT AUTHOR Francis MAES Title of the section SECTION TITLE Welcome to INEX TEXT This is a footnote

59 2008-07-02ANNPR 2008 - P. Gallinari59 HTML HEAD BODY TITLEITH1PFONT ExampleFrancis MAES Title of the section Welcome to INEX This is a footnote Example TITLE DOCUMENT AUTHOR Francis MAES Title of the section SECTION TITLE Welcome to INEX TEXT This is a footnote

60 2008-07-02ANNPR 2008 - P. Gallinari60 HTML HEAD BODY TITLEITH1PFONT ExampleFrancis MAES Title of the section Welcome to INEX This is a footnote Example TITLE DOCUMENT AUTHOR Francis MAES Title of the section SECTION TITLE Welcome to INEX TEXT INPUT DOCUMENT TARGET DOCUMENT

61 2008-07-02ANNPR 2008 - P. Gallinari61 Results

62 2008-07-02ANNPR 2008 - P. Gallinari62 Conclusion Learn to explore the state space of the problem Alternative to DP or classical search algorithms Could be used with any decomposable cost function Fewer assumptions than other methods (e.g. no Markov hypothesis) Leads to solutions that may scale with complex and large SP problems

63 2008-07-02ANNPR 2008 - P. Gallinari63 More on this Paper at European Workshop on Reinforcement Learning 2008 Preliminary paper ECML 2007 Library available http://nieme.lip6.fr


Download ppt "Learning structured ouputs A reinforcement learning approach P. Gallinari F. Maes, L. Denoyer www-connex.lip6.fr University Pierre."

Similar presentations


Ads by Google