Presentation is loading. Please wait.

Presentation is loading. Please wait.

Learning with lookahead: Can history-based models rival globally optimized models? Yoshimasa Tsuruoka Japan Advanced Institute of Science and Technology.

Similar presentations


Presentation on theme: "Learning with lookahead: Can history-based models rival globally optimized models? Yoshimasa Tsuruoka Japan Advanced Institute of Science and Technology."— Presentation transcript:

1 Learning with lookahead: Can history-based models rival globally optimized models? Yoshimasa Tsuruoka Japan Advanced Institute of Science and Technology (JAIST) Yusuke Miyao National Institute of Informatics (NII) Junichi Kazama National Institute of Information and Communications Technology (NICT)

2 History-based models Structured prediction problems in NLP – POS tagging, named entity recognition, parsing, … History-based models – Decompose the structured prediction problem into a series of classification problems Have been widely used in many NLP tasks – MEMMs (Ratnaparkhi, 1996; McCallum et al., 2000) – Transition-based parsers (Yamada & Matsumoto, 2003; Nivre et al., 2006) Becoming less popular

3 Part-of-speech (POS) tagging Perform multi-class classification at each word Features are defined on observations (i.e. words) and the POS tags on the left I saw a dog with eyebrows NVDPNVDP NVDPNVDP NVDPNVDP NVDPNVDP NVDPNVDP NVDPNVDP

4 Dependency parsing I saw a dog with eyebrows OPERATIONSTACKQUEUE Shift ReduceL ReduceR I saw a dog with eyebrows

5 Dependency parsing I saw a dog with eyebrows OPERATIONSTACKQUEUE Shift ReduceL ReduceR Isaw a dog with eyebrows

6 Dependency parsing I saw a dog with eyebrows OPERATIONSTACKQUEUE Shift ReduceL ReduceR a dog with eyebrows Isaw

7 Dependency parsing I saw a dog with eyebrows OPERATIONSTACKQUEUE Shift ReduceL ReduceR sawa dog with eyebrows

8 Dependency parsing I saw a dog with eyebrows OPERATIONSTACKQUEUE Shift ReduceL ReduceR saw adog with eyebrows

9 Dependency parsing I saw a dog with eyebrows OPERATIONSTACKQUEUE Shift ReduceL ReduceR saw a dogwith eyebrows

10 Dependency parsing I saw a dog with eyebrows OPERATIONSTACKQUEUE Shift ReduceL ReduceR saw dogwith eyebrows

11 Dependency parsing I saw a dog with eyebrows OPERATIONSTACKQUEUE Shift ReduceL ReduceR saw dog witheyebrows

12 Dependency parsing I saw a dog with eyebrows OPERATIONSTACKQUEUE Shift ReduceL ReduceR saw dog with eyebrows

13 Dependency parsing I saw a dog with eyebrows OPERATIONSTACKQUEUE Shift ReduceL ReduceR saw dog with

14 Dependency parsing I saw a dog with eyebrows OPERATIONSTACKQUEUE Shift ReduceL ReduceR saw dog

15 Lookahead Playing Chess If I move this pawn, then the knight will be captured by that bishop, but then I can …

16 POS tagging with lookahead Consider all possible sequences of future tagging actions to a certain depth I saw a dog with eyebrows NVD NVDPNVDP NVDPNVDP

17 POS tagging with lookahead Consider all possible sequences of future tagging actions to a certain depth I saw a dog with eyebrows NVD NVDPNVDP NVDPNVDP

18 POS tagging with lookahead Consider all possible sequences of future tagging actions to a certain depth I saw a dog with eyebrows NVD NVDPNVDP NVDPNVDP

19 POS tagging with lookahead Consider all possible sequences of future tagging actions to a certain depth I saw a dog with eyebrows NVD NVDPNVDP NVDPNVDP

20 POS tagging with lookahead Consider all possible sequences of future tagging actions to a certain depth I saw a dog with eyebrows NVD NVDPNVDP NVDPNVDP

21 Dependency parsing I saw a dog with eyebrows OPERATIONSTACKQUEUE Shift ReduceL ReduceR saw dogwith eyebrows Shift ReduceL ReduceR saw dog witheyebrows

22 Dependency parsing I saw a dog with eyebrows OPERATIONSTACKQUEUE Shift ReduceL ReduceR saw dogwith eyebrows Shift ReduceL ReduceR sawwith eyebrows

23 Choosing the best action by search S1S1 S2S2 SmSm....... a1a1 a2a2 amam S1*S1*S2*S2*S3*S3* search depth S

24 Search

25 Decoding cost Time complexity: O(nm^(D+1)) – n: number of actions to complete the structure – m: average number of possible actions at each state – D: search depth Time complexity of k-th order CRFs: O(nm^(k+1)) History-based models with k-depth lookahead are comparable to k-th order CRFs in terms of training/testing time

26 Perceptron learning with Lookahead S1S1 S2S2 SmSm....... S1*S1*S2*S2*Sm*Sm* a1a1 a2a2 amam Without lookahead With lookahead Linear scoring model Correct action Guaranteed to converge

27 Experiments Sequence prediction tasks – POS tagging – Text chunking (a.k.a. shallow parsing) – Named entity recognition Syntactic parsing – Dependency parsing Compared to first-order CRFs in terms of speed and accuracy

28 POS tagging Accuracy WSJ corpus

29 Training time Second WSJ corpus

30 POS tagging (+ tag trigram features) Accuracy WSJ corpus

31 Chunking (shallow parsing) F-score CoNLL 2000 data set

32 Named entity recognition F-score BioNLP/NLPBA 2004 data set

33 Dependency parsing F-score WSJ corpus (Zhang and Clark, 2008)

34 Related work MEMMs + Viterbi – label bias problem (Lafferty et al., 2001) Learning as search optimization (LaSO) (Daume III and Marcu 2005) – No lookahead Structured perceptron with beam search (Zhang and Clark, 2008)

35 Conclusion Can history-based models rival globally optimized models? – Yes, they can be more accurate than CRFs The same computational cost as CRFs

36 Future work Feature Engineering Flexible search extension/reduction Easy-first tagging/parsing – (Goldbergand & Elhadad, 2010) Max-margin learning

37 THANK YOU


Download ppt "Learning with lookahead: Can history-based models rival globally optimized models? Yoshimasa Tsuruoka Japan Advanced Institute of Science and Technology."

Similar presentations


Ads by Google