Presentation is loading. Please wait.

Presentation is loading. Please wait.

LING 581: Advanced Computational Linguistics Lecture Notes January 26th.

Similar presentations


Presentation on theme: "LING 581: Advanced Computational Linguistics Lecture Notes January 26th."— Presentation transcript:

1 LING 581: Advanced Computational Linguistics Lecture Notes January 26th

2 Penn Treebank Bracketing guidelines

3 Ungraded Homework Exercise Search for NP trace relative clauses as defined below: Be ready to compare search pattern and number found next time in class

4 Ungraded Homework Exercise @NP < @NP < @SBAR 12038 @NP < @NP < @SBAR 12038

5 Ungraded Homework Exercise @NP < @NP < @SBAR plus WH indices 10956 down from 12038 @NP < @NP < @SBAR plus WH indices 10956 down from 12038

6 Ungraded Homework Exercise @NP < @NP < (@SBAR < /^-NONE-/) 529 Note -NONE- < *ICH* @NP < @NP < (@SBAR < /^-NONE-/) 529 Note -NONE- < *ICH*

7 Ungraded Homework Exercise

8 Not all @NP < @NP < (@SBAR < /^-NONE-/) are relative clauses Not all @NP < @NP < (@SBAR < /^-NONE-/) are relative clauses

9 Ungraded Homework Exercise @NP < @NP < (@SBAR < /^-NONE-/) plus *ICH* count drops from 529 to 166 @NP < @NP < (@SBAR < /^-NONE-/) plus *ICH* count drops from 529 to 166

10 Ungraded Homework Exercise @NP < @NP < (@SBAR < /^-NONE-/) plus *ICH* Is 166 too low? How about other -NONE- nodes? @NP < @NP < (@SBAR < /^-NONE-/) plus *ICH* Is 166 too low? How about other -NONE- nodes?

11 Ungraded Homework Exercise

12 Final tally

13 Homework Exercise Use the bracketing guides and choose three “interesting” constructions Find all occurrences in the WSJ PTB

14 Homework Exercise 581 Homework rules – Due next lecture – Present your findings in class (slides)

15 Parsing … from Treebank search to stochastic parsers trained on the WSJ Penn Treebank

16 Bikel Collins Java re-implementation of Collins’ parser Paper – Daniel M. Bikel. 2004. Intricacies of Collins’ Parsing Model. (PS) (PDF) in Computational Linguistics, 30(4), pp. 479-511.PS) (PDF) in Computational Linguistics, 30(4), pp. 479-511. – http://www.cis.upenn.edu/~dbikel/papers/collins- intricacies.pdf Software – http://www.cis.upenn.edu/~dbikel/

17 Bikel Collins Download and install Dan Bikel’s parser File: install.sh – Java code – but at this point I think Windows won’t work because of the shell script (.sh) – maybe after files are extracted?

18 Bikel Collins Download and install the POS tagger MXPOST parser doesn’t actually need a separate tagger…

19 Bikel Collins Training the parser with the WSJ PTB See guide – http://www.cis.upenn.edu/~dbikel/download/dbparser/gu ide.pdf directory: TREEBANK_3/parsed/mrg/wsj chapters 02-21: create one single.mrg file events:wsj-02-21.obj.gz directory: TREEBANK_3/parsed/mrg/wsj chapters 02-21: create one single.mrg file events:wsj-02-21.obj.gz

20 Bikel Collins Settings:

21 Bikel Collins Parsing – Command – Input file format (sentences)

22 Bikel Collins Verify the trainer and parser work on your machine

23 Bikel Collins File: bin/parse is a shell script that sets up program parameters and calls java

24 Bikel Collins

25 File: bin/train is another shell script

26 Bikel Collins Relevant WSJ PTB files

27 Bikel Collins If you have tcl/tk installed, I use a wrapper to call Dan Bikel’s code makes it easy to work the parser without memorizing the command line options

28 Bikel Collins For tree viewing, you can use tregex For demos, I use my own viewer

29 Bikel Collins POS tagging (MXPOST, in directory jmx) – tagger_input – $prefix/jmx/mxpost $prefix/jmx/tagger.project /tmp/err.txt Parsing – set ddf "wsj-02-21.obj.gz” – set properties "collins.properties" – parser_input – $dbprefix/bin/parse 400 $dbprefix/settings/$properties $dbprefix/bin/$ddf /tmp/test2.txt 2>@ stdout Training – set mrg "wsj-02-21.mrg” – set properties "collins.properties" – $dbprefix/bin/train 800 $dbprefix/settings/$properties $dbprefix/bin/$mrg 2>@ stdout Unix file descriptors 0 Standard input (stdin) 1Standard output (stdout) 2Standard error(stderr) GUI components frame.input text.input.t -height 4 -yscrollcommand {.input.s set} scrollbar.input.s -command {.input.t yview} frame.tagged text.tagged.t -height 9 -yscrollcommand {.tagged.s set} scrollbar.tagged.s -command {.tagged.t yview} Code proc tagger_input {} { set lines [.input.t get 1.0 end] set infile [open "/tmp/test.txt" w] puts -nonewline $infile [string trimright $lines] close $infile } proc parser_input {} { set lines [.tagged.t get 1.0 end] set infile [open "/tmp/test2.txt" w] puts -nonewline $infile [string trimright $lines] close $infile } Unix file descriptors 0 Standard input (stdin) 1Standard output (stdout) 2Standard error(stderr) GUI components frame.input text.input.t -height 4 -yscrollcommand {.input.s set} scrollbar.input.s -command {.input.t yview} frame.tagged text.tagged.t -height 9 -yscrollcommand {.tagged.s set} scrollbar.tagged.s -command {.tagged.t yview} Code proc tagger_input {} { set lines [.input.t get 1.0 end] set infile [open "/tmp/test.txt" w] puts -nonewline $infile [string trimright $lines] close $infile } proc parser_input {} { set lines [.tagged.t get 1.0 end] set infile [open "/tmp/test2.txt" w] puts -nonewline $infile [string trimright $lines] close $infile }


Download ppt "LING 581: Advanced Computational Linguistics Lecture Notes January 26th."

Similar presentations


Ads by Google