LING 581: Advanced Computational Linguistics Lecture Notes January 19th
Administrivia New room – Shantz 338 – (I have asked Jennifer Columbus to investigate refund: however, I’m told it may not happen) Marshall 480 Shantz 338
Penn Treebank Availability – Source: Linguistic Data Consortium (LDC) U. of Arizona is a (fee-paying) member of this consortium Resources are made available to the community through the main library URL –
Penn Treebank (V3) Call Record
Penn Treebank 1.Tagging Guide 2.Arpa94 paper 3.Parse Guide 1.Tagging Guide 2.Arpa94 paper 3.Parse Guide
Penn Treebank
sections 00-24
Penn Treebank
tregex Tregex is a Tgrep2-style utility for matching patterns in trees. written In Java written In Java run-tregex-gui.command shell script -mx flag, the 300m default memory size will need to be increased depending on the platform
tregex Select the PTB directory – TREEBANK_3/parsed/mrg/wsj/ Browse Deselect any unwanted files
tregex Search
tregex Help
tregex Help
tregex Help
tregex Help
tregex Help
tregex Pattern: – <, $+ (/,/ $+ $+ /,/=comma))) <- =comma)
tregex Help
tregex
Different results from: < /^WH.*-([0-9]+)$/#1%index << < (/^-NONE-/ < /^\*T\*-([0-9]+)$/#1%index))
tregex Example: WHADVP also possible (not just WHNP)
Ungraded Homework Exercise Search for NP trace relative clauses as defined below: Be ready to compare search pattern and number found next time in class