Presentation is loading. Please wait.

Presentation is loading. Please wait.

LING 581: Advanced Computational Linguistics Lecture Notes January 19th.

Similar presentations


Presentation on theme: "LING 581: Advanced Computational Linguistics Lecture Notes January 19th."— Presentation transcript:

1 LING 581: Advanced Computational Linguistics Lecture Notes January 19th

2 Course Webpage –http://dingo.sbs.arizona.edu/~sandiway/ling581- 11/http://dingo.sbs.arizona.edu/~sandiway/ling581- 11/ Enrollment

3 Course Objectives Gain meaningful project experience –dealing with natural language software packages –installation –input data formatting –operation –project exercises –useful “real-world” computational experience –write small programs –abilities gained will be of value to employers

4 Computational Facilities Advise using your own laptop/desktop –we can also make use of this computer lab but you don’t have installation rights on these computers Platforms –You need to run some variant of Unix… (your task #1 for this week) e.g. –Linux de facto standard for advanced/research software –Cygwin on Windows http://www.cygwin.com/ Linux-like environment for Windows making it possible to port software running on POSIX systems (such as Linux, BSD, and Unix systems) to Windows. –MacOS X Not quite Linux, some porting issues, especially with C programs

5 Theme Language Understanding

6 Project Topics 1.PTB (Penn Treebank) search/lookup software (tgrep2), 2.Part-of-speech taggers. 3.The use and modification of statistical parsers trained on Treebanks (Bikel-Collins, and others) 4.Ontologies and Semantic Networks: WordNet etc. 5.Question-Answering (QA) 6.Sentence Parsing using contemporary linguistic theory: Minimalist Program

7 Grading Completion of all homework tasks will result in a satisfactory grade (A)

8 In the News recently… www.ibmwatson.com

9 Project 1: PTB You will be exposed to –Perl –Java –Lisp s-exps –Bikel-Collins Parser You will need to review concepts from LING 538 –regexp use –Penn POS tags

10 PTB Availability –Linguistic Data Consortium (LDC) U. of Arizona is a (fee-paying) member of this consortium Resources are made available to the community through the main library –URL http://sabio.library.arizona.edu/search/X

11 PTB (V3) Call Record

12 Task 1 Install cygwin or ubuntu Install the PTB –Borrow it from the library –Or use the cd I’ve brought with me Familiarize yourself with the organization and layout of the files –e.g. the difference between mrg and prd formats –As is standard in the literature, we’ll be using the WSJ (Wall Street Journal) section of the PTB

13 TreeBank Browsing 00/wsj_0001.mrg ( (S (NP-SBJ (NNP Mr.) (NNP Vinken) ) (VP (VBZ is) (NP-PRD (NP (NN chairman) ) (PP (IN of) (NP (NP (NNP Elsevier) (NNP N.V.) ) (,,) (NP (DT the) (NNP Dutch) (VBG publishing) (NN group) ))))) (..) )) 00/wsj_0001.mrg ( (S (NP-SBJ (NP (NNP Pierre) (NNP Vinken) ) (,,) (ADJP (NP (CD 61) (NNS years) ) (JJ old) ) (,,) ) (VP (MD will) (VP (VB join) (NP (DT the) (NN board) ) (PP-CLR (IN as) (NP (DT a) (JJ nonexecutive) (NN director) )) (NP-TMP (NNP Nov.) (CD 29) ))) (..) ))

14 TreeBank Browsing My out-dated tool (treebank viewer) URL –http://dingo.sbs.arizona.edu/~sandiway/treebankviewer/

15 PTB Search Tools Looking ahead Google and Install –tgrep2 http://tedlab.mit.edu/~dr/Tgrep2/ a fast command line search tool for parse trees C program (source, Makefile) –Tregex http://nlp.stanford.edu/software/tregex.shtml Graphical java version Penn Treebank Online (tgrep interface) –http://www.ldc.upenn.edu/ldc/online/treebank/http://www.ldc.upenn.edu/ldc/online/treebank/ –doesn’t seem to be working tgrep search currently unavailable.. tgrep –VP << /^believe/ < (S < (/^NP/ !<< /[*]/ !< (-NONE- < T)) < (VP|AUX << to)) –approximation to finding Verb Phrases headed by "believe" that have an infinitival complement with a non-null subject

16 PTB Search Tools


Download ppt "LING 581: Advanced Computational Linguistics Lecture Notes January 19th."

Similar presentations


Ads by Google