Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 More Xkwic and Tgrep LING 5200 Computational Corpus Linguistics Martha Palmer March 2, 2006.

Similar presentations


Presentation on theme: "1 More Xkwic and Tgrep LING 5200 Computational Corpus Linguistics Martha Palmer March 2, 2006."— Presentation transcript:

1 1 More Xkwic and Tgrep LING 5200 Computational Corpus Linguistics Martha Palmer March 2, 2006

2 LING 5200, 2006 BASED on Kevin Cohen’s LING 5200 2 Resources – Laura is bugging me to make a CU Corpora page… Like this http://www.stanford.edu/dept/linguistics/ corpora/cas-home.html http://www.stanford.edu/dept/linguistics/ corpora/cas-home.html TGREP http://www.stanford.edu/dept/linguistics/ corpora/cas-tut-tgrep.html

3 LING 5200, 2006 BASED on Kevin Cohen’s LING 5200 3 Searching with pos tags and ! [word = "[tT]he" & !( pos = "DT" ) ]; wsj [ !(word = "water" | pos = "NN")]; [ !(word = "water") & !( pos = "NN")]; [ word != "water" & pos != "NN" ];

4 LING 5200, 2006 BASED on Kevin Cohen’s LING 5200 4 Operator precedence The precedence properties of the (logical) operators are defined by the following list, i.e. if operator x is listed before operator y, operator x has precedence over y. Operators are evaluated left-right =, !=, !, &, | [ ! word = "water" & ! pos = "NN" ]; disambiguates as [ !(word = "water") & !( pos = "NN")];

5 LING 5200, 2006 BASED on Kevin Cohen’s LING 5200 5 Searching sequences with | and ? "Bill" [pos = "NP"]; [pos = "NP"] [pos = "NP"] [pos = "NP"]; ([pos = "NP"] [pos = "NP"]) | ([pos = "NP"] "of" [pos = "NP"]); ([pos = "NP"] "of“? [pos = "NP"]); Note: First match applies

6 LING 5200, 2006 BASED on Kevin Cohen’s LING 5200 6 Corpus Position: wild cards and contexts "give" []* "up"; "give" []{0,5} "up"; "give" []* "up" within 7; "Clinton" expand to 5; "Clinton" expand left to 5; "Clinton" expand right to 5;

7 LING 5200, 2006 BASED on Kevin Cohen’s LING 5200 7 Assignments and Intersect Q1 = "rain"; Q2 = [pos="NN"]; intersect Q1 Q2; Q1 = [pos = "JJ"] [pos = "NN"]; Q2 = "acid" "rain"; intersect Q1 Q2; [word = "acid" & pos = "JJ"] [word = "rain" & pos = "NN"]

8 LING 5200, 2006 BASED on Kevin Cohen’s LING 5200 8 Structural restrictions "give" []* "up" within s; ("gain" []* "profit") | ("profit" []* "gain") within 3 s; ("gain" []* "profit") | ("profit" []* "gain") within article; "Clinton" expand left to 2 s;

9 LING 5200, 2006 BASED on Kevin Cohen’s LING 5200 9 Defining structural restrictions Nounphrase = [pos = "DT"] [pos = "JJ"] [pos = "NN"]; Nounphrase; [pos = “JJ”] Go back to select

10 LING 5200, 2006 BASED on Kevin Cohen’s LING 5200 10 For fun [pos = "V.*"][pos = "PN.*”] []* [pos = "V.*"][pos = "PN.*”] ( [pos = “V.*”] [pos = “PN.*”]) within s Not a question, not beginning of sentence…

11 LING 5200, 2006 BASED on Kevin Cohen’s LING 5200 11 less is more less cat ??/* | less Switches  SPACE – next screenful  b– previous screenful  / /RNR search for pattern  ? search backwards for pattern  q - quit

12 LING 5200, 2006 BASED on Kevin Cohen’s LING 5200 12 Searching for a word tgrep Halloween – what happens? Why don’t you have to specify a file? babel>grep tgrep.cshrc # tgrep stuff #setenv TGREP_CORPUS /corpora/treebank2/tbl_075/tgrepabl/brwn_cmb.crp setenv TGREP_CORPUS /corpora/treebank2/tgrepabl/wsj_mrg.crp Count results: tgrep research | wc –l cat ??/* | grep Halloween | wc -l

13 LING 5200, 2006 BASED on Kevin Cohen’s LING 5200 13 Tgrep Switches -a Match on all patterns in a sentence -w Return the whole sentence -n Put the entire string on one line -t Print only the terminals

14 LING 5200, 2006 BASED on Kevin Cohen’s LING 5200 14 Viewing it in sentential context tgrep –wn Halloween | more tgrep –wn research | more (20,865 hits) Can also use less

15 LING 5200, 2006 BASED on Kevin Cohen’s LING 5200 15 Viewing it in sentential context tgrep –wn research | more

16 LING 5200, 2006 BASED on Kevin Cohen’s LING 5200 16 Searching by POS tgrep NNS | more Another way to do your sanity check

17 LING 5200, 2006 BASED on Kevin Cohen’s LING 5200 17 See more data? tgrep NNS | grep. | more

18 LING 5200, 2006 BASED on Kevin Cohen’s LING 5200 18 Sentential context (again) tgrep –wn NNS | more

19 LING 5200, 2006 BASED on Kevin Cohen’s LING 5200 19 Searching by syntactic constituent tgrep NP | more

20 LING 5200, 2006 BASED on Kevin Cohen’s LING 5200 20 Single-line outputs tgrep –n NP | more

21 LING 5200, 2006 BASED on Kevin Cohen’s LING 5200 21 Viewing tree-like output tgrep –w NP | head 20

22 LING 5200, 2006 BASED on Kevin Cohen’s LING 5200 22 Searching for relations between nodes tgrep ‘NP < CC’ | head -16

23 LING 5200, 2006 BASED on Kevin Cohen’s LING 5200 23 tgrep –g (whole language) A < B – A immediately dominates B A < B – A is immediately dominated by B A << B – A dominates B A >> B – A is dominated by B A. B – A immediately precedes B A.. B – A precedes B A<<,B – B is the leftmost descendent of A A<<‘B – B is the rightmost descendent of A

24 LING 5200, 2006 BASED on Kevin Cohen’s LING 5200 24 Alternation node names can be ORed e.g. tgrep ‘Clinton|Gore’ | head

25 LING 5200, 2006 BASED on Kevin Cohen’s LING 5200 25 Character classes Regular expressions tgrep ‘/[Cc]hild/’ | egrep. | head

26 LING 5200, 2006 BASED on Kevin Cohen’s LING 5200 26 Working towards that weird example… tgrep ‘/[Pp]resident/’ | head

27 LING 5200, 2006 BASED on Kevin Cohen’s LING 5200 27 Combining alternation and a regular expression tgrep ‘Clinton|Gore|[Pp]resident/’ | head

28 LING 5200, 2006 BASED on Kevin Cohen’s LING 5200 28 Searching for a transitive verb tgrep -w 'VP << like < NP << DT' | more

29 LING 5200, 2006 BASED on Kevin Cohen’s LING 5200 29 Verbs + Particles tgrep -w 'VP kick tgrep 'VP << /kick.*/ <2 PRT' kick tgrep 'VP <1 VB <2 PRT' kick tgrep -nw 'VP <1 /VB.*/ <2 PRT' kick tgrep 'VP <1 (VB < kick) <2 PRT' kick tgrep 'VP <1 (/VB.*/ < kick) <2 PRT' kick


Download ppt "1 More Xkwic and Tgrep LING 5200 Computational Corpus Linguistics Martha Palmer March 2, 2006."

Similar presentations


Ads by Google