Presentation is loading. Please wait.

Presentation is loading. Please wait.

TSD, Brno, 13.9.2006Institute of Formal and Applied Linguistics, 1 Czech Verbs of Communication and the Extraction of.

Similar presentations


Presentation on theme: "TSD, Brno, 13.9.2006Institute of Formal and Applied Linguistics, 1 Czech Verbs of Communication and the Extraction of."— Presentation transcript:

1 TSD, Brno, 13.9.2006Institute of Formal and Applied Linguistics, {benesova,bojar}@ufal.mff.cuni.cz 1 Czech Verbs of Communication and the Extraction of their Frames Václava Benešová and Ondřej Bojar

2 TSD, Brno, 13.9.2006Institute of Formal and Applied Linguistics, {benesova,bojar}@ufal.mff.cuni.cz 2/14 Introduction  1. VALLEX, Valency Lexicon of Czech Verbs  2. Automatic Identification of Verbs of Communication  3. Frame Suggestion  4. Conclusion

3 TSD, Brno, 13.9.2006Institute of Formal and Applied Linguistics, {benesova,bojar}@ufal.mff.cuni.cz 3/14  1. Valency lexicon of Czech Verbs, VALLEX 1.x, and its Verb Classes  Verb Classes in VALLEX  Verbs of Communication

4 TSD, Brno, 13.9.2006Institute of Formal and Applied Linguistics, {benesova,bojar}@ufal.mff.cuni.cz 4/14 VALLEX Theoretical background: Functional Generative Description (FGD) Valency: “ability of lexical units to bind other lexical units” Versions: 1.0, internal 1.5, 2.0 (autumn 2006) (almost 4300 entries) Corpus coverage (Czech National corpus): ● about 10% verbs occurrences with low corpus frequency, not covered (cca 28000 lemmas)

5 TSD, Brno, 13.9.2006Institute of Formal and Applied Linguistics, {benesova,bojar}@ufal.mff.cuni.cz 5/14 Verb Entry in VALLEX Verb Entry: set of valency frame(s) Valency frame: sequence of slots (functor, morphemic realization, type of complement) Attributes of valency frames: gloss, example, … class

6 TSD, Brno, 13.9.2006Institute of Formal and Applied Linguistics, {benesova,bojar}@ufal.mff.cuni.cz 6/14 Verb Classes in VALLEX  Classification: in progress built from below emphasis on syntactic criteria communication, mental action, perception, psych verb, exchange, change, phase verbs, phase of action, modal verbs, motion, transport, location, … VALLEX 1.0VALLEX 1.5 Total Verb Entries 1.4372.476 Total Verb Lemmas 1.0811.844 Total Valency Frames 4.2397.080 Valency Frames with Class 1.591 [37.5%] 3.156 [44.6%] Total Classes Frame Types in Class on Average 16 6.1 23 6.1

7 TSD, Brno, 13.9.2006Institute of Formal and Applied Linguistics, {benesova,bojar}@ufal.mff.cuni.cz 7/14 Communication verbs in VALLEX ‘a speaker conveys information to a recipient’ ACT ADDR PAT/EFF {nom} {gen/dat/acc} {dc,...} simple information: {říci: say, informovat: inform, …} + THAT: že → verbs of announcement question: {ptát se: ask, …} + WHETHER, IF: zda, jestli → interrogative verbs commands, bans, warning, …: {nakázat: order, zakázat: prohibit, …} + IN ORDER TO, LET: aby,ať → imperative verbs VALLEX 1.0 VALLEX 1.5 verbs of announce ment: že 191276 interrogati ve verbs: zda 87135 imperative verbs: aby 74105

8 TSD, Brno, 13.9.2006Institute of Formal and Applied Linguistics, {benesova,bojar}@ufal.mff.cuni.cz 8/14  2. Automatic Identification of Verbs Communication  Evaluation VALLEX vs. FrameNet

9 TSD, Brno, 13.9.2006Institute of Formal and Applied Linguistics, {benesova,bojar}@ufal.mff.cuni.cz 9/14 Automatic Identification of Verbs Communication Search corpus for V+N234+subord{aby,zda,že} marks each as a communication verb if enough occurrences are found. weak points: 1. eliminates nominal structures: ‘He said the truth about the killer.’ ‘He gave her many presents.’ (verb of exchange) 2. ignores examples where a complement was not expressed on the surface layer: ‘He said that …’ 3. homonymy of conjunctions: že (that) and aby (in order to) ‘He has done it in order to make money…’

10 TSD, Brno, 13.9.2006Institute of Formal and Applied Linguistics, {benesova,bojar}@ufal.mff.cuni.cz 10/14 Evaluation against VALLEX and FrameNet  golden standards: VALLEX 1.0, VALLEX 1.5, FrameNet 1.2  ROC curves TP … true positives (communication verbs according to a golden standard and above the threshold) FP … false positives (non communication verbs and above the given threshold) TPR = TP / P (P the total number of communication verbs) … true positive rate TNR = TN / N (N the total number of verbs with no sense of communication) 40 – 50 % communication verbs identified correctly (for both VALLEX and FrameNet) 20% falsely marked

11 TSD, Brno, 13.9.2006Institute of Formal and Applied Linguistics, {benesova,bojar}@ufal.mff.cuni.cz 11/14  3. Frame Suggestion  Frame Edit Distance and Verb Entry Similarity  Experimental Results

12 TSD, Brno, 13.9.2006Institute of Formal and Applied Linguistics, {benesova,bojar}@ufal.mff.cuni.cz 12/14 Frame Edit Distance and Verb Entry Similarity insert, delete, replace  FED (number of edit operations: insert, delete, replace necessary to convert a hypothesized frame to a correct frame)  ES (entry similarity or expected saving) min FED(G,H) ES=1- FED(G,Ø)+FED(H,Ø) G … golden verb entries of this base lemma H … hypothesized entries Ø … blank verb entry ES 0% (suggesting nothing), ES 100% (golden frames)

13 TSD, Brno, 13.9.2006Institute of Formal and Applied Linguistics, {benesova,bojar}@ufal.mff.cuni.cz 13/14 Experimental Results with ES Suggested framesES [%] Specific frame for verbs of communication, default for others 38.00 Baseline 1: ACT(1)26.69 Baseline 2: ACT(1) PAT(4)37.55 Baseline 3: ACT(1) ADDR(3,4) PAT(4) 35.70 Baseline 4: Two typical frames: ACT(1) PAT(4) 39.11

14 TSD, Brno, 13.9.2006Institute of Formal and Applied Linguistics, {benesova,bojar}@ufal.mff.cuni.cz 14/14 Conclusion  Automatic identification of communication verbs according to the proposed pattern V+N234+subord{aby,zda,že} performs satisfactorily (40-50% true positives against VALLEX and FrameNet, 20% false positives)  FED reveals that more lexicographic labour could be saved by suggesting more than one frame per verb -> need to focus on other classes, too


Download ppt "TSD, Brno, 13.9.2006Institute of Formal and Applied Linguistics, 1 Czech Verbs of Communication and the Extraction of."

Similar presentations


Ads by Google