Presentation is loading. Please wait.

Presentation is loading. Please wait.

Stefan Evert, IMS - Uni Stuttgart Brigitte Krenn, ÖFAI Wien IMS Conclusion There is no single best strategy to extract an optimal set of candidate data.

Similar presentations


Presentation on theme: "Stefan Evert, IMS - Uni Stuttgart Brigitte Krenn, ÖFAI Wien IMS Conclusion There is no single best strategy to extract an optimal set of candidate data."— Presentation transcript:

1 Stefan Evert, IMS - Uni Stuttgart Brigitte Krenn, ÖFAI Wien IMS Conclusion There is no single best strategy to extract an optimal set of candidate data from a corpus. You need to know a least some structural and distributional properties of the phenomena you are searching for. Preparation of candidate data influences distributions. Distributional properties determine the outcome of AMs. Know the distributional assumptions underlying the AMs you use.

2 Stefan Evert, IMS - Uni Stuttgart Brigitte Krenn, ÖFAI Wien IMS Adj-N As for Java being slow as molasses, that's often a red herring. In the first place, the less genes, more behavioural flexibility argument is a total red herring. Smadja gives the example that the probability that any two adjacent words in a text will be red herring is greater than the probability of red times the probability of herring. We can, therefore, use statistical methods to find collocations [Smadja, 1993]. Okay, I think I misled you by introducing a red herring, for which I am very sorry. Home page for the British Red Cross, includes information on activities, donations, branch addresses,... Welcome to Red Pepper, independent magazine of the green and radical left. Hotel Accommodations by Red Roof Inn where you will find affordable Hotel Rates. Web site of the Royal Air Force aerobatic team, the Red Arrows. Hello, I am called Richard Herring and this is my web-site.

3 Stefan Evert, IMS - Uni Stuttgart Brigitte Krenn, ÖFAI Wien IMS Extraction of Coocurrence Data

4 Stefan Evert, IMS - Uni Stuttgart Brigitte Krenn, ÖFAI Wien IMS Basic Assumptions The collocates of a collocation cooccur more frequently within text than arbitrary word combinations. (Recurrence) Stricter control of cooccurrence data leads to more meaningful results in collocation extraction.

5 Stefan Evert, IMS - Uni Stuttgart Brigitte Krenn, ÖFAI Wien IMS Word (Co)occurrence Distribution of words and word combinations in text approximately described by Zipfs law. Distribution of combinations is more extreme than that of individual words.

6 Stefan Evert, IMS - Uni Stuttgart Brigitte Krenn, ÖFAI Wien IMS Word (Co)occurrence Zipf 's law: n m is the number of different words occurring m times i.e., there is a large number of low-frequency words, and few high-frequency ones

7 Stefan Evert, IMS - Uni Stuttgart Brigitte Krenn, ÖFAI Wien IMS An Example corpus size: 8 million words from the Frankfurter Rundschau corpus 569,310 PNV-combinations (types) have been selected from the extraction corpus including main verbs, modals and auxiliaries. Considering only combinations with main verbs, the number of PNV-types reduces to 372,212 (full forms). Corresponding to 454,088 instances

8 Stefan Evert, IMS - Uni Stuttgart Brigitte Krenn, ÖFAI Wien IMS c=1 c=2 c>=3 Distribution of PNV types according to frequency (372,212 types)

9 Stefan Evert, IMS - Uni Stuttgart Brigitte Krenn, ÖFAI Wien IMS c=3c=4c>=5<=10c>10 Distribution of PNV types according to frequency (10,430 types with f>2)

10 Stefan Evert, IMS - Uni Stuttgart Brigitte Krenn, ÖFAI Wien IMS Word (Co)occurrence Collocations will be preferably found among highly recurrent word combinations extracted from text. Large amounts of text need to be processed to obtain sufficient number of high-frequency combinations.

11 Stefan Evert, IMS - Uni Stuttgart Brigitte Krenn, ÖFAI Wien IMS Control of Candidate Data Extract collocations from relational bigrams ðSyntactic homogeneity of candidate data ð(Grammatical) cleanness of candidates e.g. N+V pairs: Subject+V vs. Object+V Text type, domain, and size of source corpus influence the outcome of collocation extraction

12 Stefan Evert, IMS - Uni Stuttgart Brigitte Krenn, ÖFAI Wien IMS Terminology Extraction corpus tokenized, pos-tagged or syntactically analysed text Base data list of bigrams found in corpus Cooccurrence data bigrams with contingency tables Collocation candidates ranked bigrams

13 Stefan Evert, IMS - Uni Stuttgart Brigitte Krenn, ÖFAI Wien IMS Types and Tokens Frequency counts (from corpora) identify labelled units (tokens), e.g. words, NPs, Adj-N pairs set of different labels (types) type frequency = number of tokens labelled with this type example:... what the black box does...

14 Stefan Evert, IMS - Uni Stuttgart Brigitte Krenn, ÖFAI Wien IMS Types and Tokens box Frequency counts (from corpora) identify labelled units (tokens), e.g. words, NPs, Adj-N pairs set of different labels (types) type frequency = number of tokens labelled with this type example:... what the black box does...

15 Stefan Evert, IMS - Uni Stuttgart Brigitte Krenn, ÖFAI Wien IMS Types and Tokens Counting cooccurrences bigram tokens = pairs of word tokens bigram types = pairs of word types contingency table = four-way classification of bigram tokens according to their components

16 Stefan Evert, IMS - Uni Stuttgart Brigitte Krenn, ÖFAI Wien IMS Contingency Tables contingency table for pair type (u,v)

17 Stefan Evert, IMS - Uni Stuttgart Brigitte Krenn, ÖFAI Wien IMS Collocation Extraction: Processing Steps Corpus preprocessing tokenization (orthographic words) pos-tagging morphological analysis / lemmatization partial parsing (full parsing)

18 Stefan Evert, IMS - Uni Stuttgart Brigitte Krenn, ÖFAI Wien IMS Collocation Extraction: Processing Steps Extraction of base data from corpus adjacent word pairs Adj-N pairs from NP chunks Object-V & Subject-V from parse trees Calculation of cooccurrence data compute contingency table for each pair type (u,v)

19 Stefan Evert, IMS - Uni Stuttgart Brigitte Krenn, ÖFAI Wien IMS Collocation Extraction: Processing Steps Ranking of cooccurrence data by "association scores" measure statistical association between types u and v true collocations should obtain high scores using association measures (AM) N-best list = listing of N highest- ranked collocation candidates

20 Stefan Evert, IMS - Uni Stuttgart Brigitte Krenn, ÖFAI Wien IMS Base Data: How to get? Adj-N adjacency data numerical span NP chunking (lemmatized)

21 Stefan Evert, IMS - Uni Stuttgart Brigitte Krenn, ÖFAI Wien IMS Base Data: How to get? V-N adjacency data sentence window (partial) parsing identification of grammatical relations (lemmatized)

22 Stefan Evert, IMS - Uni Stuttgart Brigitte Krenn, ÖFAI Wien IMS Base Data: How to get? PP-V adjacency data PP chunking separable verb particles (in German) (full syntactic analysis) (lemmatization?)

23 Stefan Evert, IMS - Uni Stuttgart Brigitte Krenn, ÖFAI Wien IMS Adj-N In the first place, the less genes, more behavioural flexibility argument is a total red herring. In/PRP the/ART first/ORD place/N,/$, the/ART /$ less/ADJ genes/N,/$, more/ADJ behavioural/ADJ flexibility/N /$ argument/N is/V a/ART total/ADJ red/ADJ herring/N./$.

24 Stefan Evert, IMS - Uni Stuttgart Brigitte Krenn, ÖFAI Wien IMS Adj-N: pos wi = N span size 1 (adjacency) w j, j= -1 first/ORD place/N less/ADJ genes/N behavioural/ADJ flexibility/N /$ argument/N red/ADJ herring/N

25 Stefan Evert, IMS - Uni Stuttgart Brigitte Krenn, ÖFAI Wien IMS Adj-N: pos wi = N more/ADJ flexibility/N /$ argument/N flexibility/N argument/N red/ADJ herring/N total/ADJ herring/N span size 2 w j, j = -2, -1 first/ORD place/N the/ART place/N less/ADJ genes/N /$ genes/N behavioural/ADJ flexibility/N

26 Stefan Evert, IMS - Uni Stuttgart Brigitte Krenn, ÖFAI Wien IMS Adj-N: pos wj = ADJ, pos wi = N more/ADJ flexibility/N /$ argument/N flexibility/N argument/N red/ADJ herring/N total/ADJ herring/N span size 2 w j, j = -2, -1 first/ORD place/N the/ART place/N less/ADJ genes/N /$ genes/N behavioural/ADJ flexibility/N

27 Stefan Evert, IMS - Uni Stuttgart Brigitte Krenn, ÖFAI Wien IMS Adj-N (S(PPIn/PRP (NPthe/ART first/ORD place/N ),/$, (NP the/ART /$ less/ADJ genes/N,/$, more/ADJ behavioural/ADJ flexibility/N /$ argument/N ) (VPis/V (NP a/ART total/ADJ red/ADJ herring/N )./$.

28 Stefan Evert, IMS - Uni Stuttgart Brigitte Krenn, ÖFAI Wien IMS Adj-N (S(PP-modIn/PRP (NPthe/ART first/ORD place/N ),/$, (NP-subj the/ART /$ less/ADJ genes/N,/$, more/ADJ behavioural/ADJ flexibility/N /$ argument/N ) (VP-copulais/V (NP a/ART total/ADJ red/ADJ herring/N )./$.

29 Stefan Evert, IMS - Uni Stuttgart Brigitte Krenn, ÖFAI Wien IMS Adj-N: NP chunks NP chunks (NPthe/ART first/ORD place/N) (NP the/ART /$ less/ADJ genes/N,/$, more/ADJ behavioural/ADJ flexibility/N /$ argument/N ) (NP a/ART total/ADJ red/ADJ herring/N) Adj-N Pairs less/ADJ genes/N more/ADJ flexibility/N behavioural/ADJ flexibility/N more/ADJ argument/N behavioural/ADJ argument/N total/ADJ herring/N red/ADJ herring/N

30 Stefan Evert, IMS - Uni Stuttgart Brigitte Krenn, ÖFAI Wien IMS N-V: Object-VERB spill the beans Good for you for guessing the puzzle but from the beans Mike spilled to me, I think those kind of twists are more maddening than fun. bury the hatchet Paul McCartney has buried the hatchet with Yoko Ono after a dispute over the songwriting credits of some of the best- known Beatles songs.

31 Stefan Evert, IMS - Uni Stuttgart Brigitte Krenn, ÖFAI Wien IMS N-V: Object-Mod-VERB keep nose to the grindstone I'm very impressed with you for having kept your nose to the grindstone, I'd like to offer you a managerial position. Weve learned from experience and kept our nose to the grindstone to make sure our future remains a bright one. She keeps her nose to the grindstone.

32 Stefan Evert, IMS - Uni Stuttgart Brigitte Krenn, ÖFAI Wien IMS N-V: Object-Mod-VERB keep nose to the grindstone (VP {kept, keeps,...} {(NP-obj your nose), (NP-obj our nose), (NP-obj her nose),... } (PP-mod to the grindstone) )

33 Stefan Evert, IMS - Uni Stuttgart Brigitte Krenn, ÖFAI Wien IMS PN-V: P-Object-VERB zur Verfügung stellen (make available) Peter stellt sein Auto Maria zur Verfügung (Peter makes his car available to Maria) in Frage stellen (question) Peter stellt Marias Loyalität in Frage (Peter questions Marias loyalty) in Verbindung setzen (to contact) Peter setzt sich mit Maria in Verbindung (Peter contacts Maria)

34 Stefan Evert, IMS - Uni Stuttgart Brigitte Krenn, ÖFAI Wien IMS Contingency Tables for Relational Cooccurrences (big,dog) (black,box) (black,dog) (small,cat) (small,box) (black,box) (old, box) (tabby,cat) pair type: (u,v) = (black, box)

35 Stefan Evert, IMS - Uni Stuttgart Brigitte Krenn, ÖFAI Wien IMS Contingency Tables for Relational Cooccurrences (big,dog) (black,box) (black,dog) (small,cat) (small,box) (black,box) (old, box) (tabby,cat) pair type: (u,v) = (black, box)

36 Stefan Evert, IMS - Uni Stuttgart Brigitte Krenn, ÖFAI Wien IMS Contingency Tables for Relational Cooccurrences (big,dog) (black,box) (black,dog) (small,cat) (small,box) (black,box) (old, box) (tabby,cat) pair type: (u,v) = (black, box)

37 Stefan Evert, IMS - Uni Stuttgart Brigitte Krenn, ÖFAI Wien IMS Contingency Tables for Relational Cooccurrences (big,dog) (black,box) (black,dog) (small,cat) (small,box) (black,box) (old, box) (tabby,cat) pair type: (u,v) = (black, box)

38 Stefan Evert, IMS - Uni Stuttgart Brigitte Krenn, ÖFAI Wien IMS Contingency Tables for Relational Cooccurrences (big,dog) (black,box) (black,dog) (small,cat) (small,box) (black,box) (old, box) (tabby,cat) pair type: (u,v) = (black, box)

39 Stefan Evert, IMS - Uni Stuttgart Brigitte Krenn, ÖFAI Wien IMS Contingency Tables for Relational Cooccurrences (big,dog) (black,box) (black,dog) (small,cat) (small,box) (black,box) (old, box) (tabby,cat) pair type: (u,v) = (black, box)

40 Stefan Evert, IMS - Uni Stuttgart Brigitte Krenn, ÖFAI Wien IMS (big,dog) (black,box) (black,dog) (small,cat) (small,box) (black,box) (old, box) (tabby,cat) Contingency Tables for Relational Cooccurrences f(u,v) = 2 f 1 (u) = 3 f 2 (v) = 4 N = 8

41 Stefan Evert, IMS - Uni Stuttgart Brigitte Krenn, ÖFAI Wien IMS Contingency Tables for Relational Cooccurrences f(u,v) = 2 f 1 (u) = 3 f 2 (v) = 4 N = 8

42 Stefan Evert, IMS - Uni Stuttgart Brigitte Krenn, ÖFAI Wien IMS Contingency Tables for Relational Cooccurrences f(u,v) = 2 f 1 (u) = 3 f 2 (v) = 4 N = 8

43 Stefan Evert, IMS - Uni Stuttgart Brigitte Krenn, ÖFAI Wien IMS Contingency Tables for Relational Cooccurrences real data from the BNC (adjacent adj-noun pairs, lemmatised)

44 Stefan Evert, IMS - Uni Stuttgart Brigitte Krenn, ÖFAI Wien IMS Contingency Tables in Perl %F = (); %F1 = (); %F2 = (); $N = 0; while (($u, $v)=get_pair()) { $F{"$u,$v"}++; $F1{$u}++; $F2{$v}++; $N++; }

45 Stefan Evert, IMS - Uni Stuttgart Brigitte Krenn, ÖFAI Wien IMS Contingency Tables in Perl foreach $pair (keys %F) { ($u,$v) = split /,/, $pair; $f = $F{$pair}; $f1 = $F1{$u}; $f2 = $F2{$v}; $O11 = $f; $O12 = $f1 - $f; $O21 = $f2 - $f; $O22 = $N - $f1 - $f2 - $f; #... }

46 Stefan Evert, IMS - Uni Stuttgart Brigitte Krenn, ÖFAI Wien IMS Reminder: Contingency Table with Row and Column Sums

47 Stefan Evert, IMS - Uni Stuttgart Brigitte Krenn, ÖFAI Wien IMS Why are Positional Cooccurrences Different? adjectives and nous cooccurring within sentences "I saw a black dog" (black, dog) f(black, dog)=1, f 1 (black)=1, f 2 (dog)=1 "The old man with the silly brown hat saw a black dog" (old, dog), (silly, dog), (brown, dog), (black, dog),..., (black, man), (black, hat) f(black, dog)=1, f 1 (black)=3, f 2 (dog)=4

48 Stefan Evert, IMS - Uni Stuttgart Brigitte Krenn, ÖFAI Wien IMS Why are Positional Cooccurrences Different? "wrong" combinations could be considered as extraction noise ( association measures distinguish noise from recurrent combinations) but: very large amount of noise statistical models assume that noise is completely random but: marginal frequencies often increase in large steps

49 Stefan Evert, IMS - Uni Stuttgart Brigitte Krenn, ÖFAI Wien IMS Contingency Tables for Segment-Based Cooccurrences within pre-determined segments (e.g. sentences) components of cooccurring pairs may be syntactically restricted (e.g. adj-noun, noun Sg -verb 3.Sg ) for given pair type (u,v), set of all sentences is classified into four categories

50 Stefan Evert, IMS - Uni Stuttgart Brigitte Krenn, ÖFAI Wien IMS Contingency Tables for Segment-Based Cooccurrences u S = at least one occurrence of u in sentence S u S = no occurrences of u in sentence S v S = at least one occurrence of v in sentence S v S = no occurrences of v in sentence S

51 Stefan Evert, IMS - Uni Stuttgart Brigitte Krenn, ÖFAI Wien IMS Contingency Tables for Segment-Based Cooccurrences f S (u,v) = number of sentences containing both u and v f S (u) = number of sentences containing u f S (v) = number of sentences containing v N S = total number of sentences

52 Stefan Evert, IMS - Uni Stuttgart Brigitte Krenn, ÖFAI Wien IMS Frequency Counts for Segment-Based Cooccurrences adjectives and nous cooccurring within sentences "I saw a black dog" (black, dog) f S (black, dog)=1, f S (black)=1, f S (dog)=1 "The old man with the silly brown hat saw a black dog" (old, dog), (silly, dog), (brown, dog), (black, dog),..., (black, man), (black, hat) f S (black, dog)=1, f S (black)=1, f S (dog)=1

53 Stefan Evert, IMS - Uni Stuttgart Brigitte Krenn, ÖFAI Wien IMS Segment-Based Cooccurrences in Perl foreach $S { %words = map {$_ => 1} words($S); %pairs = map {$_ => 1} pairs($S); foreach $w (keys %words) { $FS_w{$w}++; } foreach $p (keys %pairs) { $FS_p{$p}++; } $NS++; }

54 Stefan Evert, IMS - Uni Stuttgart Brigitte Krenn, ÖFAI Wien IMS Contingency Tables for Distance-Based Cooccurrences problems are similar to segment- based cooccurrence data but: no pre-defined segments accurate counting is difficult here: sketched for special case all orthographic words numerical span: n L left, n R right no stop word lists

55 Stefan Evert, IMS - Uni Stuttgart Brigitte Krenn, ÖFAI Wien IMS Contingency Tables for Distance-Based Cooccurrences n L = 3, n R = 2

56 Stefan Evert, IMS - Uni Stuttgart Brigitte Krenn, ÖFAI Wien IMS Contingency Tables for Distance-Based Cooccurrences n L = 3, n R = 2 occurrences of v

57 Stefan Evert, IMS - Uni Stuttgart Brigitte Krenn, ÖFAI Wien IMS Contingency Tables for Distance-Based Cooccurrences n L = 3, n R = 2 occurrences of v window W(v) around them

58 Stefan Evert, IMS - Uni Stuttgart Brigitte Krenn, ÖFAI Wien IMS Contingency Tables for Distance-Based Cooccurrences n L = 3, n R = 2 occurrences of v window W(v) around them occurrences of u

59 Stefan Evert, IMS - Uni Stuttgart Brigitte Krenn, ÖFAI Wien IMS Contingency Tables for Distance-Based Cooccurrences n L = 3, n R = 2 occurrences of v window W(v) around them occurrences of u cross-classify occurrences of u against window W(v)

60 Stefan Evert, IMS - Uni Stuttgart Brigitte Krenn, ÖFAI Wien IMS Contingency Tables for Distance-Based Cooccurrences

61 Stefan Evert, IMS - Uni Stuttgart Brigitte Krenn, ÖFAI Wien IMS Contingency Tables for Distance-Based Cooccurrences

62 Stefan Evert, IMS - Uni Stuttgart Brigitte Krenn, ÖFAI Wien IMS N-V: Subject-Verb Beispiele

63 Stefan Evert, IMS - Uni Stuttgart Brigitte Krenn, ÖFAI Wien IMS Cooccurrence Data Token-Type Distinction

64 Stefan Evert, IMS - Uni Stuttgart Brigitte Krenn, ÖFAI Wien IMS Extraction Strategies required: PoS-tagging basic phrase chunking infinitives with zu (to) are treated like single words, separated verb prefixes are reattached to the verb

65 Stefan Evert, IMS - Uni Stuttgart Brigitte Krenn, ÖFAI Wien IMS Extraction Strategies Full forms or base forms ? depends on language and collocation type required: morphological analysis

66 Stefan Evert, IMS - Uni Stuttgart Brigitte Krenn, ÖFAI Wien IMS 3 Extraction Strategies Strategy 1: Retrieval of n-grams from word forms only (w i ) Strategy 2: Retrieval of n-grams from part-of-speech annotated word forms (wt i ) Strategy 3: Retrieval of n-grams from word forms with particular parts-of- speech, at particular positions in syntactic structure (wt i c j )

67 Stefan Evert, IMS - Uni Stuttgart Brigitte Krenn, ÖFAI Wien IMS Spans tested w i w i+1 w i w i+1 w i+2 w i w i+2 w i+3 w i w i+3 w i+4

68 Stefan Evert, IMS - Uni Stuttgart Brigitte Krenn, ÖFAI Wien IMS Results of Strategy 1 Retrieval of PP-verb collocations from word forms only is clearly inappropriate as function words like articles, prepositions, conjunctions, pronouns, etc. outnumber content words such as nouns, adjectives and verbs. Blunt use of stop word lists leads to the loss of collocation-relevant information, as accessibility of prepositions and determiners may be crucial for the distinction of collocational and noncollocational word combinations.

69 Stefan Evert, IMS - Uni Stuttgart Brigitte Krenn, ÖFAI Wien IMS Results of Strategy 1 most useful/informative span: w i w i+1 w i+2 examples bis & 17 & Uhr 2222 FRANKFURT & A. & M. 949 in & diesem & Jahr 915 um & 20 & Uhr 855 Di. & bis & Fr & bis & Tips & und & Termine 597 in & der & Nacht 582

70 Stefan Evert, IMS - Uni Stuttgart Brigitte Krenn, ÖFAI Wien IMS we have learned useful/informative span size is language specific we find a number of different constructions e.g. NP, PP,... names, time phrases, conventionalized constructions,...

71 Stefan Evert, IMS - Uni Stuttgart Brigitte Krenn, ÖFAI Wien IMS Results of Strategy 2 wt i wt i+1 with preposition t i and noun t i+1 PPs with arbitrary preposition- noun co-occurrences such as am Samstag (on Saturday), am Wochenende (at the weekend), für Kinder (for children) Fixed/conventionalized? PPs such as zum Beispiel (for example)

72 Stefan Evert, IMS - Uni Stuttgart Brigitte Krenn, ÖFAI Wien IMS Results of Strategy 2 wt i wt i+1 with preposition t i and noun t i+1 PPs with a strong tendency for particular continuation such as nach Angaben + NP gen (`according to'), im Jahr + Card (in the year). Potential PP-collocates of verb- object collocations such as zur Verfügung (at the disposal)

73 Stefan Evert, IMS - Uni Stuttgart Brigitte Krenn, ÖFAI Wien IMS Results of Strategy 2 wt i wt i+2 with preposition t i and noun t i+1 typically cover PPs with pre-nominal modification Cardinal, for instance, is the most probable modifier category co-occurring with bis... Uhr (until oclock) Adjective is the predominant modifier category related to im... Jahr (1272 of 1276 cases total), vergangenen (Adj, last, 466 instances)

74 Stefan Evert, IMS - Uni Stuttgart Brigitte Krenn, ÖFAI Wien IMS Results of Strategy 2 wt i wt i+3 with preposition t i and noun t i+1 typically exceeds phrase boundaries im Jahres (in dat year gen ), for instance, originates from PP NP gen e.g. im September dieses Jahres (in the September of this year)

75 Stefan Evert, IMS - Uni Stuttgart Brigitte Krenn, ÖFAI Wien IMS Results of Strategy 2 wt i wt i+1 wt i+2 with preposition t i and noun t i+1 and verb t i+2 Frequent preposition-noun-participle or -infinitive sequences are good indicators for PP-verb collocations, especially for collocations that function as predicates such as support-verb constructions and a number of figurative expressions. zur Verfügung gestellt (made available) in Frage gestellt (questioned) in Verbindung setzen (to contact)

76 Stefan Evert, IMS - Uni Stuttgart Brigitte Krenn, ÖFAI Wien IMS Results of Strategy 2 wt i wt i+2 wt i+3 wt i wt i+3 wt i+4 with preposition t i and noun t i+2 and verb t i+3 with preposition t i and noun t i+3 and verb t i+4 a variety of PPs with prenominal modification are covered but also phrase boundaries are more likely to be exceeded durch Frauen helfen durch X (Y) Frauen helfen

77 Stefan Evert, IMS - Uni Stuttgart Brigitte Krenn, ÖFAI Wien IMS Results of Strategy 3 wt i c k wt j c k wt l c m PP- Collocate V- Collocate Right Neighbo ur Co- occurring Main Verb zur Verfügung stehen zur Verfügung stellen in Krafttreten99126 in Kraftsetzen1223 in Kraftbleiben05

78 Stefan Evert, IMS - Uni Stuttgart Brigitte Krenn, ÖFAI Wien IMS Conclusion There is no single best strategy to extract an optimal set of candidate data from a corpus. You need to know a least some structural and distributional properties of the phenomena you are searching for. Preparation of candidate data influences distributions. Distributional properties determine the outcome of AMs. Know the distributional assumptions underlying the AMs you use.


Download ppt "Stefan Evert, IMS - Uni Stuttgart Brigitte Krenn, ÖFAI Wien IMS Conclusion There is no single best strategy to extract an optimal set of candidate data."

Similar presentations


Ads by Google