Presentation is loading. Please wait.

Presentation is loading. Please wait.

Without data, nothing Adam Kilgarriff Lexical Computing Ltd University of Leeds.

Similar presentations


Presentation on theme: "Without data, nothing Adam Kilgarriff Lexical Computing Ltd University of Leeds."— Presentation transcript:

1 Without data, nothing Adam Kilgarriff Lexical Computing Ltd University of Leeds

2 Generative Lexicon Account of non-standard uses of words So: we need a dataset Gasteiz-Vitoria, 2012Kilgarriff: Without Data, Nothing2

3 Method Sample of words Sample of corpus instances for each Choose a dictionary Sense-tag Identify mismatches to dict senses For each Does it fit the GL model? Gasteiz-Vitoria, 2012Kilgarriff: Without Data, Nothing3

4 Resources Words (random sample) modest disability steering seize sack (v) sack (n) onion rabbit handbag Corpus instances between 82 and 718 for each word Total: 2276 Dictionary: HECTOR OUP/Xerox project in corpus lexicography Gasteiz-Vitoria, 2012Kilgarriff: Without Data, Nothing4

5 Tagging Three professional lexicographers Assign sense to each corpus instance For this exercise If anything other than 3-way agreement Re-examine 390 of 2276 cases (17%) Gasteiz-Vitoria, 2012Kilgarriff: Without Data, Nothing5

6 modest Any two dictionaries divide up space differently HECTOR: 9 CIDE: 3 LDOCE: 4 COBUILD: 5 tagger agreement – less than half Messy but no GL-like cases Gasteiz-Vitoria, 2012Kilgarriff: Without Data, Nothing6

7 Szeged, Jan 2008Kilgarriff, Global WordNet7 What is language?

8 steering 2 senses Activity: his steering was careless Mechanism: they overhauled the steering 16 re-examined, most underspecified it has the Peugeot’s steering feel One more complex case After nearly fifty years [as a bus driver] Mr. Hannis stepped down from behind the steering wheel Gasteiz-Vitoria, 2012Kilgarriff: Without Data, Nothing8

9 onion Two senses: plant and food 34 cases re-examined 10 bridged divide Plant the sets two inches apart to produce a good yield of medium-sized onions Others – medicine, decorative feature, dye, cliché of Frenchness It’s not all frogs legs and strings of onions in the South of France Gasteiz-Vitoria, 2012Kilgarriff: Without Data, Nothing9

10 sack (n) 2 x sack race One metaphor Santa Claus Ridley pulled another doubtful gift from his sack Ridley: British politician Gasteiz-Vitoria, 2012Kilgarriff: Without Data, Nothing10

11 sack (v) And Labour MP, Mr. Bruce George, has called for the firm to be sacked from duty at Prince Andrew’s £5 million home at Sunningwell Park near Windsor Non-standard because end-employment needs PERSON as direct object. Candidate for GL treatment Gasteiz-Vitoria, 2012Kilgarriff: Without Data, Nothing11

12 handbag She moved from handbags through gifts to the flower shop [handbag department in department store] Candidate for GL treatment Gasteiz-Vitoria, 2012Kilgarriff: Without Data, Nothing12

13 Results 2276 corpus instances 390 re-examined 41 non-standard uses 2 potentially accounted for by GL Conclusion GL will never account for a large share of non- standard word use Gasteiz-Vitoria, 2012Kilgarriff: Without Data, Nothing13

14 Gasteiz-Vitoria, 2012Kilgarriff: Without Data, Nothing14 What is language?

15 Gasteiz-Vitoria, 2012Kilgarriff: Without Data, Nothing15 What is language? In our heads

16 Gasteiz-Vitoria, 2012Kilgarriff: Without Data, Nothing16 What is language? In our heads In texts and sound signals

17 Gasteiz-Vitoria, 2012Kilgarriff: Without Data, Nothing17 What is language? In our heads In texts and sound signals Both

18 Gasteiz-Vitoria, 2012Kilgarriff: Without Data, Nothing18 Methodology Study language in our heads Introspection Semantic analysis Experiments with human subjects “rationalist” (Leibniz, Chomsky) Problems: coverage, arbitrariness

19 Gasteiz-Vitoria, 2012Kilgarriff: Without Data, Nothing19 Methodology Study text “empiricist” (Locke, Hume) Physics: forces, matter Chemistry: chemicals, bonds Language: text, speech signals

20 Gasteiz-Vitoria, 2012Kilgarriff: Without Data, Nothing20 Empiricist linguistics A new way to find out about language 20 years of rapid ascent Computers Corpora bigger and bigger data sets available Language technology tools lemmatizers, POS-taggers, parsers, machine learning for pattern finding

21 Gasteiz-Vitoria, 2012Kilgarriff: Without Data, Nothing21 Preliminaries over What is a word sense

22 Gasteiz-Vitoria, 2012Kilgarriff: Without Data, Nothing22 Preliminaries over What is a word sense (my PhD in 5 slides)

23 Gasteiz-Vitoria, 2012Kilgarriff: Without Data, Nothing23 Preliminaries over What is a word sense (my PhD in 5 slides) Where do you find them?

24 Gasteiz-Vitoria, 2012Kilgarriff: Without Data, Nothing24 Preliminaries over What is a word sense (my PhD in 5 slides) Where do you find them? Dictionaries!

25 Gasteiz-Vitoria, 2012Kilgarriff: Without Data, Nothing25 The lexicographers They create them Methods Introspection Other dictionaries Corpus Atkins, Hanks

26 Gasteiz-Vitoria, 2012Kilgarriff: Without Data, Nothing26 What is a word sense (1) SFIP Sufficiently frequent insufficiently predictable (a glass of) whisky x (a glass of) tequila

27 Gasteiz-Vitoria, 2012Kilgarriff: Without Data, Nothing27 What is a word sense (2) homonymy analogy polysemy rules phraseology

28 Gasteiz-Vitoria, 2012Kilgarriff: Without Data, Nothing28 What is a word sense (3) A cluster Of instances of use Operationalised as: corpus lines Clustered by lexicographers

29 Gasteiz-Vitoria, 2012Kilgarriff: Without Data, Nothing29 What is a word sense (3)

30 Gasteiz-Vitoria, 2012Kilgarriff: Without Data, Nothing30 What is a word sense (3)

31 Gasteiz-Vitoria, 2012Kilgarriff: Without Data, Nothing31 What is a word sense (3)

32 Gasteiz-Vitoria, 2012Kilgarriff: Without Data, Nothing32 What is a word sense (3)

33 Gasteiz-Vitoria, 2012Kilgarriff: Without Data, Nothing33 What is a word sense (3) A cluster Of instances of use Operationalised as: corpus lines Clustered by lexicographers Makes sense of Overlapping senses Different dictionaries, different senses Lumping and splitting

34 Theory Hanks Norms and exploitations Task of lexicographer Record the norms Speakers may always exploit norms to say something new Gasteiz-Vitoria, 2012Kilgarriff: Without Data, Nothing34

35 Boring question Homonymy or polysemy We all know it’s a kline Interesting question Norm or exploitation Gasteiz-Vitoria, 2012Kilgarriff: Without Data, Nothing35

36 metaphor see meaning understand Norm I travelled the path From life towards art Desire the horse Depression the cart Leonard Cohen Exploitation Gasteiz-Vitoria, 2012Kilgarriff: Without Data, Nothing36

37 How do they do it? honeymoon Gasteiz-Vitoria, 2012Kilgarriff: Without Data, Nothing37

38 Gasteiz-Vitoria, 2012Kilgarriff: Without Data, Nothing38

39 Gasteiz-Vitoria, 2012Kilgarriff: Without Data, Nothing39

40 Gasteiz-Vitoria, 2012Kilgarriff: Without Data, Nothing40

41 Gasteiz-Vitoria, 2012Kilgarriff: Without Data, Nothing41

42 Gasteiz-Vitoria, 2012Kilgarriff: Without Data, Nothing42

43 The Sketch Engine Corpus query tool Used for making dictionaries at OUP, CUP, Collins, Macmillan, Le Robert, Cornelsen, Elhuyar Foundation Also Universities Linguistic research Teaching Linguistics, also languages Gasteiz-Vitoria, 2012Kilgarriff: Without Data, Nothing43

44 60 languages covered Gasteiz-Vitoria, 2012Kilgarriff: Without Data, Nothing44

45 Gasteiz-Vitoria, 2012Kilgarriff: Without Data, Nothing45

46 Individual licences (£4.99/month) University site licences Free trial – self register Gasteiz-Vitoria, 2012Kilgarriff: Without Data, Nothing46

47 Build instant corpora form the web WebBootCaT Install your corpora Compare corpora http://www.sketchengine.co.uk Gasteiz-Vitoria, 2012Kilgarriff: Without Data, Nothing47

48 Gasteiz-Vitoria, 2012Kilgarriff: Without Data, Nothing48 Thank you homonymy analogy polysemy rules phraseology


Download ppt "Without data, nothing Adam Kilgarriff Lexical Computing Ltd University of Leeds."

Similar presentations


Ads by Google