Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Using Corpora in Language Research -also Introduction to the Sketch Engine (WS15) part 1 Adam Kilgarriff Lexical Computing Ltd Universities of Leeds.

Similar presentations


Presentation on theme: "1 Using Corpora in Language Research -also Introduction to the Sketch Engine (WS15) part 1 Adam Kilgarriff Lexical Computing Ltd Universities of Leeds."— Presentation transcript:

1 1 Using Corpora in Language Research -also Introduction to the Sketch Engine (WS15) part 1 Adam Kilgarriff Lexical Computing Ltd Universities of Leeds

2 May 2011 Adam Kilgarriff 2 What is language?

3 May 2011 Adam Kilgarriff 3 What is language? In our heads

4 May 2011 Adam Kilgarriff 4 What is language? In our heads In texts and sound signals

5 May 2011 Adam Kilgarriff 5 What is language? In our heads In texts and sound signals Both

6 May 2011 Adam Kilgarriff 6 Methodology Study language in our heads Competence Chomsky “rationalist” (Descartes, Leibniz)‏

7 May 2011 Adam Kilgarriff 7 Methodology Study language in our heads Competence Chomsky “rationalist” (Descartes, Leibniz)‏ Odd method for objective science Practical problems: coverage, arbitrariness

8 May 2011 Adam Kilgarriff 8 Methodology Study text “empiricist” (Locke, Hume)‏ Physics: forces, matter Chemistry: chemicals, bonds Language: text, speech signals

9 May 2011 Adam Kilgarriff 9 It goes against the grain What is important about a sentence? its meaning Corpus methodology: Throw away individual sentence meaning Find patterns

10 May 2011 Adam Kilgarriff 10 Computer power Corpora bigger and bigger data sets Language technology tools lemmatizers, POS-taggers, parsers Machine learning, pattern-finding 20 years of rapid ascent

11 May 2011 Adam Kilgarriff 11 All the linguisticses Theoretical Socio Psycho Developmental Law and Computational Contrastive Applied... linguistics

12 May 2011 Adam Kilgarriff 12 Developmental CHILDES, TalkBank How children learn language Parents record all interactions Since 1980s Prof. Brian MacWhinney, Carnegie-Mellon Many languages Largest chunk: English, 23m words

13 May 2011 Adam Kilgarriff 13

14 May 2011 Adam Kilgarriff 14

15 May 2011 Adam Kilgarriff 15

16 May 2011 Adam Kilgarriff 16

17 May 2011 Adam Kilgarriff 17

18 May 2011 Adam Kilgarriff 18

19 May 2011 Adam Kilgarriff 19

20 May 2011 Adam Kilgarriff 20 Language change Brown family Small but perfectly formed I m words 500 x 2000-word samples the same 15 text types Supports comparison American and British English 1931, 1961, 1991, 2006

21 May 2011 Adam Kilgarriff 21

22 May 2011 Adam Kilgarriff 22

23 May 2011 Adam Kilgarriff 23

24 May 2011 Adam Kilgarriff 24

25 May 2011 Adam Kilgarriff 25

26 May 2011 Adam Kilgarriff 26 Language and gender When you see a dentist... What is now normal? Recent study they now the norm themself now needed despite what spellcheck says BNC (most text from 1989) 0.2/million EnTenTen (mostly 2009) 0.4/million

27 May 2011 Adam Kilgarriff 27 Language and law Trade marks Hoover and similar trademark or generic Cases sabatier, botox, kettle chips Key evidence Do people tend to capitalize?

28 May 2011 Adam Kilgarriff 28 English nouns: % capitalized

29 May 2011 Adam Kilgarriff 29 Syntax and semantics

30 May 2011 Adam Kilgarriff 30

31 May 2011 Adam Kilgarriff 31

32 May 2011 Adam Kilgarriff 32 DANTE Detailed account of English lexis Corpus-driven From word sketches Lexicographers assign to senses High precision Available at http://webdante.comhttp://webdante.com Brochures

33 May 2011 Adam Kilgarriff 33 What data shall I use?

34 May 2011 Adam Kilgarriff 34 Think hard

35 May 2011 Adam Kilgarriff 35 Sometimes... Just-in-time corpus from the web Use case: Translator, French-to-English Translation task volcanoes In French I understand it OK, but I'm no vulcanologist, I don't know the English terminology BootCaT, Baroni and Bernardini

36 May 2011 Adam Kilgarriff 36

37 May 2011 Adam Kilgarriff 37

38 May 2011 Adam Kilgarriff 38

39 May 2011 Adam Kilgarriff 39

40 May 2011 Adam Kilgarriff 40

41 May 2011 Adam Kilgarriff 41

42 May 2011 Adam Kilgarriff 42

43 May 2011 Adam Kilgarriff 43

44 May 2011 Adam Kilgarriff 44 Corpora in Sketch Engine Access-to-all 42 languages All major world languages Mostly large, web-crawled Various other CHILDES, Brown,... “My corpora” BootCat and other

45 May 2011 Adam Kilgarriff 45 LCL sponsorship of LSA One year free accounts for participants http://www.sketchengine.co.uk “Register” “Site licence member” Your details and Organisation: select LSA2011 Site licence key: Boulder Password by email change it (under Settings)‏

46 May 2011 Adam Kilgarriff 46 Today Motivations, taster Sunday 9-12 practical


Download ppt "1 Using Corpora in Language Research -also Introduction to the Sketch Engine (WS15) part 1 Adam Kilgarriff Lexical Computing Ltd Universities of Leeds."

Similar presentations


Ads by Google