Presentation is loading. Please wait.

Presentation is loading. Please wait.

Guy Aston SSLMIT, University of Bologna The learner as corpus designer.

Similar presentations


Presentation on theme: "Guy Aston SSLMIT, University of Bologna The learner as corpus designer."— Presentation transcript:

1 Guy Aston SSLMIT, University of Bologna guy@sslmit.unibo.it The learner as corpus designer

2 … or the art of fruit salads

3 Learner uses of corpora Form-focussed (data-driven learning) Meaning-focussed (learning the culture) Skill-focussed (reading practice) Browsing environment (serendipity) Reference tool for other tasks (reading/writing aid)

4 Why make your own corpus? You can devise your own recipe You know what’s in it You learn how to do it Can be fun Can provide practice in language use

5 The raw ingredients

6 Devising your own recipe Only the text-type(s) you want Only the texts you want The quantity you want … small and specialised is beautiful

7 You know what’s in it Top-down knowledge of corpus Top-down knowledge of texts

8 You learn how to do it Can be a useful skill for many language workers –technical writers –translators –teachers Can make you a more critical corpus user

9 It can be fun Provides a challenge Gives sense of achievement/satisfaction Practice in language use Design/construction/evaluation of corpora can be communicative activities

10 Why use standard corpora? Less effort More reliable Better packaging You don’t want to learn to make your own

11 Less effort

12 More reliable if it’s well designed if it fits your needs

13 Better packaging Metatextual information Annotation Corpus-specific software

14 You don’t want to learn to make your own?

15 A compromise strategy: make your own subcorpus assemble using the pre-prepared ingredients of a larger corpus or in other words… go to a (fruit) salad bar

16 (Pick ’n’ mix with the BNC)

17 You have a choice of text-types individual texts selection by pre-determined criteria selection by hand … or both

18 You know what went in so top-down processing is easier Little effort in comparison with making your own

19 Good packaging Metatextual information Linguistic annotation Can use software designed for full corpus Indexed

20 You get to learn what are(n’t) useful subcorpora what are(n’t) useful design criteria how to do it

21 It can be fun challenge / achievement / satisfaction You can talk about its design / construction / evaluation

22 Talking about fruit salad BNC Sampler: KC2

23 Talking about fruit salad BNC Sampler: KC2

24 And now to details … the Sampler awaits!

25 You can create subcorpora of specific corpus texts texts containing solutions to a query encoded categories of texts your own categories of texts and compare them with other subcorpora the full corpus

26 Text analysis: selecting Choosing specific texts

27 Viewing the index

28 Party policies (will/shall be + VVN)

29 Or, to return to our fruit salad text …

30 Frequent adjectives (KC2) Most frequent adjectives (KC2)

31 Appreciating food (KC2)

32 A bad language subcorpus: texts containing solutions to a query

33 Choosing the bad language texts j

34 collocates of f.*k.* collocates of f_ words

35 oh fuck.* with oh as collocate

36 collocates of oh collocates of oh

37 ‘context-governed’ spoken texts - monologue: 17 texts - dialogue: 29 texts Making subcorpora using encoded categories

38 More frequent in M* –could –had –he –know –their –were –when –who –your More frequent in D* –'ll –'m –any –no –pounds –right –yeah –yes *ranked 20+ positions higher in first 100 words Monologue vs Dialogue

39 no occurrences of all right in monologue when you’re / you’ll / you’d / you’ve is more common in monologue than when we’re / we’ll / we’d / we’ve; vice-versa in dialogue Investigating the differences

40 youweyou’*we’* Mo42532014 685 535 Dia66354949 9391253 we/we’* much more frequent in dialogue Pronoun (+ contraction)

41 you and we youwe Monologue42532014 Dialogue66354949

42 Subcorpora using your own categories David Lee’s book genres academic non-fiction (13 texts) non-academic non-fiction (15 texts) prose fiction (13 texts)

43 Distinctive -ly adverbs of: academic non-fiction –accordingly, essentially, eventually, largely, namely, notably, respectively, surprisingly non-academic non-fiction –effectively, merely, normally, obviously, possibly, specially prose fiction –carefully, quietly, slightly, slowly, softly, surely, truly

44 largely (academic non-fict) largely (academic non-fiction)

45 it (academic non-fiction)

46 To conclude …

47 Working with subcorpora can allow study/comparison of forms/meanings in particular texts/text-types better-focussed reading practice more appropriate reference tools for particular tasks more focussed browsing

48 may not be representative (but nor is most language learning data) are good for forming hypotheses to be tested more widely will allow more interesting uses when extracted from a larger corpus Subcorpora

49 Making your own provides better preparation and motivation for corpus use more critical awareness lots to talk about

50 Enjoy!

51


Download ppt "Guy Aston SSLMIT, University of Bologna The learner as corpus designer."

Similar presentations


Ads by Google