Presentation is loading. Please wait.

Presentation is loading. Please wait.

Pedagogic uses of a corpus of student writing

Similar presentations


Presentation on theme: "Pedagogic uses of a corpus of student writing"— Presentation transcript:

1 Pedagogic uses of a corpus of student writing
and their implications for sampling and annotation Alois Heuboeck University of Reading, UK

2 The British Academic Written English (BAWE) corpus of student writing
Project in progress at the universities of Reading, Warwick and Oxford Brookes Funded by the Economic and Social Research Council (project nr. RES ) The BAWE corpus: corpus of student writing; compiled at R, W & OBU a specific language corpus: student writing – neither learner nor expert corpus “experts” of student writing: not the students, but tutors marking “good quality” possible progression through levels of studies under construction; no systematic data available yet Possible later uses: EAP teaching Discussed in the following: usage of corpora in language teaching and their implications on the form of the corpus what this means for a corpus of student writing; & how decisions on the corpus design try to reflect these requirements

3 Outline Corpora in LT: uses and purposes
Accessing corpus information: interfaces Building corpora: requirements and decisions - the BAWE corpus

4 Using corpora in language pedagogy
classroom pedagogic uses materials description “motivational” purposes pedagogic uses: classroom activities: get the student to work with corpus data data-driven learning, discovery learning student to fulfil tasks; interpret concordances; find rules himself; formulate new questions and queries; discover (browse) the corpus; discover language teaching materials: for the teacher or student; usage in classroom; self study; other learner interfaces (computer-assisted learning, distance learning etc.) language description: corpus-based grammar and dictionaries (e.g. Biber et al 1999, Longman grammar of spoken and written English, Sinclair 1990/1995, Collins-COBUILD grammar and dictionary), or studies purposes: why are corpora used? “motivational” reasons: relating to the student’s (psychological) condition & the process of learning: motivation, autonomy, responsibility “linguistic” reasons: relating to the object of learning (the “language” as what is being acquired) - corpus data: “good data”: authentic data; schemata, patterns (lexicogrammar); repetition and variation in text; new phenomena (not covered by “traditional” grammars) - advancement and outcome of learning: “good/relevant learning”: task-oriented, user-focused, specific needs - role of corpus: “reference tool” or “source of communicative tasks” in communicative language teaching (“participatory experience”) (Aston 1995, “Corpora in language pedagogy” in Cook/Seidlhofer eds.) “linguistic”

5 Interfaces (1): the concordance
typical query options word form lemma wildcards (e.g. “investigat*”) Tight relation between information & interface: information needs an interface to be accessed grammatical (e.g. POS) patterns

6 Information & interfaces (2)
Frequencies, ratios statistics e.g. word list, key words ad hoc statistics macrostructural properties and choices corpus items generic types, e.g. CARS model (Swales 1990)

7 Requirements: a “good corpus” for language pedagogy
Representative: target variety Relevant: information, annotation Requirements: on corpus design Validity – representation of a target variety: sampling Relevance of information accessed: annotation Practicality Usable: e.g. interface, size

8 Representativeness Conflicting principles
The corpus as a representative sample should reflect: Conflicting principles distribution and quantitative relations quantitative representativeness The notion of “representation” is crucial for validity representation: of a target variety (or a set of target varieties) 2 aspects of representation => 2 complementary principles for sampling range of features qualitative representativeness

9 Representativeness (2): the BAWE corpus
A trade-off: stratified sampling AH PS Frame 2: 4 disciplinary groups à 768 ass. English History Linguistics Classics Archaeology History of Art Physics Chemistry Meteorology Mathematics Computer Science Engineering Frame 1: the university: corpus Σ=3,072 ass. Frame 3: 4x6 disciplines à 128 ass. Frame 4: 4 levels per discipline à 32 ass. SS LS Frame 1: university: i.e. where assgts are written pb: imbalance across university Frame 2: disciplinary groupings – the level that reflects most closely the hard vs. soft distinction pb: => inhomogeneous => unpractical for sampling, since the DG/faculty is not the context of production but the... Frame 3: discipline – the “traditional” approach for reasons of sampling, “disciplines” are defined in institutional terms as academic Schools/Depts. disciplines chosen to represent the disciplinary grouping pb: what is a discipline? – subjects/modules “on the edge” (e.g. “History of science for physicists”, Economics in agriculture); but also inherently interdisciplinary disciplines, combining courses from several disciplines (e.g. LS and SS [economy, politics, business administration] in agriculture; Archaeology as AH discipline and “Archaeological science”) within student’s career: types appearing & disappearing; development in writing Frame 4: levels of studies Biological Sciences Sociology Law Business Health & Social Care Politics Anthropology Publishing Medicine Biochemistry Agriculture Food Sciences

10 Relevance Relevant information in corpus Significant query
Corpus annotation Features: lexicogrammatical, structural etc.

11 Relevance (2): features annotated in the BAWE corpus
“grammatical” textual: structure of “running text” typographical (lay-out) metatextual: numbering grammatical: POS, lemmatisation, “sentences” textual: “running text” vs. front/back matter non-textual elements “interrupting” running text: tables, figures, formulae chunks of text carrying a particular function which sets them apart from running text: lists typographical: paragraphs, highlighting, enumerated paragraphs (list-like) metatextual: numbering of paragraphs and “sentences” other interesting: e.g. semantic, discursive features other “interesting” features

12 Modularity: subcorpora
Corpus size “For the pedagogical analysis of many common grammatical phenomena a full-size research corpus is much too large.” (Osborne 2000) Modularity: subcorpora Specialised corpora “Practical” requirements (mentioned before): First: interface (related to corpus information) – not dealing with this here Second: corpus size Osborne, John, 2000 (p.169): “What can students learn from a corpus?: building bridges between data and explanation” in Burnard/McEnery eds., Rethinking language pedagogy from a corpus perspective. Papers from the third international conference on teaching and language corpora. Frankfurt/M.: Peter Lang (Łódź studies in language 2) cf. also: "The use of a few structurally similar texts also enables the identification of some higher-level regularities through concordancing." (Aston 1995: 266)

13 Conclusion: 3 views Qualitative vs. quantitative representation
corpus as representation of a (set of) target variety/varieties Corpus annotation and interfaces: query instances of lexicogrammatical (etc.) features and phenomena Corpus as a whole – representing the target variety Inside the corpus: “bag of occurrences (of features)” Corpus as a collection of (potential) subcorpora Corpus size: modularity balanced samples of target variety/varieties

14 Pedagogic uses of a corpus of student writing
and their implications for sampling and annotation Alois Heuboeck University of Reading, UK The British Academic Written English corpus


Download ppt "Pedagogic uses of a corpus of student writing"

Similar presentations


Ads by Google