Presentation on theme: "An investigation into Corpus-based learning about language inin the primary-school: CLLIP Corpus evidence of the features of childrens literature."— Presentation transcript:
An investigation into Corpus-based learning about language inin the primary-school: CLLIP Corpus evidence of the features of childrens literature
The CLLIP Project: Background CLLIP: Corpus-based Learning about Language In the Primary-school ESRC-funded project Exploring potential for using corpus evidence with primary school children (9-11 year olds) for learning about language (L1)
Linguistic analysis of CLLIP corpus CLLIP corpus is a collection of the texts in the British National Corpus that were written for a child audience The corpus contains imaginative fiction, factual prose and other texts Linguistic analysis was conducted on the imaginative fiction texts only
Project research question: 1 1. Does linguistic analysis of the corpus data confirm, extend or challenge the descriptions of English lexis and syntax which are identified as teaching targets in the National Curriculum and the National Literacy Strategy? 1a. Does any such analysis suggest a need for further research on the basis of a larger dedicated corpus of writing for children?
Corpora: CLLIP and comparison CLLIP corpus: imaginative fiction written for child audience, from the BNC 31 texts Comparison corpus (hereafter Comp): imaginative fiction written for an adult audience, from the BNC 315 texts Newspaper texts from the BNC 114 texts
Purpose of the linguistic analysis To determine the characteristic features of the language of imaginative fiction written for children To compare and contrast the language of these texts with the language of imaginative fiction written for adults, and also the language of newspapers
Questions What is distinctive about the discourse of the CLLIP corpus? What similarities and differences are there in the overall word frequencies and of POSgrams in the three corpora? Is there a difference in the uses of certain lexical items between the child and adult fiction corpora? A POSgram is a sequence of parts of speech, such as an article followed by an adjective followed by another adjective then a noun (eg a bright red car; the last chocolate biscuit). In this study, we look at 6-grams (sequences of six parts of speech)
Frequency of Parts of Speech For each part of speech you can see 3 columns. The first two columns (left and middle) are for the CLLIP and Comp corpora respectively. What is remarkable is the similarity between the two for most parts of speech. There are many more nouns proportionally in the Newspaper corpus, while there are more lexical verbs in the fiction corpora.
Frequency data CLLIP – 22.0%; Comp – 22.4%; News – 23.5% The top ten most frequent tokens for the CLLIP and Comp corpora are remarkably similar, particularly the top 4. Note the greater frequency of of in the News corpus, which is related to the higher number of nouns – in expressions such as the resignation of. The figures at the top show the percentage of the overall frequency that the top ten account for in each corpus
Frequency - adjectives CLLIP – 14.6%; Comp – 11.3%; News – 11.9% Once again, a remarkable similarity exists between the top 11 adjectives for the fiction corpora, while the Newspaper corpus contains many adjectives that refer to social attributes. The figures at the top indicate that the top 11 adjectives in the CLLIP corpus do a larger amount of work than those for the other two corpora
POSgram information This table shows the most frequent 6-POS grams for each corpus. For each corpus, the sequence preposition + article + noun + of + article + noun is most common, followed by preposition + article + noun + preposition [not of] + article + noun in the two fiction corpora
Prep+art+[ ]+of+art+noun 51% This slide shows the nouns that most frequently fill the third slot in the preposition + article + noun + of + article + noun sequence. This shows that the sequence most commonly indicates spatial or temporal relations in the fiction corpora while in the newspaper corpus it can also express causal relations. The top six nouns in the CLLIP corpus account for 51% of the 6 POS grams of this sequence.
Body parts: NECK Do nouns in the CLLIP corpus more typically refer to physical entities in the world than the equivalent noun in the Comp corpus? The two righthand columns show the percentage of uses of the word neck that are used to refer to part of a piece of clothing, or used in an idiomatic sense. The adult corpus contains only a marginally higher percentage of idiomatic uses.
Neck CLLIP: stick your neck out Little physical contact Intimacy with animals Neck as site of pain Comp: breathing down your neck Lots of physical contact Intimacy between humans Neck as site of desire, tenderness, place for ornamentation
Finger CLLIP Figurative – 13% Jab, prod, lay, run, put Accusing, admonishing Used for drawing, for indicating the need for silence and for pulling triggers Comp Figurative – 19% Put, raise, point, run, jab, wag Furtive, tentative, negligent Used for communicating, for feeling [contours & textures], for wearing rings
in time – CLLIP We looked at uses of in time in the CLLIP corpus. The dominant meaning is immediate, and characters are concerned to accomplish something before the expiry of an implied deadline, externally imposed. A childly perspective seems often to imply staying on the right side of trouble or sanction.
in time – Comp In time in the Comp corpus is used in several senses. i: in the fullness of time, time on a large scale, which the speaker can perceive from a distance ii: within an appropriate period of time iii: others, as in the last line, where in and time have more separate meanings than is usual in the phrase