Presentation is loading. Please wait.

Presentation is loading. Please wait.

FYC corpus: an introduction and overview, with preliminary findings Exploring the question or ‘orality’ empirically with a controlled data set CCCC 2014.

Similar presentations


Presentation on theme: "FYC corpus: an introduction and overview, with preliminary findings Exploring the question or ‘orality’ empirically with a controlled data set CCCC 2014."— Presentation transcript:

1 FYC corpus: an introduction and overview, with preliminary findings Exploring the question or ‘orality’ empirically with a controlled data set CCCC 2014 Indianapolis 20 March 2014 Daniel Kies Department of English College of DuPage

2 What is the evidence for orality in first ‐ year composition ? Exploring the question empirically with a controlled data set Daniel Kies Dept. of English College of DuPage Olga Lambert Dept. of Languages & Literature Benedictine University Sandra Gollin Kies Dept. of Languages & Literature Benedictine University CCCC 2014 Indianapolis 20 March 2014

3 The Genesis of the Project 3 CCCC 2014 Indianapolis 20 March 2014 We noted several items related to the question of orality:  Growing concern for a “shift to orality” and consequently a degeneration, degradation, and overall diminishment of the English language For example consider the next slide.

4 Comments in the popular media Students use texting language in papers at university. Help! I am not an English teacher, but I just started teaching at an American college and I have found that several students sometimes substitute a single number or letter for a word. One student used "4" instead of "for" throughout his entire paper. Another wrote "U" instead of "you." It was the kind of writing that you would expect to see in a text message. These students are still required to take English no matter what subjects they choose to major in, so it is hard for me to understand why they make mistakes like these. I have to assume that it is intentional laziness rather than a real error, but this makes it harder to correct.  Source: Students-use-texting-language-in-papers-at-university-Help 4 CCCC 2014 Indianapolis 20 March 2014

5 The Genesis of the Project 5 We noted a second trend in this line of thought:  The blame is usually attributed to the wide-spread adoption of communications technology by the millennial generation For example, see the next slide CCCC 2014 Indianapolis 20 March 2014

6 Comments in the popular media (2)  Teenagers who frequently use 'techspeak' when they text performed poorly on a grammar test, said Drew Cingel, a former undergraduate student in communications at Penn State.  When tweens write in techspeak, they often use shortcuts, such as homophones, acronyms and omissions of non-essential letters such as 'wud' for 'would.’  Source: fostering-bad-grammar-and-spelling-researchers-claim.html 6  Teenagers who frequently use 'techspeak' when they text performed poorly on a grammar test, said Drew Cingel, a former undergraduate student in communications at Penn State.  When tweens write in techspeak, they often use shortcuts, such as homophones, acronyms and omissions of non-essential letters such as 'wud' for 'would.’  Source: fostering-bad-grammar-and-spelling-researchers-claim.html 6 CCCC 2014 Indianapolis 20 March 2014

7 The Genesis of the Project 7 Finally, we observed:  Parallels between Birkerts’ and Ong’s explorations of linguistic change triggered by technological innovation seen in the shift from pre-literate (what Ong called “primary orality”) to literate cultures reflected in the “secondary orality” (Ong 1982) some researchers believe to exist in the contemporary technological, cultural, and linguistic environment And similar remarks are found in the professional literature CCCC 2014 Indianapolis 20 March 2014

8 The Genesis of the Project 8 For example, Clive Thompson summarizing the L&L Standford Writing Project study: “Technology isn’t killing our ability to write. It’s reviving it— and pushing our literacy in bold new directions.” Bauerlein in the CHE concludes of the same study: “I think we can say that instead of dispelling fears about the impact of technology on student writing, the Lunsford study raises them to a new level.” Mark Bauerlein, 2008, “The Lunsfords on Student Writing” Chronicle of Higher Education. writing/6148 CCCC 2014 Indianapolis 20 March 2014

9 The Genesis of the Project Fears for the future of writing (1)  Experts say that children write more these days than they did 20 years ago, because of texting and social media. Most of that writing, however, is in text-speak, and that form of language becomes a bad habit. Students are now so used to writing in text-speak that they can’t easily remember (or apply) proper language rules.  Communication is becoming more global in scope and more electronic in form. By the time these children finish school and enter the workforce, this decline in the spoken (sic) word will become greater. Written communication, in a formal report, an , or even a text, isn’t just happening on the colloquial level anymore, and children need to be educated on how to use technology in formal, professional contexts.  Source: language-evolution-or-just-laziness.html 9 CCCC 2014 Indianapolis 20 March 2014

10 10  Rebecca Gemkow, a Lyons Township High School English teacher, said she believes it is crucial for teenagers to recognize the difference between social and academic writing in order to be successful in the real world.  “I feel that all of the online opportunities and the time spent with such opportunities puts students at a deficit when it comes to producing sophisticated writing,” she said. “In result, there is a much greater responsibility put on teachers to help rectify the situation so that students will be prepared for the rest of high school, as well as post-high school writing.”  Source: texting-lingo-popping-up-in-school-writing.html CCCC 2014 Indianapolis 20 March 2014 The Genesis of the Project Fears for the future of writing (2)

11 11 Background of our research corpus (1):  First-year composition (FYC) corpus, over 7 million words drawn from the academic writing of the general population of students in first-year writing classes at a community college in America’s Midwest.  The corpus spans the period , and thus allows for a comparison of student writing over the time period beginning with the adoption of the world wide web and search engines by the general population, and the present, when electronic texts are pervasive. The Genesis of the Project CCCC 2014 Indianapolis 20 March 2014

12 12 Background of our research corpus (2):  The FYC corpus is from the same composition courses taught by the same instructor over the period. This stability produces highly comparable data in terms of writing topics, and reduces variability that might have been due to different instructors’ pedagogical styles or abilities.  The writing prompts were intended to elicit essays in different academic genres such as summary, review of an article, argumentative/persuasive essay, descriptive/comparative response, analysis of persuasive writing, and definition. Major topics were the future of books, The Gutenberg Elegies, literacy, and in the second semester students typically wrote academic research essays on topics related to the Orwell’s The Genesis of the Project CCCC 2014 Indianapolis 20 March 2014

13 The Genesis of the Project 13 WRAB III Paris, France 19 February 2014 Background of our student writers:  All students have similar backgrounds  cultural,  linguistic, and  socio-economic  Most students come from the western suburbs of Chicago that surround the college  All students have similar educational achievements

14 14 Research questions  What are the general features of first year composition students’ writing?  What are the principal markers of orality?  Is there any evidence of a shift to orality in first year students’ writing over time? CCCC 2014 Indianapolis 20 March 2014

15 15 1. Review of previous research on differences between oral and written text. (e.g. Ong, Halliday & Matthiesson, 2004, O’Donnell, 1974, Chafe, Tannen) 2. Selection of comparable written texts (Orwell’s 1984 and Birkerts’ Gutenberg Elegies essays) 3. Conversion of word files to machine/software readable unicode text files 4. Parsing of 1984 texts using UAMCorpusTool (O’Donnell). 5. Analysis of general linguistic features using UAM and generation of descriptive stats. 6. General comparison with Biber’s (1988) Mean frequencies for academic prose and face to face conversation. (Not all categories are easily comparable). 7. Finer analysis of wordlists using Wordsmith Tools 6 (Scott) 8. Concordancing of specific features using WSTools 6. Future research: More fine-grained analyses. Factor analysis (Biber, 1988, 2006). Methodology (1) CCCC 2014 Indianapolis 20 March 2014

16 Methodology (2) Tools:  WordSmith Tools (Mike Scott)  UAMCorpusTool (Mick O’Donnell)  AntConc (Laurence Anthony) Materials:  The pronoun study corpus: 100,000 words on Birkerts’ Gutenberg Elegies.  The verb study corpus: student research essays on George Orwell’s  Sub-corpus 1: (449,706 words)  Sub-corpus 2: (363,157 words) 16 CCCC 2014 Indianapolis 20 March 2014

17 Methodology (3) Techniques: Establishing sets of metrics from earlier research to provide a means to measure the orality of the students’ texts:  Biber et al. (2006) examined a range of university registers, both spoken and written (T2K-SWAL corpus).  Includes a wide range of spoken registers such as classroom instruction, office hours, and service encounters, and written academic registers such as textbooks and administrative texts, but no student writing.  The T2K-SWAL corpus provides a useful backdrop against which to compare student writing, but it does not examine the texts of novice writers. 17 CCCC 2014 Indianapolis 20 March 2014

18 Methodology (4) Techniques:  Compare student corpora against the academic registers in corpora.byu.edu (Mark Davies)  That corpus focuses largely on cross-disciplinary, academic journal articles. 18 CCCC 2014 Indianapolis 20 March 2014

19 19 Claims for writing (1) Writing has been claimed to be:  More structurally complex and elaborate  More explicit  More decontextualized/autonomous  Less personally involved/ more detached or abstract  Higher concentration of new information  More deliberately organized (Biber, 1988, p. 47). CCCC 2014 Indianapolis 20 March 2014

20 20 The theoretical notion of register (field, mode and tenor) from systemic functional linguistics postulates a number of features that distinguish orality i.e. “very spoken” or conversational English from “very written” genres such as academic texts. (Halliday & Matthiessen, 2004) Claims for writing (2) CCCC 2014 Indianapolis 20 March 2014

21 21 Some markers of orality are: in terms of field,  a tendency to focus on subjective experience; in mode,  reduction in social distance between interlocuters; in tenor,  a tendency to focus on subjective experience;  lower lexical density,  higher grammatical intricacy, and  the predominance of generalized “hypernomic” lexical items over more abstract or obscure meanings (e.g. went rather than walk or stagger). (Halliday & Matthiessen, 2004) Claims for writing (3) CCCC 2014 Indianapolis 20 March 2014

22 22  Corpus-based research by Biber, Johannsen, Leech, Conrad, & Finegan (1999) showed significant differences between academic and spoken text.  For example, 45% of the lexical verbs in spoken texts were represented by just 12 key words (words like say, make, think, and get).  First and second person pronouns were much more common in spoken than academic texts. Claims for writing (4) CCCC 2014 Indianapolis 20 March 2014

23 23 More recent research since Biber 1988  Spoken/written dichotomy is inadequate.  Biber et al. (2006) proposed seven dimensions that cut across academic discourse in the university context:  “a fundamental oral/literate opposition” … holds between spoken and written modes “regardless of purpose, interactiveness, or other pre-planning considerations.” (Biber, 2006, p. 186). CCCC 2014 Indianapolis 20 March 2014

24 24  Some key findings of Biber (2006):  Present tense is the most common tense in academic texts, both spoken and written. Humanities have the greatest proportion of past tense at 40%. However, these tend to be in connection with historical events rather than personal narratives.  95% of written and 90% of spoken academic registers use simple aspect.  Active voice is much more common than passive (80% active in written academic registers and 90% in spoken registers.) Characteristics of spoken vs written text in academic contexts CCCC 2014 Indianapolis 20 March 2014

25 25 Spoken vs written registers  Biber et al. (2002) found “strong polarization between spoken and written registers.” Demarked as “dimensions.” Written (regardless of purpose) is  informationally dense, (Dimension 1),  non-narrative focus (Dimension 2),  elaborated reference (Dimension 3),  little overt persuasion (Dimension 4), and  impersonal (Dimension 5). CCCC 2014 Indianapolis 20 March 2014

26 26 University registers (Biber, Conrad, Reppen, Byrd, and Helt, 2002) Written (e.g. textbooks, syllabi, administrative info.) Spoken (e.g. lectures, labs. study groups, office hrs) Information-dense (D1)Involvement and interaction Non-narrative focus (D2) Elaborated reference (D3)Situated reference Little overt persuasion (D4)More overt persuasion Impersonal style (D5)Less impersonal in style CCCC 2014 Indianapolis 20 March 2014 To study ‘orality,’ I concentrated on the syntactic patterns that mark Dimension 1 in the table above.

27 27 Oral and literate discourse compared on Dimension 1: Positive features for orality: “interactiveness and personal involvement (1st and 2nd person pronouns, WH questions), personal stance (e.g., mental verbs, that- clauses with likelihood verbs and factual verbs, factual adverbials, hedges), and structural reduction and formulaic language (e.g., contractions, that- omission, common vocabulary, lexical bundles)” (p. 186.) These features contrast with literate discourse: “informational density and complex noun phrase structures (frequent nouns and nominalizations, prepositional phrases, adjectives, and relative causes) as well as passive constructions” (p. 186.) CCCC 2014 Indianapolis 20 March 2014

28 Dimension 1: Oral vs Literate discourse (Biber et al. 2004) POSITIVE LOADING contractions, pronouns, verbs, adverbials Contractions Pronouns: demonstrative Pronouns: it Pronouns:1st person Verbs: present tense Adverbials: time Adverbs: common Pronouns: indefinite That-omission NEGATIVE LOADING nouns, adjectives, passives Nouns: nominalizations Word length Prepositional phrases Adjectives: attributive Passives: agentless Passives: postnominal Type/token ratio Common adjectives: relational Relative clauses CCCC 2014 Indianapolis 20 March 2014

29 29 Results: Common Assertions 1 (Smileys & Emoji)  No emoticons appeared in the corpus, except in one paper: a paper about internet related language changes  For example, see the opening of the student’s paper: ____________________________________________ Textspeak Has the Sustenance Teenagers Want Textspeak; Netspeak; Chatspeak; these names are given to the “language” of text messaging and instant messaging, but these terms all have the same origin: Newspeak. … CCCC 2014 Indianapolis 20 March 2014

30 30 Results: Common Assertions 2 (txtng abbr) No abbreviations related to text (SMS) messages appeared in the corpus, e.g.: CCCC 2014 Indianapolis 20 March 2014 AAMOF ADN AFAIA AFAIC AFAIK BTW CU CUL DEB gf GMTA HTH IC IIRC ITSFWI IMO IMCO IMHO LOL NBD NOYL NTYMI OIC OOTQ PITA PTB POV RO(T)FL ROFLMAO RTFM SEP SNAFU STFU TIA TOBG TPTB TTFN TTUL TYVM WB WRT WYSIWYG WTG YGLT YMMV

31 Marker of Orality 1: Contractions The number of contractions decreased by a factor of 10 between 1998 and 2013:  Total contractions subcorpus: 1183  Total contractions subcorpus: 193 CCCC 2014 Indianapolis 20 March 2014

32 Marker of Orality 2: Pronoun it The number of instances decreased between 1998 and 2013:  Total in subcorpus: 4240  Total in subcorpus: 3262 CCCC 2014 Indianapolis 20 March 2014

33 Marker of Orality 3: Demonstrative Pronouns The number of instances decreased between 1998 and 2013:  Total in sub-corpus:  Total in sub-corpus: 9451 CCCC 2014 Indianapolis 20 March 2014

34 Marker of Orality 4: Pro-verb do The number of instances decreased between 1998 and 2013:  Total in sub-corpus: 2377  Total in sub-corpus: 1363 CCCC 2014 Indianapolis 20 March 2014

35 Marker of Orality 5: First person pronouns The number of instances decreased between 1998 and 2013:  Total in sub-corpus: 6345  Total in sub-corpus: 3889 CCCC 2014 Indianapolis 20 March 2014

36 Marker of “very written” Text 1: Nominalization The number of instances increased between 1998 and 2013:  Total in sub-corpus: 3796  Total in sub-corpus: 5851 CCCC 2014 Indianapolis 20 March 2014

37 Marker of “very written” Text 2: Word length The number of instances 1.3 million words between 1998 and 2013:  Total in sub-corpus: characters/word total word count  Total in sub-corpus: characters/word total word count CCCC 2014 Indianapolis 20 March 2014

38 Marker of “very written” Text 3: Prepositional phrases The number is insignificant between 1998 and 2013:  Total in sub-corpus: (0.172 prep/total word count) total word count  Total in sub-corpus: (0.141 prep/total word count) total word count CCCC 2014 Indianapolis 20 March 2014

39 Marker of “very written” Text 4: Passives The number of instances increased between 1998 and 2013:  Total in sub-corpus: 7740  Total in sub-corpus: CCCC 2014 Indianapolis 20 March 2014

40 Marker of “very written” Text 5: Attributive Adjectives The number of instances increased between 1998 and 2013:  Total in sub-corpus: 1537  Total in sub-corpus : 3847 CCCC 2014 Indianapolis 20 March 2014

41 41 Feature research papers 449,706 words research papers 363,157 words % of total (Frequency/1,000 words) Noun32.72% (327)32.60% (326) Verb16.23% (162)16.53% (165) Adjective 0.37% (4) Pronoun 4.75% (47)4.52% (45) Adverb 3.99% (40)4.06% (41) Preposition10.71% (107)11.09% (111) Conjunction3.19% (32) 3.29% (33) Comparison of lexemes across word classes CCCC 2014 Indianapolis 20 March 2014

42 42 Comparison of lexemes with “Orwell” corpora FeatureBiber 1998 Academic prose Biber 1998 Face-to-face conversation FYC ( ) Frequency/1,000 words Noun (205) Adjective attrib (4) (Comp. and super) Preposition (111) Conjunction (33) Verb (past) (25) Verb (pres) (49.7) Pronoun (pers.) (45) Adverb (41) CCCC 2014 Indianapolis 20 March 2014

43 43  Grammatical verbs: be, have, do, modals.  Lexical verbs:  Difficulty of separating lexical verbs from nouns that look the same. eg. command and command.  Find all verbs  Past tense  Present tense  Passive  Passive without agent  Present participle  Past participle  Most common verbs in both speech and writing  Most common verbs in COCA spoken corpus  Most common verbs in COCA ac corpus (look for humanities subset).  Latin-based verbs.  Phrasal verbs substituting for Latin-based or single word verbs. Comparison of linguistic features CCCC 2014 Indianapolis 20 March 2014

44 Results: Interjections  corpus.byu.edu (Academic prose [journal articles])  SECTION # TOKENS 6031 SIZE 102,046,528 PER MILLION CCCC 2014 Indianapolis 20 March 2014

45 Results: Interjections Orwell research papers  Length:  - Number of segments:109  - Words in segments:128  Text Complexity:  - Av. Word Length:4.04  - Av. Segment Length:1.17  Lexical Density:  - Lexemes per segment:0.72  - Lexemes % of text:61.72% CCCC 2014 Indianapolis 20 March 2014

46 Results: Interjections Orwell research papers  Length:  - Number of segments:60  - Words in segments:56  Text Complexity:  - Av. Word Length:3.93  - Av. Segment Length:0.93  Lexical Density:  - Lexemes per segment:0.67  - Lexemes % of text:71.43% CCCC 2014 Indianapolis 20 March 2014

47 Conclusions (1)  Uses of texting and emoji as the return of the Rebus principle (representing language by means of a symbol).  The rebus marks intellectual leap that every literate culture and individual will make when moving from pre-literate to literate states (Ong’s “primary literacy”)) CCCC 2014 Indianapolis 20 March 2014  Above: a message from a child, age 4, incorporating rebuses as she is just beginning to learn that symbols can represent words, letters of the alphabet, and the sounds of speech.

48 Conclusions (2)  The return of the Rebus principle as a commonly used shortcut in communication systems, which we use everyday whenever we use technology mediated communication (smart phones and browsers)  Here, we see modern examples of Ong’s “secondary literacy.” CCCC 2014 Indianapolis 20 March 2014

49 Selected References Biber, D. (1988). Variation across speech and writing. Cambridge NY: Cambridge University Pres. Biber, D., Johannsen, S., Leech, G., Conrad, S., & Finegan, E. (1999). Longman Grammar of Spoken and Written English. Harlow, England: Pearson Education. Biber, D. (2006) University Language: A corpus-based study of spoken and written registers. John Benjamins. Fellbaum, C., & Miller, G. A. (1990). Folk psychology or semantic entailment? Comment on Rips and Conrad (1989). Psychological Review, X, 97(4), Freeman Y.S. & Freeman, D. (2009) Academic Language for English language learners and struggling readers. How to help students succeed across content areas. Portsmouth NH: Heinemann. Halliday, M. A. K., & Matthiessen, C. (2004). An introduction to functional grammar (3 ed.). London: Arnold. Ong, W. J. (1982). Orality and literacy: The technologizing of the word. London: Routledge. Partridge, M. (2011). A comparison of lexical specificity in the communication verbs of L1 English and TE student writing. Southern African Linguistics and Applied Language Studies, 29(2), Scott, M. (2012). Wordsmith Tools version 6. Liverpool: Lexical Analysis Software. 49 CCCC 2014 Indianapolis 20 March 2014

50 Contact Information Daniel Kies Department of English College of DuPage 425 Fawell Boulevard Glen Ellyn, Illinois 60137, USA 50 CCCC 2014 Indianapolis 20 March 2014


Download ppt "FYC corpus: an introduction and overview, with preliminary findings Exploring the question or ‘orality’ empirically with a controlled data set CCCC 2014."

Similar presentations


Ads by Google