Presentation on theme: "Stylistics – case study see last slide for websites used to get numerical information from texts."— Presentation transcript:
Stylistics – case study see last slide for websites used to get numerical information from texts
2/20 Stylistic analysis Literary vs linguistic stylistics –Lit crit focuses on effect on the reader, intended or otherwise, so largely intuitive and subjective –Linguistic stylistics looking for characterisations of style (including literary style) in terms of linguistic phenomena at the various levels of linguistic description
3/20 Stylistic analysis Inventory of linguistic devices and their effect –usually in a contrastive way: –in contrast with other texts of a similar genre –in contrast with other genres Linguistic devices described in terms of the usual linguistic levels of description: phonology, morphology, lexis, grammar, etc.
4/20 Example Newspaper reporting of a similar story Sun vs Independent –readership by social class –Sun: widely read (c. 5m), mostly by lower class and lower middle class –Independent: circulation 0.25m, educated middle class How would you expect this different readership to be reflected in the styles?
5/20 Sun vs Independent Targeted readership largely dictates subject matter and the angle of coverage From a purely linguistic point of view we might expect differences in … –vocabulary –complexity of sentence structure Other differences might include literary But (compared to other texts) features of the genre (newspaper story) may be shared
8/20 SunIndependent Family bid for Lindsay's killerHawker family make new plea THE family of an English teacher murdered in Japan today appealed for her killer to be found – a year after her death. The family of a young British teacher murdered in Japan were yesterday flying to Tokyo to launch a fresh appeal on the first anniversary of her death. Lindsay Ann Hawker, 22, was found dead in a sand-filled bath on a balcony of a flat belonging to one of her students. Lindsay Ann Hawker, 22, was found dead in a bath filled with sand on the balcony of a flat in Ichikawa, east of Tokyo, on March 27 last year. Parents Bill and Julia and their daughters Lisa, 26, and Louise, 23 have flown to capital Tokyo to “get justice for Lindsay”. Miss Hawker's parents, Bill and Julia, and her two sisters, Lisa and Louise, will leave London's Heathrow Airport this morning to travel to the Japanese capital to renew their appeal to find her killer. A poster campaign aims to help catch suspect Tatsuya Ichihashi, 29, who fled from cops. Detectives are still hunting 29-year-old suspect Tatsuya Ichihashi, who lived at the flat and fled when approached by officers for questioning. More than 20,000 people joined a tribute page on website Facebook A webpage set up on social networking site Facebook, called "Don't forget Lindsay Hawker, Please remember this Face", now has more than 20,000 members. Some differences
9/20 Some differences Differences of detail –[Some are due to slightly different publication time, before or after press conf] –What elements are of interest? Differences of vocabulary –cops vs officers, dad vs father, year after vs anniversary Differences of explication –capital of Japan, Facebook Differences of syntax –surprisingly few –but possible stylistic trademark of redtop is internal structure of noun phrases …
10/20 Appositive noun phrases a sand-filled bath Parents Bill and Julia capital Tokyo suspect Tatsuya Ichihashi, 29 website Facebook a bath filled with sand Miss Hawker's parents, Bill and Julia Tokyo; the Japanese capital 29-year-old suspect Tatsuya Ichihashi [the] social networking site Facebook
11/20 Numerical comparison Thanks to computers it is now (relatively) easy to count things What should we count? –easy to count number of paragraphs, sentences, words, letters –may give a measure of complexity average sentence length (words/sentence) average word length percentage of long words –type:token ratio (vocabulary richness) number of types = number of different words number of tokens = total number of words Hapax legomena = numbner of unique words
12/20 Normalization and significance Always important to compare like with like –It is usual when counting things to “normalize” over the length of the text –If one text is longer than the other, of course you would expect higher frequencies of everything Issue of statistical significance –Small differences may not really tell you anything –Various measures can confirm whether difference is statistically significant or due to random fluctuation
13/20 How to count How to recognize paragraph breaks? How to recognize sentence breaks? –Headlines don’t end in a fullstop –Not all sentences end in a fullstop –Not all full stops are sentence ending (abbreviations) How to count words –Hyphenated words, contractions e.g. don’t How to measure word-length/complexity –length only roughly corresponds to complexity –number of characters vs number of syllables –cf. through vs idea –counting syllables implies either a dictionary or an algorithm
14/20 Numerical comparison SunIndy sentences1310 words letters/numbers complex words19 (7%)36 (14%) syllables av’ge word length (characters) av’ge word length (syllables) av’ge sentence length (words) short sentences6 (42%)4 (40%) long sentences2 (14%)1 (10%) types type-token ratio Hapax legomena texts are roughly the same length Hard to know if any differences are statistically significant with such a small amount of data, but … Indy does have more complex words … and higher AWL and ASL … and higher ratio of short:long sentences … and richer vocabulary
15/20 Word length Comparison of distribution of words by length only tells us that the two texts are very similar correlation ρ = 0.977
16/20 Syntactic information SunIndy questions00 passives8 (57%)6 (60%) longest sentence33 words43 words shortest sentence 4 words 7 words use of verb to be 8 8 use of auxiliary 1 3 conjunctions 3 (8%) 4 (10%) pronouns 7 (19%) 4 (11%) prepositions13 (34%)14 (38%) nominalizations 0 1 (2%) Sentence beginnings: pronouns 6 1 article 1 4 conjunction 0 0 preposition 0 0 Again, hard to know if differences are significant This kind of measure more useful to distinguish different genres
17/20 Readability Big interest from teachers, publishers and researchers in quantifying the appropriate reading age for a text –i.e. what level of education do you need to understand this text? (reader-oriented view) –or: for what age of readership is this text appropriate (text-oriented view) Most measures based on combination of average word length (measured in characters or syllables), and average sentence length –some additionally take into account proportion of long/short words
18/20 Readbility indexes
19/20 Readability indexes Most give a (US) school grade: –Kincaid – best for technical material; short sentences, eg in dialogues, will lower the score: gives a grade level –ARI (Automated Readability Index) –Coleman-Liau – counts characters rather than syllables, so easier to implement –SMOG (simple measure of goobledygook) (McLaughlin 1969) – can be estimated by sampling e.g sentence segments; said to give best correlation with its criterion. See –FOG (Gunning 1952) – gives a school grade. Score >12 means “too hard to read”! A few give a raw score: –Flesch-Kincaid – widely used, simple calculation; the higher the score, the easier it is to read. Highest possible score is 121 (text made up of one-word one-syllable sentences). Score around 100 means OK for 11- yr old. Time magazine ~52, Harvard Law Review low 30s. –Lix (Björnsson) – originally developed for Swedish, raw score 55 very hard.
20/20 Readability SunIndy Kincaid911.1 ARI Coleman-Liau911.1 Flesch-Kincaid Gunning FOG Lix SMOG also suggests where improvements can be made! also used (give slightly different figures, probably depending on how they count things) Conversion: Add 1 to US grade to give British school year eg 11 th grade = year 12 Note: with Flesch-Kincaid, lower score means harder to read