Presentation is loading. Please wait.

Presentation is loading. Please wait.

Some thoughts on modelling phonetic effects in corpora.

Similar presentations


Presentation on theme: "Some thoughts on modelling phonetic effects in corpora."— Presentation transcript:

1 Some thoughts on modelling phonetic effects in corpora

2

3

4 Paul CarterUniversity of Sheffield University of Leeds (and once of the University of York) p.g.carter@.ac.uk sheffield leeds

5 Some thoughts on modelling phonetic effects in corpora we’ve seen how individual participants may have individual (random) effects on reaction times etc. someone might be quick someone else might not have got much sleep someone else might be tested on a day like today

6 Some thoughts on modelling phonetic effects in corpora

7 different-sized vocal tracts

8 Brief sketch of 3 papers I’ve been involved in: with John Local at York with Leendert Plug at Leeds with Emma Moore at Sheffield

9 Paul Carter & John Local (2007) F2 variation in Newcastle and Leeds English liquid systems. Journal of the International Phonetic Association 37(2): 183-199. Laboratory experimental work control balance independent predictors

10 Paul Carter & John Local (2007) F2 variation in Newcastle and Leeds English liquid systems. Journal of the International Phonetic Association 37(2): 183-199. [l] and [ ɹ ] in 2 nonrhotic varieties of British English ‘clear’ and ‘dark’ [l] also ‘clear’ and ‘dark’ [ ɹ ]? F2 as acoustic correlate of clear/dark

11 Paul Carter & John Local (2007) F2 variation in Newcastle and Leeds English liquid systems. Journal of the International Phonetic Association 37(2): 183-199. monosyllabic and disyllabic words in word lists initial [l] versus initial [ ɹ ] initial [l] versus final [l] medial [l] versus medial [ ɹ ] lead vs reed lead vs deal believe vs bereave belly vs berry

12 Paul Carter & John Local (2007) F2 variation in Newcastle and Leeds English liquid systems. Journal of the International Phonetic Association 37(2): 183-199. Original paper: repeated measures ANOVA Normalisation: separate ANOVAs for each gender Hz transformed into ERB-rate between-subjects variety within-subjects position, liquid, etc.

13 Paul Carter & John Local (2007) F2 variation in Newcastle and Leeds English liquid systems. Journal of the International Phonetic Association 37(2): 183-199. Replication with linear mixed-effects models, allowing random intercepts and slopes allows for avoidance of over-generalisation by gender and better normalisation

14 Paul Carter & John Local (2007) F2 variation in Newcastle and Leeds English liquid systems. Journal of the International Phonetic Association 37(2): 183-199. initial liquids initial liquids

15 Paul Carter & John Local (2007) F2 variation in Newcastle and Leeds English liquid systems. Journal of the International Phonetic Association 37(2): 183-199. Carter & Local found large liquid x variety interaction for each gender; also small main effect for variety for female speakers only now I can be even more sure of the liquid x variety interaction; it also seems that there are main effects for variety, gender and liquid (no interaction for variety x gender) initial liquids initial liquids

16 Paul Carter & John Local (2007) F2 variation in Newcastle and Leeds English liquid systems. Journal of the International Phonetic Association 37(2): 183-199. laterals

17 Paul Carter & John Local (2007) F2 variation in Newcastle and Leeds English liquid systems. Journal of the International Phonetic Association 37(2): 183-199. Carter & Local found main effects for variety and position; also a variety x position interaction now I can be even more sure of the variety x position interaction but the main effects turn out not to be significant (for Leeds female speakers the post hoc tests in Carter & Local hinted at this) laterals this eases a little theoretical puzzle: why do dark laterals get even darker in syllable rimes?

18 Paul Carter & John Local (2007) F2 variation in Newcastle and Leeds English liquid systems. Journal of the International Phonetic Association 37(2): 183-199. medial liquids (ws) medial liquids (ws)

19 Paul Carter & John Local (2007) F2 variation in Newcastle and Leeds English liquid systems. Journal of the International Phonetic Association 37(2): 183-199. medial liquids (sw) medial liquids (sw)

20 Paul Carter & John Local (2007) F2 variation in Newcastle and Leeds English liquid systems. Journal of the International Phonetic Association 37(2): 183-199. Carter & Local found main effects for liquid and prosodic structure; there were liquid x variety and liquid x prosodic structure interactions now I can be sure of the liquid main effect but there is a main effect for variety (and possibly also gender) rather than prosodic structure; there are interactions as follows: liquid x variety, liquid x prosodic structure and (additionally) liquid x variety x prosodic structure medial liquids medial liquids (this doesn’t solve a theoretical problem of syllabification)

21 That’s laboratory work; corpus work is different…

22 Leendert Plug & Paul Carter (2014) Timing and tempo in spontaneous phonological error repair. Journal of Phonetics 45: 52-63. when speakers make errors and then self-repair, what predicts how quickly they will start the repair and how fast they produce it?

23 Leendert Plug & Paul Carter (2014) Timing and tempo in spontaneous phonological error repair. Journal of Phonetics 45: 52-63. We started with control variables then tested expanded models with likelihood ratio tests perhaps not so well supported theoretically (fishing) but we were fishing – in the sense that we wanted to discover which of several similar predictors worked best

24 Leendert Plug & Paul Carter (2014) Timing and tempo in spontaneous phonological error repair. Journal of Phonetics 45: 52-63. In similar work, collinearity meant similar predictor variables having significant main effects but in opposite directions clearly a spurious result

25 Leendert Plug & Paul Carter (2014) Timing and tempo in spontaneous phonological error repair. Journal of Phonetics 45: 52-63. Our solution was to use conditional inference regression trees to show the structure in the data and conditional variable importance based on random forests to see which predictors mattered most library(party) myformula=DV~IV 1 +IV 2 +…+IV n mytree<-ctree(myformula,data=mydata) myforest=cforest(myformula,data=mydata) myvarimp=varimp(myforest,conditional=TRUE) see references to Strobl et al in Plug & Carter (2014) and Moore & Carter (2015)

26 Leendert Plug & Paul Carter (2014) Timing and tempo in spontaneous phonological error repair. Journal of Phonetics 45: 52-63. lost the random effect structure (perhaps didn’t matter) but more robust for missing data and able to cope with correlated variables provided some support for our LME models help to decide which of several similar variables should be a predictor in a model (could have used data reduction methods, e.g. principal components analysis – but part of the point was to identify the best predictors)

27 Emma Moore & Paul Carter (2015) Dialect contact and distinctiveness: the social meaning of language variation in an island community. Journal of Sociolinguistics 19(1): 3-36. TRAP and BATH vowels in Isles of Scilly archive data (can’t get any more!) Scilly speakers, some educated only on the islands, some educated partly on the mainland compared to mainland Cornwall speakers and (sort-of) RP speakers

28 Emma Moore & Paul Carter (2015) Dialect contact and distinctiveness: the social meaning of language variation in an island community. Journal of Sociolinguistics 19(1): 3-36. hides some issues, e.g. imbalance N TRAP =2469 N BATH =345

29 Emma Moore & Paul Carter (2015) Dialect contact and distinctiveness: the social meaning of language variation in an island community. Journal of Sociolinguistics 19(1): 3-36. Again, we wanted to allow for other influences on the formants, e.g.: duration of the vowel (achieving target?) manner of articulation of following consonant (nasals can muck things up) voicing of the following consonant (also has massive effect on duration)

30 Emma Moore & Paul Carter (2015) Dialect contact and distinctiveness: the social meaning of language variation in an island community. Journal of Sociolinguistics 19(1): 3-36. Problem:despite 2814 observations, not enough data this is the nature of corpora: we can’t predict what will appear

31 many things will be vanishingly rare Emma Moore & Paul Carter (2015) Dialect contact and distinctiveness: the social meaning of language variation in an island community. Journal of Sociolinguistics 19(1): 3-36. this is the nature of corpora: Problem:despite 2814 observations, not enough data

32 Emma Moore & Paul Carter (2015) Dialect contact and distinctiveness: the social meaning of language variation in an island community. Journal of Sociolinguistics 19(1): 3-36. this is the nature of corpora: many things will be absent Problem:despite 2814 observations, not enough data

33 Emma Moore & Paul Carter (2015) Dialect contact and distinctiveness: the social meaning of language variation in an island community. Journal of Sociolinguistics 19(1): 3-36. this is the nature of corpora: the more potential predictors, the better the chance of missing cells Problem:despite 2814 observations, not enough data

34 frequency effects not typically incorporated in laboratory experimental design Emma Moore & Paul Carter (2015) Dialect contact and distinctiveness: the social meaning of language variation in an island community. Journal of Sociolinguistics 19(1): 3-36. this is the nature of corpora: Problem:despite 2814 observations, not enough data

35 Emma Moore & Paul Carter (2015) Dialect contact and distinctiveness: the social meaning of language variation in an island community. Journal of Sociolinguistics 19(1): 3-36. So we used conditional inference regression trees and conditional variable importance from random forests of trees in tandem with mixed-effects modelling mixed-effects where we could make models which met the assumptions of the technique; variable importance in random forests where we knew there was collinearity, etc. (e.g. manner of articulation of following consonant with lexical set)

36 So, mixed-effects models with full random effects structure won’t always work in corpora When they seem to work they may be hard to interpret There are potential statistical solutions involving data reduction techniques There are also possible alternatives to attenuate precisely the problems corpora pose – e.g. conditional variable importance can cope with missing cells, imbalance and highly-correlated predictors

37 Here’s the snow again


Download ppt "Some thoughts on modelling phonetic effects in corpora."

Similar presentations


Ads by Google