Presentation is loading. Please wait.

Presentation is loading. Please wait.

Analyses on IFA corpus Louis C.W. Pols Institute of Phonetic Sciences (IFA) Amsterdam Center for Language and Communication (ACLC) Project meeting INTAS.

Similar presentations


Presentation on theme: "Analyses on IFA corpus Louis C.W. Pols Institute of Phonetic Sciences (IFA) Amsterdam Center for Language and Communication (ACLC) Project meeting INTAS."— Presentation transcript:

1 Analyses on IFA corpus Louis C.W. Pols Institute of Phonetic Sciences (IFA) Amsterdam Center for Language and Communication (ACLC) Project meeting INTAS 915 May 23-25, 2003, Jyvaskyla

2 May 24, 2003INTAS 915, Jyvaskyla2 overview structure of IFA corpus See three reports R. v Son and papers in open lit. why corrected means? how corrected means some results conclusions

3 May 24, 2003INTAS 915, Jyvaskyla3 structure of IFA corpus 4 male & 4 female; 5 hrs. of speech; 8 styles a. o. informal story telling (I); retelling (R); reading a story (T), reading sentences (S) ~50 K words (AIFC, 44 kHz, 16 bit) label files with annotation tiers phonemic segmentation and labeling (automatically generated, hand corrected; ~200k boundaries; 0.84 word labels/min; 3.3 boundaries/min) description levels: phoneme, demi-syllable, syllable, word, sentence, paragraph tiers: POS, lemma, lexical freq., etc.

4 May 24, 2003INTAS 915, Jyvaskyla4 access of IFA corpus use of CGN protocols non-speech data in database structure relational DB, SQL query language basic structure = table items (indiv. phoneme occurrences) x attributes (phoneme parent word, duration, position, speaker, etc.) WWW front end to simplify access (automatically generating SQL queries; direct links to relevant files)

5 May 24, 2003INTAS 915, Jyvaskyla5 why corrected means? non-ideal design (no fixed numbers of observations of all relevant factors; this precludes the use of e.g. ANOVA) confounding (occurrence of factor values is correlated, thus many combinations of values are rare) interaction (one factor being modulated by other factors): additive, multiplicative, or ordinal interaction factors of interest vs. nuisance factors

6 May 24, 2003INTAS 915, Jyvaskyla6 how corrected means? incidence matrix from basic data rows = combinations of levels on factors of interest columns = comb. of levels on nuisance factors quasi-minimal pairs method mean difference per row pair: by comparing (non-empty) pairs of columns matrix of differences (fitted with additive model) variable sample sizes: use weighting factors corrected means

7 example: vowel duration (ms) speaking style (I, R, S, T) vs. lexical stress (+, -) common meanscorrected means stress style +-total I97.0 (4209) 84.6 (762) 95.1 (4971) R101.8 (4861) 94.1 (712) 100.8 (5573) S95.5 (12179) 84.0 (1850) 94.0 (14029) T96.2 (11664) 85.7 (1824) 94.8 (13488) total96.9 (32913) 86.1 (5148) 95.4 (38061) stress style +-total I107.481.494.4 R110.484.597.5 S107.880.994.3 T109.381.695.5 total108.782.195.4 38061 total counts13323 row differences

8 May 24, 2003INTAS 915, Jyvaskyla8 row difference counts + signif. I +I -R +R -S +S -T +Total I -199*199 R +485118*603 R -106*120152*378 S +975*218*1142184*2519 S -234*309232*221452*1448 T +980*216*1144183*2590*444*5557 T -232*324227*223439*742432*2619 Total3211130528978113481118643213323 * 0.001 significance

9 May 24, 2003INTAS 915, Jyvaskyla9 conclusions simple averaging of unbalanced data is dangerous free conversational speech data are always unbalanced the corrected means method then is a good alternative can be interpreted as a least RMS-error approximation of ‘balanced’ means with an unbalanced data set


Download ppt "Analyses on IFA corpus Louis C.W. Pols Institute of Phonetic Sciences (IFA) Amsterdam Center for Language and Communication (ACLC) Project meeting INTAS."

Similar presentations


Ads by Google