Analyses on IFA corpus Louis C.W. Pols Institute of Phonetic Sciences (IFA) Amsterdam Center for Language and Communication (ACLC) Project meeting INTAS.

Analyses on IFA corpus Louis C.W. Pols Institute of Phonetic Sciences (IFA) Amsterdam Center for Language and Communication (ACLC) Project meeting INTAS 915 May 23-25, 2003, Jyvaskyla

May 24, 2003INTAS 915, Jyvaskyla2 overview structure of IFA corpus See three reports R. v Son and papers in open lit. why corrected means? how corrected means some results conclusions

May 24, 2003INTAS 915, Jyvaskyla3 structure of IFA corpus 4 male & 4 female; 5 hrs. of speech; 8 styles a. o. informal story telling (I); retelling (R); reading a story (T), reading sentences (S) ~50 K words (AIFC, 44 kHz, 16 bit) label files with annotation tiers phonemic segmentation and labeling (automatically generated, hand corrected; ~200k boundaries; 0.84 word labels/min; 3.3 boundaries/min) description levels: phoneme, demi-syllable, syllable, word, sentence, paragraph tiers: POS, lemma, lexical freq., etc.

May 24, 2003INTAS 915, Jyvaskyla4 access of IFA corpus use of CGN protocols non-speech data in database structure relational DB, SQL query language basic structure = table items (indiv. phoneme occurrences) x attributes (phoneme parent word, duration, position, speaker, etc.) WWW front end to simplify access (automatically generating SQL queries; direct links to relevant files)

May 24, 2003INTAS 915, Jyvaskyla5 why corrected means? non-ideal design (no fixed numbers of observations of all relevant factors; this precludes the use of e.g. ANOVA) confounding (occurrence of factor values is correlated, thus many combinations of values are rare) interaction (one factor being modulated by other factors): additive, multiplicative, or ordinal interaction factors of interest vs. nuisance factors

May 24, 2003INTAS 915, Jyvaskyla6 how corrected means? incidence matrix from basic data rows = combinations of levels on factors of interest columns = comb. of levels on nuisance factors quasi-minimal pairs method mean difference per row pair: by comparing (non-empty) pairs of columns matrix of differences (fitted with additive model) variable sample sizes: use weighting factors corrected means

example: vowel duration (ms) speaking style (I, R, S, T) vs. lexical stress (+, -) common meanscorrected means stress style +-total I97.0 (4209) 84.6 (762) 95.1 (4971) R101.8 (4861) 94.1 (712) 100.8 (5573) S95.5 (12179) 84.0 (1850) 94.0 (14029) T96.2 (11664) 85.7 (1824) 94.8 (13488) total96.9 (32913) 86.1 (5148) 95.4 (38061) stress style +-total I107.481.494.4 R110.484.597.5 S107.880.994.3 T109.381.695.5 total108.782.195.4 38061 total counts13323 row differences

May 24, 2003INTAS 915, Jyvaskyla8 row difference counts + signif. I +I -R +R -S +S -T +Total I -199*199 R +485118*603 R -106*120152*378 S +975*218*1142184*2519 S -234*309232*221452*1448 T +980*216*1144183*2590*444*5557 T -232*324227*223439*742432*2619 Total3211130528978113481118643213323 * 0.001 significance

May 24, 2003INTAS 915, Jyvaskyla9 conclusions simple averaging of unbalanced data is dangerous free conversational speech data are always unbalanced the corrected means method then is a good alternative can be interpreted as a least RMS-error approximation of ‘balanced’ means with an unbalanced data set

Analyses on IFA corpus Louis C.W. Pols Institute of Phonetic Sciences (IFA) Amsterdam Center for Language and Communication (ACLC) Project meeting INTAS.

Similar presentations

Presentation on theme: "Analyses on IFA corpus Louis C.W. Pols Institute of Phonetic Sciences (IFA) Amsterdam Center for Language and Communication (ACLC) Project meeting INTAS."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Analyses on IFA corpus Louis C.W. Pols Institute of Phonetic Sciences (IFA) Amsterdam Center for Language and Communication (ACLC) Project meeting INTAS.

Similar presentations

Presentation on theme: "Analyses on IFA corpus Louis C.W. Pols Institute of Phonetic Sciences (IFA) Amsterdam Center for Language and Communication (ACLC) Project meeting INTAS."— Presentation transcript:

Similar presentations

About project

Feedback