Presentation is loading. Please wait.

Presentation is loading. Please wait.

ACORNS Acquisition of COmmunication and RecogNition Skills The CareGiver corpus Toomas Altosaar, L. ten Bosch, G. Aimetti, C. Koniaris, K. Demuynck, H.

Similar presentations


Presentation on theme: "ACORNS Acquisition of COmmunication and RecogNition Skills The CareGiver corpus Toomas Altosaar, L. ten Bosch, G. Aimetti, C. Koniaris, K. Demuynck, H."— Presentation transcript:

1 ACORNS Acquisition of COmmunication and RecogNition Skills The CareGiver corpus Toomas Altosaar, L. ten Bosch, G. Aimetti, C. Koniaris, K. Demuynck, H. van den Heuvel

2 ACORNS Acquisition of COmmunication and RecogNition Skills LREC 201019 May, 2010Slide no. 2 Overview Background of the ACORNS project A speech corpus  Rationale  Design A few details Public availability

3 ACORNS Acquisition of COmmunication and RecogNition Skills LREC 201019 May, 2010Slide no. 3 Background of the ACORNS project Acquisition of COmmunication and RecogNition Skills  FP6 FET Project 2006-2009  www.acorns-project.org Aim: to investigate language acquisition by young infants  By simulating this learning process by designing and testing a computational model  Focus on word discovery  Improve ASR  To that end, a speech corpus was created

4 ACORNS Acquisition of COmmunication and RecogNition Skills LREC 201019 May, 2010Slide no. 4 The ACORNS corpus - rationale ACORNS model takes part in a caregiver-learner interaction loop Corpus is required for testing various computational approaches for language learning Utterances in corpus ‘simulate’ the caregiver Corpus keeps the balance in complexity between Real-life recordings of caretaker utterances in real-life noisy child-caretaker interactions (CHILDES) Lab-fabricated speech-like stimuli (NEWPORT)

5 ACORNS Acquisition of COmmunication and RecogNition Skills LREC 201019 May, 2010Slide no. 5 ACORNS-corpus – design (1) Four languages (FIN, SWE, UK, NL) In total 10 speakers for FIN, UK, NL  4 speakers for SWE Speech from primary and secondary caregivers Speakers read aloud sentences  Simple grammatical structure  Limited number of keywords Two speaking styles  Infant directed style (IDS)– adult directed style (ADS)

6 ACORNS Acquisition of COmmunication and RecogNition Skills LREC 201019 May, 2010Slide no. 6 Design (2) Utterances across languages are highly comparable with respect to utterance length, syntactic structure, choice of keywords Allows a cross-linguistic comparison of computational approaches of word discovery Keyword selection was inspired by information about communicative development inventories (CDI)  E.g. the MacArthur Bates CDI http://www.sci.sdsu.edu/cdi/

7 ACORNS Acquisition of COmmunication and RecogNition Skills LREC 201019 May, 2010Slide no. 7 Examples of Y1-utterances (UK) Where is Miriam now ? Do you see the shoe ? Show me the book ! That is the bottle The telephone is here Look, Daddy Here is the diaper That is a telephone Show me a shoe

8 ACORNS Acquisition of COmmunication and RecogNition Skills LREC 201019 May, 2010Slide no. 8 Examples of Y2-utterances (UK) I see a green turtle Can you hear the red square and the airplane? 50 keywords Up to 4 keywords per sentence Semantically free But inconsistencies were avoided: * Look at the big small car, * red green ball

9 ACORNS Acquisition of COmmunication and RecogNition Skills LREC 201019 May, 2010Slide no. 9 Number of utterances ‘Y1’ 1 keyword/utt 28000 cross- linguistically comparable utts ‘Y2’ multiple keywords/utt 34800 cross- linguistically comparable utts SWE8000-- FIN800011600 (+1588) UK4000 (IDS only)11600 (+1588) NL800011600

10 ACORNS Acquisition of COmmunication and RecogNition Skills LREC 201019 May, 2010Slide no. 10 Format Each utterance is available as single wav file  44.1 kHz, mono … and is accompanied by an xml file, with  Speaker information (gender)  Speech style (IDS, ADS)  Orthographic annotation (checked)  Keyword (s)  Duration  And for FIN some more information about syntax (see paper) Total 12 GB L. ten Bosch2, G. Aimetti3, C. Koniaris4, K. Demuynck5, H. van den Heuvel2

11 ACORNS Acquisition of COmmunication and RecogNition Skills LREC 201019 May, 2010Slide no. 11 Research purposes Simulation of word detection/word spotting Acquisition of word-like units Acquisition of (simple) syntax Across morphologically + syntactically different European languages

12 ACORNS Acquisition of COmmunication and RecogNition Skills LREC 201019 May, 2010Slide no. 12 Public availability Corpus made available via ELRA Interested parties must contact ELRA

13 ACORNS Acquisition of COmmunication and RecogNition Skills LREC 201019 May, 2010Slide no. 13 Conclusion Corpus available with cross-language compatible utterances Speech based IDS & ADS modes Utterances have lexical and syntactic structure inspired by infant-directed speech Primary & secondary caregivers Ideal for testing models of language acquisition and word detection Made available through ELRA More information at www.acorns-project.org Also software available – see website


Download ppt "ACORNS Acquisition of COmmunication and RecogNition Skills The CareGiver corpus Toomas Altosaar, L. ten Bosch, G. Aimetti, C. Koniaris, K. Demuynck, H."

Similar presentations


Ads by Google