Presentation is loading. Please wait.

Presentation is loading. Please wait.

Introduction to CHILDES and TalkBank Brian MacWhinney CMU - Psychology, Modern Languages, Language Technologies Institute.

Similar presentations

Presentation on theme: "Introduction to CHILDES and TalkBank Brian MacWhinney CMU - Psychology, Modern Languages, Language Technologies Institute."— Presentation transcript:

1 Introduction to CHILDES and TalkBank Brian MacWhinney CMU - Psychology, Modern Languages, Language Technologies Institute

2 The goal of TalkBank

3 The core idea Human communication is a single unified process. However, patterns in communication are analyzed by 20 different fields. The time scales of the processes varies from milliseconds to centuries. But all of these processes must have their ultimate effect in the Moment. We can capture the Moment on video.

4 Principles Data-sharing, Informed Consent Multimedia Open Access, Web Access, Commentary Specified Format Interoperability Community integration

5 Availability programs, manuals, fonts, morphologies, CA conventions, video production guides, XML Schema, links to other programs data can be either downloaded or played back over the web

6 Current target areas 1. CHILDES 2. PhonBank 3. BilingualBank 4. AphasiaBank 5. CABank 6. ClassBank

7 CHILDES Child Language Data Exchange System Founded in 1984 in Concord MA Director: Brian MacWhinney Programmers: Leonid Spektor, Franklin Chen 3000 Members 130 corpora Over 3200 published articles

8 CHILDES and TalkBank CHILDESTalkBank Age23 years7 years Words44 million8 + 55 million Media750 GB450GB Languages3218 Publications3200+89 Users3000+500

9 Practical Considerations Learning CLAN takes about a week Transcription is slow. Perhaps 15:1 ratio. Blitzscribe, LENA, etc. probably will not work Currently available data may not be perfect for a given issue Corpora may need enhancement through MOR or Coder’s editor 9

10 Tools from the Web Data: CLAN: Manuals: Morphosyntax: Phon Tutorial videos Digital video: CA Methods:

11 11 Why no handout? “Overviews” link has this PPT presentation CHILDES is now fully electronic. No more paper.

12 Available Methods Microanalysis - CA, phonetics, ethology Microgenetic analysis - CA, code-switching (NEXT) Group and treatment comparisons - Genesee Error analysis - YipMatthews Diffusion analysis - in preschools Longitudinal studies - growth curves Modeling - neural nets, dynamic systems, evolutionary models

13 CLAN Tools Transcribing Editing Counts -- FREQ, KWAL Analyses: MOR, GRASP, PHON Interoperability -- ELAN, Praat, SFS, EXMARaLDA, CLAPI, PHON

14 CA marks in Unicode

15 Transcripts linked to media

16 16 Ground Rules Ethical use, informed consent Levels of permission Respect for dignity of participants Respect for contributors Requirement to cite sources Requirement to contribute data

17 17 Info-CHILDES and Membership Archived at LinguistList Info-CHIBolts for nuts and bolts Membership list IASCL Membership

18 18 Getting Set Up Download CLAN from Programs link

19 19 Windows issues You can work in c:\childes But your administrator may have this locked, so, you may need shortcuts. Windows IPA is difficult. Windows compression may produce.wmf

20 20 Downloading Manuals CHAT, CLAN

21 21 Getting Started Open CLAN Manual to Chapter 2 Double-click application Control-D to open Commands Window Set Working Directory to c:\childes\clan\lib\samples

22 22 Should look like this: Windows will be c:\childes\clan\lib\samples

23 23 Run FREQ Freq sample.cha Hit RUN or carriage return In output, does “want” occur 3 times?

24 24 Interface Features Help CLAN Files In Recall Set MOR, Lib, Output directories

25 25 Files In

26 26 Building Commands mlu +t*CHI +f sample.cha mlu *.cha Wildcards File output *.cha

27 27 Changing Directories Set Working to: ne32 combo +t*MOT +s"is^*ing" *.cha Set Working to: samples kwal +sbunny +w2 -w2 0042.cha Triple click on output line to go back to source file

28 28 GEM Set Working to: Workshop GEM +s* pau001.cha Open output, play audio

29 29 Exercises - Chapter 8 MLU50 mlu +t*CHI +z50u +f *.cha MLU5 maxwd +t*CHI +g1 +c5 +dl 68.cha | mlu > 68.ml5.cex TTR freq +t*CHI +s"*-%" +f *.cha

30 30 BatchFile maxwd +t*CHI +g1 +c5 +dl 14.cha | mlu > 14.ml5.cex maxwd +t*CHI +g1 +c5 +dl 55.cha | mlu > 55.ml5.cex maxwd +t*CHI +g1 +c5 +dl 66.cha | mlu > 66.ml5.cex maxwd +t*CHI +g1 +c5 +dl 68.cha | mlu > 68.ml5.cex maxwd +t*CHI +g1 +c5 +dl 98.cha | mlu > 98.ml5.cex Batch batch.cex Or just run by highlighting in Commands (Windows)

31 31 Tables ChildMLU5 0 MLU5TTRMLT Ratio 55-0.70-0.65-0.15-0.94 66-0.25-0.19-0.68-1.14 683.102.56-0.671.60 98-0.95-1.11-0.550.31

32 32 The Editor

33 33 Playing a linked file Esc-8 Esc-A Cont-Click F5

34 34 Linking a File - F5 Cursor on *FAT Find file F5 Press space for each utterance Save

35 35 F5 Tricks Go back to last good link Space quickly through contained overlap If a bullet is missing, cut and paste an old one For precision, try Sonic Mode

36 36 Sonic Mode Esc-0 to start Highlight area Shift-click to move edge Have cursor on line in file S to insert time marks Triple click a linked sentence

37 37 Transcribing Open new window (Command-N) Insert headers @Begin @Languages: en @Participants: CHI Target_Child, MOT Mother, FAT Father, ROS Brother @Date F5 with space at each utterance Go back and transcribe each bullet (c-click) Adjust time marks using Esc-A

38 38 F5, locate sound, enter bullets click on bullets, transcribe

39 39 Or use SoundWalker

40 40 Or use the Video Editor

41 41 CHECK CHECK is CRUCIAL Internal:Esc-L External:check *.cha External CHECK provides fuller control

42 42 Options Backup Wrapping Line Numbers CHECK

43 43 More Options Line numbers F5 bullets SoundAnalyzer

44 44 Coder's Editor Open barry.cha Esc-0 Cursor on first line Open codeshar.cut %spa Insert $NIA:AC:IN

45 45 Coder's Editor Commands F1 finish current tier and go to the next Esc-c finish coding current tier Esc-t restrict coding to a particular speaker Esc-Esc go on to the next speaker Esc-s rotate subcodes Control-g cancel illegal command

46 46 Send to Praat Open Praat, Click before link, Send to Praat, Run Analysis

47 47 Learning to Digitize

48 48 Searching, Replacing Cont-R, Cont-F Space, No, !, control-G


50 50 Tour of English MOR Files Download a copy A-rules C-rules Sf.cut Lexicon

51 51 Running MOR Set MOR directory mor +xi (dogs) mor +xl barry.cha Open barry.ulx.cex Fix problems using KWAL mor *.cha

52 52 POST mor barry.cha +1 or else mor barry.cha and then ren *.mor.cex *.cha +f post *.cha +1

53 53 Fixing POST POST is 95% accurate, but some projects need 100% accuracy Eve training set may need error checking More data will train a better POST POST training is mostly about bootstrapping, using regexp to find and correct subcases leading to error Need to remove some POS possibilities and add them back through post-POST rules (spell as N)

54 54 CHAT What is an utterance? What is a word? Tour of the CHAT manual

55 55 Web Browsing of Video

56 56 Some examples Forrester Rollins Yasmin Paulo Brent, MacWhinney Classroom - JLS

57 57 Rollins Coding

58 Conclusions CHILDES and TalkBank provide solid tools for studying language learning and functioning Data-sharing has led to major advances in the field New approaches emphasize the use of multimedia analysis, computational linguistics, and speech technology 58

Download ppt "Introduction to CHILDES and TalkBank Brian MacWhinney CMU - Psychology, Modern Languages, Language Technologies Institute."

Similar presentations

Ads by Google