Presentation is loading. Please wait.

Presentation is loading. Please wait.

CHILDES I 08-11-20061 An Introduction to CHILDES (Child Language Data Exchange System) Jacqueline van Kampen

Similar presentations


Presentation on theme: "CHILDES I 08-11-20061 An Introduction to CHILDES (Child Language Data Exchange System) Jacqueline van Kampen"— Presentation transcript:

1 CHILDES I An Introduction to CHILDES (Child Language Data Exchange System) Jacqueline van Kampen

2 CHILDES I CHILDES (Child Language Data Exchange System)  Brian MacWhinney and Catherine Snow, Carnegie Mellon University (Pittsburgh).  CHILDES provides tools for studying conversational interaction, including - a database of transcripts - programs for computer analysis of the data - methods for linguistic coding - systems for linking transcripts to digitized audio and video  The database includes many megabytes of naturalistic data from children acquiring different languages (over 26 languages are included).  The CHILDES search programs are called CLAN (Computarized Language Analysis)  Information about CHILDES and the CLAN programs is available on the CHILDES homepage and in MacWhinney’s handbook. -MacWhinney, B. (2006) The CHILDES Project: Tools for Analyzing Talk, 3 rd edition. Mahwah NJ: Lawrence Erlbaum Associates. Open website:

3 CHILDES I Before you use CHILDES data  Read the Ground rules for data usage at the CHILDES Website In a publication based on the use of CHILDES data you should cite - the references for the corpora you use (mentioned in the documentation) - MacWhinney’s Handbook (latest edition)  Download the CLAN program at:  Download the CHILDES files (zip files) at  Get the information you need in the manuals: Database manuals, CHAT manual, and CLAN manual at: The Manuals in PDF (at

4 CHILDES I Database manuals, CHAT manual, and CLAN manual (in PDF) The Database manuals are available at: In a database manual you will find:  All the necessary information about a corpus  The reference(s) you have to cite when using a corpus The CHAT manual is available at: In the CHAT manual you will find  The information about the codes in the transcriptions The CLAN manual is available at: In the CLAN manual you will find  The information about the tools for analyzing the data Go to Database Manuals  American English (http://childes.psy.cmu.edu/manuals/02englishusa.doc)http://childes.psy.cmu.edu/manuals/02englishusa.doc

5 CHILDES I The CHAT codes (Codes for the Human Analysis of Transcripts) The files are transcribed in CHAT format. A CHAT file has the extension ".cha." CHAT codes make it possible to search with various CLAN programs in de files. The  A header is a line of text that gives information about the participants and the setting  All headers begin with the  A CHAT file begins and ends  header is followed by a series that state information about the child, other participants, date of recording/transcription. The tiers: * and %  The data that come after are divided into lines.  Each line begins with a ‘tier’.  The tiers are an important tool for the CLAN programs in data searching.  The most important 'tiers' are the *-tiers and the %-tiers. Put cursor on CLAN window. With Ctrl-o open go to Soft Grid Q:\VFS\CLAN\CHILDES\CLAN\LIB sample.cha

6 CHILDES I The CHAT codes (Codes for the Human Analysis of Transcripts) The * tiers  are followed by three capitals that indicate the name of the child or the child's conversation partner, for instance: *CHI: (followed by an utterance of a child, the child stated in *MOT: (followed by an utterance of the mother) The % tiers  are 'dependent tiers' referring to the previous utterance of child/conversation partner.  are followed by three small letters that represent a code, for instance. %act: (=action. This tier describes the actions of the speaker or the listener) %alt: (=alternative. This tier is used to provide an alternative possible transcription) %com: (=comment. This tier is the general purpose comment tier) %par: (=paralinguistic. This tier codes paralinguistic behaviors as coughing and laughing %spa: (=speech act. This tier is for speech act coding) The % tiers contain additional information (optionally added)

7 CHILDES I Three % tiers coding linguistic information Some % tiers are very useful for linguistic analysis Open: LIB\sample2mor %mor (=morpholgy) This tier codes morphemic segments by type and part of speech. Example *CHI: I wanted a toy %mor: PRO|I&1S V|want-PAST DET|a&INDEF N|toy %pho (=phonology) This tier This tier describes phonological phenomena (in IPA or SAMPA format) %syn (=syntax) This tier codes syntactic structure

8 CHILDES I The CHAT codes (Codes for the Human Analysis of Transcripts) Some (!) frequently used notations See The CHAT Manual pages for a full list of Symbols (at #unfilled pause between form markers 6schwa &phonological fragment xxxunintelligible speech (not treated as a word by the CLAN program) wwwuntranscribed material [/] retracing without correction, e.g..: then [/] then [//] retracing, with correction, e.g.: then [//] but ["] quotation mark, used when the child literally repeats something, e.g. bear ["] [*] (item for) all words between the brackets, e.g. ["] +/. trailing of. The sentence is incomplete, but not interrupted by another speaker. +//interruption. The sentence is incomplete, and interrupted by another speaker. [=!text] paralinguistic material, like crying, yelling, laughing, for instance [=! cries] [= text] short explanation, e.g. look there [= in the closet] [:text] standard form (in the adult language), e.g. he have [= has]

9 CHILDES I The CLAN (Computarized Language Analysis) programs The CLAN programs are tools for analyzing the data. In order to run CLAN, you have to install CLAN at your pc or Mac. Open again: The CLAN window  The output of the analyses appears in another window, the "output window".  CLAN provides a commands window. In this window you can type the commands to run an analysis on one or more files.

10 CHILDES I The CLAN commands A CLAN command includes several components:  directory (input/working): specifies the search space (directory) (obligatory)  directory (output): specifies the directory in which the output will be stored (optional)  the main command  search file(s)  output file (optional)  +/- switch(es) 1. Select under Working: \lib\ne32 2. Select under Lib: \lib specify search space (obligatory) specify storing place (optional) set the LIB (library) directory main command search file(s) +/- switche(s) (random order)

11 CHILDES I The Search Functions: The main command  the main command  +/- switch(es)  search file(s)  output file (optional) Click the CLAN icon and select: KWAL command (specified only once) In the CLAN window you must specify the search functions The main command  select first+only one  frequently used options:  freq (frequencies counts)  kwal (word/morphemes search)  combo (combined searches of 2 or more words/morphemes  mlu (MLU counts)  chip (comparison and analysis of utterances of different speakers) CLAN icon (click): survey of all command options

12 CHILDES I The Search Functions: the +/- parameter switches Stay at the commands window and type after kwal a return Some parameter switches +tselects the utterances of a specified speaker (the one following the tier) +sselects a word to be searched (search) +dused with 'kwal' this option puts the output in CHAT format +oused with ‘freq’ this option sorts output by descending frequency +uspecifies that all search results are stored in 1 file +rdeals with the treatment of material between parentheses +x –xsearch includes only utterances longer than/shorter than specified number of words (w), morphemes (m) or characters (c) +w -wgives extra utterances in the context of the searched item (window) +f -f+f: output is stored in the (specified) file(s) -f: output appears on the screen  Parameter switches may be specified more than once  Order of the parameter switches is random  Parameter switches have (in general) an option: (include) and (exclude)  Not all parameter switches go with all commands  The various switches across the commands can be seen in the commands window

13 CHILDES I The Search Functions: the +/- t option Put cursor back to the pre-revious line kwal and type: +t*CHI NOTE: You have to insert a space after each option! The parameter switch +t  +t selects the utterances of a specified speaker (the one following the tier)  The +t switch may be specified more than once!  The +t/  t switch includes +t or excludes  t particular tier(s).  In CHAT formatted files, there are three tier code types: -main speaker tiers (denoted by *) -speaker dependent tiers (denoted by %) -header tiers (denoted  The speaker-dependent tiers are attached to speaker tiers. e.g. +t*MOT (speaker tier ‘mother’) and +t%act (dependent tier ‘action’) analyzes all of the *MOT main tiers and only the %act dependent tiers associated with that speaker.  The +t option specifies which main speaker tiers, their dependent tiers, and header tiers should be included in the analysis. All other tiers, found in the given file, will be ignored by the program.

14 CHILDES I The Search Functions: the +/- s option Add +s”a” to the line in commands window The parameter switch +s  +s selects a word (or code) to be searched (search)  The +s switch may be specified more than once!  The +s/-s switch is used to include or exclude certain words.  The +s option specifies the keyword you want to find.  You do this by putting the word in quotes directly after the +s switch as in +s"dog" to search for the word dog.  Use of the +s option will override the default (all utterances!). kwal +t*CHI +s"a"

15 CHILDES I The Search Functions: the search files Click the FILE IN icon and select file 98.cha + click ‘Done’ kwal +t*CHI The search file  CLAN takes as working space the directory specified under Working  The files (from the directory) have to be specified in the window  For all files in the directory: - type: *.* - or go to the icon: FILE IN  Search files may be specified more than once FILE IN icon (click): Choose - all files in the directory: click on Add All - a subset of the files: double click on each file  the main command  +/- switch(es)  search file(s)  output file (optional)

16 CHILDES I The Search Functions: the output file Type on the command line +fart kwal +t*CHI +fart Important guidelines  you can put the file name or any switch in any order you wish  you must not forget to keep a space between each option  by default CLAN gives the output on the screen. With the option +f you can change this.  +f puts the output under the directory specified under Output (or by default same as working directory)  +f can (optionally) be given an extension of 3 letters. If so, the output file will get this name  the main command  +/- switch(es)  search file(s)  output file (optional)

17 CHILDES I The output file Click on Run. Type Ctrl-o and see if you get this in the output file kwal +t*CHI +fart kwal +t*CHI +sa +f Sun Nov 05 16:44: kwal (25-Oct-2006) is conducting analyses on: ONLY speaker main tiers matching: *CHI; **************************************** From file to file *** File "c:\childes\clan\lib\ne32\98.cha": line 447. Keyword: a *CHI: [>] *** File "c:\childes\clan\lib\ne32\98.cha": line 481. Keywords: a, a *CHI:a [/] a *** File "c:\childes\clan\lib\ne32\98.cha": line 487. Keyword: a *CHI:a: *** File "c:\childes\clan\lib\ne32\98.cha": line 495. Keyword: a *CHI:a coat *** File "c:\childes\clan\lib\ne32\98.cha": line 551. Keyword: a *CHI:a clothes.

18 CHILDES I The output window 1)let output appear on the screen 2)select all files in the directory 3)let kwal search for the word “shall” uttered by the mother kwal +t*MOT +f +u Sun Nov 05 16:57: kwal (25-Oct-2006) is conducting analyses on: ONLY speaker main tiers matching: *MOT; **************************************** From file to file *** File "c:\childes\clan\lib\ne32\68.cha": line 50. Keyword: shall *MOT:but shall we find out ? *** File "c:\childes\clan\lib\ne32\68.cha": line 204. Keyword: shall *MOT: [<] ? *** File "c:\childes\clan\lib\ne32\68.cha": line 241. Keyword: shall *MOT:shall we do these ? *** File "c:\childes\clan\lib\ne32\68.cha": line 393. Keyword: shall *MOT:shall we try the next one ? +f: output is stored in the (specified) file(s) by default, or with –f the output appears on the screen Now try out the following Click on Run. You will get this on the screen (Output window) Erase the output file 98.art

19 CHILDES I The Search Functions: the search files Select Add ALL at the FILE IN icon Run: kwal +t*MOT kwal +t*MOT FILE IN icon (click): Choose - all files in the directory: click on Add All In order to -let output appear on the screen -select all files in the directory -let kwal search for the word “shall” uttered by the mother you should

20 CHILDES I Some more parameter switches: The +u option First delete file 98.art.cex Go back with cursor to command line kwal +t*CHI +fart Go to FILE IN, Clear all files, and select files 55.cha and 66.cha Run and open the two output files +uspecifies that all search results are stored in 1 file We continue with the KWAL option in the commands window. Add +u to the command line Run and open the output file  By default, when the user has specified a series of files on the command line, the analysis is performed on each individual file. The program then provides separate output for each data file.  If the command line uses the +u option, the program combines the data found in all the specified files into one set and outputs the result for that set as a whole. Delete the output file 55.art (under: LIB\ne32) Delete the output files 55.art and 66.art (under: LIB\ne32)

21 CHILDES I The +/- w option with KWAL Type file 68.cha and add –w2 and +w1 to the command line +w -wgives extra utterances in the context of the searched item (window) This option can be used with either KWAL or COMBO.  The -w option followed by a positive integer (1, 2, 3, etc.) causes the program to display that number of preceding utterances.  The +w option followed by a positive integer (1, 2, 3, etc.) causes the program to display that number of succeeding utterances. From here  we work with the Output window  we type the files directly on the command line kwal +t*CHI +w1 -w2 kwal +t*CHI +w1 -w2 Sun Nov 05 17:35: kwal (25-Oct-2006) is conducting analyses on: ONLY speaker main tiers matching: *CHI; **************************************** From file *** File "c:\childes\clan\lib\ne32\68.cha": line 543. Keyword: a *CHI:0. *MOT:what are you making ? *CHI:a mouth. *MOT:a mouth ? *** File "c:\childes\clan\lib\ne32\68.cha": line Keyword: a *CHI:she [= toy] wants to go in the chair. *MOT:oh. *CHI:a xxx chair. *CHI:xxx on the table *** File "c:\childes\clan\lib\ne32\68.cha": line Keyword: a *CHI:0 [=! whines]. *MOT:see if there's a chair in the garage. *CHI:nope # [>]. *MOT:yeah [<].?

22 CHILDES I The +/- w option with KWAL kwal +t*CHI +s"a" 68.cha +w1 -w2 From here  we work with the Output window Means (by default): no +f  we type the files directly on the command line Means: we erase and type the file 68.cha kwal +t*CHI +w1 -w2 kwal +t*CHI +w1 -w2 Sun Nov 05 17:35: kwal (25-Oct-2006) is conducting analyses on: ONLY speaker main tiers matching: *CHI; **************************************** From file *** File "c:\childes\clan\lib\ne32\68.cha": line 543. Keyword: a *CHI:0. *MOT:what are you making ? *CHI:a mouth. *MOT:a mouth ? *** File "c:\childes\clan\lib\ne32\68.cha": line Keyword: a *CHI:she [= toy] wants to go in the chair. *MOT:oh. *CHI:a xxx chair. *CHI:xxx on the table *** File "c:\childes\clan\lib\ne32\68.cha": line Keyword: a *CHI:0 [=! whines]. *MOT:see if there's a chair in the garage. *CHI:nope # [>]. *MOT:yeah [<].? –w2: 2 preceding sentences +w1: 1 following sentence gives On the command line there should be

23 CHILDES I The +r option: +r1 and +r2 kwal +t*CHI +s"except" 68.cha +r1 kwal +t*CHI +s"except" 68.cha +r2 +rThis option deals with material in parenthesis  By default, CLAN searches for words including the material between parentheses (omitted parts of words).  With the +r option, you can change this. -+r1 removes the parentheses (like the default) -+r2 leaves the parentheses kwal +t*CHI +r1 kwal +t*CHI +r1 Sun Nov 05 17:54: kwal (25-Oct-2006) is conducting analyses on: ONLY speaker main tiers matching: *CHI; **************************************** From file *** File "c:\childes\clan\lib\ne32\68.cha": line 346. Keyword: except *CHI:yeah # (ex)cept they go up and down. > kwal +t*CHI +r2 kwal +t*CHI +r2 Sun Nov 05 17:54: kwal (25-Oct-2006) is conducting analyses on: ONLY speaker main tiers matching: *CHI; **************************************** From file Try out the the following: 1)+r1 2)with the word "except" 3)for speaker = child Run the same with +r2

24 CHILDES I The +r option: +r1 and +r2 kwal +t*CHI +s"(ex)cept" 68.cha +r2 kwal +t*CHI +r1 Sun Nov 05 17:54: kwal (25-Oct-2006) is conducting analyses on: ONLY speaker main tiers matching: *CHI; **************************************** From file *** File "c:\childes\clan\lib\ne32\68.cha": line 346. Keyword: except *CHI:yeah # (ex)cept they go up and down. kwal +t*CHI +r2 Sun Nov 05 17:54: kwal (25-Oct-2006) is conducting analyses on: ONLY speaker main tiers matching: *CHI; **************************************** From file +r1: like the default What would you have to do to get a match with the +r2 option? +r2: no matches, because no word "except ", but "(ex)cept"

25 CHILDES I The +r option: +r5 kwal +t*CHI +s "wanna" 68.cha (+r1) (no matches) kwal +t*CHI +s "wanna" 68.cha +r5 (5 matches) kwal +t*CHI +swant 68.cha Tue Nov 07 10:52: kwal (25-Oct-2006) is conducting analyses on: ONLY speaker main tiers matching: *CHI; **************************************** From file *** File "68.cha": line 707. Keyword: want *CHI: [>] *** File "68.cha": line 840. Keywords: want, want, want *CHI: [/] [//] uh I want to make that lady she was singing on tv [>] When the child says wanna, meaning want to, it is transcribed like this: wanna [: want to]. By default, material in the form [: want to] replaces the material preceding it If you do not want this replacement, use the +r5 switch Try out the the following: 1)r1 (or by default no +r) 2)with the word "wanna" 3)for speaker = child Run the same with +r5

26 CHILDES I The +d option kwal +s"the" 68.cha kwal +s"the" 68.cha +d +dused with KWAL this option puts the output in CHAT format  By default, KWAL outputs the location of the tier where the match occurs.  When the +d switch is turned on you can output each matched sentence without line number information in a simple legal CHAT format.  The +d1 switch outputs legal CHAT format along with file names and line numbers.  The +d and +d1 switches can be extremely important tools for performing analyses on particular subsets of a text, because you can use the output file for further analysis with CLAN  Using +d is sometimes handy when the output is very long and you want to have a quick overview. Leaving out the location specification reduces the output file Try out the the following: 1)the word "the" 2)for all speakers Run the same with +d

27 CHILDES I Some other commands: The FREQ command  FREQ searches for frequencies of words freq +t*CHI *.cha What you have to specify: The working directory What you may specify +ta specified speaker +sa word to get a frequency count of +fif you want to store the output in a file (instead of on the output window) +oused with FREQ this option sorts output by descending frequency What you will get  a list of the words with their frequencies  the type token ratio (= total number of unique words used by a selected speaker divided by the total number of words used by the same speaker Try out the the following (on a new line): 1)freq 2) for speaker =child3) all files We are still working in ne32 !

28 CHILDES I The FREQ command: the options +u, +o and +s freq +t*CHI *.cha +u freq +t*CHI *.cha +u +o freq +t*CHI *.cha +u +o +s"this" What you may specify +ta specified speaker +sa word to get a frequency count of +fstores the output in a file (instead of on the output window) +oused with FREQ this option sorts output by descending frequency +uspecifies that all search results are stored in 1big file Run the same with the +u option Run again, but add the +s option for the word "this" Run again, but add the +o option

29 CHILDES I The output of the options +u, +o and +s Run the same with the +u option Add the +s option for the word "this" Add the +o option freq +t*CHI *.cha +u Tue Nov 07 11:35: freq (25-Oct-2006) is conducting analyses on: ONLY speaker main tiers matching: *CHI;**************************************** From file 54 a 1 abcs 1 about 1 after 1 again freq +t*CHI *.cha +u +o Tue Nov 07 11:42: freq (25-Oct-2006) is conducting analyses on: ONLY speaker main tiers matching: *CHI;**************************************** From file 114 yeah 54 a 40 this 38 i 38 the 36 no freq +t*CHI *.cha +u freq +t*CHI *.cha +u +o freq +t*CHI *.cha +u +s"this" (you can leave out the +o) 40 this Total number of different word types used 40 Total number of words (tokens) Type/Token ratio

30 CHILDES I The COMBO command  COMBO searches for a combination of words combined with Boolean operators like "and" or "or" Run: combo +t*CHI *.cha +s"want^to" Some operators used with COMBO ^immediately followed by *repeated character +OR ! NOT What you will get  the list of the utterances that contain the (combination of the) items Examples with ^ and * want directly followed by to combo +s "want^to" sample.cha want eventually followed by to combo +s"want^*^to" sample.cha both want and to in any order combo +s"want^to" +x sample.cha What you have to specify: The working directory and the +s option The command matches: file 55.cha 0 times; file 66.cha 1 time; file 68.cha 10 times; file 98.cha 2 times

31 CHILDES I The COMBO command: the +x option + swe^can Strings matched 1 times Strings matched 0 times Strings matched 2 times Strings matched 0 times The +x option gives two extra matches for file 68.cha, one subject- inversion (can we) and one random cluster (we …. (he) can) COMBO searches are sequential, If you want ti find clusters of words in any order, you need to use the +x option Try out the the following: 1)the +s option “we^can” 2)for speaker = mother Run the same with +x +swe^can +x Strings matched 1 times Strings matched 0 times Strings matched 4 times Strings matched 0 times

32 CHILDES I The MLU command MLU calculates the mean length of utterance  If the corpus has no %mor line, you then run MLU on the main line by adding the -t%mor switch  Then you get "MLU in words". MLU is going to count each word as one word and will do no morphemic analysis Recall  dependent % tiers are added to the main tiers  the %tiers %mor, %syn and %phon code linguistic information  If the corpus has a %mor line, then MLU will give you a true MLU in number of “morphemes”. Run: mlu +t*CHI *.cha mlu -t%mor Note: There is a folder morsamples under LIB that contains some files with %mor tiers

33 CHILDES I Two useful helps for searching: 1) The wildcard * CLAN offers two possibilities that facilitate searching  the wildcard (asterix *)  searching in a list of words The wildcard (asterix *) A wildcard uses the asterisk symbol (*) to take the place of something else. Wildcards can be used to refer to  a group of files (*.cha)  a group of speakers (CH*)  a group of words with a common form eve10.chasearches in 1 file (eve10.cha) eve*.cha searches in all (cha) files of Eve +s "go " searches for go +s " go* " searches for all words that begin with go: go, goes, goed (child language), going, gone, gold, golden, good, etc... +s "*go*" searches for all words that contain go, so next to the ones above: ongoing, outgo, outgoing, etc…

34 CHILDES I ) The wildcard * Run the command in LIB\ne32 1)word search 2)for the word "on" 3)for speaker = child 4)1 big output Same, but now for: 1)all words containing "on" Same, but now for: 1)all words ending with "on" also gives: moon, station kwal +t*CHI *.cha +u +s"on" kwal +t*CHI *.cha +u +s“*on" kwal +t*CHI *.cha +u +s“*on*" also gives: monster, crayons, don’t, gone, etc..

35 CHILDES I ) Searching for words in a list kwal +t*CHI 55.cha (easy) question: which words the list is made of? This saves you typing a series of +s switches Create a file  either in CLAN (as ordinary text file) or any other editor “text only” file Specify the file on the command line  by putting the file name after the +s preceded by sign In order to search for words in a list, you have to  create a file containing the list of words  specify the file after the +s option Use the file “articles” under the LIB directory to 1)search in the ne32 folder in file number 55 2)for the speaker = child

36 CHILDES I Some exercises Answer the following questions for the files of Adam 1.with FREQ: Which question words does Adam use? 2.with COMBO: does Adam use the question word what with the auxiliary is? 3.with COMBO: does Adam use the question word where with the auxiliary is in a subordinate? 4.with KWAL: Does Adam use the word little and small? 1.freq +s"how" +s"wh*" +t*CHI +u *.cha 2.combo +s"what^is" +t*CHI *.cha 3.combo +s"where^is" +t*CHI *.cha +x 4.kwal +s"little" +s"small" *.cha +t*CHI


Download ppt "CHILDES I 08-11-20061 An Introduction to CHILDES (Child Language Data Exchange System) Jacqueline van Kampen"

Similar presentations


Ads by Google