Presentation is loading. Please wait.

Presentation is loading. Please wait.

Towards a solution for the sharing of phonological data Yvan Rose Memorial University of Newfoundland Brian MacWhinney Carnegie Mellon University Yvan.

Similar presentations


Presentation on theme: "Towards a solution for the sharing of phonological data Yvan Rose Memorial University of Newfoundland Brian MacWhinney Carnegie Mellon University Yvan."— Presentation transcript:

1 Towards a solution for the sharing of phonological data Yvan Rose Memorial University of Newfoundland Brian MacWhinney Carnegie Mellon University Yvan Rose Memorial University of Newfoundland Brian MacWhinney Carnegie Mellon University

2 Map of presentation Context: no specialized tool to facilitate research in phonological development A preliminary attempt: ChildPhon A more promising solution: Phon Current state of the Phon project Developments in foreseeable future Potential Publicly-available cross-linguistic database Proposal Context: no specialized tool to facilitate research in phonological development A preliminary attempt: ChildPhon A more promising solution: Phon Current state of the Phon project Developments in foreseeable future Potential Publicly-available cross-linguistic database Proposal

3 Context (until recently) CHILDES tools (focus on CLAN) Number of tools for multimedia data storage and analysis Mostly deals with morphological and syntactic aspects of development Not easily extensible What about phonology? No CHILDES tool adapted for phonology Data sharing and broad-based investigations are challenging CHILDES tools (focus on CLAN) Number of tools for multimedia data storage and analysis Mostly deals with morphological and syntactic aspects of development Not easily extensible What about phonology? No CHILDES tool adapted for phonology Data sharing and broad-based investigations are challenging

4 A first attempt ChildPhon (Rose 2003) Analytical (relational) database for child language data Designed within FileMaker Pro Main features Interface for double-blind transcriptions Automatic functions based on phonetic transcriptions: Syllabification of transcribed forms Detection of common processes observed in child language (e.g. onset cluster reduction) ChildPhon (Rose 2003) Analytical (relational) database for child language data Designed within FileMaker Pro Main features Interface for double-blind transcriptions Automatic functions based on phonetic transcriptions: Syllabification of transcribed forms Detection of common processes observed in child language (e.g. onset cluster reduction)

5 Problems with ChildPhon No support for Unicode fonts  no X-platform compatibility (Macintosh-only) Not compatible with CHILDES / TalkBank  no data exchange functions Automatic parses limited, not customizable Multimedia capabilities are minimal (at best) Requires use of proprietary software and font Algorithms are ‘destructive’ Statistical functions are minimal No web implementation In sum: Good idea -- Bad implementation No support for Unicode fonts  no X-platform compatibility (Macintosh-only) Not compatible with CHILDES / TalkBank  no data exchange functions Automatic parses limited, not customizable Multimedia capabilities are minimal (at best) Requires use of proprietary software and font Algorithms are ‘destructive’ Statistical functions are minimal No web implementation In sum: Good idea -- Bad implementation

6 Phon: a more promising solution Interdisciplinary project (First of its kind between Linguistics and Computer Science at Memorial University of Newfoundland)  Software designers and programmers: Rodrigue Byrne, Gregory Hedlund, Philip O'Brien, Yvan Rose, Harold Wareham  Financial Support:  Faculty of Arts, Memorial University  Social Sciences and Humanities Research Council of Canada (SSHRC)  Canada Fund for Innovation (CFI)  National Science Foundation (NSF) Interdisciplinary project (First of its kind between Linguistics and Computer Science at Memorial University of Newfoundland)  Software designers and programmers: Rodrigue Byrne, Gregory Hedlund, Philip O'Brien, Yvan Rose, Harold Wareham  Financial Support:  Faculty of Arts, Memorial University  Social Sciences and Humanities Research Council of Canada (SSHRC)  Canada Fund for Innovation (CFI)  National Science Foundation (NSF)

7 Phon: Overview Software underpinnings: Programmed in Java, Unicode font encoding Cross-platform compatible (Mac, Windows, …) XML data storage structure Compatible with TalkBank schema User management system Extended multimedia capabilities More flexible automatic algorithms Specialized query language Offers a complete solution for data sharing Software underpinnings: Programmed in Java, Unicode font encoding Cross-platform compatible (Mac, Windows, …) XML data storage structure Compatible with TalkBank schema User management system Extended multimedia capabilities More flexible automatic algorithms Specialized query language Offers a complete solution for data sharing

8 Phon: usability Intuitive graphical user interface Helpful wizards (e.g. project creation, queries) Record navigator Custom selection of data fields General / record-by-record Intuitive query language Standard terminology Built-in queries (modifiable by user) Query memorization and saving Intuitive graphical user interface Helpful wizards (e.g. project creation, queries) Record navigator Custom selection of data fields General / record-by-record Intuitive query language Standard terminology Built-in queries (modifiable by user) Query memorization and saving

9 Phon: main functions User management Media segmentation Phonetic transcription Transcription merging (Selection of ‘final’ transcriptions for analysis) Phrase segmentation and alignment ( Further segmentation according to research needs) Syllable alignment (Alignment of syllables of target and actual forms) Database query User management Media segmentation Phonetic transcription Transcription merging (Selection of ‘final’ transcriptions for analysis) Phrase segmentation and alignment ( Further segmentation according to research needs) Syllable alignment (Alignment of syllables of target and actual forms) Database query

10 User management Secure login User tasks / privileges management Secure login User tasks / privileges management

11 Media segmentation Generally similar to CLAN Hit the space bar to define a speech segment Default segment length user-defined Useful for working on small speech segments Segment editing: Change numerical value ‘Stretch’ the time segment by sliding pointer Generally similar to CLAN Hit the space bar to define a speech segment Default segment length user-defined Useful for working on small speech segments Segment editing: Change numerical value ‘Stretch’ the time segment by sliding pointer   Play Export sound clip Export sound clip Yvan Rose: Replace yellow line in segment “timebar” by waveform.

12 Transcription: general interface Transcription window Transcription window Session info (drawer) Session info (drawer) Media controls Media window Media window

13 Transcription Built-in IPA character map Symbol ‘categories’ Access to sound segment Interface for double-blind transcriptions Tied with user management functions Built-in IPA character map Symbol ‘categories’ Access to sound segment Interface for double-blind transcriptions Tied with user management functions Yvan Rose: Link adulttranscription to an electronic IPA dictionary. Need to develop a transcription system for sounds that can’t be transcribed easily. Ability to assign a feature set to a dummy character Ability to use the forward slash bar to assign two competing symbols to a given sound (e.g. p/b would imply that voicing cannot be transcribed accurately; the alternants will be considered as one consonant by the syllabifier and query interpreter.

14 Transcription merging Comparison of ‘competing’ transcriptions Direct access to media segment Selection of most accurate transcription Further refinement of selected transcription Comparison of ‘competing’ transcriptions Direct access to media segment Selection of most accurate transcription Further refinement of selected transcription Yvan Rose: People an algorithm that would enable a comparison of transcriptions based on specific parameters (e.g. voicing). This algorithm could build on the feature sets associated with each segment transcribed.

15 Phrase alignment Further segmentation of the utterances Useful for research on phonological domains A simple mouse click sets and resets the domain boundaries Further segmentation of the utterances Useful for research on phonological domains A simple mouse click sets and resets the domain boundaries Yvan Rose: Several people requested different levels of segmentation. This includes morpho-syntactic levels of segmentation, as well as various levels of the prosodic hierarchy. Also: add PLAY button in the interface of this module

16 Syllabification algorithm Refined labeling of each syllabic position Each label is a valid object for query Syllabification algorithm Refined labeling of each syllabic position Each label is a valid object for query  k ø n s t r e I n t s ‘constraints’ NN R R  OO Syllabification algorithm

17 Parameters of syllabification are user-definable Syllabification algorithm Timing tier Syllable constituents Yvan Rose: The parameters will be revised thoroughly. To add (among others): word-final codas, list of exceptional clusters. Also add, to complement stress attraction, an option of ambisyllabic syllabification of intervocalic consonants in Strong-Weak syllable juncture. In addition to this, we also need a way to manually assign a syllabification to each consonant which cannot be accounted for by the automatic algorithm.

18 Syllable alignment Automatic alignment of syllables Manual modifications Automatic alignment of syllables Manual modifications

19 Query language Quick and accurate queries on large amounts of data Language features Uses terms familiar to phonologists to compose queries Syllable constituents: onset, nucleus, … Stressed vs. unstressed syllables Custom predicates History of recent queries Ability to save queries Quick and accurate queries on large amounts of data Language features Uses terms familiar to phonologists to compose queries Syllable constituents: onset, nucleus, … Stressed vs. unstressed syllables Custom predicates History of recent queries Ability to save queries

20 Query language components Selectors (e.g. Onset(Syllable x)) Predicates (e.g. Branching(Onset(Syllable x)) Boolean connectives Example: Selectors (e.g. Onset(Syllable x)) Predicates (e.g. Branching(Onset(Syllable x)) Boolean connectives Example: let corpusName = "TestCorpus", let corpus = Corpus(corpusName), let records = Records(corpus) foreach r in records foreach p in Phrases(r) foreach s in Syllables(p) Branching(Onset(TargetSyllable(s))) AND NOT Branching(Onset(ActualSyllable(s))) let corpusName = "TestCorpus", let corpus = Corpus(corpusName), let records = Records(corpus) foreach r in records foreach p in Phrases(r) foreach s in Syllables(p) Branching(Onset(TargetSyllable(s))) AND NOT Branching(Onset(ActualSyllable(s)))

21 Query tree structure Branching onset reduction in 2nd syllable branching( )pos(, 2)onset( )TargetPhrase Record TargetPhrase Syllable Nucleus Rhyme TUN D RASDUN D AS Onset Nucleus Onset TRUE ActualPhrasepos(, 2) ActualPhrase onset( )branching( ) Onset FALSE AND NOT MATCH Coda

22 Query results View in application Use to generate textual reports Recording session (e.g. to exemplify a given process) Time slice (e.g. to exemplify a stage of acquisition) Entire database (to exemplify a learning curve) Export As Unicode file As ASCII file (modulo font conversion limitations) View in application Use to generate textual reports Recording session (e.g. to exemplify a given process) Time slice (e.g. to exemplify a stage of acquisition) Entire database (to exemplify a learning curve) Export As Unicode file As ASCII file (modulo font conversion limitations)

23 Enhancements (short term) Improvement of syllable alignment algorithm (building on Kondrak’s 2003 algorithm) Import function ChildPhon files (including font translator --almost done!) CHAT files Incorporation user-defined fields Incorporation of statistical functions Chart report generator Ability to select various chart formats Bar graphs (for proportions within and across sessions) Line graphs (for learning curves) Improvement of syllable alignment algorithm (building on Kondrak’s 2003 algorithm) Import function ChildPhon files (including font translator --almost done!) CHAT files Incorporation user-defined fields Incorporation of statistical functions Chart report generator Ability to select various chart formats Bar graphs (for proportions within and across sessions) Line graphs (for learning curves)

24 Enhancements (longer term) Interoperability with Praat Export to Praat (similar to CLAN function) Interface to accommodate acoustic measurement data Web-based interface Data sharing at a distance Easy query of corpora on CHILDES database Further automation Automatic detection of pre-identified processes Interoperability with Praat Export to Praat (similar to CLAN function) Interface to accommodate acoustic measurement data Web-based interface Data sharing at a distance Easy query of corpora on CHILDES database Further automation Automatic detection of pre-identified processes Yvan Rose: Include function to extract phonetic inventories per session/stage/… Get examples of ‘canned’ analyses in literature on clinical phonology.

25 Development timeline End of fall of 2004 Completion of current development phase Release of testing (Beta) version Winter of 2005 Bug fixes Improvement of functionality and user interface (including short-term enhancements) Website creation (http://www.phon.ca/) Completion of technical documentation Notes to programmers User guide Summer of 2005  Phonopen-source freeware Release of  Phon 1.0 as open-source freeware End of fall of 2004 Completion of current development phase Release of testing (Beta) version Winter of 2005 Bug fixes Improvement of functionality and user interface (including short-term enhancements) Website creation (http://www.phon.ca/) Completion of technical documentation Notes to programmers User guide Summer of 2005  Phonopen-source freeware Release of  Phon 1.0 as open-source freeware

26 Potential Standard for data sharing Large-scale investigations Cross-linguistic investigations Enhancement to CHILDES Elaboration of a database fulfilling the needs of acquisitionists focussing on phonology and related issues Investigation of interface issues (e.g. between morpho-syntax and phonology) Standard for data sharing Large-scale investigations Cross-linguistic investigations Enhancement to CHILDES Elaboration of a database fulfilling the needs of acquisitionists focussing on phonology and related issues Investigation of interface issues (e.g. between morpho-syntax and phonology)

27 How to realize this potential Team of researchers specializing in: Early acquisition (including babbling) Segmental development Prosodic development Phonological disorders Second language acquisition … Feedback on software development project Data contribution Existing corpora in digital format Conversion of printed corpora Identification of corpora (printed, with or without audio files) Setting of conventions for data conversion Team of researchers specializing in: Early acquisition (including babbling) Segmental development Prosodic development Phonological disorders Second language acquisition … Feedback on software development project Data contribution Existing corpora in digital format Conversion of printed corpora Identification of corpora (printed, with or without audio files) Setting of conventions for data conversion

28 Our proposal Constitution of a research team to develop a phonological component of CHILDES Database Supporting software Elaboration, with the research team, of a grant application to support: Database elaboration Software development Periodical meetings Workshops … Constitution of a research team to develop a phonological component of CHILDES Database Supporting software Elaboration, with the research team, of a grant application to support: Database elaboration Software development Periodical meetings Workshops …

29 Concretely Feedback on software project Software needs for various types of research Let us know what you need Implementation Let us know how you want it to work Contribution to grant application Kinds of research would the new database enable Let us know what you would like to do Impacts of this research (e.g. theoretical, clinical, …) Supporting letters Contribution to the public database Sharing of existing / future corpora Establishment of conventions to format older corpora Feedback on software project Software needs for various types of research Let us know what you need Implementation Let us know how you want it to work Contribution to grant application Kinds of research would the new database enable Let us know what you would like to do Impacts of this research (e.g. theoretical, clinical, …) Supporting letters Contribution to the public database Sharing of existing / future corpora Establishment of conventions to format older corpora

30 Special thanks The ‘Phon’ team at Memorial: Rodrigue Byrne Harold Wareham Gregory Hedlund Philip O’Brien For his great help with the TalkBank XML schema: Franklin Chen (Carnegie Mellon University) For their useful feedback on an early version of this software: Heather Goad (McGill), Paula Fikkert (Nijmegen), Clara Levelt (Leiden), Katherine Demuth (Brown), Mark Johnson (Brown), Carrie Dyck (Memorial), Phil Branigan (Memorial), Brian MacWhinney (Carnegie Mellon), Bryan Gick (UBC), Sophie Wauquier-Gravelines (Nantes), Sharon Inkelas (UC Berkeley), Conxita Lleó, Sonia Frota (Lisbon), Maria João Freitas (Lisbon), Ronald Sprouse (UC Berkeley), Joe Pater (UMass, Amherst), John Archibald (Calgary), Éliane Lebel (Memorial); hoping that no one was forgotten… The ‘Phon’ team at Memorial: Rodrigue Byrne Harold Wareham Gregory Hedlund Philip O’Brien For his great help with the TalkBank XML schema: Franklin Chen (Carnegie Mellon University) For their useful feedback on an early version of this software: Heather Goad (McGill), Paula Fikkert (Nijmegen), Clara Levelt (Leiden), Katherine Demuth (Brown), Mark Johnson (Brown), Carrie Dyck (Memorial), Phil Branigan (Memorial), Brian MacWhinney (Carnegie Mellon), Bryan Gick (UBC), Sophie Wauquier-Gravelines (Nantes), Sharon Inkelas (UC Berkeley), Conxita Lleó, Sonia Frota (Lisbon), Maria João Freitas (Lisbon), Ronald Sprouse (UC Berkeley), Joe Pater (UMass, Amherst), John Archibald (Calgary), Éliane Lebel (Memorial); hoping that no one was forgotten…


Download ppt "Towards a solution for the sharing of phonological data Yvan Rose Memorial University of Newfoundland Brian MacWhinney Carnegie Mellon University Yvan."

Similar presentations


Ads by Google