LREC 2008, May 26 – June 1, Marrakesh Bridging the Gap between Linguists & Technology Developers: Large-Scale, Sociolinguistic Annotation for Dialect and.

Slides:



Advertisements
Similar presentations
Mini Presentations: How To
Advertisements

Self-Study Tool for Alaska Schools Winter Conference January 14, 2010 Jon Paden, EED Deborah Davis, Education Northwest/Alaska Comprehensive Center.
Atomatic summarization of voic messages using lexical and prosodic features Koumpis and Renals Presented by Daniel Vassilev.
Articulation Treatment
Markpong Jongtaveesataporn † Chai Wutiwiwatchai ‡ Koji Iwano † Sadaoki Furui † † Tokyo Institute of Technology, Japan ‡ NECTEC, Thailand.
Effects of Competence, Exposure, and Linguistic Backgrounds on Accurate Production of English Pure Vowels by Native Japanese and Mandarin Speakers Malcolm.
Identification of prosodic near- minimal Pairs in Spontaneous Speech Keesha Joseph Howard University Center for Spoken Language Understanding (CSLU) Oregon.
Retelling a personal history... From Reading to Writing Do you ever wish you knew more about the lives of your parents, grandparents, or friends? Although.
Perception of syllable prominence by listeners with and without competence in the tested language Anders Eriksson 1, Esther Grabe 2 & Hartmut Traunmüller.
Languages Dialect and Accents
Language Access Responsibilities
Development of Automatic Speech Recognition and Synthesis Technologies to Support Chinese Learners of English: The CUHK Experience Helen Meng, Wai-Kit.
Emotion in Meetings: Hot Spots and Laughter. Corpus used ICSI Meeting Corpus – 75 unscripted, naturally occurring meetings on scientific topics – 71 hours.
Job Analysis OS352 HRM Fisher January 31, Agenda Follow up on safety discussion Job analysis – foundation of HR – Purpose – Various techniques.
Designing Software for Personal Music Management and Access Frank Shipman & Konstantinos Meintanis Department of Computer Science Texas A&M University.
Chapter three Phonology
Brainstorming and Idea Reduction
PHONOLOGICAL ANALYSIS ABSTRACT Substitution is a common phenomenon when a non-English speaker speaks English with foreign accent. By using spectrographic.
Linguistic Transference and Interference: Interpreting Between English and ASL Jeffrey Davis Davis, Jeffrey E Linguistic transference and interference:
Ellinor Bollman Young Speakers. The core idea of Young Speakers is that children are experts in their own situation and can provide valuable.
National Curriculum Key Stage 2
Semantic and phonetic automatic reconstruction of medical dictations STEFAN PETRIK, CHRISTINA DREXEL, LEO FESSLER, JEREMY JANCSARY, ALEXANDRA KLEIN,GERNOT.
ISSUES IN SPEECH RECOGNITION Shraddha Sharma
Qualitative Research Methods
ESL Phases & ESL Scale Curriculum Corporation 1994.
9 Closing the Project Teaching Strategies
Katherine S. Holmes READ 7140 May 28, Georgia Writing Test – 5 th Grade GOAL: To assess the procedures to enhance statewide instruction in language.
Arabic STD 2006 Results Jonathan Fiscus, Jérôme Ajot, George Doddington December 14-15, Spoken Term Detection Workshop
Real-Time Speech Recognition Subtitling in Education Respeaking 2009 Dr Mike Wald University of Southampton.
Interactive Dialogue Systems Professor Diane Litman Computer Science Department & Learning Research and Development Center University of Pittsburgh Pittsburgh,
Computational Investigation of Palestinian Arabic Dialects
Nasal endings of Taiwan Mandarin: Production, perception, and linguistic change Student : Shu-Ping Huang ID No. : NA3C0004 Professor : Dr. Chung Chienjer.
Chapter 3 Social Dialectology ‘us’ vs. ‘them’. Funny…?
© 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
The Linguistics of Second Language Acquisition
Copyright 2007, Toshiba Corporation. How (not) to Select Your Voice Corpus: Random Selection vs. Phonologically Balanced Tanya Lambert, Norbert Braunschweiler,
Enhanced Infrastructure for Creation & Collection of Translation Resources Zhiyi Song, Stephanie Strassel (speaker), Gary Krug, Kazuaki Maeda.
LREC 2008, May 26 – June 1, Marrakesh Speaker Recognition: Building the Mixer 4 and 5 Corpora Linda Brandschain, Christopher Cieri, David Graff, Abby Neely,
Information Technology – Dialogue Systems Ulm University (Germany) Speech Data Corpus for Verbal Intelligence Estimation.
Chapter 6 Determining System Requirements. 2 2 What are Requirements? “Requirements are … a specification of what should be implemented. They are descriptions.
10 Strategies to do well in TOEIC Speaking Exams By Thomas Gowing 26/01/12 By following some tips, non-native English speakers can get high or higher scores.
Ch 3 Slide 1 Is there a connection between phonemes and speakers’ perception of phonetic differences? (audibility of fine distinctions) Due to phonology,
THE NATURE OF TEXTS English Language Yo. Lets Refresh So we tend to get caught up in the themes on English Language that we need to remember our basic.
Rundkast at LREC 2008, Marrakech LREC 2008 Ingunn Amdal, Ole Morten Strand, Jørn Almberg, and Torbjørn Svendsen RUNDKAST: An Annotated.
Assessment of Phonology
EVALUATION OF HRD PROGRAMS Jayendra Rimal. The Purpose of HRD Evaluation HRD Evaluation – the systematic collection of descriptive and judgmental information.
Elaine Ménard & Margaret Smithglass School of Information Studies McGill University [Canada] July 5 th, 2011 Babel revisited: A taxonomy for ordinary images.
Are you ready to play…. Deal or No Deal? Deal or No Deal?
Automatic Cue-Based Dialogue Act Tagging Discourse & Dialogue CMSC November 3, 2006.
Structural Metadata Annotation of Speech Corpora: Comparing Broadcast News and Broadcast Conversations Jáchym KolářJan Švec University of West Bohemia.
© 2005 by Prentice Hall Chapter 6 Determining System Requirements Modern Systems Analysis and Design Fourth Edition Jeffrey A. Hoffer Joey F. George Joseph.
This multimedia product and its contents are protected under copyright law. The following are prohibited by law: any public performance or display, including.
Hello, Who is Calling? Can Words Reveal the Social Nature of Conversations?
DiscAn : Towards a Discourse Annotation system for Dutch language corpora or why and how we would want to annotate corpora on the discourse level Ted Sanders.
0 / 27 John-Paul Hosom 1 Alexander Kain Brian O. Bush Towards the Recovery of Targets from Coarticulated Speech for Automatic Speech Recognition Center.
The Audio-Lingual Method
1/17/20161 Emotion in Meetings: Business and Personal Julia Hirschberg CS 4995/6998.
Language and Social Class
SWE 214 (071) Chapter 12: Brainstorming and Idea Reduction Slide 1 Chapter 12: Brainstorming and Idea Reduction.
INTRODUCTION TO THE WIDA FRAMEWORK Presenter Affiliation Date.
Objectives of session By the end of today’s session you should be able to: Define and explain pragmatics and prosody Draw links between teaching strategies.
Welcome to All S. Course Code: EL 120 Course Name English Phonetics and Linguistics Lecture 1 Introducing the Course (p.2-8) Unit 1: Introducing Phonetics.
Cross-Dialectal Data Transferring for Gaussian Mixture Model Training in Arabic Speech Recognition Po-Sen Huang Mark Hasegawa-Johnson University of Illinois.
© 2016 albert-learning.com. The Speaking Test Assesses speaking in a foreign language in a business context. Lasts a maximum of 15 minutes. Available.
PREPARED BY MANOJ TALUKDAR MSC 4 TH SEM ROLL-NO 05 GUKC-2012 IN THE GUIDENCE OF DR. SANJIB KR KALITA.
What is sociolinguistics 2
Automatic Speech Recognition
Dean Luo, Wentao Gu, Ruxin Luo and Lixin Wang
FCE (FIRST CERTIFICATE IN ENGLISH) General information.
What is sociolinguistics?
Presentation transcript:

LREC 2008, May 26 – June 1, Marrakesh Bridging the Gap between Linguists & Technology Developers: Large-Scale, Sociolinguistic Annotation for Dialect and Speaker Recognition* Christopher Cieri 1, Stephanie Strassel 1, Meghan Glenn 1, Reva Schwartz 2, Wade Shen 3, Joseph Campbell 3 1. Linguistic Data Consortium 3600 Market Street, Suite 810 Philadelphia, PA {ccieri, strassel, 3. MIT Lincoln Laboratory 244 Wood Street Lexington, MA {swade, 2. United States Secret Service Washington, DC * This work is sponsored by the Department of Homeland Security under Air Force Contract FA C Opinions, interpretations, conclusions and recommendations are those of the authors and are not necessarily endorsed by the United States Government

LREC 2008, May 26 – June 1, Marrakesh Introduction to Phanotics  Increased interest in speaker recognition community in high-level features that abstract from the acoustic signal. lexical choice, presence of idiomatic expressions, syntactic structures  Forensic applications require robustness to channel differences channel adaptation and the identification of features inherently robust to channel difference  Language Recognition community increasingly mutually intelligible dialects, not just languages  Decades of research in dialectology suggest that high-level features can enable systems to cluster speakers according to the dialects they speak.  Phanotics (Phonetic Annotation of Typicality in Conversational Speech) seeks to Sponsored by United States Secret Service MIT Lincoln Laboratory coordinates effort and develops the systems Linguists from Arizona State and Old Dominion universities consult on dialectal phenomena LDC and Appen Pty Ltd o Australia annotate data provided by LDC and Identify high-level features characteristic of American dialects, annotate a corpus for these features use the data to develop dialect recognition systems use the categorization to create better models for speaker recognition

LREC 2008, May 26 – June 1, Marrakesh Annotation Approach  Annotating large corpora for many high-level features impractical without existing data annotations technologies that simplify the annotator’s task  Phanotics uses data orthographically transcribed to serve as a guide to potential loci for the features sought orthographic transcripts, pronouncing lexicon, forced-aligner generate putative, time-aligned, phonetic transcription that images that the speaker’s utterances were standard. high-level features of interest described as deviations from standard pronunciation loci in which actual pronunciation differs from putative standard are potential high-level features  Since complete phonetic transcription cost-prohibitive automatic phonetic transcription is not adequately accurate we lack dialect studies for every difference one might encounter We do not count deviations directly but allow the technologies to guide human annotators to expected features.

LREC 2008, May 26 – June 1, Marrakesh Requirements  Requires natural speech from speakers of target dialects Initial focus on distinguishing African American Vernacular English (AAVE) from all other dialects of American English (non-AAVE) plan to investigate other American dialects later  Selected data collected to minimize the effect of observation recordings of subjects engaged in conversations  Project requires subjects categorized according to the dialect spoken.  Since goal is to establish typicality of features by dialect, categorization based on something other than features themselves relied on self-reported metadata AAVE native speakers of American English born and raised in the United States ethnically African American Non-AAVE American English speakers of other ethnicities  Remove subjects from either pool who appear later mis-categorized.

LREC 2008, May 26 – June 1, Marrakesh Data Selection  Mixer Corpora CTS, from LDC; supports robust SR development subjects provided age, sex, occupation, cities born/raised, ethnicity subjects completed >=10 six-minute calls speaking to other subjects whom they typically did not know about assigned topics Bilinguals in Arabic, Mandarin, Russian, and Spanish used those languages & English 7% calls in cross-channel recording room (8+ microphones on one side of call calls audited for topic and audio quality but not generally transcribed Although not designed for the current effort includes self-report ethnicity. Pool contains speakers of multiple American English dialects who categorized themselves as African American and other ethnicities 126 Mixer calls transcribed by Phanotics project 35 included conversations between two speakers of AAVE 91 include conversations between one AAVE and non-AAVE

LREC 2008, May 26 – June 1, Marrakesh Data Selection  Fisher Corpus collected at LDC to support STT development within DARPA EARS subjects provided age, sex, native language, and the cities where they were born and raised subjects completed minute calls, speaking to other participants, whom they typically did not know, about assigned topics calls audited for topic and quality verbatim, time-aligned orthographic transcripts were produced lacks crucial information on the ethnicity of the speaker but some subjects were LDC employees, their family, friends, and colleagues small number (171) could be assigned to an ethnic category after the fact  StoryCorps® Griot Initiative funded by Corporation for Public Broadcasting in US one-year effort to record one-hour interviews of African Americans. nine recording locations open for up to six weeks each subjects interview friends and family on topics of their choice potential users receive instructions on conducting good interviews; trained facilitator present participants receive a free copy of their interview; other copies are archived and distributed StoryCorps provides Phanotics selected interview in exchange for transcripts  Sociolinguistic Interviews recorded and contributed by researchers working in the United States variable quality being reviewed for potential use

LREC 2008, May 26 – June 1, Marrakesh Transcription  Most audio lacked transcripts; LDC designed spec for this project. similar to Fisher Quick Transcription specification emphasizes speed and accuracy. annotators segment speech at sentence level sentences further segmented if >8 seconds; >0.5 seconds internal silence segments overlap; audio containing no speech left un-segmented standard orthography, case, punctuation (period, question mark, comma) -- incomplete sentences and restarts; - incomplete words proper names, acronyms, letter strings capitalized uttered numbers written as words, not as strings of digits limited set of standard contractions are used and non-standard contractions (‘cause for because) written as the full word obviously mispronounced, idiosyncratic words tagged with ‘+’ no other attempt made to mark dialectal pronunciation accomplished in annotation phase limited set of non-lexemes, (um, uh) used in filled pauses speech errors transcribed as produced limited time to transcribe diffluencies since these will be rejected background noises not marked; limited set of markers for speaker noises transcribers indicate low confidence with double parentheses (()).

LREC 2008, May 26 – June 1, Marrakesh Feature Annotation  Goal: identify features that distinguish dialect from standard  features described as rules that change standard into non-standard  rules apply variably according to internal and external constraints lexical identity, morphology of affected word, position within sentence, phonological environment, functional effect of change (for example whether it neutralizes a distinction between two words), the age, sex, socioeconomic class of speakers, dialects they speak  Examples reduction of consonant clusters in final position left => lef’, missed => miss) deletion of r, l, w car => ca’, palm => pa’m, young ones => young ‘uns change of the voiced and voiceless interdental fricatives into stops bother => boda’  Data preparation, customized tools simplify the annotation process  Rules specified as a => b/x_y a becomes b when preceded by x and followed by y input+environment, “xay”, constitute search term input+output a=>b constitute a question to be answered by human Did the subject say xay or xby?

LREC 2008, May 26 – June 1, Marrakesh Feature Annotation  SPAAT (Super Phonetic Annotation and Analysis Tool) designed for rapid annotation and analysis for each feature, presents list of regions of interest (ROI) where rule may have applied since transcript & audio previously forced-aligned, annotator can listen to the audio with small amount of preceding and following context Annotator’s job is to decide whether or not the rule has applied.

LREC 2008, May 26 – June 1, Marrakesh Initial Results  average time to annotate an ROI ranges  Approach to measuring inter-annotator agreement distinguishes initial agreement measured at beginning of effort assess the difficulty of a task from measures repeated after thorough documentation created, annotators undergone rigorous training, testing and selection  Initial inter-annotator agreement varies by rule, rule type, annotator and annotator training absolute average initial agreement across five annotators, all rules was 74.49% on three-way decision where a feature is annotated as present, intermediate or absent converted to two-way decision (feature is present versus intermediate + absent) initial agreement climbs to 85.54% Pair wise agreement by chance in three way and two way decisions is, respectively, 11.1% and 25% initial two way agreement rates were 83.81% for rules involving substitutions and 91.95% for rules involving reductions and insertions.  Team now working to increase IAA expanding training program, documentation to include audio examples decision: form is standard, non-standard, intermediate, unrelated to rule, indeterminate, ROI is mistaken creating a small gold standard

LREC 2008, May 26 – June 1, Marrakesh Summary  Project connects sociolinguistics and HLT  Seeks to determine typicality of high level features in distinguishing dialect for forensic purposes  Focuses initially on AAVE; later on other dialects of American English  Uses existing audio from CTS and interviews  Creates transcripts, audio-transcript time-alignments  Combination of these with SPAAT speeds annotation  Initial inter-annotator agreement encouraging  Modifications of spec, training, tool expected to increase IAA  Fisher audio and transcripts already available in LDC’s Catalog LDC2005S13 Fisher English Training Part 2, Speech LDC2005T19 Fisher English Training Part 2, Transcripts LDC2004S13 Fisher English Training Speech Part 1 Speech LDC2004T19 Fisher English Training Speech Part 1 Transcripts  Mixer audio in queue  Story Corps Griot and Sociolinguistic Interviews under negotiation  To be distributed after use in the program Mixer Transcripts Annotations possibly SPAAT