DocLing2016 Software Tools Peter K. Austin Department of Linguistics SOAS, University of London 2016-02-10.

Slides:



Advertisements
Similar presentations
IRCS Workshop on Open Language Archives IMDI & Endangered Languages Archives Heidi Johnson / AILLA.
Advertisements

The Seven Pillars of Open Language Archiving: A Vision Statement Gary Simons and Steven Bird Workshop on Web-based Language Documentation and Description.
The Seven Pillars of Open Language Archiving: Introducing the OLAC Vision Gary Simons SIL International LSA Symposium: The Open Language Archives Community.
LIFTing LEGO with RELISH: Lexicon Interchange FormaT in Use Helen Aristar-Dry Institute for Language Information and Technology Eastern Michigan U.
The Chinese Room: Understanding and Correcting Machine Translation This work has been supported by NSF Grants IIS Solution: The Chinese Room Conclusions.
Language Documentation Claire Bowern Yale University LSA Summer Institute: 2013 Week 1: Software.
Software Tools for Language Documentation DocLing 2013 Peter K. Austin Department of Linguistics, SOAS.
Advanced Metadata Usage Daan Broeder TLA - MPI for Psycholinguistics / CLARIN Metadata in Context, APA/CLARIN Workshop, September 2010 Nijmegen.
Susan Gehr Cell/text (707)
Language Documentation Claire Bowern Yale University LSA Summer Institute: 2013 Week 3: Thursday (corpora)
Annotation, Alignment and Transcription: An extremely brief and basic introduction to Elan and Transcriber OLAC Tutorial at the Linguist Society of America.
Using ELAN for transcription and annotation Anthony Jukes.
PHONEXIA Can I have it in writing?. Discuss and share your answers to the following questions: 1.When you have English lessons listening to spoken English,
Moodle, Blogs, Wikis and More Exploring Web 2.0 Tools: The 2nd Generation of the World Wide Web.
Zum Aufbau eines multimedialen Spracharchivs Dagmar Jung (Institut für Linguistik, Allgemeine Sprachwissenschaft, Universität zu Köln) CCeH Eröffnungsworkshop.
Many kinds of clients and servers This work is licensed under a Creative Commons Attribution-Noncommercial- Share Alike 3.0 License. Skills: none IT concepts:
General Report Wichita Project Boulder, Colorado, USA David S. Rood Andreas Mühldorfer Armik Mirzayan Frankfurt,
1 Final Year Project 2003/2004 LYU0302 PVCAIS – Personal Video Conference Archives Indexing System Supervisor: Prof Michael Lyu Presented by: Lewis Ng,
Chapter 3 Software Two major types of software
Accessible Video in Two Parts Terrill Thompson Techology Accessibility Specialist University of Washington
Presented by Eroika Jeniffer.  What are we going to learn? - the use of chat in classroom - the most likely application on chat. And many more….. So,
Accessibility Compliance in Distance Learning: Barrier-Free Multimedia Robert Wyatt, Director of Distance Education & Leyla Zhuhadar, Instructional Designer,
Knowledge Science & Engineering Institute, Beijing Normal University, Analyzing Transcripts of Online Asynchronous.
Chapter 5 Application Software.
Towards a definition of GestBase - an open database of gestures Milan Rusko Institute of Informatics of the Slovak Academy of Sciences, Bratislava.
Current Trends in Language Documentation and the Hans Rausing Endangered Languages Project Lenore A. Grenoble Dartmouth College Lenore A. Grenoble Linguistics.
July 11, 2003E-MELD 2003 E-MELD “School” of Best Practice Helen Aristar-Dry & Gayathri Sriram The LINGUIST List Eastern Michigan University.
Slide 1 System Software Software The term that we use for all the programs and data that we use with a computer system. Two types of software: Program.
Sharing linguistic multi-media resources Jacquelijn Ringersma Paul Trilsbeek Max Planck Institute for Psycholinguistics Nijmegen, The Netherlands.
June 20, 2006E-MELD 2006, MSU1 Toward Implementation of Best Practice: Anthony Aristar, Wayne State University Other E-MELD Outcomes.
Lights, Camera, Caption! Presented by Kaela Parks.
The Language Archive – Max Planck Institute for Psycholinguistics Nijmegen, The Netherlands Application of Audio and Video Processing Methods for Language.
Moodle (Course Management Systems). Assignments 1 Assignments are a refreshingly simple method for collecting student work. They are a simple and flexible.
The Archive of the Indigenous Languages of Latin America Goals and Visions.
Max Planck Institute for Psycholinguistics Tool development report H. Brugman MPI Nijmegen.
CapturaTalk4Android Demonstration Abi James
Parser-Driven Games Tool programming © Allan C. Milne Abertay University v
Publishing Technology & Media Solutions.  The flips are back with new features.  Embed Audio & video with seamless streaming.
LEXUS: a web based lexicon tool Jacquelijn Ringersma Max Planck Institute for Psycholinguistics Nijmegen, The Netherlands.
Introduction to ELAN Mary Chambers ELAP, Department of Linguistics, SOAS.
Hands-on tutorial: Using Praat for analysing a speech corpus Mietta Lennes Palmse, Estonia Department of Speech Sciences University of Helsinki.
Systems Software Operating Systems. What is software? Software is the term that we use for all the programs and data that we use with a computer system.
QuikTrac 5.5, a validated Motorola Software Solution, allows you to take your Host ERP screens and extend them out to fixed or mobile devices including.
The Dictionary Development Pathway Facilitating Dictionary Development through Language Software.
© Paradigm Publishing Inc. 5-1 Chapter 5 Application Software.
INFO1408 Database Design Concepts Week 15: Introduction to Database Management Systems.
Exploring and Enriching a LR Archive via the Web Marc Kemps-Snijders, Alex Klassmann, Claus Zinn, Peter Berck, Albert Russel, Peter Wittenburg MPI for.
SIL FieldWorks Language Explorer: The lexicon component Gary Simons SIL International Lexicon Tools and Lexicon Standards Nijmegen, 4–5 August 2010.
C OMPUTING E SSENTIALS Timothy J. O’Leary Linda I. O’Leary Presentations by: Fred Bounds.
Fieldworks Language Explorer Training December 2, 2011 Sulphur, OK Your Language Program and the FLEx database software Joshua Jensen
LINGUATECA FLUP/CLUP The Corpógrafo – a Web-based environment for corpora research extract Term Candidates.
1 Software. 2 What is software ► Software is the term that we use for all the programs and data on a computer system. ► Two types of software ► Program.
National Library of the Czech Republic Integration of digital materials into EDL Adolf Knoll National Library of the Czech Republic Helsinki CENL Workshop.
Microsoft Office Live Meeting What’s New for Attendees? Streamlined User Experience Improved Web Access Client Local PC and Server Recordings High.
Multimedia.
, Bauru, Teacher Poly & Teacher Ulisses Audio Class!
TypeCraft Software Evaluation 21/02/ :45 Powered by None Complete: 10 On, Partial: 0 Off, Excluded: 0 Off Country: All, Region:
Annotation by category – ELAN and ISO DCR Han Slöetjes, Peter Wittenburg Max-Planck-Institute for Psycholinguistics LREC,
Transcription Software Amazing Slow Downer & Transcribe! Rick Lollar Amazing Slow Downer & Transcribe! Rick Lollar.
1 February 2012 ILCAA, TUFS, Tokyo program David Nathan and Peter Austin Hans Rausing Endangered Languages Project SOAS, University of London Language.
Manolis Mylonakis, Polyxeni Arapi, Nikos Pappas, Nektarios Moumoutzis, Stavros Christodoulakis {manolis, xenia, nikos, nektar, Laboratory.
ELAN as a tool for oral history CLARIN Oral History Workshop Oxford Sebastian Drude CLARIN ERIC 18 April 2016.
© STZ Language Learning Media Telos Language Partner (TLP Pro) TLP Pro combines communication-oriented interactive self-study activities with intuitive.
Language Software Overview May Language Software Overview Which software to use for a given language development task? Kent Schroeder SIL Africa.
English-Lithuanian-English Lexicon Database Management System for MT Gintaras Barisevicius and Elvinas Cernys Kaunas University of Technology, Department.
FLEx 1 NATHANIEL EVERSOLE JULIET MORGAN. WHAT IS FLEx?
The Dictionary Development Pathway Facilitating Dictionary Development through Language Software.
Language Software Overview
Hands-on tutorial: Using Praat for analysing a speech corpus
Presentation transcript:

DocLing2016 Software Tools Peter K. Austin Department of Linguistics SOAS, University of London

© 2016 Peter K. Austin Creative commons licence Attribution-NonCommercial-NoDerivs CC BY-NC-ND

With thanks to … Stuart McGill, Anthony Jukes and Candide Simard who all contributed to the development of these materials for various training courses

After you make a recording You probably need to transcribe it. You may need to translate it. You may want to add other information. Some tools will help you transcribe. ELAN and Transcriber are two that linguists are using these days

ELAN “ELAN (EUDICO Linguistic Annotator) is an annotation tool that allows you to create, edit, visualize and search annotations for video and audio data.” links text annotations with audio and/or video data. one audio stream, up to four video streams ELAN files can be exported in a variety of formats (including to Shoebox/Toolbox for interlinearisation, then reimported)

What can’t ELAN do? It can’t do your transcription It can’t do your analysis It can’t keep you organised It can’t (by itself) make a viewer for community members It isn’t (unfortunately) very easy to learn

What can ELAN do? It can help with transcription and translation It can help with your analysis by presenting your data It can help keep you organised by linking the media and data files together It can help you find things in your data It can help if making a product for community members (text, subtitled video)

Tiers

Tiers are where you put your annotations Tiers can contain many kinds of annotations, some of the most obvious are:  IPA transcription  practical orthographic transcription  free translations into languages of wider communication  morphemes and gloss  gesture annotation  grammar notes  socially significant information  any other information which seems relevant

ELAN – plus and minus Handles most audio and video formats Powerful for annotating and searching Good compatibility with Toolbox Good exports for web video etc via CUPED or other tools Prospects for development Multi-platform, open- source Difficult to get started – steep learning curve No inbuilt tools for interlinearising or lexicon building *Too* powerful/flexible – temptation to add zillions of tiers, gets cluttered and confusing

Transcriber Transcriber is a tool for assisting the manual annotation of speech signals. It provides a user interface for segmenting long duration speech recordings, transcribing them, and labeling speech turns, topic changes and acoustic conditions.

Transcriber plus and minus Relatively easy to set up and use XML format for easy file exchange Handles most audio formats Multi-platform, open source Lacks video support Overlapping speech tricky to handle when exporting to Toolbox Not (really) designed for linguists – unlikely to integrate with linguistic analysis tools in the future

You’ve transcribed. Now what? Grammar analysis Lexicon building Cultural/ethnographic notes ??? Tools that help you do some of these things: Toolbox Fieldworks Language Explorer (FLEx) – both from SIL

Toolbox Toolbox is a data management and analysis tool for field linguists. It is especially useful for maintaining lexical data, and for parsing and interlinearizing text, but it can be used to manage virtually any kind of data.

Toolbox plus and minus Tried and tested (Relatively) easy to use after some initial study Large and helpful user community Interoperability with ELAN Can produce printed or online dictionaries with MDF or Lexique Pro Standard Format (backslash codes) not really well-structured ‘End of life’? It is very old, not being developed actively Limited interaction with media files Mac only under emulation

Fieldworks Language Explorer “FieldWorks is a set of software tools that help manage cultural and linguistic data from initial collection through submission for publication” It can be used to record lexical information and develop dictionaries. It can interlinearize text. The morphological parser provides the user with a way to check the grammatical rules they have recorded against real language data. The grammar information can also be compiled in an automatically generated grammar sketch.

FLEx plus and minus Better data structure than Toolbox - XML Very powerful parsing and grammatical analysis tools Designed to hold all your linguistic and cultural data and notes Poor handling of media Large application, memory hog Windows only Poor integration with Toolbox

Another dictionary tool – WeSay WeSay helps non-linguists build a dictionary in their own language. It has various ways to help native speakers to think of words in their language and enter some basic data about them (no backslash codes, just forms to fill in). Designed for teamwork – one ‘advanced’ user does the complicated set-up work, very simple interface for other users

We Say plus and minus Very simple to use Will run on netbooks and other low- powered machines Good data structure Easy export via Lexique Pro for print/web No tools for interlinearising or analysis Limited media support Windows only

Comparison of programs TranscriberELANToolboxFLExWeSay Audio time-alignment  Video time-alignment  Multi-tier annotation  Interlinear support  Lexicography  Word collection  Simple to learn  Special char. input  XML data 

Managing metadata There are a few programs that can be used to manage metadata Arbil (from MPI Nijmegen) can be used online or stand alone for IMDI metadata CIMDI Maker for offline CIMDI metadata SayMore (from SIL) can be used to harvest metadata from files and then say more about it (also transcription or translation) Being developed but starting to look solid

SayMore