HLT R&D in South Africa HLT Collaboration between South Africa and the Low Countries Workshop 24 November 2008 Noordhoek, South Africa.

Slides:



Advertisements
Similar presentations
Case study International Best Practice: South Africa (RSA) Gabriele Sauberer (TermNet)
Advertisements

Information Society Technologies Third Call for Proposals Norbert Brinkhoff-Button DG Information Society European Commission Key action III: Multmedia.
Introduction to BLaRKs Helmer Strik Dept. of Linguistics Centre for Language and Speech Technology (CLST) Radboud University Nijmegen, the Netherlands.
ICT Work Programme NCP Infoday 23 June Maria Geronymaki DG INFSO.H.2 ICT for Government & Public Services Objective.
Probabilistic Adaptive Real-Time Learning And Natural Conversational Engine Seventh Framework Programme FP7-ICT
Jing-Shin Chang National Chi Nan University, IJCNLP-2013, Nagoya 2013/10/15 ACLCLP – Activities ( ) & Text Corpora.
MAI Internship April-May MAI Internship 2002 Slide 2 of 14 What? The AST Project promotes development of speech technology for official languages.
Multilingual eLearning in LANGuage Engineering. Project Overview  Project span: Oct 2004 – Oct 2007  Kick-off meeting Oct  Project goals:
Dictionaries for the Human Language Technologies virtual network Dr Mariëtta Alberts Focus Area Manager Standardisation and Terminology Development Pan.
Speech and Language Technologies in the Next Generation Localisation CSET Prof. Andy Way, School of Computing, DCU.
Syllabus and curriculum design From LETRAC to Bologna Belinda Maia University of Porto.
Spoken Language Technologies: A review of application areas and research issues Analysis and synthesis of F0 contours Agnieszka Wagner Department of Phonetics,
Center for Computational Learning Systems Independent research center within the Engineering School NLP people at CCLS: Mona Diab, Nizar Habash, Martin.
Center for Computational Learning Systems Independent research center within the Engineering School NLP people at CCLS: Mona Diab, Nizar Habash, Martin.
Resources Primary resources – Lexicons, structured vocabularies – Grammars (in widest sense) – Corpora – Treebanks Secondary resources – Designed for a.
Centre for Text Technology (CTexT) Research Unit: Languages and Literature in the South African Context North-West University, Potchefstroom Campus (PUK)
“I think it's fair to say that personal computers have become the most empowering tool we've ever created. They're tools of communication, they're tools.
Bootstrapping pronunciation models: a South African case study Presented at the CSIR Research and Innovation Conference Marelie Davel & Etienne Barnard.
PanSALB 2004/05 to 2006/07 Pan South African Language Board LEKGOTLA LA DIPOLELO KAMOKA TŠA AFRIKA BORWA LEKGOTLA LA DIPUO TSOHLE TSA AFRIKA BORWA LEKGOTLA.
National language and terminology policies – a South African perspective Dr Mariëtta Alberts Standardisation and Terminology Development Pan South African.
DEVELOPING AND MANAGING RESOURCE SCARCE LANGUAGES: THE SOUTH AFRICAN CASE JUSTUS C ROUX IMS STUTTGART
Korea Terminology Research Center for Language and Knowledge Engineering Infrastructures in Korea and for the Korean Language Key-Sun Choi.
KarolaYn, Ana, Diego.. T he Republic of South Africa is a country located at the southern tip of Africa. It borders the countries of Namibia, Botswana,
Printed African Vernacular Literature before and round 1960 Information retrieval and other things.
CAREERS IN LINGUISTICS OUTSIDE OF ACADEMIA CAREERS IN INDUSTRY.
Fab lab Forum Innovation Hub, South Africa Regional Report South Africa Letlotlo Phohole.
Data-driven approach to rapid prototyping Xhosa speech synthesis Albert Visagie Justus Roux Centre for Language and Speech Technology Stellenbosch University.
Machine Translation Dr. Radhika Mamidi. What is Machine Translation? A sub-field of computational linguistics It investigates the use of computer software.
Early history in overview 1859: Founding of the Theological Seminary of the Dutch Reformed Church 1866: Founding of the Stellenbosch Gymnasium, inspired.
Bridging Communications Across the Digital Divide Edwin Blake Collaborative Visual Computing Laboratory Department of Computer Science.
Some Thoughts on HPC in Natural Language Engineering Steven Bird University of Melbourne & University of Pennsylvania.
DFKI GmbH, , R. Karger Indo-German Workshop on Language Technologies Reinhard Karger, M.A. Deutsches Forschungszentrum für Künstliche Intelligenz.
PrepTalk a Preprocessor for Talking book production Ted van der Togt, Dedicon, Amsterdam.
Cooperation between PanSALB and terminology structures Dr Mariëtta Alberts Lexicography and Terminology Development PanSALB.
Multilingualism: Training and capacity building Dr Mariëtta Alberts Pan South African Language Board (PanSALB)
Syllabus and curriculum design From LETRAC to Bologna Belinda Maia University of Porto.
The South African HLT Audit 1 HLT Research Group, CSIR, South Africa 2 Graduate School of Technology Management, University of Pretoria, South Africa 3.
Research Component on Technology Concluding Thoughts Sarmad Hussain Center for Research in Urdu Language Processing National University of Computer and.
Roadmap for Language Resources and Evaluation in a Multilingual Environment Minority Languages in the African Context Justus Roux Centre for Language and.
Overview: HLT industry in South Africa E Barnard.
Suléne Pilon & Danie Prinsloo Overview: Teaching and Training in South Africa 25 November 2008;
Introducing MorphoLogic to LIRICS Gábor Prószéky MorphoLogic Pázmány Péter Catholic University Faculty.
Dutch HLT Resources: from BLARK to Priority Lists Helmer Strik, Diana Binnenpoorte, Janienke Sturm, Folkert de Vriend, and Catia Cucchiarini* A 2 RT, Dept.
UCREL: from LOB to REVERE Paul Rayson. November 1999CSEG awayday Paul Rayson2 A brief history of UCREL In ten minutes, I will present a brief history.
National anthem oid=nl http://nl.netlog.com/go/explore/videos/vide oid=nl Lord Bless Africa.
EVikings II WP3: Language Technologies. HLT Human Language Technologies (HLT) play a crucial role in the Information Society For small languages it is.
PROGRESS REPORT ROLE OF PROVINCES ON THE DEVELOPMENT OF INDIGENOUS LANGUAGES ACTING DIRECTOR-GENERAL VELISWA BADUZA (MS) ARTS AND CULTURE NATIONAL LANGUAGE.
Work of the National Research Foundation (NRF) Relating to Arts & Culture Prepared for: The Portfolio Committee on Arts and Culture Dr Rocky Skeef Tuesday.
Initial fieldwork for LWAZI: A telephone-based spoken dialog system for rural South Africa Tebogo Gumede and Madelaine Plauché Human Language Technology.
Workshop: HLT Collaboration November 2008 Workshop: HLT Collaboration between South Africa and the Low Countries November 2008 Noordhoek, South.
Hendrik J Groenewald Centre for Text Technology (CTexT™) Research Unit: Languages and Literature in the South African Context North-West University, Potchefstroom.
Catia Cucchiarini, Walter Daelemans and Helmer Strik Strengthening the Dutch Language and Speech Technology Infrastructure Catia Cucchiarini, Walter Daelemans.
LREC 2004, 26 May 2004, Lisbon 1 Multimodal Multilingual Resources in the Subtitling Process S.Piperidis, I.Demiros, P.Prokopidis, P.Vanroose, A. Hoethker,
Introduction A field survey of Dutch language resources has been carried out within the framework of a project launched by the Dutch Language Union (Nederlandse.
CAPACITY BUILDING FOR LANGUAGE FACILITATION Presentation to the Portfolio Committee August 2007.
Government information and services for socio-economic development in local SA languages using ICT Britta Zawada, Riah Mabule, Kim Wallmach, Nathi Ngcobo.
PARLIAMNETARY PORTFOLIO COMMITTEE PRESENTATION ON THE DEVELOPMENT, USE AND PROMOTION OF SOUTH AFRICAN SIGN LANGUAGE.
NEDLAC COMMUNITY CONSTITUENCY GDS PROGRESS UPDATE TO LABOUR PORTFOLIO COMMITTEE TUESDAY, 29 AUGUST 2006.
Big Data: Every Word Managing Data Data Mining TerminologyData Collection CrowdsourcingSecurity & Validation Universal Translation Monolingual Dictionaries.
DG: Department of Basic Education Mr HM Mweli Venue: Cape Town 17 March 2016 JOINT PC BASIC EDUCATION AND HIGHER EDUCATION AND TRAINING O VERVIEW OF THE.
NCP meeting Jan 27-28, 2003, Brussels Colette Maloney Interfaces, Knowledge and Content technologies, Applications & Information Market DG INFSO Multimodal.
Bertus van Rooy Research Unit: Languages and Literature in the South African Context North-West University, Potchefstroom Campus (PUK) South Africa
How can speech technology be used to help people with disabilities?
Deputy director-general: curriculum branch
Dr Elbie Adendorff AILA 2014
South Africa When you think of South Africa, what comes to mind?
WP3: Supporting RTD in Language Technologies
A Country Report – COCOSDA Activities in China Data More and more companies on data resources and services suppliers are emerging in China: a new.
PRESENTATION ON GEOGRAPHICAL NAMES
The Language in Education conundrum from an empirical perspective:
Presentation transcript:

HLT R&D in South Africa HLT Collaboration between South Africa and the Low Countries Workshop 24 November 2008 Noordhoek, South Africa

Overview Specific R&D challenges Areas of active research Text processing Speech processing Applications of HLT Main projects: current and recent Research institutions active in HLT Main R&D sponsors

Specific R&D Challenges Incompleteness of basic linguistic knowledge Scarcity of resources Linguistic data Technology components Uniqueness of user populations and languages

Research areas (1) Text processing: Computational morphological analysis, POS tagging Spelling checkers, grammar checkers Machine translation, machine-aided translation Computational lexicography Wordnets Research focus: Development of basic required components and tools Data collection and corpus development Technology transfer, cross-language learning, bootstrapping, language distances MA for agglutinative languages

Research areas (2) Speech processing: ASR, TTS, spoken dialogue systems Phonetic investigations for HLT Speaker verification, S-LID Speech tools (diarization, channel normalisation, speech detection) Research focus: Development of basic required components and tools Data collection and corpus development Technology transfer, cross-language learning, bootstrapping, language distances Timing information in speech Multi-accent and multilingual acoustic modelling Higher order Markov models and other non-standard acoustic models

Research areas (3) Applications of HLT Telephone-based information systems Computer assisted language learning Document proofing tools Accessibility devices Mobile devices

Main R&D initiatives Department of Arts and Culture (DAC) Applications that support multilingualism, especially related to government service delivery DAC A: Spelling checkers DAC B: Machine-aided translation DAC C: Lwazi: Multilingual telephony-based information delivery Department of Science and Technology (DST) Directed research in HLT aimed at addressing SA national priorities. National HLT Network projects International collaborative projects Various individual research projects

Main R&D projects Text processing: Computational morphological analysis: Unisa Spellcheckers: DAC A Machine translation: EtsaTrans, DAC B Speech: Phonetic investigations: NHN PAST ASR/TTS/spoken dialogue systems: AST, Limpopo ASR OpenPhone, Lwazi (DAC C) Mobile E-learning for Africa (MELFA)

UNISA Computational Morphological Analysis Development of parsing tools for Bantu languages: computational morphological analysers disambiguators syntactic parsers Development of supporting resources for development & testing, includes extensive underlying machine-readable lexicons Status: Initiated in 2002 (for isiZulu morphological analyser) Various prototypes under development (isiZulu, isiXhosa, Siswati, isiNdebele, Northern Sotho and Setswana) Extended until 2010 Principal researchers: Sonja Bosch (Project Leader), Laurette Pretorius Ansu Berg, Axel Fleisch, Albert Kotze, Petro Kotze, Memezi Mfusi, Lydia Mojapelo, Rigardt Pretorius, Linda van Huyssteen, Biffy Viljoen Sponsor: NRF

DAC A: Spelling checkers for public administration domain Development of spelling checkers for 10 official SA languages Specifically for use in government departments. Spelling checkers for isiNdebele, isiXhosa, isiZulu and Siswati include morphological analysers for effective spellchecking of these agglutinative languages Status: Final evaluation by client in progress Principal researchers: MJ Puttkammer (NWU), S Pilon (NWU), DJ Prinsloo (UP), SE Bosch (Unisa) Sponsor: Department of Arts and Culture, CText

EtsaTrans Machine Translation Development of a functional machine translation system. Focus domain: mainly administrative documents Main languages: English to Afrikaans, Afrikaans to English Other languages: English to Xhosa, English to Southern Sotho Harvesting previously translated information to create parallel corpora Status: Initiated in 2003, ongoing Prototypes in use Principal researchers: JA Naudé, L Jordaan Sponsor: UFS

DAC B: Machine-aided translation tools Development of translation tools: An integrated translation environment (ITE) Word translators Machine translation systems for three language pairs Terminology management system Document management system Status: Under development ( ) All tools, data and research output to be made available publicly Principal researchers: HJ Groenewald, S Pilon (NWU) DJ Prinsloo (UP) Sponsor: DAC

NHN PAST: Phonetics for Advanced Speech Technology Technology-orientated investigation and description of the vowel system of the Sotho languages and tone in Sotho and Nguni language Status: Initiated May 2008, Due for completion June 2009 Principal researchers: E. Barnard (Meraka) B. Khoali (independent consultant) D. Wissing (NWU) S. Zerbian (Wits) Sponsor: National HLT Network (DST/Meraka)

African Speech Technologies (AST) Development of a multilingual telephone-based hotel reservation system. Developed corpora and technology components (TTS, ASR, dialogue systems) for SAE, Afrikaans, isiZulu, isiXhosa and Sesotho. Status: Completed 2004 Gave rise to commercial company: Catchword Data available for research purposes (release imminent) Principal researchers: J.C. Roux, E.C. Botha, J. du Preez Various collaborators Sponsor: DACST (Innovation Fund)

Limpopo ASR Development of baseline automatic speech recognition systems for the major languages of the Limpopo Province Languages: Sepedi (Sesotho sa Leboa), Setswana, Tshivenda and Xitsonga. Telephone speech data collection and manual annotation Extension to text-to-speech synthesis and domain-specific prototype dialogue systems Status: Baseline ASR systems completed ( ) Extension ongoing Principal researchers: HJ Oosthuizen and MJD Manamela Sponsor: Telkom and other industry partners

OpenPhone Demonstrated use of telephone-based information services in providing health information in a rural setting. Automated health information system that provides information to caregivers looking after HIV-positive children living in the vicinity of Gabarone in Botswana Includes Setswana TTS and ASR development Status: Completed 2008, currently live. Principal researchers: Etienne Barnard, Marelie Davel, Madelaine Plauche Sponsor OSI/OSISA, DST

Lwazi Development and piloting of a fully Open Source multilingual telephone-based information system ASR and TTS systems in 11 official languages ASR and TTS integrated into a telephony platform Open Source resources and tools Various pilots: first significant pilot with DPSA Community Development Workers Status: Initiated September 2006 On track for completion September 2009 Principal researchers: Etienne Barnard, Marelie Davel, Gerhard van Huyssteen Sponsor: DAC

Mobile E-learning for Africa (MELFA) Mobile solutions for on-site literacy training and skills development for workers in the Building and Construction Industry Includes text-to-speech, speech-to-speech translation Initially 30 test persons in Western Cape are involved in testing the modules for interactive M- learning. Status: Initiated in 2007, completing in Principal researchers: JC Roux (Project leader, SA), A Visagie, H Engelbrecht, A Magnusdottir, P Scholtz. Sponsor: Danida (Danish government organisation)

Research institutions: Text InstitutionAreas of interestSize 1 Language focus UNISA University of South Africa Morphological analysis, POS disambiguation, syntactic parsing 8/2Bantu family languages CTexT North-West University Document proofing tools, machine aided translation, machine translation, computer assisted language learning, syntactic parsing 2/8Afrikaans (Other official languages, African languages) UP University of Pretoria Morphological analysis, POS disambiguation, syntactic parsing, computational lexicography 2/0Sepedi UWC University of Western Cape POS disambiguation, computational lexicography, localization, machine translation 2/xisiXhosa Wits (1) University of Witwatersrand Morphological analysis1/0isiZulu UFS University of Free State Machine aided translation, machine translation 1/0English  Afrikaans (Sesotho/E, isiXhosa/E) 1 Size: snr researchers / post-graduate students

Research institutions: Speech InstitutionAreas of interestSizeLanguage focus SU-CLaST University of Stellenbosch ASR, TTS, spoken dialogue systems, speaker verification, S- LID, computer assisted language learning, machine translation, speech-to-speech translation 6/6SAE, isiXhosa, Afrikaans Meraka CSIR Meraka Institute ASR, TTS, spoken dialogue systems, tone modelling, pronunciation modelling, speaker verification, language distances, channel normalisation, S-LID 4/15All SA official languages Wits University of Witwatersrand Tone modelling TTS 2/1Sotho and Nguni languages Limpopo University of Limpopo ASR, TTS, language modelling1/2Sepedi, Xitsonga, Tshivenda, Setswana

Main R&D sponsors Department of Arts and Culture (DAC) Applications that support multilingualism, especially related to government service delivery Department of Science and Technology (DST) Directed research in HLT aimed at addressing SA national priorities. National Research Foundation (NRF) Support for individual researchers Industry: Addressing industry-specific needs ASR/TTS (Telkom, Intelleca, IBM, Google and others), Spelling checkers (Microsoft) Speech processing tools (Grintek,Armscor), Speech-to-speech translation (Armscor) International donor funding Addressing developmental needs Open Society Initiative (OSI/OSISA), Danish Danida, UK Dept for International Development (DfID) Canadian International Development Research (IDRC), and others Host institutions (Universities, CSIR, etc)