Presentation is loading. Please wait.

Presentation is loading. Please wait.

HLT R&D in South Africa HLT Collaboration between South Africa and the Low Countries Workshop 24 November 2008 Noordhoek, South Africa.

Similar presentations


Presentation on theme: "HLT R&D in South Africa HLT Collaboration between South Africa and the Low Countries Workshop 24 November 2008 Noordhoek, South Africa."— Presentation transcript:

1 HLT R&D in South Africa HLT Collaboration between South Africa and the Low Countries Workshop 24 November 2008 Noordhoek, South Africa

2 Overview Specific R&D challenges Areas of active research Text processing Speech processing Applications of HLT Main projects: current and recent Research institutions active in HLT Main R&D sponsors

3 Specific R&D Challenges Incompleteness of basic linguistic knowledge Scarcity of resources Linguistic data Technology components Uniqueness of user populations and languages

4 Research areas (1) Text processing: Computational morphological analysis, POS tagging Spelling checkers, grammar checkers Machine translation, machine-aided translation Computational lexicography Wordnets Research focus: Development of basic required components and tools Data collection and corpus development Technology transfer, cross-language learning, bootstrapping, language distances MA for agglutinative languages

5 Research areas (2) Speech processing: ASR, TTS, spoken dialogue systems Phonetic investigations for HLT Speaker verification, S-LID Speech tools (diarization, channel normalisation, speech detection) Research focus: Development of basic required components and tools Data collection and corpus development Technology transfer, cross-language learning, bootstrapping, language distances Timing information in speech Multi-accent and multilingual acoustic modelling Higher order Markov models and other non-standard acoustic models

6 Research areas (3) Applications of HLT Telephone-based information systems Computer assisted language learning Document proofing tools Accessibility devices Mobile devices

7 Main R&D initiatives Department of Arts and Culture (DAC) Applications that support multilingualism, especially related to government service delivery DAC A: Spelling checkers DAC B: Machine-aided translation DAC C: Lwazi: Multilingual telephony-based information delivery Department of Science and Technology (DST) Directed research in HLT aimed at addressing SA national priorities. National HLT Network projects International collaborative projects Various individual research projects

8 Main R&D projects Text processing: Computational morphological analysis: Unisa Spellcheckers: DAC A Machine translation: EtsaTrans, DAC B Speech: Phonetic investigations: NHN PAST ASR/TTS/spoken dialogue systems: AST, Limpopo ASR OpenPhone, Lwazi (DAC C) Mobile E-learning for Africa (MELFA)

9 UNISA Computational Morphological Analysis Development of parsing tools for Bantu languages: computational morphological analysers disambiguators syntactic parsers Development of supporting resources for development & testing, includes extensive underlying machine-readable lexicons Status: Initiated in 2002 (for isiZulu morphological analyser) Various prototypes under development (isiZulu, isiXhosa, Siswati, isiNdebele, Northern Sotho and Setswana) Extended until 2010 Principal researchers: Sonja Bosch (Project Leader), Laurette Pretorius Ansu Berg, Axel Fleisch, Albert Kotze, Petro Kotze, Memezi Mfusi, Lydia Mojapelo, Rigardt Pretorius, Linda van Huyssteen, Biffy Viljoen Sponsor: NRF

10 DAC A: Spelling checkers for public administration domain Development of spelling checkers for 10 official SA languages Specifically for use in government departments. Spelling checkers for isiNdebele, isiXhosa, isiZulu and Siswati include morphological analysers for effective spellchecking of these agglutinative languages Status: Final evaluation by client in progress Principal researchers: MJ Puttkammer (NWU), S Pilon (NWU), DJ Prinsloo (UP), SE Bosch (Unisa) Sponsor: Department of Arts and Culture, CText

11 EtsaTrans Machine Translation Development of a functional machine translation system. Focus domain: mainly administrative documents Main languages: English to Afrikaans, Afrikaans to English Other languages: English to Xhosa, English to Southern Sotho Harvesting previously translated information to create parallel corpora Status: Initiated in 2003, ongoing Prototypes in use Principal researchers: JA Naudé, L Jordaan Sponsor: UFS

12 DAC B: Machine-aided translation tools Development of translation tools: An integrated translation environment (ITE) Word translators Machine translation systems for three language pairs Terminology management system Document management system Status: Under development (2007-2010) All tools, data and research output to be made available publicly Principal researchers: HJ Groenewald, S Pilon (NWU) DJ Prinsloo (UP) Sponsor: DAC

13 NHN PAST: Phonetics for Advanced Speech Technology Technology-orientated investigation and description of the vowel system of the Sotho languages and tone in Sotho and Nguni language Status: Initiated May 2008, Due for completion June 2009 Principal researchers: E. Barnard (Meraka) B. Khoali (independent consultant) D. Wissing (NWU) S. Zerbian (Wits) Sponsor: National HLT Network (DST/Meraka)

14 African Speech Technologies (AST) Development of a multilingual telephone-based hotel reservation system. Developed corpora and technology components (TTS, ASR, dialogue systems) for SAE, Afrikaans, isiZulu, isiXhosa and Sesotho. Status: Completed 2004 Gave rise to commercial company: Catchword Data available for research purposes (release imminent) Principal researchers: J.C. Roux, E.C. Botha, J. du Preez Various collaborators Sponsor: DACST (Innovation Fund)

15 Limpopo ASR Development of baseline automatic speech recognition systems for the major languages of the Limpopo Province Languages: Sepedi (Sesotho sa Leboa), Setswana, Tshivenda and Xitsonga. Telephone speech data collection and manual annotation Extension to text-to-speech synthesis and domain-specific prototype dialogue systems Status: Baseline ASR systems completed (2004-2006) Extension ongoing Principal researchers: HJ Oosthuizen and MJD Manamela Sponsor: Telkom and other industry partners

16 OpenPhone Demonstrated use of telephone-based information services in providing health information in a rural setting. Automated health information system that provides information to caregivers looking after HIV-positive children living in the vicinity of Gabarone in Botswana Includes Setswana TTS and ASR development Status: Completed 2008, currently live. http://www.meraka.org.za/hlt_projects_ophone.htm Principal researchers: Etienne Barnard, Marelie Davel, Madelaine Plauche Sponsor OSI/OSISA, DST

17 Lwazi Development and piloting of a fully Open Source multilingual telephone-based information system ASR and TTS systems in 11 official languages ASR and TTS integrated into a telephony platform Open Source resources and tools Various pilots: first significant pilot with DPSA Community Development Workers Status: Initiated September 2006 On track for completion September 2009 Principal researchers: Etienne Barnard, Marelie Davel, Gerhard van Huyssteen Sponsor: DAC

18 Mobile E-learning for Africa (MELFA) Mobile solutions for on-site literacy training and skills development for workers in the Building and Construction Industry Includes text-to-speech, speech-to-speech translation Initially 30 test persons in Western Cape are involved in testing the modules for interactive M- learning. Status: Initiated in 2007, completing in 2009. Principal researchers: JC Roux (Project leader, SA), A Visagie, H Engelbrecht, A Magnusdottir, P Scholtz. Sponsor: Danida (Danish government organisation)

19 Research institutions: Text InstitutionAreas of interestSize 1 Language focus UNISA University of South Africa Morphological analysis, POS disambiguation, syntactic parsing 8/2Bantu family languages CTexT North-West University Document proofing tools, machine aided translation, machine translation, computer assisted language learning, syntactic parsing 2/8Afrikaans (Other official languages, African languages) UP University of Pretoria Morphological analysis, POS disambiguation, syntactic parsing, computational lexicography 2/0Sepedi UWC University of Western Cape POS disambiguation, computational lexicography, localization, machine translation 2/xisiXhosa Wits (1) University of Witwatersrand Morphological analysis1/0isiZulu UFS University of Free State Machine aided translation, machine translation 1/0English  Afrikaans (Sesotho/E, isiXhosa/E) 1 Size: snr researchers / post-graduate students

20 Research institutions: Speech InstitutionAreas of interestSizeLanguage focus SU-CLaST University of Stellenbosch ASR, TTS, spoken dialogue systems, speaker verification, S- LID, computer assisted language learning, machine translation, speech-to-speech translation 6/6SAE, isiXhosa, Afrikaans Meraka CSIR Meraka Institute ASR, TTS, spoken dialogue systems, tone modelling, pronunciation modelling, speaker verification, language distances, channel normalisation, S-LID 4/15All SA official languages Wits University of Witwatersrand Tone modelling TTS 2/1Sotho and Nguni languages Limpopo University of Limpopo ASR, TTS, language modelling1/2Sepedi, Xitsonga, Tshivenda, Setswana

21 Main R&D sponsors Department of Arts and Culture (DAC) Applications that support multilingualism, especially related to government service delivery Department of Science and Technology (DST) Directed research in HLT aimed at addressing SA national priorities. National Research Foundation (NRF) Support for individual researchers Industry: Addressing industry-specific needs ASR/TTS (Telkom, Intelleca, IBM, Google and others), Spelling checkers (Microsoft) Speech processing tools (Grintek,Armscor), Speech-to-speech translation (Armscor) International donor funding Addressing developmental needs Open Society Initiative (OSI/OSISA), Danish Danida, UK Dept for International Development (DfID) Canadian International Development Research (IDRC), and others Host institutions (Universities, CSIR, etc)

22


Download ppt "HLT R&D in South Africa HLT Collaboration between South Africa and the Low Countries Workshop 24 November 2008 Noordhoek, South Africa."

Similar presentations


Ads by Google