1 NLP in Thailand by Asanee Kawtrakul Kasetsart University.

Slides:



Advertisements
Similar presentations
SLOVAK REPUBLIC Dušan Valachovič Ministry of Education Section of Science and Technology.
Advertisements

"Recent development of e- Science & Grid in Thailand" 24 January 2006 Piyawut Srichaikul, NECTEC Putchong Uthayopas, KU.
Office of Higher Education
U.S. Civilian Research & Development Foundation Peace and prosperity through science collaboration 1 Cathleen Campbell U.S. Civilian Research & Development.
Valorisation of knowledge-intensive ideas in the South Baltic Area VALOR This project is part-financed by the European Union (European Regional Development.
Markpong Jongtaveesataporn † Chai Wutiwiwatchai ‡ Koji Iwano † Sadaoki Furui † † Tokyo Institute of Technology, Japan ‡ NECTEC, Thailand.
GSK: Development and Distribution of Resources Hitoshi ISAHARA GSK: Gengo Shigen Kyokai (Language Resource Association) National Institute of Information.
Jing-Shin Chang National Chi Nan University, IJCNLP-2013, Nagoya 2013/10/15 ACLCLP – Activities ( ) & Text Corpora.
Language tools for writers Ola Knutsson IPLab, NADA, KTH Sweden.
Language Resources in Indonesia Language Technology & Applied Information Laboratory Directorate for Information Technology and Electronics Agency for.
Stage 5 Prepare the front and back matter.. The five stages of developing a dictionary. 1.Collect words (using semantic domains). 2.Add fields (automated.
CALTS, UNIV. OF HYDERABAD. SAP, LANGUAGE TECHNOLOGY CALTS has been in NLP for over a decade. It has participated in the following major projects: 1. NLP-TTP,
A Language Environment for Second Language Writers Ola Knutsson KTH Nada.
New organisational perspectives in 'library business' in the future – case study Finland Kristiina Hormia-Poutanen National Library of Finland.
Funding Networks Abdullah Sevincer University of Nevada, Reno Department of Computer Science & Engineering.
Chapter 11 Beyond Bag of Words. Question Answering n Providing answers instead of ranked lists of documents n Older QA systems generated answers n Current.
Three Keys to Digital Language Learning: Context, Context, and Context David Wible, Tamkang University
IST DEVELOPMENT IN LATVIA
Korea Terminology Research Center for Language and Knowledge Engineering Infrastructures in Korea and for the Korean Language Key-Sun Choi.
Initiation of Standardization on Network-based Speech-to-speech Translation at ITU-T SG16 National Institute of Information and Communications Technology,
Data Mining. 2 Models Created by Data Mining Linear Equations Rules Clusters Graphs Tree Structures Recurrent Patterns.
Data-driven approach to rapid prototyping Xhosa speech synthesis Albert Visagie Justus Roux Centre for Language and Speech Technology Stellenbosch University.
Gregynog 2012 Shared Library Systems..progress in Wales Mark Hughes Head of Collections Swansea
A hybrid method for Mining Concepts from text CSCE 566 semester project.
27. August Kyung-Ho Choi Manager of Digital Archiving Division The National Library of Korea Sang-hoon Oh Secretary of General in.
(CMSC5720-1) MSC projects by Prof K.H. Wong (21 July2014) (shb907) MSC projects supervised by Prof.
NLP Related Activities in Thailand Virach Sornlertlamvanich Information Research and Development Division National Electronics and Computer Technology.
Sustainability of the work and PANL10n network: Vision beyond 2010 Regional Conference on Localized ICT Development & Dissemination Across Asia PAN Localization.
National Statistical Committee of the Kyrgyz Republic Science, technology and innovation statistics in the Kyrgyz Republic Training workshop for ECO countries.
BACKGROUND AND COMPARATIVE ADVANTAGES  Armenia as a regional center for ICT  Comparative advantages of Armenia:  qualified and experienced labor force.
Summary Report Survey on Research and Development of Machine Translation in Asian Countries Virach Sornlertlamvanich Information Research and Development.
Croatia ICT Research Environment & Priorities Dr. Diana Šimić.
Dutch HLT Resources: from BLARK to Priority Lists Helmer Strik, Diana Binnenpoorte, Janienke Sturm, Folkert de Vriend, and Catia Cucchiarini* A 2 RT, Dept.
UCREL: from LOB to REVERE Paul Rayson. November 1999CSEG awayday Paul Rayson2 A brief history of UCREL In ten minutes, I will present a brief history.
Machine Translation  Machine translation is of one of the earliest uses of AI  Two approaches:  Traditional approach using grammars, rewrite rules,
PROGRESS TEST 1 USEFUL INFORMATION. WHEN AND WHERE? Monday 25 NOV DES-GAL BRE-DER Room 19.
Virach Sornlertlamvanich Information R&D Division (iTech) National Electronics and Computer Technology Center (NECTEC) THAILAND 19 January 2001 Symposium.
1 Thailand Status Report Recent S&T Information Activities Reena Batsomboon National Research Council of Thailand Presented at 6 th Co-EXIST-SEA WORKSHOP.
"Recent development of e- Science & Grid in Thailand" 24 January 2006 Piyawut Srichaikul, NECTEC Putchong Uthayopas, KU.
Computational Linguistics. The Subject Computational Linguistics is a branch of linguistics that concerns with the statistical and rule-based natural.
ADD and SNLP in Thailand Virach Sornlertlamvanich Thai Computational Linguistics Lab. (TCL), NICT Asia Research Center, Thailand
Catia Cucchiarini, Walter Daelemans and Helmer Strik Strengthening the Dutch Language and Speech Technology Infrastructure Catia Cucchiarini, Walter Daelemans.
Criminal Justice 2011 Class Name, Instructor Name Date, Semester Chapter 1: Why We Write Police Reports.
Date: 2015/11/19 Author: Reza Zafarani, Huan Liu Source: CIKM '15
EXPERT SYSTEM OF THE HUNGARIAN SAPARD PAYMENT AGENCY written by Krisztina Tusor.
Ministry of Science and Technology Mozambique Research and Education Network - MoRENet Jussi Hinkkanen Ministry of Science and Technology Mozambique.
Technical Information Access Center Thailand Science Park Paholyothin Road Klong Luang, Pathumthani
金聲玉振 Taiwan Univ. & Academia Sinica 1 Spoken Dialogue in Information Retrieval Jia-lin Shen Oct. 22, 1998.
Utkal University We Work On Image Processing Speech Processing Knowledge Management.
1 An Introduction to Computational Linguistics Mohammad Bahrani.
Exploiting Named Entity Taggers in a Second Language Thamar Solorio Computer Science Department National Institute of Astrophysics, Optics and Electronics.
Defining CIs and their regional differences: what’s in and what’s out? Lau Schulpen - CIDIN.
Tone Recognition With Fractionized Models and Outlined Features Ye Tian, Jian-Lai Zhou, Min Chu, Eric Chang ICASSP 2004 Hsiao-Tsung Hung Department of.
1Your reference The Menu of Indicators and the Core Set from the South African Point of View Moses Mnyaka 13/08/2009.
UNL Document Summarization Virach Sornlertlamvanich, Tanapong Potipiti and Thatsanee Charoenporn Information Research and Development Division National.
Data Mining in Germany IIM Conference, Oct. 24, 2012 Gottfried Schwarz, DLR > Lecture > Author Document > Datewww.DLR.de Chart 1.
@Japan Smart City Trip February 24-28, 2017
Introduction to Pattern Recognition
Thailand’s Country Report
Thai AGROVOC Ontology Base for Agricultural Information Retrieval
Next-Generation Search Engines -Perspective and challenges
Data/Analysis Challenges in the Electronic Business Environment
Data/Analysis Challenges in the Electronic Business Environment
Sentence Writing Strategy: Simple & Compound
The Role of Bilateral Donors in supporting capacity-building in the area of ICT Open Consultations on Financing Mechanisms for Meeting the Challenges.
SME Credit Guarantee Mechanism in Taiwan
Björn Hasselgren, PhD “Shaping the Future of Core Network Corridors”
GLOBAL INDICATORS WORKSHOP ON COMMUNITY ACCESS TO ICT
Artificial Intelligence 2004 Speech & Natural Language Processing
Presentation transcript:

1 NLP in Thailand by Asanee Kawtrakul Kasetsart University

2  Thailand Language Features  What we do and the problems  The main actors  Research Model and Infrastructure  What do we need more ? Outline

3  Dialects and Tone  Isolating language  Uninflected  Monosyllabic  No word delimiters  Same form but several functions  Same form but several meanings Thai Language Characteristics

4  Grammar coverage  Word formation/Recognition u Compound words vs Sentences u Proper name vs Common noun u Loan word (transliterated foreign words) without special orthography

5 North 18.8% Dialects North-East 34.2% Central 33.7% South 13.3%

6 What do we do?  TEXT PROCESSING  SPEECH PROCESSING  IMAGE PROCESSING

7 Text Processing

8 Text Processing and Problems  Statistical Based Approach  Knowledge Based Approaches  Lack of Legal Corpus  Small Corpus  Lack of Standard(Pos, Semantic Concepts)  Redundancy work

9 Speech Processing  Speech recognition  Speech generation

10 Speech Processing and Problems  Recognition  Generation  Not Only Dialect but Tone  Isolated word not Continuous speech  Word Boundary detection

11 Image Processing  Thai optical character recognition  Hand written recognition

12 OCR and Problems  Isolated Characters

13 The Main Actors  Universities  NECTEC (National Electronic and Computer Technology Center), Ministry of Science and Technology Environment  SIGNLP

14 The Main Actors  More than 50 experienced researchers (minimum 5 years research)  More than 100 young researchers

15 Financial Supporter  National Electronics and Computer Technology Center (NECTEC)  National research council of Thailand (NRCT)  Kasetsart University Research and Development Institute (KURDI)  Thai Research Foundation (TRF)  etc.

16 Research Model and Infrastructure  Short Term  Long Term  Simple But Work  Collaboration between end users, universities and Funding Agency (including Private sectors)  Robust and very large scale  Enlarge the number of researchers

17 What do we need more?  Share resources (Corpus, Dictionary, Tools, etc.)  Share Experiences and Knowledge  Set Big Umbrella and distribute workload  Establish research network  Partnership

18 Conclusion  Most Thai uses Thai Language  Thai Language Processing has good future in the market IF…….

19  We have more Collaborative work  NLP Market for 1/2 of 60 millions