Presentation is loading. Please wait.

Presentation is loading. Please wait.

Infrastructures in Taiwan and for the Chinese Languages Chu-Ren Huang Institute of Linguistics Academia Sinica ACL 2000 WORKSHOP:

Similar presentations


Presentation on theme: "Infrastructures in Taiwan and for the Chinese Languages Chu-Ren Huang Institute of Linguistics Academia Sinica ACL 2000 WORKSHOP:"— Presentation transcript:

1 Infrastructures in Taiwan and for the Chinese Languages Chu-Ren Huang Institute of Linguistics Academia Sinica ACL 2000 WORKSHOP: Infrastructures for Global Collaboration Saturday, October 7, Hong Kong

2 Types of Infrastructures Sharable resources (for Chinese computational linguistics) Mechanisms for international collaboration Mechanisms for scholarly exchange

3 Host Institutes -The Association for Computational Linguistics and Chinese Language Processing (ACLCLP, a.k.a. ROCLING) -Academia Sinica -National Science Council (NSC)

4 Sharable Resources for Chinese Computational Linguistics Corpora Lexicons Procedures

5 Sharable Resources for Chinese Computational Linguistics--Corpora -Academia Sinica Balanced Corpus of Mandarin Chinese (Sinica Corpus) -Sinica Treebank -Standard Segmentation Corpus -ROCLING Corpus -Mandarin-Across-Taiwan (MAT) Speech Database

6 Academia Sinica Balanced Corpus of Mandarin Chinese (Sinica Corpus) 5 million words, segmented and tagged Direct WWW Access -http://www.sinica.edu.tw/~tibe/2- words/modern-words/index.html OR -http://www.sinica.edu.tw/ftms-bin/kiwi.sh License Information -

7 Sinica Treebank ,725 Trees 239,532 Words Direct WWW Access (1000 sample trees) License Information

8 Mandarin-Across-Taiwan (MAT) Speech Database Speech files are collected through telephone networks. The content Includes spontaneous speech (short answering statements) and read speech (numbers, Mandarin syllables, words of 2 to 4 syllables, phonetically balanced sentences). MAT-160 ( 160 speakers) MAT-2000

9 Sharable Resources for Chinese Computational Linguistics-Procedures Segmentation Standard for Chinese Language Processing Segmentation Standard Standard Segmentation Corpus (2 million words, segmented) Standard Segmentation Lexicon (42,138 entries, w/ frequency) Segmentation Program (free download )

10 Sharable Resources in Languages Other than Modern Mandarin Classical Chinese Corpora Corpus of Formosan Austronesian Languages Under construction, part of the National Digital Archive Initiative Lexical Databases of other Sino-Tibetan and Tibeto-Burmese Languages

11 Mechanisms for International Collaboration Major Sponsors of International Collaboration Involving Taiwan -- The Chiang Ching-kuo Foundation for International Scholarly Exchange --The National Science Council --Academia Sinica

12 Synchronic and Diachronic Chinese Corpora Three Projects Sponsored by the CCK Foundation ( ) Chu-Ren Huang, Keh-jiann Chen and Pei-chuan Wei, Academia Sinica Paul Thompson, SOAS, University of London Chaofen Sun, Stanford University

13 Mechanisms for Scholarly Exchange and Collaboration Department of International Programs, NSC Canada: NRC France: CNRS Japan: EAACST Germany: DFG, DAAD, DKFG Netherlands: NWO, IIAS USA: NSF, NIH UK: Royal Society of London, ETC

14 A NSF/NSC International Joint Project NSF: Asian Language Digital Library Project Ching-Chih Chen, Simmons College NSC International Digital Library Collaborative Projects -- Lexicon-based Knowledge Linking -Approaches Towards a WordNet Infrastructure for Multilingual Digital Library Chu-Ren Huang, Academia Sinica -- Linguistic Technology and Resources for English-Chinese Bilingual Information System Hsin-Hsi Chen, National Taiwan University

15 Mechanisms for International Collaboration-Bilateral Projects -Case by Case Negotiation Academia Sinica vs. Hong Kong Chinese University, LDC, Stanford, UCSB etc.

16 Mechanisms for Scholarly Exchange- Conferences ROCLING (annually since 1988) PACLIC [Pacific Asia Conference on Language Information and Computation] (regional conference involving Hong Kong, Japan, Korea, Singapore, and Taiwan) COLING2002

17 Mechanisms for Scholarly Exchange- Exchange Scholars Academia Sinica and EHESS: Yearly exchange Academia Sinica and University of Pennsylvania (under negotiation) NSC and CNRS, NSC and NWO: Cognitive Science

18 Mechanisms for Scholarly Exchange- Post-doctoral Fellows -Academia Sinica Post-doctoral Fellowships Application through Project PI’s or directly by applicants -NSC Post-doctoral Fellowships

19 Mechanisms for Scholarly Exchange- International Students Computational Linguistics and Chinese Language Processing An international graduate (PhD) program (Proposal under review) Visiting Students Internships


Download ppt "Infrastructures in Taiwan and for the Chinese Languages Chu-Ren Huang Institute of Linguistics Academia Sinica ACL 2000 WORKSHOP:"

Similar presentations


Ads by Google