Language Archives and Linguistic Anchoring of Digital Archives Chu-Ren Huang Institute of Linguistics, Academia Sinica LSA Symposium: The Open Language.

Slides:



Advertisements
Similar presentations
Dublin Core in Multiple Languages Thomas Baker Sixth Dublin Core Workshop Library of Congress, Washington DC Tuesday, 3 November 1998.
Advertisements

Resources and Services Bibliothèque Dieter Schmidt
Current design issues for digital archives Robert Munro (presented by David Nathan) Endangered Languages Archive (ELAR), School of Oriental and African.
ELIBRARY CURRICULUM EDITION The ultimate K-12 curriculum and reference solution.
Sample catalog metadata. Metadata January items (recordings or theses) digitised or assessed for digitisation (1629 findable online via metadata.
50 Years of Experience in Making Grey Literature Available Matching the Expectations of the Particle Physics Community Carmen ODell.
The LOM RDF binding - update Mikael Nilsson The Knowledge Management Research Group Centre for user oriented IT design Royal.
WDL Technical Architecture Working Group (TAWG) June 2010 Achievements and Recommendations Co-chaired by Noha Adly, Bibliotheca Alexandrina Babak Hamidzadeh,
1 Web Search Environments Web Crawling Metadata using RDF and Dublin Core Dave Beckett Slides:
2nd OAF Workshop: 6./7. December 2002 Lisbon Subject Interoperability Breakout Session Facilitator: Paul Child 7. December 2002.
CrossAsia at the Staatsbibliothek zu Berlin an approach to organise access to research material in the field of Asian studies.
OLAC Metadata Steven Bird University of Melbourne / University of Pennsylvania OLAC Workshop 10 December 2002.
IRCS Workshop on Open Language Archives IMDI & Endangered Languages Archives Heidi Johnson / AILLA.
Accessing Distributed Resources Information: An OLAC perspective Steven Bird Gary Simons Chu-Ren Huang Melbourne SIL Academia Sinica ENABLER/ELSNET Workshop.
White Paper on Establishing an Infrastructure for Open Language Archiving Steven Bird and Gary Simons.
The Open Language Archives Community: Building a worldwide library of digital language resources Gary Simons, SIL International LSA Tutorial on Archiving.
OLAC Process and OLAC Protocol: A Guided Tour Gary F. Simons SIL International ___________________________ OLAC Workshop 10 Dec 2002, Philadelphia.
An Overview of OLAC: The Open Language Archives Community Gary Simons and Steven Bird Workshop on The Digitization of Language Data: The Need for Standards.
IRCS Workshop on Open Language Archives, 12/02 1 Revised OLAC Vocabulary for Language Technology.
Getting Involved in OLAC Steven Bird University of Pennsylvania LREC Symposium: The Open Language Archives Community 29 May 2002.
Getting Involved in OLAC Steven Bird University of Pennsylvania LSA Symposium: The Open Language Archives Community 4 January 2002.
Helen Dry & Anthony Aristar LINGUIST List: LREC Symposium: The Open Language Archives Community 29 May 2002http://linguistlist.org.
The Seven Pillars of Open Language Archiving: Introducing the OLAC Vision Gary Simons SIL International LREC Symposium: The Open Language Archives Community.
Helen Dry & Anthony Aristar LINGUIST List: LSA Symposium: The Open Language Archives Community 4 January 2002http://linguistlist.org.
Gary Holton ANLC LSA Symposium: The Open Language Archives Community 4 January 2002 Creating an OLAC data provider at the Alaska Native Language Center.
The Seven Pillars of Open Language Archiving: Introducing the OLAC Vision Gary Simons SIL International LSA Symposium: The Open Language Archives Community.
Multilinguality & Semantic Search Eelco Mossel (University of Hamburg) Review Meeting, January 2008, Zürich.
OLAC: Open Language Archives Community OLAC : The Open Language Archives Community Gary F. Simons SIL International and Graduate Institute of Applied Linguistics.
Open Scholarship 2006 Bielefeld Academic Search Engine a Scientific Search Service for Institutional Repositories Open Scholarship 2006 New Challenges.
A Common Standard for Data and Metadata: The ESDS Qualidata XML Schema Libby Bishop ESDS Qualidata – UK Data Archive E-Research Workshop Melbourne 27 April.
Academic Services Locating and searching for information about archives online Simon Wilson, Senior Archivist 26 October 2009.
The Discovery Landscape in Crystallography UKOLN is supported by: Monica Duke UKOLN, University of Bath, UK – eBank UK project A centre.
Preserving and Sharing Digital Data Greg Colati, Director, Archives and Special Collections May 11, 2012.
Numerical Analysis 1 EE, NCKU Tien-Hao Chang (Darby Chang)
The Application of Machine Translation in CADAL Huang Chen, Chen Haiying Zhejiang University Libraries, Hangzhou, China
Interoperability aspects in the The Virtual Language Observatory Dieter Van Uytvanck Max Planck Institute for Psycholinguistics
Collection-level description in practice Collection-Level Description & NOF-digitise projects NOF-digitise programme seminar, London, 22 February 2002.
Proposed update of Technical Guidance for INSPIRE Download services based on SOS Matthes Rieke, Dr. Albert Remke (m.rieke, 52°North.
Infrastructures in Taiwan and for the Chinese Languages Chu-Ren Huang Institute of Linguistics Academia Sinica ACL 2000 WORKSHOP:
Jing-Shin Chang National Chi Nan University, IJCNLP-2013, Nagoya 2013/10/15 ACLCLP – Activities ( ) & Text Corpora.
Multilingual Information Access in a Digital Library Vamshi Ambati, Rohini U, Pramod, N Balakrishnan and Raj Reddy International Institute of Information.
COMP 6703 eScience Project Semantic Web for Museums Student : Lei Junran Client/Technical Supervisor : Tom Worthington Academic Supervisor : Peter Strazdins.
The Open Archives Initiative Simeon Warner (Cornell University) Symposium on “Scholarly Publishing and Archiving on the Web”, University.
1 The Web as a Parallel Corpus  Parallel corpora are useful  Training data for statistical MT  Lexical correspondences for cross-lingual IR  Early.
Cross-Language Retrieval INST 734 Module 11 Doug Oard.
What Linguists Want (we think) Helen Aristar Dry & Anthony Aristar LINGUIST List & E-MELD.
Metadata Standards and Applications 5. Applying Metadata Standards: Application Profiles.
Current Status and Future of Language Resources in Taiwan Chu-Ren Huang Institute of Linguistics, Academia Sinica Symposium on Language Resources in Asia.
Sharing linguistic multi-media resources Jacquelijn Ringersma Paul Trilsbeek Max Planck Institute for Psycholinguistics Nijmegen, The Netherlands.
Eureka! User friendly access to the MPI linguistic data archive Max Planck Institute for Psycholinguistics Alexander Koenig Jacquelijn Ringersma Claus.
4th National NLP Research Symposium, De La Salle Univ., Manila, June From Synergy to Knowledge: Integrating multiple language resources Part.
NLP Related Activities in Thailand Virach Sornlertlamvanich Information Research and Development Division National Electronics and Computer Technology.
Jan 9, 2004 Symposium on Best Practice LSA, Boston, MA 1 Metadata Helen Aristar Dry Eastern Michigan University LINGUIST List.
Summary Report Survey on Research and Development of Machine Translation in Asian Countries Virach Sornlertlamvanich Information Research and Development.
Jan 9, 2004 Symposium on Best Practice LSA, Boston, MA 1 Comparability of language data and analysis Using an ontology for linguistics Scott Farrar, U.
Rupa Tiwari, CSci5980 Fall  Course Material Classification  GIS Encyclopedia Articles  Classification Diagram  Course – Encyclopedia Mapping.
Aug 2-5, 2002 EMELD Workshop Overview & Update Helen Aristar Dry The LINGUIST List & Eastern Michigan University EMELD Workshop on The Digitization.
Lifecycle Metadata for Digital Objects November 1, 2004 Descriptive Metadata: “Modeling the World”
Harvesting Social Knowledge from Folksonomies Harris Wu, Mohammad Zubair, Kurt Maly, Harvesting social knowledge from folksonomies, Proceedings of the.
Brill Online Resources on East Asia Albert Hoffstadt Senior Acquisitions Manager / Asian Studies BRILL.
Introduction A field survey of Dutch language resources has been carried out within the framework of a project launched by the Dutch Language Union (Nederlandse.
Discovering libraries’ gold through collection-level descriptions ELAG 2014, Bath Valentine Charles Data specialist.
Removing the Language Barrier Machine Translation And Digital Libraries.
Online Information and Education Conference 2004, Bangkok Dr. Britta Woldering, German National Library Metadata development in The European Library.
Heidi Johnson The University of Texas at Austin
The Re3gistry software and the INSPIRE Registry
Multilingual Information Access in a Digital Library
Peggy van der Kreeft Deutsche Welle
人文地理領域的基礎網絡設施 The Cyber Infrastructure For GeoHumanities: 廖泫銘 研究副技師
Márton Németh – László Drótos How to catalogue a web archive?
Presentation transcript:

Language Archives and Linguistic Anchoring of Digital Archives Chu-Ren Huang Institute of Linguistics, Academia Sinica LSA Symposium: The Open Language Archives Community 4 January 2002

OLAC Launch, LSA-02 Linguistic Anchoring of Digital Archives Language Archives serve communities beyond linguists Linguistic description and interpretation underlies any digital archive items In digital archives, each knowledge item should be temporally, geographically, and linguistically anchored.

OLAC Launch, LSA-02 Language and Digital Archives

OLAC Launch, LSA-02 Digital Archives are Linguistically Anchored Archive s are anchored with Lexical KnowledgeBase (LKB) Archive s are anchored with Lexical KnowledgeBase (LKB) -because LKB as collection of lexical types instantiated in archives uniquely defines each archive -And each lexical item is the conceptual atom projecting knowledge from archive to archive

OLAC Launch, LSA-02 From Linguistic Anchor to Knowledge Projection Synergy of language archives anchored by lexical forms and supported by LKB generates new knowledge Extension of linguistic anchoring based on LKB to all types of digital archives will lead to even more creative synergy

OLAC Launch, LSA-02 Where & What: Language Atlas

OLAC Launch, LSA-02 Multi-anchor Knowledge Linking Geographical anchor based on GIS (geography information system) -Ecology (Fauna, Weather, Geology etc.) -Socio-Anthropological classification Linguistic anchor based on LKB -etymology, language grouping, loan words,

OLAC Launch, LSA-02 Linguistic Anchor and Authorship Dream of the Red Chamber: The classical Chinese novel in which the authorship of the last 40 chapters are in dispute The Use of Particle de in DRC ch.1-40ch.41-80ch Total fre de %17.88%56.61% de %82.12%43.39%

OLAC Launch, LSA-02 Linguistic Anchor and Schools of Thoughts Classics in Confucianism: Confucius Analacts, Mencius Classics in Taoism Lao-Zi, Zhuang-Zi -Defining a sub-lexicon for each school of thoughts ( e.g. in C and M but not in L or Z) - Tracing use in literatures ( e.g. -> Tang Poetry)

OLAC Launch, LSA-02 Synergy among Language Archives How to synergize multiple archives Each document is marked up with textual description features: topic, style etc. Each feature selects a subset of documents Sub-corpora (or new archives) can be created online according to users specification

OLAC Launch, LSA-02 OLACMS helps archive versatility Given Shared Metadata Standard New language archives can be created on the fly by harvesting existing archives Rich information can be inferred by establishing temporal and geographic anchors for each document.

OLAC Launch, LSA-02 OLAC Infrastructure Helps to Solve Language Archive Problems such as Language Identification and Metadata Set for Multi-lingual Language Archives

OLAC Launch, LSA-02 The Language Identification Problem The DC code (e.g. en for English) is not enough to describe all the languages in the world Ethnologue ( is comprehensive but not completehttp:// Potential Problems of using Ethnologue (or any existing language list) over-splitting over-chunking omission

OLAC Launch, LSA-02 A Fundamental Solution to Language Identification Problems Registering language groups with an OLAC registration service OLAC language classification server would house a comprehensive list of language family names (defined by users) and their extensional definitions (i.e. sets of Ethnologue codes) AS:Amis = {ALV, AIS} ALV= Amis, AIS= Nataoran

OLAC Launch, LSA-02 Describing Multi-Lingual Resources in OLACMS Directionality is crucial in multilingual resources However, OLAC metadata is flat and unordered Bi-directional MT

OLAC Launch, LSA-02 Multi-lingual Resources II Text: language Bitext (bilingual aligned corpus) There is always an directionality Original: language Translation: Subject.language Language Description (Field Notes) Elicitation, transcription, translation, notes Multiple related resources

OLAC Launch, LSA-02 OLAC and Asia Asian Language Resources Committee Mail List: Affiliated with the proposed AFNLP Cataloguing Asian Language Resurces Will adopt OLACMS and search engine Coordinators :Togunana Huang

OLAC Launch, LSA-02 OLAC and Taiwan Both Academia Sinica and the Digital Archives National Project will join OLAC AS corpora will be OLAC compliant soon Other resources: spoken, Taiwanese etc.