Wikis, Standards and Everything Lee GillamLaurent Romary University of SurreyMax-Planck Digital Library
Foreword Wikification and standards: is this the wrong talk? –Wiki: Open + free interaction on-line –ISO: Dusty documents imposing ways of thinking and working Still, reusability and preservation and data –Requires some minimal principles about data representation Interoperability –And there are quite a few practical standards (e.g. ISO 10646) Background (outline) –The demonstrators: OmegaWiki –The police: ISO (International standards association) –The topic at hand: language descriptions Highly complementary to work done here at MPI-EVA (eWALS)
ISO standards
Wikis for Languages Some possible motivations: –50% of languages are endangered (UNESCO); –large proportion of languages have no “resources” and no web presence; –discontinuity and fragmentation of research; –sustainability and curation issues And yet….. –Capability for capturing data like never before; –Expansion of capacity of the Internet and growing pressure for an inclusive multilingual internet; –OLPC programme; –Language experts and non-experts are prepared to contribute time and resources So, how about a Wiki-based infrastructure that allows us to form communities around languages and harmonize results?
Wikis for Languages OmegaWiki, a collaborative project to produce a free, multilingual resource in every language, with lexicological, terminological and thesaurus information World Language Documentation Centre (WLDC), currently comprising 22 experts in language technologies, linguistics, terminology standardisation, and localisation ISO, provision of the ISO 639 series of standards; focus here on and 639-6
Wikis for Languages ISO dataISO 639-X data ISO standardISO 639-X standard Expert review Community review & infrastructure “Auditors” ISO 639-4“standards as databases” ISO 11179ISO Co-ordination SIL, LoC, Infoterm Data categories Metadata registries
Wikis for Languages
Language Documentation via ISO 639-4: association of metadata descriptors to model interoperable with DCIF (12620) (639-4 section 9) Attribution information missing here
Wikis for Languages Eventual inclusion of all “available” metadata
ISO standards Language Codes Standards are growing in number and complexity –From 2 to 6 –From 400 identifiers to upwards of –From lists to databases –From tables to metadata registries –From published text documents to “published” databases –From IETF RFC to RFCs to RFCs –From a closed membership committee to an open Community initiative (OmegaWiki) –…. with accompanying (web) services and products
ISO standards Language Codes Standards are growing in number and complexity –From 2 to 6 – eventually back to 1? –From 400 identifiers to upwards of – plus supporting metadata –From lists to databases – multiple metadata registers –From tables to metadata registries – registers + policies + “auditors” –From published text documents to “published” databases – “SAD” –From IETF RFC to RFCs to RFCs – consume, consume, consume –From a closed membership committee to an open Community initiative (OmegaWiki) – supporting infrastructure, expert review of community contributions (e-Voting?) –…. with accompanying (web) services and products – Open Source and bespoke, and secured funding as necessary
ISO standards
Wikis for Language Resources?
Next steps Data and models for wiki –Structured data in necessary in scientific domains –Registering descriptors and schemas is an essential component of long- term management of such data New types of standards –Stabilisation of knowledge –Dynamic platforms for describing knowledge –Complementary to rocket science Back to WALS –MPI EVA and MPDL => eWALS Generic environment for managing and linking compliant data Connecting the whole thing…
Further Sources Gillam, L. (2007) "A metadata infrastructure using ISO standards". We Have to Talk about Metadata Workshop at UK e-Science Programme All Hands Meeting 2007 (AHM 2007), Nottingham, September. Accepted. Gillam, L., Garside, D., Cox, C. (2007) "Developments in Language Codes standards". In Rehm, Witt and Lemnitzer (eds.): Datenstrukturen fur linguistische Ressourcen und ihre Anwendungen / Data Structures for Linguistic Resources and Applications. Proc.of GLDV 2007, April 2007, Tubingen, Germany: Gunter Narr Verlag. Gillam, L., Garside, D., Cox, C. (2006). "Information volumes and linguitic diversity: meeting the challenges for content management". 3rd International Conference on Terminology, Standardization and Technology Transfer, August, Beijing, PRC.