1 Archiving LingDy 16 Feb 2012 TUFS, Tokyo David Nathan Endangered Languages Archive Hans Rausing Endangered Languages Project SOAS, University of London.

Slides:



Advertisements
Similar presentations
Introducing the ELAR information system architecture
Advertisements

Current design issues for digital archives Robert Munro (presented by David Nathan) Endangered Languages Archive (ELAR), School of Oriental and African.
LSA Archiving Tutorial January 2005 Archives, linguists, and language speakers.
Endangered Languages and Web-Based Archiving Megan J. Crowhurst The University of Texas at Austin & CELP Contributors: Chris Beier, Heidi Johnson, Lev.
DATA PROCESSING SYSTEMS
Information must be: Kept Tidy Kept Safe Stored in an accessible place Able to be found easily and quickly when needed.
STORAGE AND RETRIEVAL OF INFORMATION
Digital Preservation Steps 1 & 2: Identify & Select.
Monash's Mock RQF − Lessons learnt David Groenewegen ARROW Project Manager.
Finding Primary Source Documents The Student’s View.
June 28, 2007Max Planck Institute, Leipzig The LL-MAP Project.
CC 2007, 2011 attribution - R.B. Allen Information System Architectures and Services.
Depositing and Disseminating Digital Resources Alan Morrison Collections Manager AHDS Subject Centre for Literature, Linguistics and Languages.
Chapter 3 Applications Software: Getting the Work Done.
Creating and publishing accessible course materials Practical advise you can replicate.
Rethinking language documentation & support for the 21st century David Nathan Endangered Languages Archive SOAS University of London.
August 14, 2015 Research data management – an introduction Slides provided by the DaMaRO Project, University of Oxford Research Services.
1 David Nathan ELDP Training Workshop March 2010 Archiving.
By Breanna Myers Ms. Williams-Grant 5 th Period Business Computer Applications
Teaching and Learning with Technology  Allyn and Bacon 2002 Administrative Software Chapter 5 Teaching and Learning with Technology.
Chinese-European Workshop on Digital Preservation, Beijing July 14 – Network of Expertise in Digital Preservation 1 Trusted Digital Repositories,
Ingest and Dissemination with DAITSS Presented by Randy Fischer, Programmer, Florida Center for Library Automation, University of Florida DigCCurr2007.
Sharing linguistic multi-media resources Jacquelijn Ringersma Paul Trilsbeek Max Planck Institute for Psycholinguistics Nijmegen, The Netherlands.
EBSCOadmin. Select Change Password Select EBSCOadmin Security.
City of Seattle Office of the City Clerk Open Government = Access Challenges and Opportunities with Digital Records.
CERN – IT Department CH-1211 Genève 23 Switzerland t CERN Open Source Collaborative tools: Digital Library Software Tim Smith CERN/IT.
Revitalizing Endangered Language Data: Case studies in rescuing legacy documentation CELCNA 2007 Naomi Fox, Julia James, University of Utah.
Applications Software. Applications software is designed to perform specific tasks. There are three main types of application software: Applications packages.
The Archive of the Indigenous Languages of Latin America Goals and Visions.
Window NT File System JianJing Cao (#98284).
David Nathan Endangered Languages Archive SOAS University of London 3L Summer School, Conference, 6 July 2012 Training for language documentation: trends.
Lecture Four: Steps 3 and 4 INST 250/4.  Does one look for facts, or opinions, or both when conducting a literature search?  What is the difference.
This presentation outlines the following: How we believe we can help Electronic Marketing Strategy Marketing Overview SMS Marketing Overview Electronic.
Administrative Software Chapter 7 Teaching and Learning with Technology.
AILLA:The Archive of the Indigenous Languages of Latin America Heidi Johnson / The University of Texas at Austin.
Data Management David Nathan & Peter Austin & Robert Munro.
Indo-US Workshop, June23-25, 2003 Building Digital Libraries for Communities using Kepler Framework M. Zubair Old Dominion University.
© 2007 by Prentice Hall 1 Introduction to databases.
ITGS Case Study Theatre Booking System Ayushi Pradhan.
1 David Nathan Endangered Languages Archive Hans Rausing Endangered Languages Project SOAS, University of London Language Documentation and Archiving:
Meet and Confer Rule 26(f) of the Federal Rules of Civil Procedure states that “parties must confer as soon as practicable - and in any event at least.
1 David Nathan Endangered Languages Archive SOAS University of London LingDy Feb 15, 2013 ELAR and Digital Archiving for Documentation of Endangered Languages.
Customizing the IMDI metadata schema for endangered languages Heidi Johnson (AILLA) Arienne Dwyer (DOBES)
1 Language Documentation in West Africa July Winneba, Ghana David Nathan & Sophie Salffner Endangered Languages Archive Hans Rausing Endangered.
Teaching and Learning with Technology to edit Master title style  Allyn and Bacon 2002 Teaching and Learning with Technology lick to edit Master title.
October 24, 2015 Research data management – a brief introduction Slides provided by the DaMaRO Project, University of Oxford Research Services.
Multimedia ITGS. Multimedia Multimedia: Documents that contain information in more than one form: Text Sound Images Video Hypertext: A document or set.
AILLA:The Archive of the Indigenous Languages of Latin America Heidi Johnson The University of Texas at Austin Latin American Digital Library Initiative,
July 1, 2008 Oley Valley School District. 8:00-8:15Registration/Introductions 8:15-8:30What is Discovery Education? 8:30-8:45Logging In/Setting up Accounts.
Chapter 1 1 Lecture # 1 & 2 Chapter # 1 Databases and Database Users Muhammad Emran Database Systems.
June 3, 2016 Research data management – an introduction Slides provided by the DaMaRO Project, University of Oxford Research Services.
Documenting Endangered Languages A Partnership between the National Endowment for the Humanities and the National Science Foundation.
ALA Institutional Repository Update ALA Archives at the University of Illinois Urbana-Champaign Chris Prom Cara Bertram Denise Rayman.
Software. A web site is a collection of web pages on a particular topic. A web page is a document written in HTML code. Web pages are linked together.
1 LingDy February 14, 2012 TUFS, Tokyo David Nathan Endangered Languages Archive Hans Rausing Endangered Languages Project SOAS, University of London Data.
OAIS Rathachai Chawuthai Information Management CSIM / AIT Issued document 1.0.
AILLA:The Archive of the Indigenous Languages of Latin America Heidi Johnson / The University of Texas at Austin.
□ archiving in context □ principles & processes □ examples DocLing 2016 David Nathan Archiving.
DocLing 2016 David Nathan ELAR and Digital Archiving for Documentation of Endangered Languages.
1 February 2012 ILCAA, TUFS, Tokyo program David Nathan and Peter Austin Hans Rausing Endangered Languages Project SOAS, University of London Language.
David Robb 10/14/08 Discovery Streaming. From the Home Page, you can search for digital media by keyword, subject, grade level, or curriculum standards.
Building Preservation Environments with Data Grid Technology Reagan W. Moore Presenter: Praveen Namburi.
Digitalcommons.unl.edu Archiving Department Records.
The Challenges of Digital Preservation in a Changing Environment Andrew Pitt Pfizer eArchive Service Team Global Records Management Services DPC Digital.
Discover ScholarSphere A repository service collaboration between the University Libraries and ITS.
PowerPoint presentation
Toward Best Practice for Language Resource Conversion
Heidi Johnson The University of Texas at Austin
Introducing the ELAR information system architecture
Business Intelligence
Presentation transcript:

1 Archiving LingDy 16 Feb 2012 TUFS, Tokyo David Nathan Endangered Languages Archive Hans Rausing Endangered Languages Project SOAS, University of London

2 What is an archive?

3

4 What is a digital language archive?  a trusted repository created and maintained by an institution with a commitment to the long-term preservation of archived material  has policies and processes for materials acquisition, cataloguing, preservation, dissemination, migration to new digital formats  a platform for building and conducting relationships between data providers and data users

5 Why is language archiving different?  what is a language?  the data is not conventionalised (like $, age, year of publication etc) – what and how to code?  varying and competing expectations

6 And endangered languages archiving?  extremely diverse context – languages, cultures, communities, individuals, projects  typical source - fieldworkers  typical materials - documentation  difficult for archive staff to manage  sensitivities and restrictions  extremely high priority

7 What can a language archive offer?  Security - keep your electronic materials safe  Preservation - store your materials for the long term  Discovery - help others to find out about your materials, and you to find out about users  Protocols - respect and implement sensitivities, restrictions  Sharing - share results of your work, if appropriate  Acknowledgement - create citable acknowledgement  Mobilisation - create usable language materials  Quality and standards - advice for assuring your materials are of the highest quality and robust standards

8 Different kinds of language archives  different contexts, systems, methods, collection policies  you should consider placing your materials in more than one …

9 Why digital?  preservation: digitisation is the only way that audio and video (non-symbolic material) can be preserved for the future … because it can be copied and transmitted with zero loss  cataloguing, sharing, dissemination, repurposing

10 Digital disadvantages  digital data is fragile and ephemeral  cost (human, equipment, maintenance)  requires strategy and luck to get right  preservation depends on file and data formats  depend on tools and software  depends on formats (prefer standard, open, explicit, long-lasting)  materials may have to be converted and migrated  some formats require particular software (can we archive the software?)

11 What is archiving of language materials?  preparing materials  selecting  structuring  suitable encodings and formats  well-documented  depositing them in a suitable archive(s)  curation and accession by the archive  ongoing management, dissemination  new focus on form, presentation and user interaction/feedback

12 Users and potential users  depositors – deposit, access or update materials  speakers and their descendants (“majority of users of Berkeley Language Center archive are community members”)  other researchers - comparative/historical linguists, typologists, theoreticians, anthropologists, historians, musicologists etc etc  other “stakeholders”, eg educationalists  journalists and the wider public

13 Archives networks and bodies  foundation concepts and technologies from  library initiatives, eg. D-LIB  OAI (Open Archives Initiative)  OAIS Open Archival Information Systems (NASA and space agencies incl JAXA)  Open Language Archives Community (OLAC)  Digital Endangered Languages and Archives Network (DELAMAN)  ELAR, DOBES, ANLC, Paradisec, EMELD, LACITO, AIATSIS, AMPM (Maori)

14 Citation examples  from Heidi Johnson of AILLA Collection: Sherzer, Joel. "Kuna Collection." The Archive of the Indigenous Languages of Latin America: Media: audio, text, image. Access: 0% restricted. File/resource: Sherzer, Joel (Researcher). (1970). "Report of a curing specialist." Kuna Collection. Archive of the Indigenous Languages of Latin America: Type: transcription&translation. Media: text. Access: public. Resource ID: CUK001R001.

15 Endangered Languages ARchive (ELAR)  one of 3 programs of the Hans Rausing Endangered Languages Project  develop policies, preservation infrastructure, cataloguing and dissemination, facilities, training, advice, materials development and publishing

16 ELAR facts and figures  archived collections: 110  online (published) collections: 50  average collection size about 60 GB  online data bundles: 9523  total number of files held: around 200,000  total volume of files held: around 10 TB  online data bundles unrestricted access: 5298  registered users: >500  annual downloads: >1,000  annual number of website "hits": 230,000

17 ELAR facts and figures – user accounts  increasing number of community members, including Aleut (Canada), Tai-Ahom, Wadar (India), Burushaski (Pakistan), Serrano, Cahuilla, Arapaho (USA), Iraqi Jewish (Iraq), Saami (Finland), Wabena (Tanzania), Torwali (Pakistan), Hani, Bai (China), Irish  comments: “I found your site while looking up my grandmother, and i found her on your site speaking our language. and i would love for my children her great grandchildren to hear our language coming from her".  many interdisciplinary researchers, particularly archivists and anthropologists

18 Archiving and data management  most data-related issues are really part of linguistic data/corpus management  there are now few data-related issues that are archive-specific  metadata formats  video  presentation/exhibition of material

19 What can you archive (at ELAR)?  media - sound, video  graphics - images, scans  texts - fieldnotes, grammars, description, analysis  structured data - aligned and annotated transcriptions, databases, lexica  metadata - contextual information about the materials, structured and unstructured

20 Archive objects  an “object” could be a file, a set of files, a directory, or a set of files with their relationships explicitly defined  these are often called “sessions”or “bundles”  they should be made explicit  through metadata  our future catalogue system will provide the ability for depositors to directly create, label and update bundles See bundles at ELAR

21 Archive material should be selected  example: Depositor’s question: How much video can I archive?  answer:...

22  resource(s) for an endangered language  it could be just one file  inventory / metadata  deposit form viewview  existing deposits can also be updated, added to, and metadata added/modified What is required to make a deposit?

23 How can I deliver data?  hard disks  we return them  we send them out   good for samples for evaluation  OK for most text materials  Dropbox etc  flash cards and USB sticks  a web upload facility may be provided one day  we download from your server

24 What about CDs and DVDs?  we have found CDs, and especially DVDs, to be very unreliable  DVD fail rate > 10%  cause confusion as files are allocated to fit on disks, not according to corpus structure  create a lot of work for depositors and for ELAR

25 Protocol  the sensitivities and access restrictions associated with EL resources  need to be discussed, collected and recorded in the field  global protocol (the overall, typical value) is entered into the deposit form  specific protocol (for files, bundles) is entered via metadata (or any other explicit way)

26 Protocol and access control  principles:  granularity – file, bundle or collection  access is a relation between object and user  protocol values can be changed over time  ELAR’s URCS system  User  Researcher  Community member  Subscriber

27 “I have images”  what kinds of images?  what are their sources?  what is their documentation value? what role do they play in the collection?  … these should be reflected in the data structures/metadata

28 Metadata for images  at least captions  what else?  …  in what form?  narrative  tabular fields  keywords

29  get a list of image files  command (DOS) window  in directory  type “dir > list.txt”  open text file (in Notepad++ or MS Word)  change font to Courier  get a “vertical selection”  (or use a file listing utility!)  paste into spreadsheet Integrating images into metadata

30 Integrating images into metadata  make a new sheet for images  paste in image file list (see previous)  add an ID column  type “1” in first cell  select from first to last cell in ID column  Edit>Fill>Series>OK  add other columns  now you can refer to your images anywhere!

31 Using spreadsheet to access data  you can turn a filename into a link to access files directly from a spreadsheet  have the filename in cells  use the formula =HYPERLINK(file, “Message")  examples =HYPERLINK("E:\archiving\images\"&A2, "click here") =HYPERLINK(A1&A2, "click here") =HYPERLINK(A1&A2, A2)

32 My cells have multiple values!  example: keywords  this is probably OK, as keywords are atomic  just consistently use a suitable delimiter  e.g. use comma - if data values cannot have commas  ELAR recommends double pipe “||”

33 My cells have multiple values!  example: speakers in a recording  speakers are probably not atomic – they have other attributes  create a separate “speakers” sheet  give each speaker an ID (number or initials)  use the IDs in the original sheet, with delimiter (implements one to many)  (advanced) or make another sheet to associate recordings with speakers (implements many to many)

34 Expressing “Relation” in spreadsheets  one column is usually insufficient  “relationship” has 2-parts  the target of the relationship  description of the relationship  how would this work for images?

35 How can I tell if it’s Unicode?  use a browser or Notepad++  paste text in  examine the encoding (before and after)

36 Can I still use MS Word?  ELAR no longer accepts MS Word files  but Word is still useful  quicker to type up  useful tables, functions, macros etc  solutions  think “text only”  tables as spreadsheets (are they bad too?)  (advanced) complex materials formatted as styles, then export as marked up  PDF/A – but not a perfect solution

37 End