Presentation is loading. Please wait.

Presentation is loading. Please wait.

Revitalizing Endangered Language Data: Case studies in rescuing legacy documentation CELCNA 2007 Naomi Fox, Julia James, University of Utah.

Similar presentations


Presentation on theme: "Revitalizing Endangered Language Data: Case studies in rescuing legacy documentation CELCNA 2007 Naomi Fox, Julia James, University of Utah."— Presentation transcript:

1 Revitalizing Endangered Language Data: Case studies in rescuing legacy documentation CELCNA 2007 Naomi Fox, Julia James, University of Utah

2 Digital formats ● Why do you want your data in digital format? DigitalNonDigital Dictionary databasenotecards in a shoebox under the bed ● examples of increased functionality of digital formats – even in Word format, you can use 'find' instead of flipping through pages

3 What are Best Practices and why should I care? ● Why follow BP – interoperability/data sharing – protect valuable data from loss (obsolescence) – make sure your data outlives you ● Finding out BP: resources – E-MELD (http://www.emeld.org)http://www.emeld.org – OLAC (http://www.language-archives.org/)http://www.language-archives.org/ – DELAMAN (http://www.delaman.org/)http://www.delaman.org/ – Edata (http://www.endangereddata.org)http://www.endangereddata.org

4 Quick and Dirty Best Practice Recommendations Audio ● uncompressed,.wav or.aiff, minimum 44.1khz/16bit Text ● XML, tagged, with valid DTD ● Unicode ● indexed to audio Metadata ● have some

5 Getting there from here ● I've accepted BP, now what?

6 Getting there from here ● I've accepted BP, now what? ● My computer won't read my old Wordstar file. What program can I use?

7 Getting there from here ● I've accepted BP, now what? ● My computer won't read my old Wordstar file. What program can I use? ● I have a PC, but all of my data was entered on a Mac

8 Getting there from here ● I've accepted BP, now what? ● My computer won't read my old Wordstar file. What program can I use? ● I have a PC, but all of my data was entered on a Mac ● My data is in [insert database name here] which is not supported outside [insert obsolete OS here]. It's fine for me, but others can't use it.

9 General physical format issues ● Analog recordings – cassette – reel-to-reel – wax cylinder ● Field notes – Field Notes, Notebooks – Notecards – Annotated descriptive materials ● Outdated computer media/drives

10 Digitizing analog recordings ● outsourced – audiotechnical experts – equipment and staff limitations – equipment procurement and maintenance problems ● in-house – equipment and space appropriate as part of CAIL's ongoing mission – valuable training for students

11 Field Notes ● Issues – notes may not be in any logical order, even if they are well-catalogued – need to design a digital data structure that represents all of the written information ● Options – scanning as images – scanning/OCR to text files – manual data entry

12 Outdated computer media ● General problems – QUIRKY QUOTE NEEDED ● Floppy disks – consult an expert – building a system that can read old disks and create modern media such as CD or DVD-RAM

13 Software Issues ● More difficult to diagnose ● Usually hand-in-hand w/hardware issues – If your floppy disk is obsolete, the data on it will likely need some updating too ● Fast-paced world of software development – Even files from older versions of the same program may not transfer properly to current software

14 Case Studies ● Hypercard ● Shoebox 3.0 ● Word processors/spreadsheets – MS Word – Excel – WordPerfect – Plain text (.txt) (Not a total pain in the ass)

15 Hypercard data Floppy > CD > Hypercard data > Hypercard-to-text custom tool (VN) > Word Docs >FMPro Database >Print reference 1) Read the floppy 2) Analyze the data (what format is it in, what kind of data is it?) 3) Get the data into a transportable format 4) Structure the data 5) Use the data

16 Shoebox Shoebox 3.0 --> Toolbox --> XML -->XSLT--> HTML Online Mocho dictionary 1) Figured out the structure of the database (Shoebox 3.0) ● ascertain data collector's conventions if possible 2) Migrated to the newest form of the software (Toolbox) 3) Export XML or text version 4) Write XSLT document to create HTML (online/book tutorial) 5) Basic online web version

17 (Word) transcriptions Word transcriptions > Excel > autoglossing tool > Excel dictionary > XML or presentation format 1)Interlinear text documents 2)Tool (VN) to import document to Excel data template 3)Visual Basic tool (VN) to autogloss morphemes ● from Excel dictionary (or other database) 4)Corpus tool 5)Export ● XML for archival format ● XSLT presentation formats

18 Don't forget Metadata ● What is metadata? – Documenting your documents ● Why metadata? – saves lots of trouble later ● Resources – IMDI – OLAC – Your local archivist

19 Minimal metadata These are enough to be getting on with, but always follow your archivist's recommendations – Language – Speaker – Time and place of recording – Collector – Transcriber – Software version and revisions – Transcription conventions / abbreviations

20 (A bit of) What we have learned Save time. Hire a Genius (e.g. Vivian Ngai) – Initiative goes a long way – Knowing your end goal (desired end data format, best practices) makes the intermediate steps more focused – Ask. There are too many people to mention who have answered questions and suggested solutions


Download ppt "Revitalizing Endangered Language Data: Case studies in rescuing legacy documentation CELCNA 2007 Naomi Fox, Julia James, University of Utah."

Similar presentations


Ads by Google