Presentation is loading. Please wait.

Presentation is loading. Please wait.

Archives for communities of interest, the Pacific And Regional Archive for Digital Sources in Endangered Cultures (PARADISEC Nick Thieberger Department.

Similar presentations


Presentation on theme: "Archives for communities of interest, the Pacific And Regional Archive for Digital Sources in Endangered Cultures (PARADISEC Nick Thieberger Department."— Presentation transcript:

1 Archives for communities of interest, the Pacific And Regional Archive for Digital Sources in Endangered Cultures (PARADISEC Nick Thieberger Department of Linguistics & Applied Linguistics The University of Melbourne PNC Conference November 2005 Nick Thieberger Department of Linguistics & Applied Linguistics The University of Melbourne PNC Conference November 2005

2 Collaborative digital research resource set up by University of Sydney, University of Melbourne & Australian National University, 2003 (University of New England joined 2004) 75% fund ing from Australian Research Council Linkage Infrastructure and Equipment Fund Scheme (3 successful applications)

3 Communities of interest A group of linguists and musicologists recognised that large collections of recorded material were not being properly archived. The other parts of the community are speakers and their descendants. Shared needs in the current group, and need for training of new researchers. At least 3000 hours of analog fieldtapes New technologies have a steep learning curve - Need for specialised assistance - Applied for research funds to establish an archive A group of linguists and musicologists recognised that large collections of recorded material were not being properly archived. The other parts of the community are speakers and their descendants. Shared needs in the current group, and need for training of new researchers. At least 3000 hours of analog fieldtapes New technologies have a steep learning curve - Need for specialised assistance - Applied for research funds to establish an archive

4 Communities of interest Safeguarding unique data with rights management Ability to store all metadata but to expose only parts of it to search engines All data subject to password access Location of speakers on legacy tapes is not possible, hence liaison with regional cultural centres and museums to act as clearinghouses for repatriated data. Safeguarding unique data with rights management Ability to store all metadata but to expose only parts of it to search engines All data subject to password access Location of speakers on legacy tapes is not possible, hence liaison with regional cultural centres and museums to act as clearinghouses for repatriated data.

5 Communities of interest  Collaboration across universities and disciplines  Support from computing specialists (data grid, mass data store, programming), government agencies (E- research, Australian Partnership for Sustainable Repositories. GrangeNet)  International links - similar initiatives (OLAC/DELAMAN)  Regional cultural centres and museums (targets for repatriation of digital recordings)  International standards - Metadata (OLAC/OAI)  All requires coordination or project management  Collaboration across universities and disciplines  Support from computing specialists (data grid, mass data store, programming), government agencies (E- research, Australian Partnership for Sustainable Repositories. GrangeNet)  International links - similar initiatives (OLAC/DELAMAN)  Regional cultural centres and museums (targets for repatriation of digital recordings)  International standards - Metadata (OLAC/OAI)  All requires coordination or project management

6 To preserve and make accessible Australian researchers’ field recordings of endangered languages and musics from the Asia-Pacific together with other digital material related to cultures of the region (theses, wordlists, texts, etc) Preservation: to adopt world’s best practice standards and formats to maximise sustainability and future usability of the collection Access: To take advantage of emerging information and communication technologies to maximise access to our collection by both researchers and cultural heritage communities To preserve and make accessible Australian researchers’ field recordings of endangered languages and musics from the Asia-Pacific together with other digital material related to cultures of the region (theses, wordlists, texts, etc) Preservation: to adopt world’s best practice standards and formats to maximise sustainability and future usability of the collection Access: To take advantage of emerging information and communication technologies to maximise access to our collection by both researchers and cultural heritage communities

7 Over 2000 of the world’s 6000 languages in the Asia-Pacific region Number likely to fall to a few hundred by 2100 (UNESCO) Australian researchers active in region since 1950s - making unique recordings of unrepeatable events Recordings now themselves endangered (format obsolescence, media deterioration, loss of metadata) Over 2000 of the world’s 6000 languages in the Asia-Pacific region Number likely to fall to a few hundred by 2100 (UNESCO) Australian researchers active in region since 1950s - making unique recordings of unrepeatable events Recordings now themselves endangered (format obsolescence, media deterioration, loss of metadata)

8 2500 records in PARADISEC catalogue with data on 390 languages from 50 countries including: American Samoa, Australia, Bangladesh, Botswana, Cambodia, Chile, China, Cook Islands, Fiji, French Polynesia, Greenland, Hong Kong, Iceland, India, Indonesia, Israel, Italy, Japan, Kiribati, Republic Of Korea, Lao People’s Democratic Republic, Madagascar, Malaysia, Malta, Marshall Islands, Mexico,, Federated States Of Micronesia, Myanmar, Nauru, Nepal, New Caledonia, New Zealand, Nigeria, Niue, Palau, Papua New Guinea, Philippines, Reunion, Samoa, Singapore, Solomon Islands, South Africa, Taiwan, Province of China, Thailand, Tonga, Uganda, United States of America, Vanuatu, Viet Nam, Wallis And Futuna (data as of September 2005)

9 Locating data in the collection  Metadata complying to international standards  Open language archives community (OLAC)  Geographic data entered via a map interface for later geographic querying  Open Archives Initiative (OAI)  Metadata complying to international standards  Open language archives community (OLAC)  Geographic data entered via a map interface for later geographic querying  Open Archives Initiative (OAI)

10 Metadata Catalogue  SQL/PHP password access  OAI/DC compliant via the Open Language Archives Community  Controlled vocabularies (language name, contributor role, data type, coverage, etc)  Link to repository data stored at the Australian Partnership for Advanced Computing (APAC) in Canberra  SQL/PHP password access  OAI/DC compliant via the Open Language Archives Community  Controlled vocabularies (language name, contributor role, data type, coverage, etc)  Link to repository data stored at the Australian Partnership for Advanced Computing (APAC) in Canberra

11 Typical data  Stephen Wurm’s several hundred tapes, including 120 1970s Solomon Islands tapes and transcripts/fieldnotes  Arthur Capell’s 114 tapes, Pacific and PNG 1950s (and 30 archive boxes of fieldnotes)  Bert Voorhoeve’s 180 tapes - West Papua  Tom Dutton’s 295 PNG tapes  Stephen Wurm’s several hundred tapes, including 120 1970s Solomon Islands tapes and transcripts/fieldnotes  Arthur Capell’s 114 tapes, Pacific and PNG 1950s (and 30 archive boxes of fieldnotes)  Bert Voorhoeve’s 180 tapes - West Papua  Tom Dutton’s 295 PNG tapes

12 Imaging fieldnotes  To date over 10,000 pages of fieldnotes have been photographed and are being put online  Crucial that links between fieldnotes and field recordings be maintained  Aim to allow trusted users to build links between dynamic media and fieldnotes  To date over 10,000 pages of fieldnotes have been photographed and are being put online  Crucial that links between fieldnotes and field recordings be maintained  Aim to allow trusted users to build links between dynamic media and fieldnotes

13 Wurm collection, Solomon Islands, 1979. Digitised cassette tape with page image of transcript, and Wurm’s language map Wurm collection, Solomon Islands, 1979. Digitised cassette tape with page image of transcript, and Wurm’s language map

14 Archival data  Linking transcripts to media  Creation of time aligned data that acts as finegrained metadata  Searchable time-aligned media corpus  Citation of primary media  Linking transcripts to media  Creation of time aligned data that acts as finegrained metadata  Searchable time-aligned media corpus  Citation of primary media

15 Training, resources and advocacy  Use of new technological approaches requires training, resources and advocacy  Training in use of new tools  Resources such as software, archiving, advice on tools and methods  Advocacy of the benefits of these new approaches and tools and the reasons for engaging with them  Use of new technological approaches requires training, resources and advocacy  Training in use of new tools  Resources such as software, archiving, advice on tools and methods  Advocacy of the benefits of these new approaches and tools and the reasons for engaging with them

16 Training, resources and advocacy  Great need for training in particular expressed by postgraduate students  Training is critical as tools are constantly emerging (recording techniques and equipment, software tools)  Great need for training in particular expressed by postgraduate students  Training is critical as tools are constantly emerging (recording techniques and equipment, software tools)

17 Training, resources and advocacy  Methods for development of:  Time-aligned transcripts (in XML)  Interlinearised text  Dictionary production  Crucial separation of content and form to allow well-formed archival data  Methods for development of:  Time-aligned transcripts (in XML)  Interlinearised text  Dictionary production  Crucial separation of content and form to allow well-formed archival data

18 Training, resources and advocacy  Training in creation of archival sources by fieldworkers  Naming conventions and persistent identification of data  Metadata sets and tools  Data formats  WAV  Text/XML  etc  Training in creation of archival sources by fieldworkers  Naming conventions and persistent identification of data  Metadata sets and tools  Data formats  WAV  Text/XML  etc

19 Community of interest, support and training

20 Training, resources and advocacy  We have run training workshops in the use of appropriate linguistic toolsfor archival output (Toolbox, Transcriber etc)  We have run training workshops in the use of appropriate linguistic tools for archival output (Toolbox, Transcriber etc)  University campuses in Melbourne, Sydney, Brisbane, University of Hawai’i  In community language centres in Melbourne, Kalgoorlie, Nambucca Heads and Sydney  Batchelor Institute (Aboriginal training centre)  We have run training workshops in the use of appropriate linguistic toolsfor archival output (Toolbox, Transcriber etc)  We have run training workshops in the use of appropriate linguistic tools for archival output (Toolbox, Transcriber etc)  University campuses in Melbourne, Sydney, Brisbane, University of Hawai’i  In community language centres in Melbourne, Kalgoorlie, Nambucca Heads and Sydney  Batchelor Institute (Aboriginal training centre)

21 RNLD mailing list 131 subscribers (August 2005) Searchable archive at LinguistList (only around 140 messages over 12 months) Topics covered include:  Digital audio/video recording equipment and tools  Scanning images - management of photographs  Shoebox/ Toolbox issues  Transcriber issues  Consent forms  Unicode and orthography issues 131 subscribers (August 2005) Searchable archive at LinguistList (only around 140 messages over 12 months) Topics covered include:  Digital audio/video recording equipment and tools  Scanning images - management of photographs  Shoebox/ Toolbox issues  Transcriber issues  Consent forms  Unicode and orthography issues

22 Global research community LACITO (Paris) ANLC (Alaska) EMELD (Michigan) AILLA (Texas) PARADISEC AMPM (Auckland) AIATSIS (Canberra) ELAR (London) DOBES (Netherlands) DELAMANarchives Digital Endangered Languages and Musics Archives Network

23 We are cited as an exemplar using Digital Mass Storage Systems in the International Association of Sound and Audiovisual Archives (IASA) Guidelines on the Production and Preservation of Digital Audio Objects (IASA-TC04). Aarhus, Denmark: International Association of Sound and Audiovisual Archives (IASA), 2004, p. 51. "The Sub Committee on Technology of the Memory of the World Programme of UNESCO recommends these guidelines as best practice for Audio-Visual Archives. "

24 Total file counts by file type: Total file counts by file type: ".jpg" : 7791 files ".jpg" : 7791 files ".mp3" : 2061 files ".mp3" : 2061 files ".pdf" : 34 files ".pdf" : 34 files ".rtf" : 8 files ".rtf" : 8 files ".tif" : 171 files ".tif" : 171 files ".txt" : 3 files ".txt" : 3 files ".wav" : 2061 files ".wav" : 2061 files ".xml" : 31 files ".xml" : 31 files Total file sizes by file type: Total file sizes by file type: ".jpg" : 10.91 GB ".jpg" : 10.91 GB ".mp3" : 55.59 GB ".mp3" : 55.59 GB ".pdf" : 5.70 MB ".pdf" : 5.70 MB ".rtf" : 1.04 MB ".rtf" : 1.04 MB ".tif" : 848.57 MB ".tif" : 848.57 MB ".txt" : 2.15 MB ".txt" : 2.15 MB ".wav" : 1.67 TB ".wav" : 1.67 TB ".xml" : 1.20 MB ".xml" : 1.20 MB Current size of collection As at October 28th 2005 - 12,160files in the collection totaling 1.74 TB As at October 28th 2005 - 12,160 files in the collection totaling 1.74 TB

25 Staff  Director - 2 days per week  Project Manager - 1 day per week  Admin - 4 days per week  Audio Engineer - fulltime  Occasional contract work  i.e. just over 2 fulltime positions  Director - 2 days per week  Project Manager - 1 day per week  Admin - 4 days per week  Audio Engineer - fulltime  Occasional contract work  i.e. just over 2 fulltime positions

26 Conclusion  Successful cooperation between technical expertise and discipline based practitioners  Need for involvement of practitioners  Lack of resources to curate such collections means training practitioners to produce well- formed data for longterm accessibility  Change in practice to reflect new technological possibilities for creation of innovative research objects  Successful cooperation between technical expertise and discipline based practitioners  Need for involvement of practitioners  Lack of resources to curate such collections means training practitioners to produce well- formed data for longterm accessibility  Change in practice to reflect new technological possibilities for creation of innovative research objects

27 Further information http://paradisec.org.au


Download ppt "Archives for communities of interest, the Pacific And Regional Archive for Digital Sources in Endangered Cultures (PARADISEC Nick Thieberger Department."

Similar presentations


Ads by Google