East Meets Rest Adding East Asian Scripts to Harvard’s ILS Prepared for presentation to the North American Aleph Users’ Group 2 June 2003 Charles Husbands,

Slides:



Advertisements
Similar presentations
You will see later why I show this DVD.
Advertisements

When parallels collide: Parallel records, parallel fields and hybrid records OCLC Users Group Annual Meeting 3/6/2004 Hsi-chu Bolick University of North.
CJK Character Validation – Impact from EACC to Unicode Migration 2006 CEAL Conference Committee on Technical Processing Ai-lin Yang East Asian Library,
Japanese Records and Whether or not to Switch from MARC 8 to Unicode Storage (with an Innovative Interfaces Millennium local system) The University of.
OCLC Online Computer Library Center Connexion Overview Session OCLC CJK Users Group 2007 Annual Meeting March 24, 2007, Boston.
FROM RLIN TO OCLC CONNEXION DIFFERENT WORKFLOWS AND DIFFERENT PRACTICE Teresa Mei East Asian Catalog Librarian Cornell University Library.
OCLC Online Computer Library Center OCLC Cataloging Update Connexion client 1.50 & more OCLC CJK Users Group Annual Meeting San Francisco, CA April 8,
Vendor Records: A Brief Survey 2007 OCLC CJK Users Group Meeting Karen T. Wei University of Illinois March 24, 2007.
A worldwide library cooperative OCLC Online Computer Library Center OCLC CJK Users Group 2007 Annual Meeting March 24, 2007, Boston David Whitehair, OCLC.
A Comparative Study of Searching Korean Scripts in OPACs: The Impact of Spaces Miree Ku Duke University.
June 2004 Adil Allawi Technical Director
OCLC Online Computer Library Center Connexion Client 1.30 for Multiscripts Cataloging CJK User Group Meeting, Chicago April 2, 2005 David Whitehair and.
ALEPH version 19.01/20.01 Cataloging & Acquisitions/Serials Updates South Dakota Library Network 1200 University, Unit 9672 Spearfish, SD
ExportQ Yale University Library. What Is ExportQ ? Written by Library Systems Office Used with Voyager Cataloging Two main functions –Facilitates record.
EBooks in the Online Catalog: Challenges and Opportunities Gary Moore, Susannah Benedetti University of North Carolina Wilmington OLAC Conference 2006.
Cataloging: Millennium Silver and Beyond Claudia Conrad Product Manager, Cataloging ALA Annual 2004.
Unicode and the Web Nathan Schneider. Special Text In our interactions with computers, it is often desirable to use characters other than the standard.
1 Character Codes Related Problems - UNICODE OPAC and Millennium at WASEDA Univ. Library - Tsutomu SUZUKI Waseda University.
Library integrated system -Aleph Fang Peng Stony Brook University.
6 th Annual Hong Kong Innovative Users Group Meeting 8-9 December 2005, Hong Kong HKIUG’s Unicode Projects Untangling the Chaotic Codes Philip Wong City.
Interpret Application Specifications
1 The Forest & the Trees: HKCAN beyond CJK Cataloging Presented by Charlene Chou Columbia University HKCAN Seminar & Opening Oct. 4, 2002.
Maintain and Modify By: Sahar Aftab (1253 ) and Mehboob Nazim (1085) Central Library.
Batch-conversion of Non-standard Multiscript Records by XSLT Lucas Mak Metadata and Catalog Librarian Michigan State University Catalog Management Interest.
Localizing OpenClinica Hiroaki Honshuku: SQA 1. © What is Character Encoding?  Morse Code (1840) → Latin Alphabet  ASCII (1963)  The American Standard.
Globalisation & Computer Systems week 5 1. Localisation presentations 2.Character representation and UNICODE UNICODE design principles UNICODE character.
AGent 2.0 Cataloging AGCat –Replaces WindowsCat/FullCat UDMM Interactive authority control Subject heading translation Bibliographic resources Cataloging.
Libraries Australia Cataloguing Parallel Session Bemal Rajapatirana / Rob Walls.
1 CS 502: Computing Methods for Digital Libraries Lecture 4 Text.
Updated :02 Hong Kong University of Science & Technology Library XML Name Access Control Repository at the Hong Kong University of Science.
Weeding in ALEPH Library Staff Training © South Dakota Library Network, 2013 ©Ex Libris (USA), 2011 Modified for SDLN Version
OCLC Online Computer Library Center Kathy Kie December 2007 OCLC Cataloging & Metadata Services an introduction.
Items 14.2 Seminar 5 March Seminar Items 2 Session Agenda Item record - structural changes Call No. Filing Item sorting routines Item Form.
Let VRS Work for You! ELUNA Conference 2008 Presenter: Kelly P. Robinson GIL Service Georgia State University
Cataloging 12.3 to 14.2 Seminar. Cataloging 2 -New check routines -Cataloging authorizations -Other innovations -Fix and expand routines -Floating keyboard.
Character Encoding, F onts. Overview Why do character encoding and fonts matter to linguists? How can you identify problems? Why do these problems arise?
Highlights from recent MARC changes Sally McCallum Library of Congress.
Expression Web 3... now with TFS!. What is Expression Web 3? Professional web design and development tool Create standards-based Web sites faster & easier.
Filing and Word Breaking Procedures. 2 Session Agenda Pre-14.x tab_word_breaking table Structure Procedures Special remarks tab_filing table Structure.
Creating an Open Archives Metadata Harvesting Protocol Compliant Repository for the American Memory Online Collections OAI Open Meeting, Washington, DC.
OCLC Online Computer Library Center Annual Report: New Enterprises & Development News Marty Withrow, Director Product Development Division oclc.org.
1 Preparations for Implementing RDA in Ex Libris’ Products ALA Annual Conference | Anaheim, CA | 24 June 2012 Mike Dicus, Product Manager Ex Libris (USA),
Globalisation & Computer systems Week 5/6 Character representation ACII and code pages UNICODE.
ALEPH Software Development Plans November 2002 James Steenbergen Director of Support Services ALEPH Software Development Plans November.
Understanding InfoHawk Indexes Technical Background for Libraries Staff Patricia Baird Sue Julich.
Integrating the British library: implementation of Aleph 500 Alan Danskin Data Quality & Authority Control Manager. Sistema integrado de gestión bibliotecaria:
A worldwide library cooperative OCLC Online Computer Library Center OCLC CJK Users Group 2007 Annual Meeting March 24, 2007, Boston David Whitehair, OCLC.
Planning for Life after OCLC Passport for Cataloging An overview of the new OCLC cataloging service Revised April 2002.
Connexion Comparison Client or Browser? Fran Juergensmeyer Waukegan Public Library 2 nd Annual WILIUG Conference June 16, 2006 Cataloging from A (Authority)
Demonstration of HKCAN database Outline Database system overview Software characteristics Database status.
MARCIt records for e-journals project to implement MARCIt service McGill University Library Feb
Technology & Library Services Wooseob Jeong U of Wisconsin — Milwaukee Sun-Yoon Lee U of Southern California.
The physical parts of a computer are called hardware.
An ILS with CJK Functionality: Implementation and Impact The Experience of the University of Michigan Library Michael Meng April 5, 2006 CEAL, San Francisco.
ARABIC SCRIPT CATALOGUING at Georgetown University in Qatar Stefan Seeger MENA-IUG 5 th Annual Conference, Dubai 2010.
Loading Bibliographic Records Online and in Batch Pat Riva Romance Languages Cataloguer/ Bibliographic Database Specialist McGill University
30/01/2016 Oulun yliopiston kirjasto BookWhere and UseMARCON as copy cataloguing tools in Voyager E-EndUser2003 Timo Mäntyvaara Oulu University Library.
WORLD CONSORTIUM Welcome to. An overview by Phil Elliott Satzconcept Skandinavia a.s.
© 2001, Penn State University Encoding on the Internet Elizabeth J. Pyatt CETS.
Unicode in ALEPH Session Outline Key concepts Pre-UNICODE ALEPH ALEPH full UNICODE version Innovations in character conversion mechanism.
A& M Libraries Voyager Training Basic Cataloging February 21, 2007 Janet H. Ahrberg Oklahoma State University Library.
Language of Cataloging. What’s wrong with this picture?
The ___ is a global network of computer networks Internet.
How to control bracket and parentheses appearance in right to left display of web Presenter: Yoel Kortick.
Building an improved resource discovery experience for users: a cataloger’s experience and perspective CEAL Conference Toronto, Canada, March, 2012 Jia.
Professional development training on cataloging at the University Wisconsin-Madison Memorial Library, USA 14th October -24th October, 2016 Aigerim Shurshenova.
Yoel Kortick Senior Librarian
Workshop on XML-Based Library Applications 5
Giles Martin for the EPC Meeting October 12-14, 2005
EACC to Unicode Migration
Presentation transcript:

East Meets Rest Adding East Asian Scripts to Harvard’s ILS Prepared for presentation to the North American Aleph Users’ Group 2 June 2003 Charles Husbands, HUL Office for Information Systems

A short history of HOLLIS (Harvard Online Library Information System) 1985: NOTIS-derived Acquisitions and Cataloging 1985: NOTIS-derived Acquisitions and Cataloging 1987: Circulation implementation begins 1987: Circulation implementation begins 1988: OPAC implementation makes HOLLIS a real Integrated Library System (ILS) 1988: OPAC implementation makes HOLLIS a real Integrated Library System (ILS) Ca. 1995: Thinking about next generation begins Ca. 1995: Thinking about next generation begins November 2000: Aleph contract signed November 2000: Aleph contract signed July 2002: Aleph 15.2 installed as new ILS July 2002: Aleph 15.2 installed as new ILS 2002: The name HOLLIS now encompasses Aleph ILS and other catalogs and electronic resources 2002: The name HOLLIS now encompasses Aleph ILS and other catalogs and electronic resources

Non-latin scripts at Harvard Pre-Aleph system could use only latin script data Pre-Aleph system could use only latin script data Aleph support priorities for HOLLIS Aleph support priorities for HOLLIS 1. CJK 2. Arabic and Hebrew 3. Cyrillic and Greek CJK first CJK first Over 500,000 records Over 500,000 records 60% Chinese 60% Chinese 25% Japanese 25% Japanese 15% Korean 15% Korean

Challenges Huge character repertoire Huge character repertoire Homonyms Homonyms Other one-to-many issues Other one-to-many issues Collating sequence Collating sequence Input method Input method Display Display MARC management MARC management

Simplified and traditional forms and homonyms

Starting from Jerusalem and Beijing ExLibris’s “CJK” efforts as of Mar ExLibris’s “CJK” efforts as of Mar Designed for Chinese sites Designed for Chinese sites Automatic pinyination Automatic pinyination Text “segmentation” Text “segmentation” Chinese Windows required Chinese Windows required Collation by pinyin Collation by pinyin Inhospitable to Japanese or Korean Inhospitable to Japanese or Korean Not yet a mature product Not yet a mature product Unicode-based – a big plus Unicode-based – a big plus

Coming to Cambridge Harvard scholars’ requirements Harvard scholars’ requirements Truly “CJK” Truly “CJK” Search traditional & simplified Chinese together Search traditional & simplified Chinese together Search in original script or romanization Search in original script or romanization Cross-language character search Cross-language character search

Coming to Cambridge Other development issues Other development issues Word division Word division Facilitating staff use Facilitating staff use Retagging 880 fields Retagging 880 fields MARC compatibility MARC compatibility Desktop requirements Desktop requirements Input methods Input methods Joint specification - Jan. to Oct Joint specification - Jan. to Oct Programming Oct to Nov plus Programming Oct to Nov plus

Results of word search development For word searches on CJK characters – For word searches on CJK characters – Adjacency implied automatically Adjacency implied automatically Multilanguage results Multilanguage results Hence, no special indexes Hence, no special indexes One search retrieves both simplified and traditional forms One search retrieves both simplified and traditional forms

How come implied adjacency? Word division issues Word division issues Utilities’ practices differ Utilities’ practices differ RLIN aggregates/segments RLIN aggregates/segments OCLC does not OCLC does not Harvard chooses not to separate words Harvard chooses not to separate words Reflects the written language Reflects the written language fix_doc_delete_chi_spaces fix_doc_delete_chi_spaces Great flexibility for searcher Great flexibility for searcher

Results of browse development For browses – For browses – Language-specific indexes Language-specific indexes Chinese Chinese Pinyin order Pinyin order subarranged by Unicode values subarranged by Unicode values character by character character by character Japanese and Korean Japanese and Korean By Unicode values By Unicode values Less than ideal Less than ideal

On language-specific CJK browse Paradox Paradox Other browses not language-specific Other browses not language-specific Chinese Chinese Like Asian Aleph installations Like Asian Aleph installations Original script to pinyin dictionary Original script to pinyin dictionary Indexing by automatically-generated pinyin Indexing by automatically-generated pinyin Potentially different from cataloger-input Potentially different from cataloger-input Japanese and Korean Japanese and Korean Analogous treatment in future? Analogous treatment in future?

An aside HOLLIS language-specific browse for other non-latin scripts? HOLLIS language-specific browse for other non-latin scripts? “Han”-based writing systems (CJK) “Han”-based writing systems (CJK) Huge repertoire Huge repertoire Many homonyms Many homonyms Divergent sequencing principles Divergent sequencing principles Alphabets and syllabaries Alphabets and syllabaries Small repertoire Small repertoire Divergent sequences, but Divergent sequences, but More like latin-script languages, where English wins More like latin-script languages, where English wins

Notes on CJK browsing When browsing in the HOLLIS Catalog: When browsing in the HOLLIS Catalog: CJK browse indexes CJK browse indexes Enter search in the original script Enter search in the original script CJK in main indexes CJK in main indexes Enter search in romanized form Enter search in romanized form In CJK browse indexes In CJK browse indexes Unicode values distinct for simplified & traditional Unicode values distinct for simplified & traditional A mistake? A mistake?

Browse index display

OPAC full record display

MARC21 compatibility issues: “alternative graphic representation” Paired fields from 880 and mate Simpler index construction Simpler index construction Better display for catalogers Better display for catalogers Maintained as a pair Maintained as a pair Subfield 9 in ex-880 Subfield 9 in ex-880 Automatically generated Automatically generated Contains a language code from 008 or 041 Contains a language code from 008 or 041 Can be overridden by cataloger Can be overridden by cataloger Only one subfield 9 allowed per pair Only one subfield 9 allowed per pair

Paired fields in cataloger’s view

MARC21 compatibility issues: “alternative graphic representation” Typical p_manage_25 tab_fix group for importing CJK MARC21 records to Aleph Typical p_manage_25 tab_fix group for importing CJK MARC21 records to Aleph fix_doc_delete_chi_spacesmodify RLIN-style data fix_doc_delete_chi_spacesmodify RLIN-style data fix_doc_880retag fields fix_doc_880retag fields fix_doc_sortrearrange fields by tag fix_doc_sortrearrange fields by tag fix_doc_sort_sub6subarrange to unite pairs fix_doc_sort_sub6subarrange to unite pairs fix_doc_marc21_spaces“standard” blank replacement fix_doc_marc21_spaces“standard” blank replacement fix_doc_do_file_08 x.fixother fussing as needed locally, fix_doc_do_file_08 x.fixother fussing as needed locally, e.g. delete unwanted fields

MARC21 compatibility issues: “alternative graphic representation” Exporting CJK MARC21 records from Aleph Exporting CJK MARC21 records from Aleph Variant procedures required depending on the character encoding desired – UTF8 or MARC8. Variant procedures required depending on the character encoding desired – UTF8 or MARC8. Two new Ex Libris routines required for non-latin export are in hand but not yet tested. Two new Ex Libris routines required for non-latin export are in hand but not yet tested. A tab_fix group for p_print_03 will include A tab_fix group for p_print_03 will include fix_doc_redo_880 restore 880 fields fix_doc_create_066only for MARC8 output 066 not defined in UTF8 records

MARC21 compatibility issues: Character encoding MARC8 EACC and Unicode CJK MARC8 EACC and Unicode CJK More variants encoded separately in EACC More variants encoded separately in EACC Harvard’s decision: Harvard’s decision: Go with Unicode Go with Unicode Modify Ex Libris CJK conversion table Modify Ex Libris CJK conversion table Two EACC values can become one Unicode value Two EACC values can become one Unicode value Imperfect reversibility Imperfect reversibility

Harvard desktop requirements for CJK Staff client for CJK character input Staff client for CJK character input Windows 2000 Professional Windows 2000 Professional “Language setting for the system” Japanese, Korean, Chinese traditional, Chinese simplified “Language setting for the system” Japanese, Korean, Chinese traditional, Chinese simplified “Input locales” as needed “Input locales” as needed MS Arial Unicode font MS Arial Unicode font Staff client for view-only CJK Staff client for view-only CJK Windows 2000 professional or NT 4.0 Windows 2000 professional or NT 4.0 A CJK enabler such as Unionway’s Asian Suite A CJK enabler such as Unionway’s Asian Suite MS Arial Unicode font MS Arial Unicode font

Harvard desktop requirements for CJK Web Browser /OPAC for all users Web Browser /OPAC for all users Windows 2000 or NT 4.0 Windows 2000 or NT 4.0 Internet Explorer 5.01 or higher Internet Explorer 5.01 or higher MS Arial Unicode font MS Arial Unicode font For NT For NT IE Language packs Chinese simplified, Chinese traditional, Japanese, Korean IE Language packs Chinese simplified, Chinese traditional, Japanese, Korean For 2000 For 2000 “Language setting for the system” Japanese, Korean, Chinese traditional, Chinese simplified “Language setting for the system” Japanese, Korean, Chinese traditional, Chinese simplified “Input locales” as needed “Input locales” as needed

Things as they are today CJK added to.5 million existing records CJK added to.5 million existing records In production In production Cataloging Cataloging OCLC XPO OCLC XPO RLIN PUT RLIN PUT In testing In testing OCLC batch record import OCLC batch record import Export of MARC records Export of MARC records