Japanese Records and Whether or not to Switch from MARC 8 to Unicode Storage (with an Innovative Interfaces Millennium local system) The University of.

Slides:



Advertisements
Similar presentations
When parallels collide: Parallel records, parallel fields and hybrid records OCLC Users Group Annual Meeting 3/6/2004 Hsi-chu Bolick University of North.
Advertisements

CJK Character Validation – Impact from EACC to Unicode Migration 2006 CEAL Conference Committee on Technical Processing Ai-lin Yang East Asian Library,
OCLC Online Computer Library Center Connexion Overview Session OCLC CJK Users Group 2007 Annual Meeting March 24, 2007, Boston.
FROM RLIN TO OCLC CONNEXION DIFFERENT WORKFLOWS AND DIFFERENT PRACTICE Teresa Mei East Asian Catalog Librarian Cornell University Library.
OCLC Online Computer Library Center OCLC Cataloging Update Connexion client 1.50 & more OCLC CJK Users Group Annual Meeting San Francisco, CA April 8,
Last revised: 8 April 2006 EACC to Unicode Migration Ki Tat LAM Head of Library Systems The Hong Kong University of Science and Technology Library
A worldwide library cooperative OCLC Online Computer Library Center OCLC CJK Users Group 2007 Annual Meeting March 24, 2007, Boston David Whitehair, OCLC.
Problems with Non-roman Character (Korean) Searching Prepared by Prepared by Young Ki Lee Young Ki Lee Senior Cataloging Specialist Senior Cataloging Specialist.
The creation of "Yaolan.com" A Site for Pre-natal and Parenting Education in Chinese by James Caldwell DAE Interactive Marketing a Web Connection Company.
OCLC Online Computer Library Center Connexion Client 1.30 for Multiscripts Cataloging CJK User Group Meeting, Chicago April 2, 2005 David Whitehair and.
Bibliographic Framework Initiative Approach for MARC Data as Linked Data Sally McCallum Library of Congress.
Acquiring Chinese E-Books: Where to Start and How to Get Here-- University of Pittsburgh Library System's Experience Hong Xu March 24, 2007.
Your Potential as an Entrepreneur
Module 6: Preparing for RDA... Library of Congress RDA Preconference for MLA/DLA May 4, 2011.
Basic Copy Cataloging (Books) Prepared by Lynnette Fields, Lori Murphy, Kathy Nystrom, Shelley Stone as an LSTA grant “Funding for this grant was awarded.
Cataloging: Millennium Silver and Beyond Claudia Conrad Product Manager, Cataloging ALA Annual 2004.
14 of 35 What Is A Document? A document has three parts 1.CONTENT : The string of characters normally coded in ASCII or UNICODE - A document now-a-days.
Copyright 2004 Monash University IMS5401 Web-based Systems Development Topic 2: Elements of the Web (g) Interactivity.
1 Character Codes Related Problems - UNICODE OPAC and Millennium at WASEDA Univ. Library - Tsutomu SUZUKI Waseda University.
1. Discrete / Continuous Representations Of numbers – binary & decimal Bits Hexadecimal - 'Hex' Representing text Bits and Bytes.
Data Representation in Computers
1st Project Introduction to HTML.
The Internet & The World Wide Web Notes
Batch-conversion of Non-standard Multiscript Records by XSLT Lucas Mak Metadata and Catalog Librarian Michigan State University Catalog Management Interest.
Mohammed Saiyeedur Rahman.  E-commerce is buying and selling goods over the internet. This could include selling/buying mobile phones, clothes or DVD’s.
Chapter ONE Introduction to HTML.
East Meets Rest Adding East Asian Scripts to Harvard’s ILS Prepared for presentation to the North American Aleph Users’ Group 2 June 2003 Charles Husbands,
Lesson 8 DATA EXCHANGE. Transmission Modes Type 1 - Simplex  Simplex transmission: sends data in one direction only. A radio broadcast is a good example.
Chapter 2 TEXT.
1 Chinese Information Processing (I): Basic Concepts and Practice Unit 5: Asynchronous Communication.
Encoding and fonts Edward Garrett Software Developer, ELAR.
1 herbert van de sompel CS 502 Computing Methods for Digital Libraries Cornell University – Computer Science Herbert Van de Sompel
Chapter 6 Text and Multimedia Languages and Properties
APPX Unicode Support APPX Release 6.0 will support Unicode APPX will support languages worldwide.
Updated :02 Hong Kong University of Science & Technology Library XML Name Access Control Repository at the Hong Kong University of Science.
File Formats Chapter 9 Bit Literacy. File formats are often ignored by users Applications automatically save files in the application’s format All formats.
HTML, XHTML, and CSS Sixth Edition Chapter 1 Introduction to HTML, XHTML, and CSS.
INFOCODING BASICS & EXAMPLES OF CURRENT USE Introduction to Computer Science Using Ruby (c) 2010 Gideon Frieder.
Technology Choices for the JSTOR Online Archive Presented by Chang Feng Department of Computer Engineering and Computer Science, University of Missouri-Columbia,
Text and Graphics September 26, Unit 3.
Character Encoding, F onts. Overview Why do character encoding and fonts matter to linguists? How can you identify problems? Why do these problems arise?
OCLC Online Computer Library Center Annual Report: New Enterprises & Development News Marty Withrow, Director Product Development Division oclc.org.
Copyright © Terry Felke-Morris WEB DEVELOPMENT & DESIGN FOUNDATIONS WITH HTML5 7 TH EDITION Chapter 1 Key Concepts 1.
Planning for Life after OCLC Passport for Cataloging An overview of the new OCLC cataloging service Revised April 2002.
Connexion Comparison Client or Browser? Fran Juergensmeyer Waukegan Public Library 2 nd Annual WILIUG Conference June 16, 2006 Cataloging from A (Authority)
Demonstration of HKCAN database Outline Database system overview Software characteristics Database status.
INTERNET. Objectives Explain the origin of the Internet and describe how the Internet works. Explain the difference between the World Wide Web and the.
Data Files on Computers Text Files (ASCII) Files that can be created by typing on the keyboard while using a text editor such as notepad or TextEdit.
Representing Characters in a computer Pressing a key on the computer a code is generated that the computer can convert into a symbol for displaying or.
The physical parts of a computer are called hardware.
Website Design, Development and Maintenance ONLY TAKE DOWN NOTES ON INDICATED SLIDES.
© 2001, Penn State University Encoding on the Internet Elizabeth J. Pyatt CETS.
MISSION CRITICAL COMPUTING SQL Server Special Considerations.
The ___ is a global network of computer networks Internet.
DWAN ALSTON SMS TECHNOLOGY WHAT IS SMS????? SMS stands for Short Message Service. It is a technology that enables the sending and receiving of messages.
Important skills for Computer Based Researching Peter Szluka, Attila Skulteti
An introduction to information technology in libraries Historical, current and future viewpoints in Israel and the world (written in Hebrew by Elhanan.
Loading Chinese Vendor Acquisitions MARC Records
Project 1 Introduction to HTML.
Introduction to Computers
Workshop on XML-Based Library Applications 5
Your Potential as an Entrepreneur
Text.
Chapter 27 WWW and HTTP.
Computer Data Types Basics of Computing.
HYPERTEXT PREPROCESSOR BY : UMA KAKKAR
INFOCODING BASICS & EXAMPLES OF CURRENT USE
EACC to Unicode Migration
Real-World File Structures
Presentation transcript:

Japanese Records and Whether or not to Switch from MARC 8 to Unicode Storage (with an Innovative Interfaces Millennium local system) The University of Washington Law Librarys Decision-making Process

Differences in Storage and/or Export Settings With Different Local Systems Your Mileage May Vary Its important to note that different local systems vary widely in whether and how data is stored, imported and exported. These differences will have a huge impact on the experience of librarians making decisions on whether or not to export records in Unicode from OCLC to the local system. Your Mileage May Vary Its important to note that different local systems vary widely in whether and how data is stored, imported and exported. These differences will have a huge impact on the experience of librarians making decisions on whether or not to export records in Unicode from OCLC to the local system. Innovative Interfaces Millennium Local Systems Do not allow import of records encoded differently than the encoding for storage. In other words, If III storage is set to Unicode, records must be imported from OCLC in Unicode. If storage is set to MARC 8, records must be imported in MARC 8 Innovative Interfaces Millennium Local Systems Do not allow import of records encoded differently than the encoding for storage. In other words, If III storage is set to Unicode, records must be imported from OCLC in Unicode. If storage is set to MARC 8, records must be imported in MARC 8 Voyager Local Systems (CJK version) Can be set to convert imported MARC 8 records to Unicode on-the-fly for storage. This makes the decision about exporting from OCLC Connexion in Unicode VS MARC 8 less important (almost irrelevant) Voyager Local Systems (CJK version) Can be set to convert imported MARC 8 records to Unicode on-the-fly for storage. This makes the decision about exporting from OCLC Connexion in Unicode VS MARC 8 less important (almost irrelevant) Other Local Systems? Local systems that store data in MARC 8 cannot import and display Unicode records unless they convert the records to MARC 8. Conversely, local systems storing data in Unicode cannot import MARC 8 records unless the data is converted to Unicode. Ask these questions about your local system: Other Local Systems? Local systems that store data in MARC 8 cannot import and display Unicode records unless they convert the records to MARC 8. Conversely, local systems storing data in Unicode cannot import MARC 8 records unless the data is converted to Unicode. Ask these questions about your local system: What encoding is used for storage? What encoding is used for storage? Is there a required encoding for imported records? Is there a required encoding for imported records? If not, are imported records automatically converted to the appropriate encoding for storage? If not, are imported records automatically converted to the appropriate encoding for storage?

Our Library is trying to decide… Our Library is trying to decide… To switch, or not to switch… Innovative Interfaces Millennium System OCLC Connexion Japanese Records Marian Gould Gallagher Law Library MARC 8 OR Unicode Storage??

Unicode VS MARC 8 Basics Computers store text as numeric codes. Unicode has become the standard for text storage worldwide. Its use facilitates the storage, transfer, and display of text in a wide range of computer software environments (the internet, databases, browsers, word processors, etc) Computers store text as numeric codes. Unicode has become the standard for text storage worldwide. Its use facilitates the storage, transfer, and display of text in a wide range of computer software environments (the internet, databases, browsers, word processors, etc) What is MARC 8? MARC 8 has been the North American Library Communitys text storage standard. ( The group of 7/8-bit and 24-bit character sets used to encode MARC 21 records. These sets are specified in MARC 21 Specifications for Record Structure, Character Sets, and Exchange Media, Character Sets, Part 1. 1) What is MARC 8? MARC 8 has been the North American Library Communitys text storage standard. ( The group of 7/8-bit and 24-bit character sets used to encode MARC 21 records. These sets are specified in MARC 21 Specifications for Record Structure, Character Sets, and Exchange Media, Character Sets, Part 1. 1) What is Unicode? Unicode has become the international standard for text storage. The Universal Character Set (UCS) which is ISO and its industry counterpart Unicode. 1 What is Unicode? Unicode has become the international standard for text storage. The Universal Character Set (UCS) which is ISO and its industry counterpart Unicode. 1 1 Source: LCs MARC 21 Specifications for Record Structure, Character Sets, and Exchange Media: CHARACTER SETS

What Problems are Specific to Japanese? Q: Do Some Problems associated with Unicode vs MARC 8 storage affect one language (such as Japanese) more than others? A: Not Really. Problems with character display for specific languages are more often an issue of font availability. Each application must have access to a font that will display the proper characters. Arial Unicode MS can display most Unicode characters. In library records, an additional issue is converting between MARC 8 and Unicode. But these issues can affect many languages and scripts; not just Japanese.

What Problems are Specific to Japanese? Q: So are there any Japanese-specific problems? A: Not when it comes to Unicode storage itself. But there are common problems with display of Kanji and Japanese romanization in library catalogs. These are mainly font-availability issues, not Unicode storage issues. Examples of Font-based Problems Specific to Japanese Romanization (Diacritic Problem) Romanization (Diacritic Problem) Alif as in koninAlif as in konin Kanji Examples of Japanese Kanji not in EACC (Different Unicode Code Point Required for Verified Catalog Record in OCLC) Kanji Examples of Japanese Kanji not in EACC (Different Unicode Code Point Required for Verified Catalog Record in OCLC) MARC 8/ EACC: (U+8AAA) instead of (U+8AAC)MARC 8/ EACC: (U+8AAA) instead of (U+8AAC) MARC 8/ EACC: (U+865B) instead of (U+865A)MARC 8/ EACC: (U+865B) instead of (U+865A) MARC 8/ EACC: (U+5377) instead of (U+5DFB)MARC 8/ EACC: (U+5377) instead of (U+5DFB) MARC 8/ EACC: (U+9304) instead of (U+9332)MARC 8/ EACC: (U+9304) instead of (U+9332) MARC 8/ EACC: (U+67E5) instead of (U+67FB)MARC 8/ EACC: (U+67E5) instead of (U+67FB)

What Problems are Specific to Japanese? Why Switch to Unicode Storage? Q: If there are no problems with MARC 8 storage specific to Japanese, then why should our library switch to Unicode storage? A: Consider this quote from Microsoft: Deciding whether to store non-DBCS [double- byte character set] data as Unicode is generally determined by an awareness of the effects on storage, and about how much sorting, conversion, and possible data corruption might happen during client interactions with the data... However, for most applications the effect is negligible. Databases with well-designed indexes are especially unlikely to be affected…Deciding whether to store non-DBCS [double- byte character set] data as Unicode is generally determined by an awareness of the effects on storage, and about how much sorting, conversion, and possible data corruption might happen during client interactions with the data... However, for most applications the effect is negligible. Databases with well-designed indexes are especially unlikely to be affected…

What Problems are Specific to Japanese? Why Switch to Unicode Storage? A: (continued) Most of the time, the decision to store character data, even non-DBCS data, in Unicode should be based more on business needs instead of performance. In a global economy that is encouraged by rapid growth in Internet traffic, it is becoming more important than ever to support client computers that are running different locales. Additionally, it is becoming increasingly difficult to pick a single code page that supports all the characters required by a worldwide audience. 2 2 See the Microsoft article Storage and Performance Effects of Unicode : us/library/ms aspx us/library/ms aspxhttp://msdn2.microsoft.com/en- us/library/ms aspx

What are the Pros and Cons to Converting our Local System to Unicode Storage? Advantages of Staying with MARC 8 Advantages of Staying with MARC 8 May not be possible to back out of switch to Unicode if problems crop up May not be possible to back out of switch to Unicode if problems crop up Your records have No risk of being damaged Your records have No risk of being damaged Could be faster than Unicode (but probably is not) Could be faster than Unicode (but probably is not) In a phrase: If it aint broke, dont fix it! In a phrase: If it aint broke, dont fix it! Advantages of Switching to Unicode Advantages of Switching to Unicode Could enhance data exchange capabilities Could enhance data exchange capabilities Export/ImportExport/Import Copy/Paste between ApplicationsCopy/Paste between Applications Network printingNetwork printing Allows for display of your records in a wide variety of world- wide computing environments Allows for display of your records in a wide variety of world- wide computing environments May improve some long-standing problems with local system software (such as printing, display) May improve some long-standing problems with local system software (such as printing, display) Supporting the international Unicode standard is one of presenting your library catalog as a global resource Supporting the international Unicode standard is one of presenting your library catalog as a global resource Nothing ventured, nothing gained! Nothing ventured, nothing gained!

In our library: The Head of Technical Services Main contact with Innovative Requests information about successes/problems at other libraries East Asian Law Department Responsible for Chinese, Japanese, and Korean records Work together with Tech Services OCLC Connexion Gallagher Law Library Local System an Innovative Interfaces, Inc. Millennium local system MARC 8 Storage Unicode Storage Who decides whether to flip… …the switch to Unicode Storage? OCLC Connexion

What will our library do? Undetermined! Our library is still in the decision process Undetermined! Our library is still in the decision process Were considering all of the information noted in this presentation Were considering all of the information noted in this presentation We will probably decide soon! We will probably decide soon! University of Washington Marian Gould Gallagher Law Library

What sources of information are there? Your Local System Guides Your Local System Guides Library of Congress Guides Such as: LCs MARC 21 Specifications for Record Structure, Character Sets, and Exchange Media: CHARACTER SETS Library of Congress Guides Such as: LCs MARC 21 Specifications for Record Structure, Character Sets, and Exchange Media: CHARACTER SETS OCLC CJK Help OCLC CJK Help Microsoft Guides Such as: Storage and Performance Effects of Unicode : us/library/ms aspx Microsoft Guides Such as: Storage and Performance Effects of Unicode : us/library/ms aspxhttp://msdn2.microsoft.com/en- us/library/ms aspxhttp://msdn2.microsoft.com/en- us/library/ms aspx Unicode Consortium Unicode Consortium OCLC CJK listserv OCLC CJK listserv Eastlib listserv Eastlib listserv

Flipping the switch… Is up to you and Your Library… MARC 8 Storage Unicode Storage