Presentation is loading. Please wait.

Presentation is loading. Please wait.

CJK Character Validation – Impact from EACC to Unicode Migration 2006 CEAL Conference Committee on Technical Processing Ai-lin Yang East Asian Library,

Similar presentations


Presentation on theme: "CJK Character Validation – Impact from EACC to Unicode Migration 2006 CEAL Conference Committee on Technical Processing Ai-lin Yang East Asian Library,"— Presentation transcript:

1 CJK Character Validation – Impact from EACC to Unicode Migration 2006 CEAL Conference Committee on Technical Processing Ai-lin Yang East Asian Library, UC Berkeley April 5, 2006

2 EACC/MARC21 and Unicode East Asian Character Code (EACC) is MARC-8 CJK in MARC21 Migration to Unicode Library of Congress database RLGs Union catalog database OCLCs WorldCat database CJK Bibliographic records are restricted to EACC characters

3 Microsoft IME Variants Non-MARC21 characters Duplicate CJK characters (e.g., F937, and, 8DEF) Close variants (e.g., 6B65, and, 6B69) Typically one of these variants is a MARC21 character CJK character validation errors in OCLC OCLC XWC (Extended WorldCat) in Oracle database is built on Unicode OCLC online cataloging follows MARC21 standards CJK scripts are input by using Microsoft Global Input Method Editors (IMEs) Non-MARC21 characters cause CJK character validation errors

4 OCLC Connxion / IME Online Cataloging Examples Title: (simplified ) 245 (non-Latin) occurrence 1, $a occurrence 1, position 2 - invalid character - data must be valid non-Latin characters Valid when changed to: (traditional ) Title: (simplified ) 245 (non-Latin) occurrence 1, $a occurrence 1, position 1 - invalid character - data must be valid non-Latin characters Valid when changed to: (traditional ) Title: (traditional ) 245 (non-Latin) occurrence 1, $a occurrence 1, position 1 - invalid character - data must be valid non-Latin characters Valid when changed to: (traditional )

5 OCLC Connxion / IME Online Cataloging Examples Title: (simplified ) 245 (non-Latin) occurrence 1, $a occurrence 1, position 1 - invalid character - data must be valid non-Latin characters (traditional ) 245 (non-Latin) occurrence 1, $a occurrence 1, position 1 - invalid character - data must be valid non-Latin characters Valid when changed to: (traditional ) Title: only can be found in the traditional list; this character does not exist in the simplified list

6 Solutions Unihan Database CJK Compatibility Database OCLC CJK E-dictionary

7 Unihan Database Unihan database index Unihan grid index Unihan radical-stroke index Unihan database information (I) Several different glyphs for the character (N) Different representations of the character's scalar value (N) Mappings to the IRG sources for the character (I) Mappings to major industrial and national standards and other character collections (N) Positions in the four dictionaries used by the IRG (I) Positions in other commonly-used dictionaries (I) Radical-stroke counts as derived from different sources (I) Phonetic data derived from various sources (I) Other dictionary data (I) Variants (with links to the variant forms) Compounds containing the character (I) Other information contained in the Unihan database

8 Unihan Database Search (U+6237)

9 Unihan Database Search (U+6236)

10 CJK Compatibility Database Replace a non-MARC21 character with its MARC21 equivalent Steps for using the CJK compatibility database 1) Copy the invalid character from your bibliographic record 2) Open the CJK Compatibility PageCJK Compatibility Page 3) Paste the invalid character in the white box and use the index "Invalid character" 4) Click "Submit" 5) Copy & Paste the valid alternative into your bibliographic record

11 CJK Compatibility Database Search

12 OCLC CJK E-Dictionary

13 OCLC CJK E-Dictionary Search

14

15 CJK Character Validation Thank you!


Download ppt "CJK Character Validation – Impact from EACC to Unicode Migration 2006 CEAL Conference Committee on Technical Processing Ai-lin Yang East Asian Library,"

Similar presentations


Ads by Google