Unicode 3.0.1 Mark Davis Unicode 3.0 New 3.0 Characters CategoryV 2.1V 3.0 Alphabetics, Symbols6,51110,236 CJK Ideographs21,20427,786.

1 Unicode 3.0.1 Mark Davis

2 Unicode 3.0 New 3.0 Characters CategoryV 2.1V 3.0 Alphabetics, Symbols6,51110,236 CJK Ideographs21,20427,786 Hangul Syllables11,17211,172 Assigned characters38,88749,194 Unassigned code values18,1347,827 Synced with ISO/IEC 10646, 2 nd edition

3 Unicode 3.0 New 3.0 Blocks 80Syriac 192Thaana 128Sinhala 160Myanmar 384Ethiopic 96Cherokee 640U.C. Ab. Syl. 32Ogham 96Runic 128Khmer 176Mongolian 256Braille 128CJK Rad. Sup. 224Kangxi Rad. 16Ideo. Desc. 32Bopomofo Ext. 6,582CJK Ideo. A 1,168Yi Syllables 64Yi Radicals

4 Unicode 3.0 Property Updates (1) Bidirectional properties Byte order mark Capital letters with iota adscript Case Combining classes Decompositions

5 Unicode 3.0 Property Updates (2) Identifier Syntax Layout controls Linebreak properties East-Asian width properties Misc. Characters: Figure Space, Tilde,… Ligature Control Unassigned Code Points

6 Unicode 3.0 Conformance Unicode Transformation Formats UTF-16 BE, UTF-16 LE, UTF-16, UTF-8 Unicode Bidirectional Behavior Other normative character property values Clause numbering maintained! Stability Policies Clarification of noncharacters Normalization Conformance Test

7 Unicode 3.0 Unicode Standard Annexes (UAX) Integral part of 3.0.1 Standard UAX #09: BIDIBIDI UAX #11: East Asian WidthEast Asian Width UAX #13: Newline GuidelinesNewline Guidelines UAX #14: Line BreakingLine Breaking UAX #15: NormalizationNormalization Included in any reference to version 3.0 or later

8 Unicode 3.0 Unicode Technical Standards (UTS) UTS #06: CompressionCompression –IANA name: SCSU UTS #10: CollationCollation –Note: defined over all Unicode code points –Values will be updated soon for better ordering

9 Unicode 3.0 Technical Reports UTR #07: Language TagsLanguage Tags UTR #16: UTF-EBCDICUTF-EBCDIC UTR #17: Character Encoding ModelCharacter Encoding Model UTR #18: Regular ExpressionsRegular Expressions UTR #19: UTF-32UTF-32 UTR #21: Case MappingsCase Mappings

10 Unicode 3.0 Draft Technical Reports UTR #20: Unicode in XML…Unicode in XML… UTR #22: Character Mapping TablesCharacter Mapping Tables UTR #24: Script NamesScript Names Open for public comment

11 Unicode 3.0 Unicode Character Database More Documentation, More Data –UnicodeDataBlocks –ArabicShapingJamo –CompositionExclusionsSpecialCasing –EastAsianWidthLineBreak –UnihanBidiMirroring –CaseFoldingNormalizationTest

12 Unicode 3.0 Website changes New Look & Feel New Navigation Enhanced FAQ Glossary What is Unicode? Where is my character?

13 Unicode 3.0 Beyond 3.0 Characters –CJK characters, symbols, music systems, ancient scripts, extra characters, etc. –First allocated surrogate pairs Properties –essential for Unicode enablement

14 Unicode 3.0 Major new version Over 10,000 new characters Enhanced character data for implementations Reorganized text for better reference The version for normalization Unicode Character Database 3.0.0 Available now!

15 Unicode 3.0 Q & A

16 Unicode 3.0 Backup Slides

17 Unicode 3.0 ICU: Paid Advertisement Open Source Unicode Enablement Library –ICU: C/C++ and Java Versions –IBM Public License –Friday, 10:00 Helena Shih

18 Unicode 3.0 Enumerated Versions Unicode 1.0.0, Unicode 1.0.1 Unicode 1.1.0, Unicode 1.1.5 Unicode 2.0.0 Unicode 2.1.2, Unicode 2.1.5, Unicode 2.1.8, Unicode 2.1.9 Unicode 3.0.0 –

19 Unicode 3.0 Editorial Committee Joan Aliprand Julie Allen (editor) Joe Becker Mark Davis Asmus Freytag John Jenkins Mike Ksar Rick McGowan Lisa Moore Ken Whistler

20 Unicode 3.0 New Characters (2) CategoryV 2.1V 3.0 Private Use6,4006,400 Surrogates2,0482,048 Controls6565 Not Characters22 Assigned code values47,40257,709 Unassigned code values18,1347,827

21 Unicode 3.0 Reference to Versions Open repertoire, but backwards compatible Characters only added, not removed –Two early exceptions: ISO sync. & Korean Dont overspecify the version: –Version 2.1.0 vs. Version 2.1 vs. Version 2 or later Includes Technical Reports!!

22 Unicode 3.0 Versions of the Standard major - significant additions –published as a book minor - character additions or more significant normative changes –published as a Technical ReportTechnical Report update - any other changes –on the website in /standard/versions/ Example: 2.1.9

23 Unicode 3.0 Versioning Characters Properties Conformance Technical Reports Unicode Character Database Future

24 Unicode 3.0 Reorganized Text 6: Punctuation 7: European Alphabetics 8: Middle Eastern 9: South Asian 10: East Asian 11: Other (Mongolian, etc.) 12: Symbols 13: Formatting, Controls, Specials

25 Unicode 3.0 Additionally Shift-JIS Index Full Radical Stroke Index –CJK split in several blocks Improved Charts –Especially for CJK Ideographs Improved Implementation Guidelines General Clarifications

