Presentation is loading. Please wait.

Presentation is loading. Please wait.

International Domain Name TWNIC Nai-Wen Hsu

Similar presentations


Presentation on theme: "International Domain Name TWNIC Nai-Wen Hsu"— Presentation transcript:

1 International Domain Name TWNIC Nai-Wen Hsu snw@twnic.net.tw

2 Domain name RFC 1035 A label can not longer than 63 characters A domain name can not longer than 255 characters Maximum labels: 127 Only accept a-z,0-9, ’ - ’ as domain name Limited ASCII character code point, 37 LDH (Letter-Digit-Hyphen)

3 International Domain Name IETF IDN WG adopt UNICODE 3.2 Greek, Cyrillic, Armenian, Hebrew, Arabic, Syriac, Thaana, Devanagari, Bengali, Gurmukhi, Gujarati, Oriya, Tamil, Telugu, Kannada, Malayalam, Sinhala, Thai, … 95,156 characters

4 International Domain Name sample レコード会社.jp gwm ö bler.com 慎昌鐘錶.tw 阿克苏诺贝尔油漆公司.cn 소프트웨어.kr לארשי. םוק

5 IETF IDN Standard IDNA (RFC3490) Internationalizing Domain Names in Applications NAMEPREP(RFC3491) A Stringprep Profile for Internationalized Domain Names PUNYCODE(RFC3492) A Bootstring encoding of Unicode for Internationalized Domain Names in Applications STRINGPREP(RFC3454) Preparation of Internationalized Strings

6 User IDNA-aware Application (ToASCII and ToUnicode operations may be called here) Resolver DNS Servers Application Servers DNS Protocol ACE Call to resolver ACE Application-specific Protocol: ACE Unless the protocol Is updated to handle Other encodings Input and display: local interface methods (pen, keyboard,...) End system "Application" is where the application splits a host name into labels, sets the appropriate flags, and performs the ToASCII and ToUnicode operations. IDNA components and interfaces IDNA xn--de-jg4avhby1noc0d

7 IDNA Structure NAMEPREP Mapping Normalization Prohibit ACE (PUNYCODE) User input (UNICODE) STRINGPREP To resolver ACE Nameprep: A Stringprep Profile for Internationalized Domain Names IDNA ToASCIIToUnicode

8 NAMEPREP A Stringprep Profile for Internationalized Domain Names Mapping Stringprep table B.1,B.2 Normalization Form KC Prohibited Output Stringprep table C.1.2,2.2,3,4,5,6,7,8,9

9 NAMEPREP -- Mapping Commonly mapped to nothing: 27 Ex: Mapping for case-folding used with NFKC: 1371 Ex: A  a (U+0041  U+0061)  (U+03AB  U+03CB)  (U+3371  U+0068 U+0070 U+0061)

10 NAMEPREP -- Normalization Unicode normalization with form KC

11 NAMEPREP -- Normalization ‘u’+‘ ‥ ’  ‘ü’ ‘ a ’  ‘ a ’

12 NAMEPREP – Prohibited output Non-ASCII space characters: 17 Ex: (NO-BREAK SPACE) Non-ASCII control characters: 54 Ex: (DEVICE CONTROL STRING) Private use: 133371 Non-character code points: 49 Surrogate codes: 2048

13 NAMEPREP – Prohibited output Inappropriate for plain text: 4 Inappropriate for canonical representation: 12 Change display properties or are deprecated: 13 Tagging characters: 97

14 PUNYCODE A Bootstring encoding of Unicode for IDNA One of the ACE( ASCII Compatible Encoding) Translate non-ASCII characters to ASCII characters Prefix: xn-- Ex: 慎昌鐘錶.tw  xn--ciun9hb52c2za.tw

15 Insufficient in IDN standard Current IDN standard (IDNA, NAMEPREP, PUNYCODE) can not solve Chinese domain name requirement Tradition/Simplify Chinese mapping Ex: 台  臺 Writing variant mapping Ex: 峰  峯

16

17 Insufficient in IDN standard They are the same meaning but it is different character in different countries In China: 劝 (529D) In Japan: 勧 (52E7) In Taiwan: 勸 (52F8)

18 IDN administration guide line Registration policy to solve those problems listed above Every language has a variant table with 3 fields: valid code point recommended variant character variant

19 Variant Table sample Valid code point (VCP) Recommended variants by.tw (twRV) Recommended variants by.cn (cnRV) Character Variant(s) (CV) Remarks 丁 (4E01) Singular-relation character(1) 丄 (4E04) 上 (4E0A) 丄 (4E04) 上 (4E0A) Pair-relation characters (2.1) 上 (4E0A) 丄 (4E04) 上 (4E0A) 万 (4E07) 萬 (842C) Pair-relation characters (2.2) 萬 (842C) 万 (4E07) 萬 (842C)

20 Valid code point (VCP) Recommended variants by.tw (twRV) Recommended variants by.cn (cnRV) Character Variant(s) (CV) remarks 叶 (53F6) 葉 (8449) 叶 (53F6) 葉 (8449) Pair-relation characters (2.3) 葉 (8449) 叶 (53F6) 葉 (8449) 个 (4E2A) 個 (500B) 个 (4E2A) 个 (4E2A) 個 (500B) 箇 (7B87) Multiple-relation Characters 個 (500B) 个 (4E2A) 个 (4E2A) 個 (500B) 箇 (7B87) 箇 (7B87) 個 (500B) 个 (4E2A) 个 (4E2A) 個 (500B) 箇 (7B87) Variant Table sample

21 Variant Table Singular-relation character (VCP=twRV=cnRV=CV): 13888(66.4%) VCP=twRV≠cnRV: 2783 (13.3%) VCP=cnRV≠twRV: 2453(11.7%) VCP≠(twRV=cnRV): 333(1.6%) VCP≠twRV≠SCR: 387(1.9%)

22 Variant Table Number of character variant(s) 12345678 Number of Characters 13888 66.4% 5156 24.7% 1158 5.5% 424 2.0% 165 0.79% 60 0.29% 35 0.17% 16 0.08%

23 Variant Table The table draft is prepared by the CCMT Task force organized by TWNIC from January, 2002. Task force members have 9 experts from language linguist, computer experts and DNS experts. The table draft has submitted to the Bureau of Standards, Ministry of Economic Affairs to final review.

24 Registration procedure A Registrant should select the language(s) Activation of the requested domain name(s) & Reservation of the equivalence(s) should be provided by the Registry, within the language-based character set The registrant can require the activation of the reserved equivalent domain name(s) at any time

25 Registration sample A user select zh-tw and zh-cn language with domain name 丁上萬.com 丁上萬.com (Recommended variants for zh-tw) 丁上万.com (Recommended variants for zh-cn) 丁丄万.com (Character Variant) 丁丄萬.com (Character Variant)

26 Q & A


Download ppt "International Domain Name TWNIC Nai-Wen Hsu"

Similar presentations


Ads by Google