Discussion on Chinese Domain Name technology including encoding, testing.

Slides:



Advertisements
Similar presentations
Whois Internationalization Issues John C Klensin.
Advertisements

© 1998, Progress Software Corporation 1 Migration of a 4GL and Relational Database to Unicode Tex Texin International Product Manager.
Chapter 16. Windows Internet Name Service(WINS) Network Basic Input/Output System (NetBIOS) N etBIOS over TCP/IP (NetBT) provides commands and support.
Unicode: A Grand Tour Character Encodings & Unicode.
Chapter Programming in C
Nassau Community College
Upper OSI Layers Lecture 10, May 7, 2003 Mr. Greg Vogl Data Communications and Networks Uganda Martyrs University.
You’ve got be kidding! Asking about Japanese Domain Name at this late stage? Masato Minda Japan Registry Services 24 th January 2008, JANOG
Module 4: Configuring Network Connectivity
Technical Implementation for Chinese domain name TWNIC Feb
Fall 與 1 的數位世界 1. Fall 與 1 的數位世界 2 資訊化服務  提供快捷的服務:強大的計算能力、快速的網 路傳遞。  提供便利與便宜的商業服務:跨越地域限制, 輕易的貨比十家;降低空間與人力成本。  提供多元化的服務:新聞、血拼、聊天、數位 圖書館、網路電話.
Overview Digital Systems and Computer Systems Number Systems [binary, octal and hexadecimal] Arithmetic Operations Base Conversion Decimal Codes [BCD (binary.
Lecture 3 1 ISO/IEC and Unicode It is a coded character set(codeset) –Designed for text processing and exchange Features: –Universal: characters.
Review1 What is multilingual computing? Bilingual, trilingual, vs. Multilingual What are the fundamental issues in multi-lingual computing? –Representation.
IT-101 Section 001 Lecture #4 Introduction to Information Technology.
1 ecompany/amani Amani M. Bin Sewaif Senior Engineer Services Operations & Maintenance Etisalat – Intenet & e Solution November 22,
John Degenhart Joseph Allen.  What is FTP?  Communication over Control connection  Communication over Data Connection  File Type  Data Structure.
Dale & Lewis Chapter 3 Data Representation
1 © 2000, Cisco Systems, Inc. DNSSEC IDN Patrik Fältström
IDN over EPP (IDNPROV) IETF BOF, Washington DC November 2004.
Sophia Antipolis, September 2006 Multilinguality, localization and internationalization Miruna Bădescu Finsiel Romania.
Unicode & W3C Jataayu Software C. Kumar January 2007.
Basics of computer Franck Theeten CABIN training, June 2013 Royal Museum for Central Africa, Tervuren.
Localizing OpenClinica Hiroaki Honshuku: SQA 1. © What is Character Encoding?  Morse Code (1840) → Latin Alphabet  ASCII (1963)  The American Standard.
UNICODE Character Sets and Coding Standards Han Unification and ISO10646 Encoding Evolution and Unicode Programming Unicode.
ASCII and Unicode.
1 herbert van de sompel CS 502 Computing Methods for Digital Libraries Cornell University – Computer Science Herbert Van de Sompel
Agenda Data Representation – Characters Encoding Schemes ASCII
RFC 3361: DHCP Option for SIP Servers Speaker: Chung yu Wu Teacher: Quincy Wu.
ICT Foundation 1 Copyright © 2010, IT Gatekeeper Project – Ohiwa Lab. All rights reserved. Character representation.
1 Foundations of Computer Science Chapter 2 Data Representation.
1 INFORMATION IN DIGITAL DEVICES. 2 Digital Devices Most computers today are composed of digital devices. –Process electrical signals. –Can only have.
CS151 Introduction to Digital Design
21 May 2000Chinese Domain Name Workshop1 Status and planning reports of JPNIC 宇井隆晴 (UI, Takaharu) JPNIC.
Character Encoding, F onts. Overview Why do character encoding and fonts matter to linguists? How can you identify problems? Why do these problems arise?
2010/09/141 計算機概論計算機概論 國立成功大學化工系凌漢辰副教授 93C12 研究室 Tel: Ext (O)
Data Files on Computers Text Files (ASCII) Files that can be created by typing on the keyboard while using a text editor such as notepad or TextEdit.
Data Representation, Number Systems and Base Conversions
1 Microsoft Windows 2000 Network Infrastructure Administration Chapter 6 Resolving Network Host Names.
27 Mar 2000IETF IDN-WG1 Requirements for IDN and its Implementations from Japan Yoshiro YONEYA JPNIC IDN-TF / NTT Software Co.
MDNS Development Plan TWNIC May What is the Problem Problem #1 –Existing Internet doesn't deliver multilingual name resolution –S1 : Developing.
Multilingual Domain Name 22 Feb 2001 YONEYA, Yoshiro JPNIC IDN-TF.
Test Bed Status Zhang Wenhui CNNIC China Internet Network Information Center.
TWNIC E-name Current status/Future plan TWNIC
Characters CS240.
1 Non-Numeric Data Representation V1.0 (22/10/2005)
Etisalat/I&eS/SOM/Amani PAGE 1 Amani M. Bin Sewaif Senior Engineer Services Operations & Maintenance Etisalat – Intenet & e Solution
Multilingual Domain Name
Conversion of information in different coding systems
Binary 1 Basic conversions.
INTERNATIONALIZATION
Windows Server 2003 DNS 新增功能
Information Support and Services
Discussion on CgTLDs Che-Hoo Cheng HKNIC.
Technical Implementation for Chinese domain name
Net 323 D: Networks Protocols
Data Encoding Characters.
Net 323 D: Networks Protocols
XML Problems and Solutions
Status and planning reports of JPNIC
Technical Implementation for Chinese domain name
基於邊緣吻合向量量化編碼 法之資訊隱藏 張 真 誠 逢甲大學 講座教授 中正大學 榮譽教授、合聘教授 清華大學 合聘教授
Multilingual Domain Name
Text Encoding.
Requirements for IDN and its Implementations from Japan
Wei-Shiang Huang Introduce to IRC.
Technical Implementation for Chinese domain name
Requirements for IDN and its Implementations from Japan
Some Chinese Domain Name Issues
ASCII and Unicode.
Presentation transcript:

Discussion on Chinese Domain Name technology including encoding, testing

Clean 8 bits & UTF-8 problem Escape code “ \ ” rule must be clear. Ex. 成功成功 \ Other special character in UNIX shell Ex. 教育 (|) “ 教育 ” will be workable

Windows 9X Ex. 統一企業 will be error Automatic insertion “ \ ” in DNS, not insertion “ \ ” in DHCP Ex. ping 成功 \ 大學 Windows 2K UTF-8 in resolver(ping,ftp) Clean 8 bits in nslookup Double encoding in IE5 and resolver Clean 8 bits & UTF-8 problem

Windows Client & Server 之轉碼

Suggestion Chinese character & Alpha numeric character mixed sub-domain name. if there exist 8 bits character then that sub-domain character is case sensitive

For example: 王.tw 王.TWthe same For example: 王.tw 王.twdifferent Suggestion

Multi-lingual Multi-Byte character & single byte character 的問題 多國語言使用 multi-byte character

Problem (1) Multi-byte character has the byte code that is equivalent to single byte ASCII code, and some intermediate processing software package(Ex. BIND, sendmail, web proxy) can not recognize them separately. Especially in control character code ( “ \ ”, ”, ” | ”… )

Solutions Solution 1 Multi-byte character: \nnn\nnn. Solution 2 Non ASCII code transformation. UTF-8 Solution 3 All character transform to pure ASCII code, UTF-7, UTF-5 Solution 4 Clear byte stuffing, Escape code rule “ \\ ”, ” ”

Problem (2) All alphanumeric domain name is case insensitive, Multi-byte character is case sensitive.

Solutions Solution 1 Alphanumeric character transfer to lower(or upper) case first. (client iDNS UTF5) Solution 2 All Multi-byte character are transformed to UTF-8, so the multi-byte character will 8th bit set (negative byte) and it will be recognized easily. (win 2K DNS server)

Solutions Solution 3 If there exists one multi-byte character in sub-domain name, than that sub-domain will be case sensitive. (BIND server) For ex. : 王.tw “ A 王 ” is case sensitive

Why need solution 3 Clear 8 bits is possible Leading byte encoding has been used popularly (BIG5, GB, JIS … ) Compression ratio and conversion efficiency ? An intermediate stage toward UNICODE.