Unicode/IDN Security Mark Davis President, Unicode Consortium Chief SW Globalization Arch., IBM.

Slides:



Advertisements
Similar presentations
Unicode from a distance…
Advertisements

Unicode Security Mark Davis. The Unicode Consortium Software globalization standards: define properties and behavior for every character in every script.
New in Unicode Mark Davis, John Jenkins. Agenda Unicode UCA Regular Expressions Security Considerations Character Mapping Common Locale Data.
Globalization Gotchas
Whats New in Globalization Mark Davis. Unicode Character Database: UCD 5.0 Schedule Currently in β2 Due June, 2006 Major part of the Unicode Standard.
Unicode Mark Davis Unicode Consortium President IBM Chief SW Globalization Architect
Unicode Normalization Mark Davis
Mark Davis President, Unicode Consortium
Unicode Mark Davis Unicode Consortium President IBM Chief SW Globalization Architect.
CSCI N241: Fundamentals of Web Design Copyright ©2004 Department of Computer & Information Science Introducing XHTML: Module B: HTML to XHTML.
Internationalizing WHOIS Preliminary Approaches for Discussion Internationalized Registration Data Working Group ICANN Meeting, Brussels, Belgium Jeremy.
ICANN Rio Meeting IDN Authorization for TLDs with ICANN agreements 26 March, 2003 Andrew McLaughlin.
Internationalized Domain Names Introduction & Update MENOG 1 Bahrain April 3-5, 2007 By: Baher Esmat Middle East Liaison.
Internationalized Domain Names Tutorial Material December 2006 Tina Dam IDN Program Director ICANN
IDN Variant Issues Project (VIP) Project Update and Next Steps 11 April 2012.
DC Architecture WG meeting Monday Sept 12 Slot 1: Slot 2: Location: Seminar Room 4.1.E01.
Worldwide typography (and how to apply JIS-X to Unicode) Michel Suignard Microsoft Corporation.
Global Registry Services 1 INTERNATIONALIZED Domain Names Testbed presented to ITU/WIPO Joint Symposium Geneva 6-7 Dec An Overview On VeriSign Global.
1 Long term changes to P3P Long Term Future of P3P Workshop Giles Hogben Joint Research Centre European Commission.
4. Internet Programming ENG224 INFORMATION TECHNOLOGY – Part I
1 Character Conversions and Mapping Tables Presented By: Markus Scherer George Rhoten Raghuram (Ram) Viswanadha.
Office Links - Sharing Data in Microsoft Office A Mixed Bag of Treasures Chester N. Barkan Registrar Long Island University, C.W.Post Campus.
Internationalization Status and Directions: IETF, JET, and ICANN John C Klensin October 2002 © 2002 John C Klensin.
Text #ICANN50. Text #ICANN50 IDN Variant TLD Program GNSO Update Saturday 21 June 2014.
Internationalized Domain Names Status Report Prepared for: ICANN Meeting, Lisbon 29 March, 2007 Tina Dam IDN Program Director ICANN
International Domain Name TWNIC Nai-Wen Hsu
Thayer School of Engineering Dartmouth Lecture 2 Overview Web Services concept XML introduction Visual Studio.net.
1 HTML’s Transition to XHTML. 2 XHTML is the next evolution of HTML Extensible HTML eXtensible based on XML (extensible markup language) XML like HTML.
Introducing XHTML: Module B: HTML to XHTML. Goals Understand how XHTML evolved as a language for Web delivery Understand the importance of DTDs Understand.
Configuration Management
Introducing HTML & XHTML:. Goals  Understand hyperlinking  Understand how tags are formed and used.  Understand HTML as a markup language  Understand.
Introduction to Chinese Domain Name ZHANG Hong Aug 24, 2003.
1 © 2000, Cisco Systems, Inc. DNSSEC IDN Patrik Fältström
IDN over EPP (IDNPROV) IETF BOF, Washington DC November 2004.
Introduction to Human Language Technologies Tomaž Erjavec Karl-Franzens-Universität Graz Tomaž Erjavec Lecture: Character sets
August Chapter 1 - Essential PHP spring into PHP 5 by Steven Holzner Slides were developed by Jack Davis College of Information Science and Technology.
IDN Standards and Implications Kenny Huang Board, PIR
Creating Interfaces: Localization Language & other issues character codes Homework: preparation for future topics.
CcTLD IDN TF Report ccTLD Meeting, Rio de Janero Mar. 25, 2003 Young-Eum Chair, ccTLD IDN TF.
1 herbert van de sompel CS 502 Computing Methods for Digital Libraries Cornell University – Computer Science Herbert Van de Sompel
NASK: at the cutting edge of technology Andrzej Bartosiewicz ITU-T SG17 meeting Moscow, 2005.
1 CS 502: Computing Methods for Digital Libraries Lecture 4 Text.
Chapter 1 Internet & Web Basics Key Concepts Copyright © 2013 Terry Ann Morris, Ed.D. Revised 1/12/2015 by William Pegram 1.
Internationalized Domain Names (IDN) APAN Busan James Seng former co-chair, IDN Working Group.
XML The Overview. Three Key Questions What is XML? What Problems does it solve? Where and how is it used?
JavaScript, Fourth Edition
Chapter 8 Cookies And Security JavaScript, Third Edition.
Internationalized Domain Names Dr. Cary Karp MUSENIC Project Manager Second MUSENIC Project Workshop Stockholm, March 2004 MUSENIC – The Museum Network.
ccTLD IDN Report ccTLD Meeting, Montreol June 24, 2003 Young-Eum
SSAC Report on Domain Name Registration Data Model Jim Galvin.
XP Tutorial 8 Adding Interactivity with ActionScript.
Forms Collecting Data CSS Class 5. Forms Create a form Add text box Add labels Add check boxes and radio buttons Build a drop-down list Group drop-down.
IDNAbis and Security Protocols or Internationalization Issues with Short Strings John C Klensin SAAG – 26 July 2007.
27 Mar 2000IETF IDN-WG1 Requirements for IDN and its Implementations from Japan Yoshiro YONEYA JPNIC IDN-TF / NTT Software Co.
Multilingual Domain Name 22 Feb 2001 YONEYA, Yoshiro JPNIC IDN-TF.
The original Internationalized Domain Name (IDN) WG set the requirements for international characters in domain names in RFC 3454, RFC3490, RFC3491 and.
Internationalization of Domain Names James Seng CTO, i-DNS.net International co-chair, IETF IDN Working Group.
Tutorial 9 Working with XHTML. New Perspectives on HTML, XHTML, and XML, Comprehensive, 3rd Edition 2 Objectives Describe the history and theory of XHTML.
1 Lesson 14 Sharing Documents Computer Literacy BASICS: A Comprehensive Guide to IC 3, 4 th Edition Morrison / Wells.
Session 11: Cookies, Sessions ans Security iNET Academy Open Source Web Development.
IDN issues ITU-T SG16-Q7 26 July Objective Provide a set of examples why IDN study at ITU-T could be considered Demonstrate the need of ITU recommendation.
1 Internationalized Domain Names Paul Twomey 7 April 2008.
Director of Data Communications
Internationalized Domain Names
IDN Variant TLDs Program Update
Unicode from a distance…
Requirements for IDN and its Implementations from Japan
Requirements for IDN and its Implementations from Japan
John C Klensin APNIC Beijing, 25 August 2009
IETF 105 REGEXT Presenter: Gustavo Lozano
Presentation transcript:

Unicode/IDN Security Mark Davis President, Unicode Consortium Chief SW Globalization Arch., IBM

The Unicode Consortium Software globalization standards: define properties and behavior for every character in every script Unicode Standard: a unique code for every character Common Locale Data Repository: LDML format plus repository for required locale data Collation, line breaking, regex, charset mapping, … Used by every major modern operating system, browser, office software, email client,… Core of XML, HTML, Java, C#, C (with ICU), Javascript, …

Security ~ Identity System A X = x System B X ≠ x

IDN You get an email about your paypal.com account, click on the link… You carefully examine your browser's address box to make sure that it is actually going to http://paypal.com/ … But actually it is going to a spoof site: “paypal.com” with the Cyrillic letter “p”. You (System A) think that they are the same DNS (System B) thinks they are different

Examples: Letters p in Latin vs p in Cyrillic Cross-Script In-Script Sequences rn may appear at display sizes like m अ + ा typically looks identical to आ so̷s looks like søs Rendering Support ä with two umlauts may look the same as ä with one eḷ is actually e + l ̣

০ ১ ২ ৩ ৫ ৬ ৮ ৯ ৪ ৭ Examples: Numbers Western Bengali Oriya ୦ ୧ ୨ ୩ ୪ 1 2 3 4 5 6 7 8 9 Bengali ০ ১ ২ ৩ ৪ ৫ ৬ ৭ ৮ ৯ Oriya ୦ ୧ ୨ ୩ ୪ ୫ ୬ ୭ ୮ ୯ Thus ৪୨ = 42

Syntax Spoofing http://example.org/1234/not.mydomain.com / = fraction-slash Also possible without Unicode: http://example.org--long-and-obscure-list-of-characters.mydomain.com

UTR #36: Security Recommendations General Security Issues (not just IDN) V1 approved mid-2005; V2 in progress http://unicode.org/draft/reports/tr36/tr36.html Describes the problems, recommends best practices Users Programmers User-Agents (browsers, email, office apps) Registries Registrars

UTS #39: Security Mechanisms Supplies data /algorithms for implementations Restricted character repertoire: Based on Unicode Identifier Profile Intersect with current NamePrep Characters → scripts, confusable characters Originally in UTR #36 Version 1; split out for clarity http://www.unicode.org/draft/reports/tr39/tr39.html

Current NamePrep ≠ Unicode Identifiers U3.2 Symbols (2,974) Non-Mod. (52,842) U3.2 Alphanum* (37,200) U5.0 Alphanum* (+2,810) ℞ § ♔ / ∞ ☃ ➥ √ … a œ и ש س ௫ ཀ ฎ अ ꀅ あ タ 入 ৫ 2 … ਁ ჹ ਃ ჺ ሇ ऄ ঽ … http://unicode.org/reports/tr36/idn-chars.html

Restriction Levels* * Subject also to restrictions on confusables 2. Highly Restrictive All characters from a single script, or from limited combinations: Han + Hiragana + Katakana; Han + Bopomofo; or Han + Hangul No characters in the identifier can be outside of the Identifier Profile includes Letters, Numbers; excludes Symbols, Punctuation,… 3. Moderately Restrictive Allow Latin with other scripts except Cyrillic, Greek, Cherokee: ip-アドレス.co.jp خدمة-rss.eg 4. Minimally Restrictive Allow arbitrary mixtures of scripts: sony-βίντεο.gr xml-документы.ru игро-shop.com … * Subject also to restrictions on confusables

ICANN Guidelines v2 http://icann. org/general/idn-guidelines-14nov05 Improvement on v1,… but needs new revision: Procedurally Insufficient time for thorough review The disposition (with rationale) of comments not available Only single cycle of public review Technically Any specification needs a much clearer structure – the exact implications of a claim to adhere to the guidelines are currently impossible to measure, and useless for security #3 (script/language limitations) has far too many loopholes. #4 (symbols) is too permissive, and not well-defined #5 (registration) should use the post-nameprep’ed form

Guideline 3 (lang./script limitations) Associate with script except with language and script, or except with set of languages, or except with “more than one designator” Publish set of code points, define variant code points; indicate script/language. Why language? (too fuzzy to be testable) Why script? (derivable from characters) Single script in label, except when language requires, except with mixed-script confusables, except with “policy & table” defined. Who decides when required? Allows single-script confusables. All registry policies documented and publicly available, with table for each set of code points Machine readable? Discursive description?

Guideline 4 (disallowed symbols) Line symbol-drawing characters (as those in the Unicode Box Drawing block) One small set of the many symbols Symbols and icons that are neither alphanumeric nor ideographic language characters, Numbers? Combining Marks? Letter modifiers? Kana length mark? Ill-defined, untestable. Characters with well-established functions as protocol elements / is confusable with a “protocol element” but isn’t one. Ill-defined, untestable. Punctuation marks used solely to indicate the structure of sentences Em-dash? Who decides? Ill-defined, untestable. Punctuation marks that are used within words except “essential to the language” & “associated with explicit prescriptive rules” Ill-defined, untestable. Except “under corresponding conditions, a single specified character may be used as a separator within a label, … by designating a functionally equivalent punctuation mark from within the script.”

Guideline 5 (registration) A registry will define an IDN registration in terms of both its Unicode and ASCII-encoded representations. Should use output Unicode representation (after mapping and normalization): otherwise many more visually confusable characters are present Should say “ACE”, not ASCII.

Unicode Recommendations Precise Specification, Mechanically Testable: Guideline 3 (script/language limitations) → Publicly document the Restriction Level being enforced (≤ Level 4) Publicly document the enforcement policy on confusables: whether any two domain names are allowed to be whole-script or mixed script confusables according to [UTR39]. Guideline 4 (symbols) → Only characters in IDN Security Profiles for Identifiers [UTR39]. Guideline 5 (registration) → Define an IDN registration in terms of its: Nameprep-Normalized Unicode representation (output format) ACE representation Work with IETF to update NamePrep to Unicode 5.0 (+)

Backup Slides

Agenda Unicode Background Security Issues

Domain Names ät.com ät.com tοp.com so̷s.com søs.com String UTF-16   String UTF-16 Internal - IDNA 1a ät.com 0061 0308 0074 002E 0063 006F 006D xn--t-zfa.com 1b ät.com 00E4 0074 002E 0063 006F 006D 2a tοp.com 0074 03BF 0070 002E 0063 006F 006D xn--tp-jbc.com 2b 0074 006F 0070 002E 0063 006F 006D top.com 4a so̷s.com 0073 006F 0337 0073 002E 0063 006F 006D xn--sos-rjc.com 4b søs.com 0073 00F8 0073 002E 0063 006F 006D xn--ss-lka.com

Non-Visual Attacks Exploiting Expectations Casing: Encoding: Collation: X < Y, so X + H < Y + H wrong Casing: len(X) = len(toUpper(X)) wrong Encoding: ‘/’ is always represented by 2F16 wrong

UAX #31: Identifier & Pattern Syntax For identification of entities (programming variables, resources, domain names, ... Appropriate characters -- stable across versions Not all natural language words: can’t U.S.A. Provides Foundation: specifications can “tailor” it for different environments: adding or removing characters.

“StringPrep” Processing Map A → a Normalize c + ¸ → ç ㄱ +ㅏ → 가 カ → カ fi → f + i Prohibit & / . , …

UAX #15: Unicode Normalization Forms Normalizes most visually confusable sequences to unique form c + ¸ → ç ㄱ +ㅏ → 가 カ → カ fi → f + i Core part of StringPrep, other Identifier Profiles