Agenda: Guidelines for Supporting Complex Scripts* on Windows 2000  Key Concepts  Overview of Unicode  Migrating existing applications  Using Unicode.

Slides:



Advertisements
Similar presentations
Murray Sargent III Microsoft Corporation Text Services Group, Word Tips & Tricks on Editing and Displaying Unicode Text.
Advertisements

The creation of "Yaolan.com" A Site for Pre-natal and Parenting Education in Chinese by James Caldwell DAE Interactive Marketing a Web Connection Company.
Chris Pratley Lead Program Manager Microsoft Office.
© 1998, Progress Software Corporation 1 Migration of a 4GL and Relational Database to Unicode Tex Texin International Product Manager.
HTML I. HTML Hypertext mark-up language. Uses tags to identify elements of a page so that a browser such as Internet explorer can render the page on a.
DOCUMENT TYPES. Digital Documents Converting documents to an electronic format will preserve those documents, but how would such a process be organized?
Building International Applications with Visual Studio.NET Achim Ruopp International Program Manager Microsoft Corporation.
Unicode and Windows XP Cathy Wissink Program Manager Globalization Infrastructure, Design and Development Windows International Microsoft.
Microsoft Access Course 1. Introduction to the user interface.
Overview  Recap  HTML. Recap  What is cloud computing?  What are application service providers (ASPs)?  Describe major functions of operating systems.
LINUX-WINDOWS INTERACTION. One software allowing interaction between Linux and Windows is WINE. Wine allows Linux users to load Windows programs while.
OpenType Font by Harsha Wijayawardhana UCSC. Introduction The OpenType font format is an extension of the TrueType font format, adding support for PostScript.
Tafseer Ahmed Department of Computer Science University of Karachi Urdu on Linux International Support.
Introduction to Web Database Processing
Chapter 7 Managing Data Sources. ASP.NET 2.0, Third Edition2.
1 HTML’s Transition to XHTML. 2 XHTML is the next evolution of HTML Extensible HTML eXtensible based on XML (extensible markup language) XML like HTML.
Developing a Basic Web Page with HTML
Chapter 14 Introduction to HTML
26 April 2001 Unicode and Windows XP, IUC 18 (Hong Kong) Unicode and Windows XP Cathy Wissink Program Manager, Globalization Windows Division Microsoft.
Unicode, character sets, and a a little history. Historical Perspective First came EBCIDIC (6 Bits?) Then in the early 1960s came ASCII – Most computers.
Supporting Complex Scripts (such as Arabic and Hebrew) in your Windows 2000™ Application F. Avery Bishop Senior Program Manager Microsoft Corporation.
Sophia Antipolis, September 2006 Multilinguality, localization and internationalization Miruna Bădescu Finsiel Romania.
Unicode & W3C Jataayu Software C. Kumar January 2007.
Creating a Simple Page: HTML Overview
Introduction to.Net and ASP.Net Course Introduction Build Your Own ASP.Net Website: Chapter 1 Microsoft ASP.Net Walkthrough: Creating a Basic Web Forms.
JavaScript: Control Structures September 27, 2005 Slides modified from Internet & World Wide Web: How to Program (3rd) edition. By Deitel, Deitel,
Overview of SQL Server Alka Arora.
UNICODE Character Sets and Coding Standards Han Unification and ISO10646 Encoding Evolution and Unicode Programming Unicode.
1 herbert van de sompel CS 502 Computing Methods for Digital Libraries Cornell University – Computer Science Herbert Van de Sompel
CPSC 203 Introduction to Computers Lab 21, 22 By Jie Gao.
Basics of Web Databases With the advent of Web database technology, Web pages are no longer static, but dynamic with connection to a back-end database.
1 CS 502: Computing Methods for Digital Libraries Lecture 4 Text.
Introducing Dreamweaver MX 2004
Tutorial 1 Getting Started with Adobe Dreamweaver CS3
Amber Annett David Bell October 13 th, What will happen What is this business about personal web pages? Designated location of your own web page.
CNIT 133 Interactive Web Pags – JavaScript and AJAX JavaScript Environment.
COLD FUSION Deepak Sethi. What is it…. Cold fusion is a complete web application server mainly used for developing e-business applications. It allows.
Web Pages I Jeffrey Muday Department of Biology Wake Forest University.
Learning Web Design: Chapter 4. HTML  Hypertext Markup Language (HTML)  Uses tags to tell the browser the start and end of a certain kind of formatting.
NASRULLAH KHAN.  Lecturer : Nasrullah   Website :
Client Scripting1 Internet Systems Design. Client Scripting2 n “A scripting language is a programming language that is used to manipulate, customize,
Building digital libraries in Indian languages: case studies with Hindi and Kannada B.S. Shivaram Trainee ( ) National Center for Science Information.
Introduction to HTML. What is a HTML File?  HTML stands for Hyper Text Markup Language  An HTML file is a text file containing small markup tags  The.
INFOCODING BASICS & EXAMPLES OF CURRENT USE Introduction to Computer Science Using Ruby (c) 2010 Gideon Frieder.
2. Introduction to the Visual Studio.NET IDE. Chapter Outline Overview of the Visual Studio.NET IDE Overview of the Visual Studio.NET IDE Menu Bar and.
UCSC All rights reserved. No part of this material may be reproduced and sold. 1 IT1202-Fundamentals Of Programming (Using JAVA) Interacting with.
Lesson13. JavaScript JavaScript is an interpreted language, designed to function within a web browser. It can also be used on the server.
1 Working with MS SQL Server Textbook Chapter 14.
Chapter 8 Introduction to HTML and Applets Fundamentals of Java.
PowerBuilder Online Courses - by Prasad Bodepudi
Active Server Pages  In this chapter, you will learn:  How browsers and servers interacted on the Internet when the Internet first became popular 
1 © Copyright 2000 Ethel Schuster The Web… in 15 minutes Ethel Schuster
Ali Alshowaish. What is HTML? HTML stands for Hyper Text Markup Language Specifically created to make World Wide Web pages Web authoring software language.
Complex Scripts* in Internet Explorer 5.0 *and Multilingual text F. Avery Bishop Senior Program Manager Microsoft Corporation.
Your Search for Indian languages ends at Modular InfoTech, Pune Web-Samhita from Modular InfoTech Pvt. Ltd. Modular InfoTech is proud to offer various.
Microsoft Dynamics NAV 2009 and Architecture Overview Name Title Microsoft Corporation.
Internationalization in ASP.NET 2.0. SQL Server 2005 – Data Columns Use Unicode datatypes in: Table columns, CONVERT() and CAST() operations Use Unicode.
Web Design and Development. World Wide Web  World Wide Web (WWW or W3), collection of globally distributed text and multimedia documents and files 
8 Chapter Eight Server-side Scripts. 8 Chapter Objectives Create dynamic Web pages that retrieve and display database data using Active Server Pages Process.
NASRULLAH KHAN.  Lecturer : Nasrullah   Website :
Objective: To describe the evolution of the Internet and the Web. Explain the need for web standards. Describe universal design. Identify benefits of accessible.
Understanding Character Encodings Basics of Character Encodings that all Programmers should Know. Pritam Barhate, Cofounder and CTO Mobisoft Infotech.
MISSION CRITICAL COMPUTING SQL Server Special Considerations.
Unicode in ALEPH Session Outline Key concepts Pre-UNICODE ALEPH ALEPH full UNICODE version Innovations in character conversion mechanism.
1 CSC160 Chapter 1: Introduction to JavaScript Chapter 2: Placing JavaScript in an HTML File.
A S P. Outline  The introduction of ASP  Why we choose ASP  How ASP works  Basic syntax rule of ASP  ASP’S object model  Limitations of ASP  Summary.
Essential Skills for Computing Fonts
Introduction to ASP By “FlyingBono” 2009_01 By FlyingBono 2009_01
INFOCODING BASICS & EXAMPLES OF CURRENT USE
Presentation transcript:

Agenda: Guidelines for Supporting Complex Scripts* on Windows 2000  Key Concepts  Overview of Unicode  Migrating existing applications  Using Unicode text in resources *Such as Devanagari and Tamil

Definitions  Enabling for a script: Adding support for input, display, and output of the script  Localization: Translating user interface elements  Globalization: Developing software such that feature design and code design are not limited to a single locale or script

Requirements for Enabling Indian Scripts in Applications on Windows 2000:  Use Unicode to encode text  Enable for complex scripts Note: Many Microsoft products do not yet meet these requirements. However, we’re working on it!

Overview of Unicode

Character Set Evolution  MS-DOS: OEM character sets  Windows 3.x: ANSI character sets  Windows 9x: ANSI character sets  Windows NT: Unicode Unicode Supported for Compatibility: Supported for Compatibility: OEM (console) character sets, OEM (console) character sets, ANSI character sets, ANSI character sets,

Why do character set differences matter?  Historically, they fragmented code bases for both Windows and applications Single byte: European editions Single byte: European editions Double byte: Far East editions Double byte: Far East editions Bi-directional: Middle East editions Bi-directional: Middle East editions  Make it difficult to share data  Make it difficult to develop multilingual applications

What is Unicode?  A 16-bit character encoding A mapping of characters to numbers A mapping of characters to numbers Syntax rules for display of complex scripts Syntax rules for display of complex scripts Not a font or glyph encoding! Not a font or glyph encoding! Not a sort algorithm! Not a sort algorithm!  Includes all characters in common use in modern scripts (and others)  Basis for the ISO character encoding standard  Native text encoding for Windows NT

Unicode ™ / ISO  16-bit international character encoding  Windows 2000 uses Unicode version 2.0 0x0000 0xFFFF Punctuation Future use ASCII Private use Compatibility Indian Greek Arabic, Hebrew Latin Ideographs (Hanzi, Kanji, Hanja) Symbols Hangul Kana Thai A FF964F (null)

Relatives of Unicode  ISO/IEC bit ISO standard of 64K X 64K “planes” 32 bit ISO standard of 64K X 64K “planes” Unicode repertoire is plane 0 Unicode repertoire is plane 0  UTF-7 7 bit transformation format 7 bit transformation format Not widely used Not widely used  UTF-8 8 bit transformation format 8 bit transformation format Used in web pages and some Used in web pages and some

Why Should I Use Unicode and Win32 for Indian Text? My application works fine now! ??

Benefits of Using Unicode on Windows 2000  Share data (e.g., cut and paste) with other Win32 applications  Make use of full Win32 API for text processing  Support multilingual documents, including multiple Indian scripts  Use industry standard encoding

Summary: Use Unicode – It is the ultimate character encoding  Represent all text with one unambiguous encoding  Support multilingual text easily  Avoid special processing for variable byte- length characters  Use standard encoding recognized throughout the industry and the world  Support new scripts that are only supported through Unicode

Migrating Exiting Applications to Support Indian Text on Windows 2000… Three Migration Scenarios: 1. ANSI application to Unicode 2. Standard Win32 application to complex script enabled 3. Existing Indian language application to Unicode and Win32

Migrating ANSI applications to Unicode  Overview of “A” and “W” entry points  How to build a Unicode Win32 Application  Unicode Applications on Windows 98

Review of the W and A APIs  Two kinds of window classes: Unicode, ANSI  Win32 API has two versions of most functions: “W” (wide) version handles Unicode “W” (wide) version handles Unicode “A” (ANSI –  ) assumes the system default code page (character encoding) “A” (ANSI –  ) assumes the system default code page (character encoding)  Macros resolve to W or A entry point  Example: Macro for RegisterClassEx #ifdef UNICODE #define RegisterClassEx RegisterClassExW #else #define RegisterClassEx RegisterClassExA #endif

To Build a Unicode-enabled Application:  Automatic in Visual Studio: Compile with options –DUNICODE and -D_UNICODE Compile with options –DUNICODE and -D_UNICODE Specify WinMainCRTStartup in ProjectSettings/Link/Output/EntryPointSymbol Specify WinMainCRTStartup in ProjectSettings/Link/Output/EntryPointSymbol  Or, use only the “W” routines from Win32 API  Metafiles: Use Extended Metafiles (EMF) Use Extended Metafiles (EMF) Windows Metafiles (WMF) don’t support Unicode Windows Metafiles (WMF) don’t support Unicode

For Applications that Must Also Run on Windows 98…  Use Unicode everywhere with single binary, two code paths: On Windows NT use W entry points On Windows NT use W entry points On Windows 98, convert Unicode  ANSI, use A entry points On Windows 98, convert Unicode  ANSI, use A entry points See sample GLOBALDV for example See sample GLOBALDV for example  See April Microsoft Systems Journal for details and other options

Migrating Standard Win32 Application to Support Complex Scripts Good news: In a Unicode application, it basically just works!

Simple, Plain-text Applications  Use standard edit control in Visual C/C++  Use standard win32 API functions Win32 APIs: ExtTextOutW or DrawTextW Win32 APIs: ExtTextOutW or DrawTextW ScriptString API in Uniscribe ScriptString API in Uniscribe

Pitfalls in Enabling for Complex Scripts  When displaying typed text: Do not output characters one by one! Do not output characters one by one! Do save text in a buffer and display the whole string with Uniscribe or Win32 API Do save text in a buffer and display the whole string with Uniscribe or Win32 API  To measure line lengths: Do not sum cached character widths Do not sum cached character widths Do use a GetTextExtent function or Uniscribe Do use a GetTextExtent function or Uniscribe

Simple Applications With Formatted Text  Use rich edit control in Visual C/C++  Internet Explorer 5.0: Use Document Object Model (more later)

Applications With Advanced Formatting and Layout  Use script APIs (“Uniscribe”)  See MSJ article of November 1998

What about Visual Basic, Visual J++?  Visual Basic 6.0 Standard controls are ANSI, not Unicode Standard controls are ANSI, not Unicode Use “MS Forms 2.0” controls to use Unicode in controls Use “MS Forms 2.0” controls to use Unicode in controls Resource editor does support Unicode Resource editor does support Unicode  Visual J++ Resource editor supports Unicode Resource editor supports Unicode Text Output is ANSI only Text Output is ANSI only  Future Plans: Make Unicode work everywhere in Visual Studio

Migrating Existing Indian language applications to Win32 and Unicode

Step 1 in Migrating Existing Indian Applications Follow guidelines for Unicode enabling and complex script enabling

Step 2 in Migrating Existing Indian Applications …  Provide conversion facility to migrate documents From your format to ISCII From your format to ISCII From ISCII to Unicode From ISCII to Unicode  MultiByteToWideChar(, … Devanagari is codepage Devanagari is codepage Tamil is codepage Tamil is codepage  See UCONVERT sample Included on your CD Included on your CD Modified from UCONVERT in Win32 SDK Modified from UCONVERT in Win32 SDK

Using Unicode Text in Resources  Getting Unicode into Win32 resources  Multilingual Visual C/C++ applications

Getting Unicode into Win32 Resources  Create Unicode RC file Resource editor in Visual Studio does not support Unicode yet, so Resource editor in Visual Studio does not support Unicode yet, so Generate rc file for English using IDE Generate rc file for English using IDE Translate to target language with Unicode editor (e.g., notepad or Word) Translate to target language with Unicode editor (e.g., notepad or Word) Save as Unicode Save as Unicode  Compile with resource compiler RC.EXE RC.EXE does support Unicode RC.EXE does support Unicode Compile within Visual Studio IDE Compile within Visual Studio IDE

Implementing Multilanguage User Interface in Applications  Use satellite resource DLLs  Default to user settings, but  Allow user to change  For details, see: April 1999 Microsoft System Journal April 1999 Microsoft System Journal GLOBALDV sample code GLOBALDV sample code

Multilanguage User Interface  Initialize to current UI language Windows 2000: GetUserDefaultUILanguage() Windows 2000: GetUserDefaultUILanguage() Others: Use the language of the O/S Others: Use the language of the O/S  Allow user to select UI language Put language-dependent resources in resource DLLs Put language-dependent resources in resource DLLs Use naming convention, e.g., res.dll Use naming convention, e.g., res.dll Find all resource DLLs, put up list box of choices Find all resource DLLs, put up list box of choices

Agenda: Using Unicode and Complex Scripts in Enterprise Applications  Intranet/internet applications  Unicode support in SQL Server 7.0  Other Considerations

Intranet/Internet Applications  Internet Explorer 5.01 on Win32 Platforms Displays multilingual text including complex scripts Displays multilingual text including complex scripts Supports complex scripts in Document Object Model Supports complex scripts in Document Object Model Supports Indian text through Unicode Supports Indian text through Unicode

Encodings for Multi-lingual Text in Web Pages  Raw Unicode OK for intranet on Windows NT networks OK for intranet on Windows NT networks Not good for internet pages Not good for internet pages  Number entities, e.g., &#2325 OK for occasional use, e.g., inserting characters not in the main script of page OK for occasional use, e.g., inserting characters not in the main script of page Not good for large documents Not good for large documents  UTF-8 – Recommended encoding Works just about everywhere Works just about everywhere Supported by IE 4.0+, Netscape 4.0+ Supported by IE 4.0+, Netscape 4.0+

Creating UTF-8 Webpages  Use charset=UTF-8 in META tag  Save HTML page as UTF-8 using notepad, Word, etc.  Saving as UTF-8 in Word: Select File/Save As WebPage/Tools Select File/Save As WebPage/Tools Select Web Options/Encoding Select Web Options/Encoding Change charset designation to UTF-8 Change charset designation to UTF-8

Embedded Fonts in Web Pages  Downloadable fonts used only in web pages  Deleted when page is closed  WEFT tool Creates embedded font from TTF file Creates embedded font from TTF file Saves download time/space by using only those glyphs required for the page Saves download time/space by using only those glyphs required for the page  On Microsoft website, see workshop/author/fontembed/font_embed.asp workshop/author/fontembed/font_embed.asp

Introduction to DHTML  Based on Document Object Model Objects in HTML document Objects in HTML document Text in objects including titles, headers, etc Text in objects including titles, headers, etc Attributes such as font, color, etc Attributes such as font, color, etc Are accessible via scripts, e.g., JScript or VBScript Are accessible via scripts, e.g., JScript or VBScript Supported in IE 4.0+ Supported in IE 4.0+  See various documents under for overview

Examples of DHTML <H1 id=Head1 style=“font-weight: normal” onmouseover = “makeitalic() ;” onmouseover = “makeitalic() ;” onmouseout = “makenormal() ;” > onmouseout = “makenormal() ;” > Sample Dynamic HTML Sample Dynamic HTML function makeItalic() { function makeItalic() { Head1.style.fontstyle = “Italic” ; } function makeNormal() { Head1.style.fontstyle = “Normal” ; }</script> Heading tag Jscript functions that change style of heading text

Using Indian Scripts in DHTML  Use same design rules as static HTML Encode in UTF-8 Encode in UTF-8 Use embedded fonts if needed Use embedded fonts if needed  Consider multilingual pages Display initial page in English Display initial page in English Offer option to change to other Offer option to change to other

Unicode Support in SQL Server 7.0  Unicode datatypes in SQL Server 7.0 NCHAR NCHAR NVARCHAR NVARCHAR NTEXT NTEXT Indicate Unicode text by N’text’, in SQL queries: Indicate Unicode text by N’text’, in SQL queries: create table myTable (col1 CHAR(8), col2 NCHAR(8)) insert into myTable (col1,col2) (‘Japan’, N‘ 日本 ')  Utilities for entering/retrieving Unicode data: Query Analyzer Query Analyzer Data Transformation Services Data Transformation Services Client application using ODBC Client application using ODBC

Accessing Data Through ODBC  ODBC supports Unicode data access  Use Visual C/C++ for read/write Use SQL ‘W’ routines, e.g., SQLExecDirectW(SQLHSTMT, LPWSTR, int); Use SQL ‘W’ routines, e.g., SQLExecDirectW(SQLHSTMT, LPWSTR, int); Specify data type SQL_C_WCHAR as needed: SQLBindCol(hstmt, nColumn, SQL_C_WCHAR, szCol, nMaxCol, &cbName); Specify data type SQL_C_WCHAR as needed: SQLBindCol(hstmt, nColumn, SQL_C_WCHAR, szCol, nMaxCol, &cbName);  See GLOBALDV sample  Use Visual Basic to retrieve and display

Accessing SQL Server 7.0 Unicode Data through ASP Webpages  Use standard encodings: UTF-8 in web pages UTF-8 in web pages Unicode in SQL Server 7.0 Unicode in SQL Server 7.0  Access data through Jscript/ODBC  Jscript automatically translates Unicode to current codepage in web page Defaults to system codepage Defaults to system codepage Specify UTF-8 “codepage” using: Specify UTF-8 “codepage” using: // Scope=session // Scope=session // Scope=page // Scope=page

Summary of SQL Server 7.0 Unicode Access

Other Considerations …  Handling Indian text in network applications Indic Language Group must be installed on clients Indic Language Group must be installed on clients Only necessary on server if display and input is required locally Only necessary on server if display and input is required locally  Sharing Documents Word 2000 Documents: Must have Indic language group installed on local machine Word 2000 Documents: Must have Indic language group installed on local machine HTML: Can use embedded fonts HTML: Can use embedded fonts

Break!

OpenType Layout David C. Brown Development Lead, and David Meltzer Program Manager Microsoft Corporation

OpenType Layout  File Format  Benefits of OpenType  Layout Features  Indic Features

OpenType File Format  sfnt table structure Extension of the current TrueType file format Extension of the current TrueType file format  A single font file may contain TrueType outline data TrueType outline data PostScript (CFF) outline data PostScript (CFF) outline data

Benefits of OpenType  Support for large character sets  Multi-script character sets  Unicode support  Glyph alternates supported  Advanced typography supported  Better protection of font data  Font embedding controls

Layout Features  Glyph substitution  Glyph positioning  Script and Language information

Glyph Substitution  Single glyph substitution  One-to-many substitution  Multiple glyph substitution  Aesthetic alternatives  Contextual glyph substitution

Glyph Positioning  Two-dimensional positioning  Single glyph adjustment  Adjustment of paired glyphs  Cursive attachment  Mark attachment  Contextual positioning

Script and Language Information  Layout features encoded by Scripts Scripts Languages within scripts Languages within scripts

Indic Features  Language Forms  Conjuncts and Typographical Forms  Glyph Positioning

Language Forms  Nukta  Akhand  Reph  Below-base Form  Half Form  Post-base Form  Vattu Variants

Example: Below-base form

Conjuncts and Typographical Forms  Pre-base substitutions  Below-base substitutions  Above-base substitutions  Post-base substitutions  Halant Forms

Example: Pre-base consonant conjunct

Glyph Positioning  Below-base marks  Above-base marks  Distance control

Coming Tools for Developing OpenType Fonts  VTT (Visual TrueType)  VOLT (Visual OpenType Layout Tool)

Installing Sample Fonts …  copy …\cssamp\fonts.exe c:\temp  cd c:\temp  fonts /T:c:\temp /C  Use explorer to drag mangal.ttf and latha.ttf into your winnt\fonts directory.

Resources  OpenType Specification pec pec  Indic Encoding Specification Early draft available on your CD Early draft available on your CD contact contact