Presentation is loading. Please wait.

Presentation is loading. Please wait.

Agenda: Guidelines for Supporting Complex Scripts* on Windows 2000  Key Concepts  Overview of Unicode  Migrating existing applications  Using Unicode.

Similar presentations


Presentation on theme: "Agenda: Guidelines for Supporting Complex Scripts* on Windows 2000  Key Concepts  Overview of Unicode  Migrating existing applications  Using Unicode."— Presentation transcript:

1 Agenda: Guidelines for Supporting Complex Scripts* on Windows 2000  Key Concepts  Overview of Unicode  Migrating existing applications  Using Unicode text in resources *Such as Devanagari and Tamil

2 Definitions  Enabling for a script: Adding support for input, display, and output of the script  Localization: Translating user interface elements  Globalization: Developing software such that feature design and code design are not limited to a single locale or script

3 Requirements for Enabling Indian Scripts in Applications on Windows 2000:  Use Unicode to encode text  Enable for complex scripts Note: Many Microsoft products do not yet meet these requirements. However, we’re working on it!

4 Overview of Unicode

5 Character Set Evolution  MS-DOS: OEM character sets  Windows 3.x: ANSI character sets  Windows 9x: ANSI character sets  Windows NT: Unicode Unicode Supported for Compatibility: Supported for Compatibility: OEM (console) character sets, OEM (console) character sets, ANSI character sets, ANSI character sets,

6 Why do character set differences matter?  Historically, they fragmented code bases for both Windows and applications Single byte: European editions Single byte: European editions Double byte: Far East editions Double byte: Far East editions Bi-directional: Middle East editions Bi-directional: Middle East editions  Make it difficult to share data  Make it difficult to develop multilingual applications

7 What is Unicode?  A 16-bit character encoding A mapping of characters to numbers A mapping of characters to numbers Syntax rules for display of complex scripts Syntax rules for display of complex scripts Not a font or glyph encoding! Not a font or glyph encoding! Not a sort algorithm! Not a sort algorithm!  Includes all characters in common use in modern scripts (and others)  Basis for the ISO 10646 character encoding standard  Native text encoding for Windows NT

8 Unicode ™ / ISO 10646  16-bit international character encoding  Windows 2000 uses Unicode version 2.0 0x0000 0xFFFF Punctuation Future use ASCII Private use Compatibility Indian Greek Arabic, Hebrew Latin Ideographs (Hanzi, Kanji, Hanja) Symbols Hangul Kana Thai A 00419662FF964F850000 (null)

9 Relatives of Unicode  ISO/IEC 10646 32 bit ISO standard of 64K X 64K “planes” 32 bit ISO standard of 64K X 64K “planes” Unicode repertoire is plane 0 Unicode repertoire is plane 0  UTF-7 7 bit transformation format 7 bit transformation format Not widely used Not widely used  UTF-8 8 bit transformation format 8 bit transformation format Used in web pages and some email Used in web pages and some email

10 Why Should I Use Unicode and Win32 for Indian Text? My application works fine now! ??

11 Benefits of Using Unicode on Windows 2000  Share data (e.g., cut and paste) with other Win32 applications  Make use of full Win32 API for text processing  Support multilingual documents, including multiple Indian scripts  Use industry standard encoding

12 Summary: Use Unicode – It is the ultimate character encoding  Represent all text with one unambiguous encoding  Support multilingual text easily  Avoid special processing for variable byte- length characters  Use standard encoding recognized throughout the industry and the world  Support new scripts that are only supported through Unicode

13 Migrating Exiting Applications to Support Indian Text on Windows 2000… Three Migration Scenarios: 1. ANSI application to Unicode 2. Standard Win32 application to complex script enabled 3. Existing Indian language application to Unicode and Win32

14 Migrating ANSI applications to Unicode  Overview of “A” and “W” entry points  How to build a Unicode Win32 Application  Unicode Applications on Windows 98

15 Review of the W and A APIs  Two kinds of window classes: Unicode, ANSI  Win32 API has two versions of most functions: “W” (wide) version handles Unicode “W” (wide) version handles Unicode “A” (ANSI –  ) assumes the system default code page (character encoding) “A” (ANSI –  ) assumes the system default code page (character encoding)  Macros resolve to W or A entry point  Example: Macro for RegisterClassEx #ifdef UNICODE #define RegisterClassEx RegisterClassExW #else #define RegisterClassEx RegisterClassExA #endif

16 To Build a Unicode-enabled Application:  Automatic in Visual Studio: Compile with options –DUNICODE and -D_UNICODE Compile with options –DUNICODE and -D_UNICODE Specify WinMainCRTStartup in ProjectSettings/Link/Output/EntryPointSymbol Specify WinMainCRTStartup in ProjectSettings/Link/Output/EntryPointSymbol  Or, use only the “W” routines from Win32 API  Metafiles: Use Extended Metafiles (EMF) Use Extended Metafiles (EMF) Windows Metafiles (WMF) don’t support Unicode Windows Metafiles (WMF) don’t support Unicode

17 For Applications that Must Also Run on Windows 98…  Use Unicode everywhere with single binary, two code paths: On Windows NT use W entry points On Windows NT use W entry points On Windows 98, convert Unicode  ANSI, use A entry points On Windows 98, convert Unicode  ANSI, use A entry points See sample GLOBALDV for example See sample GLOBALDV for example  See April Microsoft Systems Journal for details and other options

18 Migrating Standard Win32 Application to Support Complex Scripts Good news: In a Unicode application, it basically just works!

19 Simple, Plain-text Applications  Use standard edit control in Visual C/C++  Use standard win32 API functions Win32 APIs: ExtTextOutW or DrawTextW Win32 APIs: ExtTextOutW or DrawTextW ScriptString API in Uniscribe ScriptString API in Uniscribe

20 Pitfalls in Enabling for Complex Scripts  When displaying typed text: Do not output characters one by one! Do not output characters one by one! Do save text in a buffer and display the whole string with Uniscribe or Win32 API Do save text in a buffer and display the whole string with Uniscribe or Win32 API  To measure line lengths: Do not sum cached character widths Do not sum cached character widths Do use a GetTextExtent function or Uniscribe Do use a GetTextExtent function or Uniscribe

21 Simple Applications With Formatted Text  Use rich edit control in Visual C/C++  Internet Explorer 5.0: Use Document Object Model (more later)

22 Applications With Advanced Formatting and Layout  Use script APIs (“Uniscribe”)  See MSJ article of November 1998

23 What about Visual Basic, Visual J++?  Visual Basic 6.0 Standard controls are ANSI, not Unicode Standard controls are ANSI, not Unicode Use “MS Forms 2.0” controls to use Unicode in controls Use “MS Forms 2.0” controls to use Unicode in controls Resource editor does support Unicode Resource editor does support Unicode  Visual J++ Resource editor supports Unicode Resource editor supports Unicode Text Output is ANSI only Text Output is ANSI only  Future Plans: Make Unicode work everywhere in Visual Studio

24 Migrating Existing Indian language applications to Win32 and Unicode

25 Step 1 in Migrating Existing Indian Applications Follow guidelines for Unicode enabling and complex script enabling

26 Step 2 in Migrating Existing Indian Applications …  Provide conversion facility to migrate documents From your format to ISCII From your format to ISCII From ISCII to Unicode From ISCII to Unicode  MultiByteToWideChar(, … Devanagari is codepage 57002 Devanagari is codepage 57002 Tamil is codepage 57004 Tamil is codepage 57004  See UCONVERT sample Included on your CD Included on your CD Modified from UCONVERT in Win32 SDK Modified from UCONVERT in Win32 SDK

27 Using Unicode Text in Resources  Getting Unicode into Win32 resources  Multilingual Visual C/C++ applications

28 Getting Unicode into Win32 Resources  Create Unicode RC file Resource editor in Visual Studio does not support Unicode yet, so Resource editor in Visual Studio does not support Unicode yet, so Generate rc file for English using IDE Generate rc file for English using IDE Translate to target language with Unicode editor (e.g., notepad or Word) Translate to target language with Unicode editor (e.g., notepad or Word) Save as Unicode Save as Unicode  Compile with resource compiler RC.EXE RC.EXE does support Unicode RC.EXE does support Unicode Compile within Visual Studio IDE Compile within Visual Studio IDE

29 Implementing Multilanguage User Interface in Applications  Use satellite resource DLLs  Default to user settings, but  Allow user to change  For details, see: April 1999 Microsoft System Journal April 1999 Microsoft System Journal GLOBALDV sample code GLOBALDV sample code

30 Multilanguage User Interface  Initialize to current UI language Windows 2000: GetUserDefaultUILanguage() Windows 2000: GetUserDefaultUILanguage() Others: Use the language of the O/S Others: Use the language of the O/S  Allow user to select UI language Put language-dependent resources in resource DLLs Put language-dependent resources in resource DLLs Use naming convention, e.g., res.dll Use naming convention, e.g., res.dll Find all resource DLLs, put up list box of choices Find all resource DLLs, put up list box of choices

31

32

33 Agenda: Using Unicode and Complex Scripts in Enterprise Applications  Intranet/internet applications  Unicode support in SQL Server 7.0  Other Considerations

34 Intranet/Internet Applications  Internet Explorer 5.01 on Win32 Platforms Displays multilingual text including complex scripts Displays multilingual text including complex scripts Supports complex scripts in Document Object Model Supports complex scripts in Document Object Model Supports Indian text through Unicode Supports Indian text through Unicode

35 Encodings for Multi-lingual Text in Web Pages  Raw Unicode OK for intranet on Windows NT networks OK for intranet on Windows NT networks Not good for internet pages Not good for internet pages  Number entities, e.g., &#2325 OK for occasional use, e.g., inserting characters not in the main script of page OK for occasional use, e.g., inserting characters not in the main script of page Not good for large documents Not good for large documents  UTF-8 – Recommended encoding Works just about everywhere Works just about everywhere Supported by IE 4.0+, Netscape 4.0+ Supported by IE 4.0+, Netscape 4.0+

36 Creating UTF-8 Webpages  Use charset=UTF-8 in META tag  Save HTML page as UTF-8 using notepad, Word, etc.  Saving as UTF-8 in Word: Select File/Save As WebPage/Tools Select File/Save As WebPage/Tools Select Web Options/Encoding Select Web Options/Encoding Change charset designation to UTF-8 Change charset designation to UTF-8

37 Embedded Fonts in Web Pages  Downloadable fonts used only in web pages  Deleted when page is closed  WEFT tool Creates embedded font from TTF file Creates embedded font from TTF file Saves download time/space by using only those glyphs required for the page Saves download time/space by using only those glyphs required for the page  On Microsoft website, see workshop/author/fontembed/font_embed.asp workshop/author/fontembed/font_embed.asp

38 Introduction to DHTML  Based on Document Object Model Objects in HTML document Objects in HTML document Text in objects including titles, headers, etc Text in objects including titles, headers, etc Attributes such as font, color, etc Attributes such as font, color, etc Are accessible via scripts, e.g., JScript or VBScript Are accessible via scripts, e.g., JScript or VBScript Supported in IE 4.0+ Supported in IE 4.0+  See various documents under www.microsoft.com/workshop/author for overview www.microsoft.com/workshop/author

39 Examples of DHTML <H1 id=Head1 style=“font-weight: normal” onmouseover = “makeitalic() ;” onmouseover = “makeitalic() ;” onmouseout = “makenormal() ;” > onmouseout = “makenormal() ;” > Sample Dynamic HTML Sample Dynamic HTML function makeItalic() { function makeItalic() { Head1.style.fontstyle = “Italic” ; } function makeNormal() { Head1.style.fontstyle = “Normal” ; }</script> Heading tag Jscript functions that change style of heading text

40 Using Indian Scripts in DHTML  Use same design rules as static HTML Encode in UTF-8 Encode in UTF-8 Use embedded fonts if needed Use embedded fonts if needed  Consider multilingual pages Display initial page in English Display initial page in English Offer option to change to other Offer option to change to other

41 Unicode Support in SQL Server 7.0  Unicode datatypes in SQL Server 7.0 NCHAR NCHAR NVARCHAR NVARCHAR NTEXT NTEXT Indicate Unicode text by N’text’, in SQL queries: Indicate Unicode text by N’text’, in SQL queries: create table myTable (col1 CHAR(8), col2 NCHAR(8)) insert into myTable (col1,col2) (‘Japan’, N‘ 日本 ')  Utilities for entering/retrieving Unicode data: Query Analyzer Query Analyzer Data Transformation Services Data Transformation Services Client application using ODBC Client application using ODBC

42 Accessing Data Through ODBC  ODBC supports Unicode data access  Use Visual C/C++ for read/write Use SQL ‘W’ routines, e.g., SQLExecDirectW(SQLHSTMT, LPWSTR, int); Use SQL ‘W’ routines, e.g., SQLExecDirectW(SQLHSTMT, LPWSTR, int); Specify data type SQL_C_WCHAR as needed: SQLBindCol(hstmt, nColumn, SQL_C_WCHAR, szCol, nMaxCol, &cbName); Specify data type SQL_C_WCHAR as needed: SQLBindCol(hstmt, nColumn, SQL_C_WCHAR, szCol, nMaxCol, &cbName);  See GLOBALDV sample  Use Visual Basic to retrieve and display

43 Accessing SQL Server 7.0 Unicode Data through ASP Webpages  Use standard encodings: UTF-8 in web pages UTF-8 in web pages Unicode in SQL Server 7.0 Unicode in SQL Server 7.0  Access data through Jscript/ODBC  Jscript automatically translates Unicode to current codepage in web page Defaults to system codepage Defaults to system codepage Specify UTF-8 “codepage” using: Specify UTF-8 “codepage” using: // Scope=session // Scope=session // Scope=page // Scope=page

44 Summary of SQL Server 7.0 Unicode Access

45 Other Considerations …  Handling Indian text in network applications Indic Language Group must be installed on clients Indic Language Group must be installed on clients Only necessary on server if display and input is required locally Only necessary on server if display and input is required locally  Sharing Documents Word 2000 Documents: Must have Indic language group installed on local machine Word 2000 Documents: Must have Indic language group installed on local machine HTML: Can use embedded fonts HTML: Can use embedded fonts

46

47

48 Break!

49 OpenType Layout David C. Brown Development Lead, and David Meltzer Program Manager Microsoft Corporation

50 OpenType Layout  File Format  Benefits of OpenType  Layout Features  Indic Features

51 OpenType File Format  sfnt table structure Extension of the current TrueType file format Extension of the current TrueType file format  A single font file may contain TrueType outline data TrueType outline data PostScript (CFF) outline data PostScript (CFF) outline data

52 Benefits of OpenType  Support for large character sets  Multi-script character sets  Unicode support  Glyph alternates supported  Advanced typography supported  Better protection of font data  Font embedding controls

53 Layout Features  Glyph substitution  Glyph positioning  Script and Language information

54 Glyph Substitution  Single glyph substitution  One-to-many substitution  Multiple glyph substitution  Aesthetic alternatives  Contextual glyph substitution

55 Glyph Positioning  Two-dimensional positioning  Single glyph adjustment  Adjustment of paired glyphs  Cursive attachment  Mark attachment  Contextual positioning

56 Script and Language Information  Layout features encoded by Scripts Scripts Languages within scripts Languages within scripts

57 Indic Features  Language Forms  Conjuncts and Typographical Forms  Glyph Positioning

58 Language Forms  Nukta  Akhand  Reph  Below-base Form  Half Form  Post-base Form  Vattu Variants

59 Example: Below-base form

60 Conjuncts and Typographical Forms  Pre-base substitutions  Below-base substitutions  Above-base substitutions  Post-base substitutions  Halant Forms

61 Example: Pre-base consonant conjunct

62 Glyph Positioning  Below-base marks  Above-base marks  Distance control

63 Coming Tools for Developing OpenType Fonts  VTT (Visual TrueType)  VOLT (Visual OpenType Layout Tool)

64 Installing Sample Fonts …  copy …\cssamp\fonts.exe c:\temp  cd c:\temp  fonts /T:c:\temp /C  Use explorer to drag mangal.ttf and latha.ttf into your winnt\fonts directory.

65 Resources  OpenType Specification http://www.microsoft.com/typography/ots pec http://www.microsoft.com/typography/ots pec  Indic Encoding Specification Early draft available on your CD Early draft available on your CD contact davidm@microsoft.com contact davidm@microsoft.comdavidm@microsoft.com

66


Download ppt "Agenda: Guidelines for Supporting Complex Scripts* on Windows 2000  Key Concepts  Overview of Unicode  Migrating existing applications  Using Unicode."

Similar presentations


Ads by Google