Developing World-Ready Applications for Windows 2000/XP

Developing World-Ready Applications for Windows 2000/XP
Houman Pournasseh Lead Program Manager Russ Rolfe Program Manager Windows Division My name is Houman Pournasseh. I am a Globalization/Evangelism Program Manager in Microsoft’s Windows Division dealing with international challenges with both Operating System groups and external developers. My name is Russ Rolfe. I am a Globalization/Evangelism Program Manager in Microsoft’s Windows International Division. I work with designers both inside and outside of Microsoft educating them about Windows XP and future Operating Systems globalization features. Our groups mission is to: Inform others on issues regarding the creation of World-Ready products Provide solutions concerning World-Ready questions. Act as a clearinghouse for all resources pertaining to World-Ready product development both inside and outside of Microsoft. 20th International Unicode Conference

Agenda Definitions Why invest in World-Ready products? Globalization – step-by-step Universal encoding - Unicode Locale aware Handle different input methods Complex script aware Font independency Multi-lingual UI aware Mirroring aware Localizability Conclusion & References In today’s tutorial we will first define some key words that we will use through the rest of the talk. We will then talk about the advantages of investing in World-Ready applications. Then we will walk you through the globalization process: for both Win32 and Web based applications. All the considerations and challenges that you should be aware of will be described step-by-step. We do not expect you to become globalization specialists, but will give you guidelines and directives to achieve your globalization goals. We will then summarize this tutorial with a call to action and give you multiple online references and support channels. Although a Q&A session will be held at the end of the presentation, Please feel free to ask your pertinent questions during the tutorial as well. 20th International Unicode Conference

Agenda Definitions Why invest in World-Ready products? Globalization – step-by-step Universal encoding - Unicode Locale aware Handle different input methods Complex script aware Font independency Multi-lingual UI aware Mirroring aware Localizability Conclusion & References 20th International Unicode Conference

Definitions World-Ready: Properly globalized and localizable. Globalization: The process of designing and implementing source code so that it can accommodate any local market (locale) or script. Localizability: Designing software code and resources such that resources can be localized for any local market (locale) without changing the source code. Localization: The process of adapting a product (including both text and non-text elements) to meet the language, cultural, and political expectations and/or requirements of a specific local market (locale). World-Ready: A product is World-Ready when It provides for the input, display, and output of a defined set of Unicode supported language scripts and data relating to specific geographic areas, (Globalization) and It is designed and coded in such away to allow the localization of its data and resources to be accomplished without any source code changes. (Localizability) Globalization: Designing and developing a software product to function in multiple locales. This process involves identifying the locales that must be supported, designing features which support those locales, and writing code that functions equally well in any of the supported locales. Localizability: Process of developing a program core whose features and code design don't make assumptions based on a single locale or script and is ready to be localized with no changes to the source code. Localization: The process of customizing or translating the separated data and resources needed for a specific region or language. L10N is a common abbreviation for Localization because the L in Localization is followed by 10 letters and ends with the letter “N”. This adaptation of the User Interface (UI) should not involve any code changes. 20th International Unicode Conference

Users and Locales: To define their geographical location, users set the location To select a UI language, users set the UI language To define formatting for date, time…, users set the user locale To run legacy applications (non-Unicode), users set the system locale Location: Or geographic ID. This per user variable is newly introduced in Windows ME and Windows XP to define the region where the user lives. Use GetGeoInfo to retrieve that value. No API available to set this locale (by design). Users can change it thru the Region Options tab of Regional and Language Options applet (can be set on the fly). User Locale: Or "Standards and formats" in Windows XP. This per user variable defines user's preferences for formatting locale sensitive data (date, time, currency ...). Your application should use this setting to display formatted data. Use GetUserDefaultLCID to retrieve this value. No API available to set this locale (by design). Users can change it thru the Region Options tab of the Regional and Language Options applet (can be set on the fly). Input Locale: This per process locale is a combination of input language (e.g. Greek) and input method (e.g. keyboard). Use GetKeyboardLayout & LoadKeyboardLayout to retrieve/set this value. Users can add/remove input locales thru the Languages tab of Regional and Language Options applet (can be set on the fly). System Locale: Or "Language for non-Unicode programs" in Windows XP. This per system variable does not affect anything but code-page based applications. It allows the OS to emulate these non-Unicode applications by using the selected language's code-page to convert between ANSI/OEM and Unicode encodings. Use GetACP / GetOEMCP APIs to retrieve this value. No API available to set this locale (by design). Users can change it thru the Advanced tab of Regional and Language Options applet (requires a reboot). UI Language: This per user variable defines the language in which menus, dialog boxes, help files ... are translated. On a MUI version of Win2000/XP users can select the user interface language thru the Languages tab of Regional and Language Options applet (requries a log-off/log-on). Use Get[User/System]DefaultUILanguage to retrieve this value. No API available to set this locale (by design). To enter text in different languages, users set the input locale 20th International Unicode Conference

Windows XP International Enhancements
Developing World-Ready Applications for Windows 2000/XP Windows XP International Enhancements Nine (9) new locales added to previous list of 126. Punjabi, Gujarati, Telugu, Kannada, Kyrgyz, Mongolian (Cyrillic), Galician, Divehi, Syriac New Indic and Arabic scripts Gujarati, Gurmukhi, Telugu, Kannada, Syriac, Divehi More robust font display for East Asian languages. Improved Regional Settings options. Largely improved MUI support New location (GEO) Support for GB18030 Windows XP includes the following international features: Nine new locales have been added to previous list of 126 (Punjabi, Gujarati, Telugu, Kannada, Kyrgyz, Mongolian (Cyrillic), Galician, Divehi, Syriac) and locale support has been improved for some of the existing ones (Farsi and Urdu). To support these locales, over 10 new languages and scripts have been added. All of which was made possible by Windows XP being Unicode based (New languages only have Unicode code points and not their own individual code pages). New font fallback support available to accommodate East Asian languages whenever the currently selected font and the font linking mechanism fail to define the appropriate glyph to be displayed. (In other words, if the font doesn’t have a glyph for the character you are trying to display, the system will find a font that does. This only happens if the East Asian language support has been installed.) The Regional and Language Options control panel has been extensively redesigned to improve the user experience and to integrate new international functionality. The Multilanguage User Interface (MUI) Pack is much closer to the localized experience. Besides having around 97% of the menus, dialogs and system messages localize, MUI now displays the localized help files. New keywords in the MUI unattended setup mode allow for a more precise definition of how an MUI version will operate. New location (GEO) locale was added to allow services to provide local content per user. The Windows XP support package for GB18030 is bundled with the Simplified Chinese version of the Windows XP sold in China. This package is also available for down-load on all language versions of Windows XP / Windows 2000. 20th International Unicode Conference

Agenda Definitions Why invest in World-Ready products? Globalization – step-by-step Universal encoding - Unicode Locale aware Handle different input methods Complex script aware Font independency Multi-lingual UI aware Mirroring aware Localizability Conclusion & References Why should you invest in globalized applications? We hope there are not too many people here at a Unicode conference tutorial asking this question. In this section we will give you some reasons to do so. 20th International Unicode Conference

Why invest in World Ready products?
Developing World-Ready Applications for Windows 2000/XP Why invest in World Ready products? Get into international market (World Wide Web era) Create a single functionality binary to: Reduce development effort and cost Ease support and maintenance pain Because the distribution of applications can happen via the the Internet, the world as a market has become more accessible to every one. (remember it is the world-wide web). Plus the Internet has made sharing data between users (multinational corporations with cosmopolitan staff) a lot easier. (The sharing of the pieces of data is made easier, not the actual communications.) Shipping one functional core-binary to all platforms and for all different language versions reduces your development hassle and coasts significantly. (Single binary means to have the functionality part of the binary, any code excluding UI, adapted for a given language). A few advantages of a single binary are: 1) It gets rid of conditional compiling and 2) It alleviates maintaining separate source codes. Microsoft Windows 2000/XP, Microsoft Office 2000/XP and Microsoft Internet Explorer 5.+ are all single binaries. Meaning, for example that the same core component file “gdi32.dll” has been shipped on the English US version of Windows 2000/XP, the Japanese and the Arabic version of this platform. A potential update in this module in future Service Packs can be applied to all languages with no additional engineering effort. 20th International Unicode Conference

Why invest in World Ready products?
Developing World-Ready Applications for Windows 2000/XP Why invest in World Ready products? Sim-Ship all language versions at once saves lost revenue. Release Eng Ver 1.0 Release Eng Ver 1.1 English Dev Team German Loc Team Jan Feb Mar Apr May Jun Jul Aug Sep By shipping all language versions of your software at the same time, you don’t delay the customers’ deployment. Imagine the case where you release the English version 1.0 of your app on the first of January and the German version 1.0 on First of April. Your first update for English version (1.1) is planned for in May. If your German customer knows he needs the functionality of 1.1, he may forgo the purchase of the German 1.0 version and wait another 4 months to get it out in his own language. In this case, the German customer wouldn’t deploy your software – released in January - before sometime in August! Release Ger Ver 1.0 Release Ger Ver 1.1 20th International Unicode Conference

Agenda Definitions Why invest in World-Ready products? Globalization – step-by-step Universal encoding - Unicode Locale aware Handle different input methods Complex script aware Font independency Multi-lingual UI aware Mirroring aware Localizability Conclusion & References In the past, most companies have stayed away from globalizing their code. They had heard how hard it was. That it would cost them a fortune, and they would need some type of wizard to accomplish it. Well 10 years ago it was all that and more, but not any more. With the creation of APIs that are Unicode based (no more needing to check if a character was one, two or more bytes wide) and locale aware (no more needing to research how they format dates in India and Turkey), the process has become much easier and more cost efficient. The 5-10% extra effort to globalize a product could save you from 20-70% in localization costs pre language. Below are eight guidelines to use to create World-Ready applications. Write fully implemented Unicode applications Don’t make locale specific assumptions when displaying data Handle different languages and methods for inputting characters Don’t depend on a given font Be aware that you might need to handle complex scripts Be ready to turn on/off mirroring on your applications Allow the user to change the language of the UI Make the localization process a simple translation job. 20th International Unicode Conference

Transforms of Unicode UTF-7: 7 bit transformation format, seldom used UTF-8 8 bit transformation format For transmission over unknown lines: e.g. Web pages Codepage number CP_UTF8 = 65001 UTF-16 and UCS-2 Microsoft uses UTF-16 little-endian as its standard for Unicode encoding UTF-32 and UCS-4 20th International Unicode Conference

Windows 2000/XP: Unicode & Single Binary
Developing World-Ready Applications for Windows 2000/XP Windows 2000/XP: Unicode & Single Binary Built in support for hundreds of languages Any (well behaved) language Win32 application can run on any language version of Windows 2000/XP Native Unicode support for new scripts Support for supplementary characters All language versions and all flavors of Windows 2000/XP have a built-in out-of-the-box support for over 60 scripts (hundreds of languages). Any well written Win32 application can run on any of these OS versions. That’s right! For example, you can run a Japanese application such as Ichitaro on Spanish Windows 2000/XP. There are some important setup steps required for this, however. For example, if the application is an ANSI application (i.e., it stores text internally in byte-based code pages), you must first set the system default locale to the one required by that application, and install a keyboard or IME via the Input Locale tab in the Regional settings control panel applet. For Japanese running on French Windows 2000/XP, you would set the system default locale to Japanese and install a Japanese IME. NOTE: Setting the system default locale to Japanese may prevent ANSI French applications from behaving correctly. Some applications register themselves as an ANSI application, but convert all text to Unicode and deal exclusively with Unicode internally. Such an application should work fine with any system default locale setting, assuming it checks for keyboard changes so that it will use the appropriate conversion to convert to/from Unicode. If the application is a pure Unicode application, then it will run with any system default locale setting. This is the ideal situation. However, there are still not very many Unicode applications available. (For example: Microsoft Office and Internet Explorer are Unicode based.) Scripts such Divehi, Syriac, Telugu (and a lot more) are supported for the very first time. We have decided to support these scripts through Unicode only and not define any ANSI or OEM code pages for them. Support for supplementary characters (combining a pair of surrogate dedicated Unicode code points to generate a new character) has also been implemented in Windows 2000/XP. That is, changes have been added to the code in Windows 2000/XP to allow a user to display, print, sort, and edit text containing surrogates. These changes are transparent to end-users and developers. 20th International Unicode Conference

Unicode Encoding Non-Unicode applications behavior depends on user’s settings and makes data exchange between OS language versions impossible. The screen shot above shows an ANSI message box that contains an Arabic string in its body. The system locale is set to English, and therefore, the conversion between ANSI and Unicode done by the system maps Arabic characters to random high ANSI Latin characters (from code page 1252!). The only user solution at this point is to set the system locale to Arabic, reboot, and re-execute the application. The real coding solution is to move the application from ANSI to Unicode 20th International Unicode Conference

Legacy systems support
Developing World-Ready Applications for Windows 2000/XP Legacy systems support Few exceptions for not fully Unicode apps: App has to run on Win9x and NT Existing Internet protocols and standards require special encoding Supporting apps that need to run on Win9x Create two separate binaries: one ANSI & one Unicode Register as ANSI and internally convert to/from Unicode as needed Use MSLU! Unless your single binary is also targeting Win9x (not fully Unicode) or unless you have to deal with internet standards and prototypes using different encoding (example of UTF-8), your application should be fully Unicode. Even if you are targeting to support non-fully Unicode platforms, there are techniques to create hybrid ANSI/Unicode applications: Create two binaries: Default compile (‘A’ routines, ANSI based routines) for Windows 9x and a Unicode compile for NT, Windows 2000 Advantages: Runs on both platforms Disadvantages: Maintenance of two binaries is messy Always register as ANSI application, convert to/from Unicode as needed. Uses Unicode on Windows 9x whenever possible Does not support “new” scripts (e.g., Devanagari, Tamil, Armenian, Georgian) Multi-script support more difficult Finally, you can use MSLU (the Microsoft Layer for Unicode). For more information on this DLL, check out: 20th International Unicode Conference

Data types For 8 bit and double-byte characters: typedef char CHAR; // 8 bit character typedef char *LPSTR; // pointer to 8 bit string For Unicode (“Wide”) characters: typedef unsigned short WCHAR; // 16 bit character typedef WCHAR *LPWSTR; //pointer to 16 bit string Programming with Unicode: Unicode is supported in the Windows 32-bit API by creating a new string data type and providing a separate set of entry points and messages to support this new data type. The API provides a series of macros and naming conventions that makes migration to Unicode transparent. Compiling a non-Unicode version and a Unicode version of an application from the same set of sources, for example, is a straightforward matter. Implementing Unicode as a separate data type allows the compiler’s type checking to ensure that only functions expecting Unicode strings are called with Unicode parameters. Data Types: Most string operations for Unicode can be coded with the same logic used for handling the Windows ANSI character set. The difference is that the basic unit of operation is a 16-bit quantity instead of an 8-bit. The header files provide a number of type definitions that make it easy to create sources that can be compiled for Unicode or the ANSI character set. The generic data type TCHAR gets resolved into the traditional char variable if the code is compiled ANSI or to the wide data type WCHAR if compiled Unicode. TCHAR LPTSTR wchar_t char wchar_t * char * 20th International Unicode Conference

Win32 API prototypes Generic function prototypes: // winuser.h #ifdef UNICODE #define SetWindowText SetWindowTextW #else #define SetWindowText SetWindowTextA #endif // UNICODE A routines behavior under Windows 2000/XP W routines behavior under Win9x Function prototypes are provided in three sets, as shown below. The generic function prototype consists of the standard API function name implemented as a macro. The generic prototype gets resolved into one of the explicit function prototypes, depending on whether the compile-time manifest constant UNICODE is defined in a #define statement. The letter W or A is added at the end of the API function name in each explicit function prototype. Note how the generic prototype uses the generic type LPTSTR for the text parameter, but the W and A prototypes use the 8-bit LPSTR or wide-character type LPWSTR instead. This three-prong approach applies exclusively to all functions with text arguments. In every case, a function with a name ending in W expects wide-character arguments, and so on. A generic function prototype should always be used with generic string and character types. On Windows NT, the “A” routines convert ANSI text arguments to Unicode internally using the default system locale. The W version of the corresponding routine will then be called. On Windows 9x “A” routines are native. Only a limited number of Win32 APIs have a W version defined on Win9x. For not defined Win32 APIs, a call to “W” routines would fail, and SetLastError will return ERROR_CALL_NOT_IMPLEMENTED. Also a text macro has been defined (_T or TEXT) to handle character strings either as ANSI or Unicode #ifdef UNICODE #define TEXT(string) L#string #else #define TEXT(string) string #endif // UNICODE 20th International Unicode Conference

String manipulation functions and macros
Developing World-Ready Applications for Windows 2000/XP String manipulation functions and macros Generic CRT bit codepage Unicode _tcscpy strcpy wcscpy _tcscmp strcmp wcscmp Compile with –D_UNICODE to get Unicode version Generic Win bit codepage Unicode lstrcpy lstrcpyA lstrcpyW lstrcmp lstrcmpA lstrcmpW A Unicode equivalent for all traditional ANSI version of string manipulation C run-time functions (str***) has been created (wcs***). The generic entry point for those functions that gets automatically resolved accordingly to the compiling conditions are named _tcs*** The compiling flag D_UNICODE should be used to compile Unicode the C run-time libraries. Win32 APIs however always offer a better international support and easier locale specific manipulation of text data. UNICODE compiling flag should be used for the Win32 version of string manipulation APIs where an A and a W version of each of those APIs has been created. And finally for embedded text, a new _TEXT macro has been defined as a generic name to either ANSI or Unicode data. Compile with –DUNICODE to get Unicode version #ifdef UNICODE #define TEXT(string) L#string #else #define TEXT(string) string #endif // UNICODE 20th International Unicode Conference

Unicode  ANSI Converting between ANSI and Unicode MultiByteToWideChar for codepage  Unicode WideCharToMultiByte for Unicode  codepage CP can be any legal codepage number or a predefined such as: CP_ACP, CP_SYMBOL, CP_UTF8, etc. Tips for writing Unicode: Use generic data types and function prototypes Replace p++/p-- with CharNext/CharPrev Compute buffer sizes in TCHAR Most of the applications today are still ANSI and you might need to do your conversion between Unicode and ANSI to interface with other applications or for internal use. int WideCharToMultiByte( UINT CodePage,// code page DWORD dwFlags, // performance and mapping flags LPCWSTR lpWideCharStr, // wide-character string int cchWideChar, // number of chars in string LPSTR lpMultiByteStr, // buffer for new string int cbMultiByte, // size of buffer LPCSTR lpDefaultChar, // default for unmappable chars LPBOOL lpUsedDefaultChar); // set when default char used The code page argument of this API can be any valid installed/supported code page (even outside the currently selected system locale). Also, since UTF-8 is a commonly used encoding of Unicode, MultiByteToWideChar and WideCharToMultiByte allow conversions between raw Unicode and UTF-8 by using the predefined value CP_UTF8. We strongly recommend to write generic code by using the generic data type instead of implicitly calling ANSI or Unicode version of Win32/C run-time functions (unless, of course, you are performing an internal conversion between those two data encoding types). This approach will help avoid any confusion about the data type handling and reduce the effort of your migration from ANSI to Unicode to a simple definition of compiling flags. Since your character string can be an array of CHAR or WCHAR (1 byte or 2 bye) depending if the Unicode flag is defined or not, the best approach when running through a chain of characters is to avoid using pointer incrementations/ decrementation and use predefined APIs such as CharNext and CharPrev. 20th International Unicode Conference

Demo! Porting an ANSI application to Unicode ANSI VERSION: #define MAX_STR 256 char g_szTemp[MAX_STR]; /////////////////////// int PASCAL WinMain(HINSTANCE hInstance, HINSTANCE hPrevInstance, LPSTR lpszCmdLine, int nCmdShow) { // Load a string resource and display it in a MessageBox LoadString(g_hInst, IDS_SAMPLE, g_szTemp, MAX_STR); MessageBox(NULL, g_szTemp, "This is an ANSI message...", MB_OK); ... ExtTextOut(hDC, 10, 10, ETO_CLIPPED , NULL, g_szTemp, strlen(g_szTemp), NULL); return (TRUE); } Unicide VERSION: include <tchar.h> TCHAR g_szTemp[MAX_STR]; MessageBox(NULL, g_szTemp, _TEXT("This is an ANSI message...“), MB_OK); ExtTextOut(hDC, 10, 10, ETO_CLIPPED , NULL, g_szTemp, _tcslen(g_szTemp), NULL); 20th International Unicode Conference

Encodings in Web pages ANSI codepages or ISO character encodings Mono-lingual or restricted to one script Raw Unicode: UTF-16 OK for Windows NT networks Number entities: क OK for occasional use UTF-8: Recommended encoding Supported by IE 4.0+ and Netscape 4.0+ ANSI or ISO encodings in Web pages can only be used for monolingual or mono-script web content. Make sure that if you are submitting such a content for localization to clearly specify the value of the code page to be set and used in the localized content. UTF-16 or raw Unicode can be used safely on any NT network that has full Unicode support. This is not a suggested encoding for internet sites where the client web browser capabilities as well the network Unicode support are not known. Number entities can be used to represent a few symbols out of the currently selected code page or encoding. This approach makes in impossible to compose large amount of text and makes the editing of your web content extremely hard. Encoding your web content in UTF-8 is the best and safest approach. All versions of Internet Explorer 4 and beyond as well as Netscape 4.0 and higher support that encoding that is not restricted to network/wire capabilities. UTF-8 universal encoding allows you to create multilingual web content without the need of changing the encoding based on the target language. 20th International Unicode Conference

Setting web encoding HTML/DHTML: Tag in the head of the document <META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=<value>"> XML: <?xml version=“1.0” encoding=<value>?> ASP: Specify charset using ASP directives: Per session: <%Session.CodePage=<charset>%> Per page: 20th International Unicode Conference

Setting encodings for .NET
Developing World-Ready Applications for Windows 2000/XP Setting encodings for .NET Class: System.Text Distinction between: File, Request, and Response encodings in code: Response.ContentEncoding=<value> in page directive: ResponseEncoding=<value>%> in configuration file: <globalization requestEncoding=<value> responseEncoding=<value> fileEncoding=<value> /> 20th International Unicode Conference

Universally encoded page
Developing World-Ready Applications for Windows 2000/XP Universally encoded page <html> <head> <meta http-equiv="Content-Type" content="text/html; charset=utf-8"> <title>International Text - Unicode UTF-8</title> </head> <body bgcolor="#ffffff"> Arabic - عندما يريدالعالم أن يتكلّم، فهو يتحدّث بلغة يونيكود. English - When the world wants to talk, it speaks Unicode Korean - 세계를 향한 대화, 유니코드로 하십시오 Thai - 'ยูนิโค้ด'ภาษาเพื่อการสื่อสารของคนทั่วโลก </body> 20th International Unicode Conference

Windows 2000/XP: National Language Settings
Developing World-Ready Applications for Windows 2000/XP Windows 2000/XP: National Language Settings NLS APIs allow you to automatically adjust to users formatting preferences: Date: /04/01 is 平成 13年7月4日 in Japan Time: 9:00PM is 21:00 in the France Currency: $1, is 1.000,00 $ in Germany Large Numbers: 123,456, is 12,34,56, in Hindi Sort Order: German ä comes after a Swedish ä comes after z Windows 2000 has many NLS* API’s that are locale aware. Thus all a programmer needs to do to display data in different formats, is to make sure that he uses the APIs and that the correct locale is always set before they call these APIs. (i.e. Use NLS APIs to remove locale dependency of your code.) NLS Functions Categories : Getting and setting locales Querying locale information Formatting and converting locale sensitive data Date/Time format Don’t assume Gregorian Calendar! Order and names of year, month, day vary by locale (For example what date does this represent 07/04/01: USA July 4th, 2001 UK 7th April 2001 Japan 2007 – April 1st) Number format Radix (decimal) separator “Thousands” separator is not necessarily on thousands (e.g., Indic calendars) Local digit substitution Others: Currency format Sort order *National Language Settings 20th International Unicode Conference

Locale awareness Eliminate implicit locale assumptions from code: #define ToUpper(ch) \ ((ch)<='Z' ? (ch) : (ch)+'A' - 'a') Query system to format locale-dependent data using NLS APIs and LCIDs. Old, English-centric, evil approach: #define ToUpper(ch) ((ch)<='Z' ? (ch) : (ch)+'A' - 'a') Can you imagine what this approach’s result would be when dealing with FE scripts for example? Modern, enlightened locale-sensitive approach: – use NLS* API routines LCMapString or CharUpper() *NLS: National Language Support Categories of NLS Functions: Getting and setting locales Querying locale information Formatting and converting locale sensitive data These APIs use: Language ID – LANGID – (primary language + sub-language): e.g. French, France Locale ID (LCID): LANGID + Sort ID *See predefined macros: MAKELANDID and MAKELCID 6 bits 10 bits 4 bits 12 bits Reserved Sub-language Sort ID Primary Language Language ID 20th International Unicode Conference

NLS APIs Getting and setting locales
Developing World-Ready Applications for Windows 2000/XP NLS APIs Getting and setting locales Querying locales LCID GetSystemDefaultLCID EnumSystemLocales LCID GetUserDefaultLCID() LCID GetThreadLocale() Setting locales BOOL SetThreadLocale(LCID dwNewLocale) BOOL SetLocaleInfo(LCID,…) // Works for standard locales only! No APIs to set System locale, User locale, and UI language Querying locales: The GetSystemDefaultLCID function can be used to retrieve the active system locale identifier. This value can help you to identify the current ANSI/OEM code pages as well as the default charset being used in the system for ANSI apps. Depending on the flags passed to it, the EnumSystemLocales function enumerates the locales that are either installed on or supported by a system (installed locales are those for which a cp*.nls file has been installed and can be used to convert between Unicode and ANSI using MultiByteToWideChar and WideCharToMultiByte APIs). The GetUserDefaultLCID function retrieves the user selected locale identifier. This value can be used in formatting of date/time/currency… into user’s preferred standards. Upon its creation, the main thread of each process inherits the user locale value. Applications (in seldom occasions for very specific reasons) might decide to change that value (example of a server based app that is representing data to a client machine wants to format data in client’s locale). The GetThreadLocale function returns the calling thread's current locale. Setting locales: The SetThreadLocale function sets the calling thread's current locale. The SetLocaleInfo function sets an item of locale information. This setting only affects the user override portion of the locale settings; it does not set the system defaults. 20th International Unicode Conference

NLS APIs Querying locale information
Developing World-Ready Applications for Windows 2000/XP NLS APIs Querying locale information To retrieve information specific to a given locale: GetLocaleInfo Gives information for any valid locale (takes an LCID). LCTYPE input tells type of info to retrieve for a given locale (e.g. currency symbol, name of months…). Returns info in string buffer (LPTSTR). To retrieve information specific to a location: GetGeoInfo Gives information for any valid location (takes an LCID). SYSGEOTYPE input tells type of info to retrieve for a given location(e.g. LCID, Time zones…). The GetLocaleInfo function retrieves information about a locale. GetLocaleInfo( LCID Locale, // locale identifier LCTYPE LCType, // information type LPTSTR lpLCData, // information buffer int cchData // size of buffer ); LCType argument can be any predefined flags to query a wide range of locale specific information from default Native country name to Native month or day names. See MSDN documentation for more info: The GetGeoInfo function gets information about a specified location. int GetGeoInfo(GEOID GeoId, // location identifier GEOTYPE GeoType, // type of information requested LPTSTR lpGeoData, // buffer for results int cchData, // size of buffer LANGID language // language id You can see a complete list of Win2000 supported locales and some of their associated settings at: 20th International Unicode Conference

NLS APIs Formatting data
Developing World-Ready Applications for Windows 2000/XP NLS APIs Formatting data To enumerate formats: EnumCalendarInfo(Ex) EnumDateFormats EnumTimeFormats To format data directly: GetCurrencyFormat GetDateFormat GetTimeFormat Enumerate formats: The EnumCalendarInfo(Ex) function enumerates calendar information for a specified locale. The CalType parameter specifies the type of calendar information to enumerate. The function returns the specified calendar information for all applicable calendars for the locale or, for a single requested calendar, depending on the value of the Calendar parameter. The EnumDateFormats(Ex) function enumerates the long or short date formats that are available for a specified locale, including date formats for any alternate calendars. The value of the dwFlags parameter determines whether the long date, short date, or year/month formats are enumerated. The EnumTimeFormats function enumerates the time formats that are available for a specified locale. Formatting data: The GetCurrencyFormat function formats a number string as a currency string for a specified locale. The GetDateFormat function formats a date as a date string for a specified locale. The GetTimeFormat function formats time as a time string for a specified locale. You can see date and time formatting for all Win2000 supported locales at: 20th International Unicode Conference

Demo! A locale aware application You can see this application and its source code at: 20th International Unicode Conference

Locale awareness in web pages
Developing World-Ready Applications for Windows 2000/XP Locale awareness in web pages To retrieve user locale: A server variable: Request.ServerVariables("HTTP_ACCEPT_LANGUAGE") A property of the navigator object: navigator.UserLanguage To set a locale: In DHTML: SetLocale("de") DateData = FormatDateTime(now(), vbShortDate) In ASP: <% Session.LCID = 1041 %> <% Response.Write( FormatDateTime(dtNow) ) %> Check out for code samples on how to retrieve the currently selected user/browser locale and how to format locale specific data in you web pages. 20th International Unicode Conference

Locale awareness in .NET
Developing World-Ready Applications for Windows 2000/XP Locale awareness in .NET Class: System.Globalization Referenced as CultureInfo – set of preferences based on language and culture. Pattern: xx-XX, such as fr-CA, de-AT (RFC-1766) Setting the CultureInfo: Implicit: Picked up from User Locale Explicit: In code: Thread.CurrentThread.CurrentCulture = new CultureInfo (“de-DE”) In page directive: Culture=<value>%> In config: <globalization culture=<value> /> 20th International Unicode Conference

Demo! Locale aware web site You can see this site and its source code at: Under “sniffing the browser” article. 20th International Unicode Conference

Handling Input methods
Developing World-Ready Applications for Windows 2000/XP Handling Input methods Easiest: Using edit controls (recommended) Responding directly to user input Input locales (language + input method): HKL GetKeyboardLayout ActivateKeyboardLayout LoadKeyboardLayout Windows messages: WM_INPUTLANGCHANGEREQUEST WM_INPUTLANGCHANGE WM_IME*.* (for IME support only) WM_CHAR and WM_IME_CHAR Most of applications can safely choose to not handle input locales and let the operating system handle that complex operation by using standard edit controls. Web pages have nothing to worry about and can rely on their HTML rendering engine for that matter. The input locale is represented by HKL (Incorrectly called “handle to keyboard layout” in documentation). HKL is DWORD where the LO word is the language ID and the HI word is the input method ID (usually ignored by apps). If you want to directly handle input locales, here are the most commonly used APIs. Use: ActivateKeyboardLayout to: set input locale Use: LoadKeyboardLayout to: load input locale into system Use: GetKeyboardLayout to: retrieve active input locale Use: GetKeyboardLayoutList to: retrieve input locale handles corresponding to current set of input locales Window messages to monitor for input locale language selection change are: WM_INPUTLANGCHANGEREQUEST wParam = (BOOL) bLangInSystemCharset (Unicode apps can ignore this) lParam = new (requested) HKL return DefWindowProc to accept request, 0 to reject WM_INPUTLANGCHANGE wParam = (UINT) BaseCharset lParam = new HKL 20th International Unicode Conference

Windows 2000/XP: Complex Scripts Complex Scripts have one or more of the following attributes: Bi-directional (BiDi) reordering (Arabic, Hebrew) Contextual shaping (Arabic, Indic family) Display of combining characters (Arabic, Thai, Indic) Specialized word-breaking (Thai) Text Justification (Arabic) A script is defined as complex whenever additional handling (not required for Latin) is needed to display characters. Attributes of complex scripts are: Bi-directionality of text when Arabic or Hebrew scripts (Right-To-Left) are mixed with other scripts such as Latin (Left-To-Right). Reading order, alignment of text and cursor positions are some of the considerations. Contextual shaping. For Arabic and Indic scripts, the shape of a character depends on its context (position) within the word and results in a different glyph. Combining characters. Two or more characters can get combined to define a whole new glyph. Word-breaking and justification. The Thai script has no spacing between words, but requires one to wrap on word boundries. An internal dictionary is required to implement word breaking. Also, for Arabic, introducing spaces between characters to justify text simply breaks the character shaping. Kashidas are inserted in appropriate positions between chars. Currently supported complex scripts are: Arabic, Divehi, Hebrew, Indic family, Syriac, Thai, and Vietnamese. 20th International Unicode Conference

Complex Scripts BiDi reordering
Developing World-Ready Applications for Windows 2000/XP Complex Scripts BiDi reordering Arabic and Hebrew scripts are not only right justified, but also read from right_to_left (RTL) – have a RTL reading order. For those scripts, the logical order (order in which the text is being entered by the user by a sequence of virtual key inputs) and the visual order (order in which characters are being represented to the user) are different in most of the case. Caret movement and mixture of RTL and LTR scripts are challenging tasks when it comes to deal with BiDi languages. For more information about BiDi algorithms, refer to Unicode organization guidelines. Back 20th International Unicode Conference

Complex Scripts Contextual Shaping
Developing World-Ready Applications for Windows 2000/XP Complex Scripts Contextual Shaping In Latin script, the shape of a character (or its glyph) remains unchanged no matter what the position of the affected character is within a word. For Arabic and Indic family of languages, the character’s glyph depends on its position within a word and its preceding/following characters. In Arabic, the same character can have up to five different shapes depending on the context. Back 20th International Unicode Conference

Complex Scripts Combining Characters
Developing World-Ready Applications for Windows 2000/XP Complex Scripts Combining Characters For Latin script, there is often time a direct one to one mapping between a character and its glyph (the character “e” is always represented by the same glyph “e”). For Complex scripts, several characters can combine together to create a whole new glyph independent from the two original characters. There are also cases where the number of resulted glyph can be bigger than the original number of characters used to generate those glyphs. Back 20th International Unicode Conference

Complex Scripts Justification
Developing World-Ready Applications for Windows 2000/XP Complex Scripts Justification To justify Latin text, spaces are added between words and characters. This approach can not be used to justify Arabic text or the contextual shaping will break. Instead, continuous lines (or Kashidas) are inserted between joining characters to make each word look longer. Insertion of Kashidas on character combinations. Back 20th International Unicode Conference

Uniscribe Clients: Windows 2000/XP, Trident, Microsoft Office 2000/XP A collection of exported APIs (high and low level) Hides implementation details A shaping engine per language Application Unicsribe (USP10.DLL) is the system engine used to shape and layout complex scripts under Windows 2000/XP. It is shipped with Microsoft Windows 2000, Microsoft Internet Explorer (4.x and 5.0) and Microsoft Office 2000. For more technical details on how to format and display text using Uniscribe (usp10.dll for version 1.0) refer to: Language Details Syllable structure (Indian, Thai) Contextual shaping (Arabic, Indic) Caret placement (all) Wordbreak (Thai) National digits (Arabic, Indic, Thai) Bi-directional layout (Arabic, Hebrew) LPK.DLL USER GDI USP 20th International Unicode Conference

Options to display text
Developing World-Ready Applications for Windows 2000/XP Options to display text Plain text in application Standard edit control or Win32 API (ExtTextOut / DrawText). Simple formatted text In Win32 apps, use Richedit control. For Web pages, use Document Object Model (DHTML). Advanced formatting Use Uniscribe (see SDK and MSJ article). To take advantage of Complex Script support provided by the system, to display plain text, the recommended approach is to use either edit controls or standard Win32 APIs such as DrawText or ExtTextOut. Those APIs and controls automatically interface Uniscribe if needed to output CS text. 20th International Unicode Conference

Special considerations
Developing World-Ready Applications for Windows 2000/XP Special considerations When dealing with BiDi, set RTL reading order and alignment SetTextAlign / GetTextAlign with TA_RIGHT ExtTextOut with ETO_RTLREADING DrawText with DT_RTLREADING To measure line lengths: Do not sum cached character widths Do use a GetTextExtent function or Uniscribe When displaying typed text: Do not output characters one at a time! Do save text in a buffer and display the whole string with Uniscribe or Win32 API We have already mentioned the attributes of a complex script and why it is so complicated to handle these scripts. Lets now talk about options you have when it comes to dealing with these complex scripts. Before talking about all complex scripts, a quick heads up about RTL (right to left) languages (Arabic, Farsi, Hebrew, Urdu). No matter what the approach you adopt to handle complex scripts, when displaying RTL scripts you need to be aware of: Alignment of the text: RTL languages are right aligned. Reading order: the reading order goes also from right to left. This can be a tricky task when RTL and LTR characters are mixed in a BiDi context. Standard text out functions have special flags to handle this kind of formatting: Function: Flags: DrawText DT_RIGHT / DT_RTLREADING ExtTextOut ETO_RTLREADING SetTextAlign TA_RIGHT / TA_RTLREADING (per DC [Device Context]) GetTextAlign (per [Device Context]) Because of contextual shaping and combining character attributes of CS, the length (in width) of a laid out character string can be completely different from individual sums of character widths used to generate that string. Use GetTextExtent API to retrieve the correct size of the text. To make sure that your displayed text is properly laid out and it’s being shaped properly, out put text strings as a whole (remember the importance of a context in defining the right glyph to be displayed) instead of outputting one character at a time. 20th International Unicode Conference

Windows 2000/XP: Font support
Developing World-Ready Applications for Windows 2000/XP Windows 2000/XP: Font support Introduction of OpenType fonts: Extended TTF with glyphs for PE, ME, Thai, Greek, Turkish, Cyrillic… Font fallback mechanism for CS and Eastern Asian scripts used by Uniscribe Font linking mechanism used by GDI One of the biggest challenges in enabling the OS for international character sets is the ability to select and represent the right glyph. In addition to the old font substitution (font association) technique used since early versions of Windows, Windows 2000/XP take advantage of new font technologies and features to overcome these challenges: The new OpenType fonts with extended support for Pan European (PE) and Middle East (ME)/Thai scripts. A new technique called font linking allows displaying of scripts that are not supported by the selected font (base font) by appending to that base font a linked font with glyphs for the desired script. (Note: Not all fonts support all scripts. For example, you can not use “Arabic Transparent” to display Japanese glyphs.) Arial and Tahoma are examples of linked fonts. Note that applications or users can not modify the font linking behavior or append the list of link candidates for a given font. Font fallback is used by Uniscribe (shaping engine for complex scripts) to provide an alternate font for runs of characters not representable by the original font. As opposed to font linking, a fall back font will replace the originally selected font. Applications and users can not add or modify these fallback fonts. 20th International Unicode Conference

Font independency Win32 programming
Developing World-Ready Applications for Windows 2000/XP Font independency Win32 programming Not to do: Hard code font face names Assume a given font is installed Assume selected font supports the desired script To do: Use MS Shell Dlg face name in Dialog resources EnumFontFamiliesEx or ChooseFont to select fonts Even though you use an OpenType font, like Arial, that is font-linked for Eastern Asian locales, hard coding the font face name still exposes your application’s behavior to user changes. For example: The Win2000 version of Arial might be replaced with an older NT4.0 version which didn’t support Arabic or Hebrew, thus your program loses it capability to support these languages. As displayed on the image above, whenever the system is failing to retrieve the proper glyph to display an script, the default glyph (empty square boxes or bold vertical lines) are displayed. In resource files, never hard code a dialog box font (e.g. Microsoft Sans Serif). Always use the face name MS Shell Dlg which gets mapped to a font which supports your system’s UI language. At run-time, when using font creation APIs (e.g. CreateFont), do not hard code font names (e.g. “Arial”) nor assume a font is installed on the system. In order to select the best font which matches your requirements (LOGFONT ), use: EnumFontFamiliesEx() or ChooseFont() APIs. To set lfCharSet in the LOGFONT structure, use the charset available from: WM_INPUTLANGCHANGE GetTextCharsetInfo TranslateCharsetInfo For scripts with NO charset (e.g., Armenian, Devanagari, Georgian, Tamil), Use DEFAULT_CHARSET Get FONTSIGNATURE structure from EnumFontFamExProc Match font to Unicode range of script using FONTSIGNATURE 20th International Unicode Conference

Font independency In Web pages
Developing World-Ready Applications for Windows 2000/XP Font independency In Web pages Avoid placing text formatting values into in-line style. Hello Declare text style in CSS files: <style> .myStyle {font-size: 10pt; font-family: Arial;} </style> Hello Use WEFT to embed fonts to your web pages (IE only): When creating web pages, avoid placing font attributes values into in-line styles. This is the same as hard coding text in your source code (i.e. if you need to make changes you have to go to that page and make the changes necessary for each language you localizing into). A better way to handle the font attributes is to use Cascading Style Sheets (CSS). What you do is create a CSS for each language you are localizing. Then when a web page is being created it incorporates the localized CSS which contains the corresponding attributes necessary for the current language. Thus in our example we create a style class called myStyle which contains our font family and font size. Allowing the them to change depending on the language you are rendering your content. As for your web page all you need to do is “span” what ever text you want formatted with your myStyle class. Font embedding has been a feature of Microsoft applications such as Word and PowerPoint® for several years. It allows the fonts used in the creation of a document to travel with that document, ensuring that a user sees documents exactly as the designer intended them to be seen. Font embedding technology is also built into Microsoft Internet Explorer (version 4 and above) bringing embedded fonts to the Web. Microsoft has provided a Web Embedding Fonts Tool (WEFT) which lets Web authors create 'font objects' that are linked to their Web pages so that when an Internet Explorer user views the pages they'll see them displayed in the font style contained within the font object. If the viewer is not using IE, they will see the text displayed in either the second choice font specified by the page or their default font. This all depending on the browser they are using and the fonts they have installed. 20th International Unicode Conference

Windows 2000/XP: Multilanguage UI
Developing World-Ready Applications for Windows 2000/XP Windows 2000/XP: Multilanguage UI Multilanguage version of Windows 2000/XP allows you to: Switch the language of UI without rebooting Set the language of UI per user Add/Remove language modules Offer your own solution for a multilingual UI The multi-lingual user interface feature in Windows 2000/XP allows the user interface language to be specified for each user. This means that you can set up the profiles of several users on the same machine so that each one has a different user interface, e.g., Japanese, German, English. You can add or remove the set of languages that you want to make available to the users at any time (note that the UI language is different from script support). This language change is only for resources like menus and dialog boxes. It does not change the locale for things like sorting rules, date and time format, etc. The multi-language user interface feature for Windows 2000/XP is only offered in the “Windows 2000 MultiLanguage version” and the “Windows XP MultiLanguage Pack”. You can take advantage of this functionality to implement your own solutions for multilanguage user interfaces. How does the MultiLanguage system differ from the corresponding localized version of Windows 2000? Not fully localized (over 90% for Windows 2000 and over 97% for Windows XP of the UI is localized, exceptions are registry keys and folder names) Additional disk space required for language specific MUI files. 20th International Unicode Conference

Multilingual UI Applications Possible options
Developing World-Ready Applications for Windows 2000/XP Multilingual UI Applications Possible options One localized .exe per target language Eng.exe Ger.exe Jpn.exe One multilingual language resource DLL One single EXE per language: Advantage: Easiest to implement. Disadvantage: Need to maintain a separate source tree for each language Any resource change requires a full binary compilation Need to start a different instance of the application to switch UI language Waste of disk space: multiple copies of core binary All language versions in the same resource file: Relatively easy to implement Allows user interface switching on the fly No compile resource update possible Difficult to update with new languages Difficult to install a subset of UI languages Unsupported languages waste memory and disk space No straightforward way to implement direct resource loading interfaces (LoadMenu, LoadString, etc.) Satellite DLL: Advantages: Allows user interface switching Gives complete control over which languages are installed Easy to update with a new language Language specific updates do not affect all languages Can be implemented for Win9x, NT and Windows 2000 Disadvantages: Need to maintain synchronization of all language satellite DLLs. Myapp.exe Eng Ger Jpn One resource DLL per target language Myapp.exe Eng.dll Ger.dll Jpn.dll 20th International Unicode Conference

Satellite DLL Initialize to current UI language. Windows 2000/XP: GetUserDefaultUILanguage() Down-level platforms: See “Writing Multilingual User Interface Applications” on Globaldev. Allow user to select UI language. Use naming convention, for example: res<LANGID>.dll Find all resource DLLs using FindFirstFile and FindNextFile Use LoadLibrary(Ex) to load DLL file A well behaved application should display its UI in the language of the OS user interface (user’s default/preferred langauge), as well as, allow the user to change the user interface language. The following example shows how to find current UI language and load the appropriate resource DLL for an application. wLangId = GetUserDefaultUILanguage(); _stprintf(g_tcsTemp, _TEXT("res%x.dll"), wLangId); if((hRes = LoadLibrary(g_tcsTemp)) == NULL) { // we didn't find the desired language satellite DLL, lets go with // French (or any default / preference language). hRes = LoadLibrary(_TEXT("res40c.dll")); } To see how to do this for legacy OS’s see “writing Multilingual User Interface Applications at: Applications should also allow users to select a different UI language. By adopting a proper naming convention for satellite resources (for example a separate subdirectory named after the LangID of the resources per language), it is easy at run time to detect the list of languages for which you have available resources. Check out the source code of Global.exe application at the link below for a complete example of this: 20th International Unicode Conference

Windows 2000/XP: Mirroring technology
Developing World-Ready Applications for Windows 2000/XP Windows 2000/XP: Mirroring technology To create an automatic right-to-left layout of the user interface for localized versions of bidirectional languages (Arabic and Hebrew). Windows 98, Arabic and Hebrew versions, and Windows 2000 introduced mirroring support to give a perfect RTL layout look & feel to the UI. In the implementation, the user interface is mirrored (flipped from LTR to RTL) by reversing the 0.0 origin of the screen (more details later). The Image above was taken from a localized Arabic explorer of Windows 2000. All window elements have been mirrored (title-bar, menus, tree view, tool-bar, scroll-bars, …). 20th International Unicode Conference

Coordinate transformation
Developing World-Ready Applications for Windows 2000/XP Coordinate transformation Origin (0,0) in upper RIGHT corner of window X scale factor = -1 X values increase from right to left Origin Origin This introduced technique is based on inverting the 0,0 origin of the screen from the top-left to the top-right. To minimize the amount of code changes required by applications to support mirroring, the GDI and User modules have been modified to turn mirroring on and off with almost no additional changes. Changing the origin of the screen from top-left to top-right affects both screen and client coordinates. ScreenToClient and ClientToScreen APIs do not return the expected coordinates of a given window if the mirroring is turned on. You should use MapWindwPoints instead. It is a new API which implicitly gives you the right coordinates. The coordinate calculation takes in account if mirroring is turned on or off at the OS level. To avoid confusion around coordinates, change your design from the concepts of “left” and “right” and think in terms of “near” and “far.” Increasing x Increasing x 1 1 Default (LTR) window Mirrored (RTL) window 20th International Unicode Conference

Controlling the mirroring style
Developing World-Ready Applications for Windows 2000/XP Controlling the mirroring style Per Process: GetProcessDefaultLayout SetProcessDefaultLayout (LAYOUT_RTL) Per window: CreateWindowEx (WS_EX_LAYOUTRTL | WS_EX_NOINHERITLAYOUT ) SetWindowLong Per DC: GetLayout / SetLayout LAYOUT_BITMAPORIENTATIONPRESERVED ; Win32 APIs have been created to allow mirroring activation/deactivation at different levels. To turn mirroring on or off per process, GetProcessDefaultLayout and SetProcessDefaultLayout APIs are available. Once a process is mirrored, all newly created Windows during the life time of this process will also be mirrored. For windows being create by calls to CreateWindowEx, you can define WS_EX_LAYOUTRTL to force a mirrored layout. Please note that all child windows of this parent window will then be mirrored by default unless the flag WS_EX_NOINHERITLAYOUT is defined. Inheritance does not apply to owned windows. At run-time, you can call SetWindowLong with the WS_EX_LAYOUTRTL flag to mirror a window followed by an InvalidateRect to refresh the displaying area. In a mirrored window, all components are automatically mirrored. However, most of the time, bitmaps and icons should not be mirrored (flipped on vertical axe). To avoid this scenario, you should turn the mirroring off for the affected DC by calls to GetLayout/SetLayout with the LAYOUT_BITMAPORIENTATIONPRESERVED flag. For more information about mirroring technology and other considerations, you can refer to the following MSJ article: 20th International Unicode Conference

Controlling the mirroring style
Developing World-Ready Applications for Windows 2000/XP Controlling the mirroring style Dialog Resources: Set WS_EX_LAYOUTRTL in dialog template Message boxes: Use MB_RTLLAYOUT option BitBlt/StretchBlt: Use NOMIRRORBITMAP flag Unlike windows, dialog boxes cannot be mirrored at run-time. The mirroring style (WS_EX_LAYOUTRTL) should be added to the dialog resources. To mirror a message box that does not belong to a mirrored window, the MB_RTLLAOUT flag can be specified to automatically mirror the message box. To avoid the mirroring of images and bitmaps, you can use the NOMIRRORBITMAP flag in your calls to BitBlt or StretchBltcall. 20th International Unicode Conference

Mirroring common issues
Developing World-Ready Applications for Windows 2000/XP Mirroring common issues Above are two common mirroring bugs. Mirrored bitmap: The Windows logo is drawn in a mirrored DC (Device Context) (inherited from the parent window). Solution: call SetLayout and turn the mirroring off on this DC or call BitBlt or StretchBltcall with NOMIRRORBITMAP flag Off screen bitblt: Because the owner-drawn command or bitmap is placed using absolute positioning, it is drawn outside the desired area. Solution: Get the right coordinate of the host rectangle by calls to MapWindowPoints Mirrored bitmap! Off screen bitblt 20th International Unicode Conference

BiDi & mirroring in web pages
Developing World-Ready Applications for Windows 2000/XP BiDi & mirroring in web pages In a web context, mirroring and RTL reading order go hand-in-hand: Using DIR attribute would: Set the “right” alignment of the text Set the right_to_left reading order of the text Mirror the page context Leave the orientation of stationary elements To set DIR attribute: Html: <html dir=RTL> At an element level DHTML object: document.Dir = "RTL“ To set the mirroring style in a HTML page, the DIR attribute can be manually added to the HTML tag as shown above. In Internet Explorer, you can also set this setting for a web page by right-clicking within the page and selected the encoding option. To set the DIR attribute at run-time in DHTML pages, you can use an approach similar to the one shown below, where the mirror function reverses the DIR attribute of the page: function mirror() { if (document.dir == "rtl") document.dir = "ltr"; } else document.dir = "rtl"; 20th International Unicode Conference

Tips for BiDi web pages Directional images: <IMG style=filter:flipH SRC=arrow.jpg > Avoid explicit alignments: Obsolete usage of “align=left” in tables and cells Avoid absolute positioning of elements Remember: tables get mirrored automatically, use them for robust reversibility! Stationary elements and images should remain in their orientation upon the application of a RTL <DIR> attribute. For directional images (example of an arrow pointing to a text), to force reversibility, the flipH style filter can be used. The code below shows how this can be implemented: if (document.dir == "rtl") strFlip = ""; else strFlip = "flipH"; myID = document.all("myID"); myID.style.filter = strFlip; <img ID="myID" src=arrow.jpg> Explicit alignments would overwrite the DIR attribute settings. English text is by default aligned to the left, therefore, to accommodate bidi localization, avoid an explicite obsolete left alignment. By setting the DIR=RTL attributes, all controls within the web page are repositioned. To avoid breaking functionality/look in html dialogs or elements, DO NOT use absolute positioning. Instead, place all your web or dialog elements within table cells that get rearranged automatically at run-time. 20th International Unicode Conference

Demo! Mirrored DHTML 20th International Unicode Conference

Localizability Localization of software: Adapting user interface elements to a specific language Localization should require no engineering changes! Changes to code are part of localizability process Source code changes, due to localization, are bugs in the core code! Besides creating applications that can be used in many different locales (I.e. bigger market, more revenue), the other big reason for Globalization and Localizability is to keep localization costs down. Localizability is the biggest area to help keep these costs down. When your localization team begins to localize your software every code change that could have been avoid by localizability, multiplies by how many language markets you are preparing you applications for. (I.e. If your going into 24 markets, the cost of every non-localizability issue not addressed in the core code, costs you up to 24 times its original cost to fix, because you may to have to fix it in every language version.) 20th International Unicode Conference

Challenges Developers are focused on their primary language Coding “tricks” to save work (saves a few $) Hardcoded strings Creating text strings from phrases by concatenation Creates dialog boxes using overlapped controls The “tricks” cost lots of $ because: Bugs difficult to detect before localization is done Code change required to address the issue In this world, we all tend to focus on what is comfortable and familiar to us. When we learned our native language, we were all young and it just happened. Thus it becomes a background task that we all take for granted. This is the case when it comes to dealing with languages in applications. Most developers’ approach to language is that they don’t even give it a second thought, thus they come up with “clever” coding “tricks” to save work and supposedly save money. Some of these tricks and the programmers reasons behind them are: “Hardcoding” strings in their code. Thus not having to take the time to create some messaging service. Creating text strings from phrase by concatenation. i.e. “The [flood gate/control rods/exit door] can not be [closed/open/reset].” Thus saving storage space. Overlapping dialog box’s controls. Thus saving display space in the dialog box. Although these tricks are clever and may save some work for the developer, in the long run they cause extra costs to appear when you localize your applications. 20th International Unicode Conference

Unbreakable rules (1/3) Remove all localizable resources from source code, place in standard resource file Do not place non-localizable strings in resource file Removing all localizable resources (text, graphics, font names, etc.), allows localizers to translate the resources with out having to go into your code. One little know feature of windows 9x and NT resources is that they have always been stored as Unicode. This is a nice feature, because since a lot of development editors and environments are still not Unicode, you can create your resources in some other encoding and still have them usable in your Unicode aware app. In the same light, make sure that you do not put non-localizable strings (internal commands, file names, etc.) in these resource. If you do, the localization team may localize these things that you are counting on being the same from language to language, thus breaking the applications. 20th International Unicode Conference

Unbreakable rules (2/3) Avoid composite strings that are built at runtime: Wrong way: var t1_text = "Not enough memory to"; t2_text = "the file"; v1_text = "open"; v2_text = "copy"; v3_text = "save"; ... text = t1_text+" "+v2_text+" "+t2_text+" "+filename+"."; Right way var t1_text = "Not enough memory to open the file %s1."; var t2_text = "Not enough memory to copy the file %s1."; var t3_text = "Not enough memory to save the file %s1."; Use FormatMessage for multiple variable sentences Thus: "Not enough memory to %s1 the file %s2."; Becomes: ”Liian vähän muistia tiedoston %2 %1.”: By creating sentences from concatenated fragments at run time, your localization team does not know in what context the fragments are used. This causes many incorrect usage of the locale language, because in many languages, fragments will be translated differently depending on the context of the whole sentence. When it is absolutly necessary for a string to have more than one variable, use FormatMessage to display it. FormatMessage edits the message by substituting the supplied parameter values for placeholder variables in the message. 20th International Unicode Conference

Unbreakable rules (3/3) Do not reuse string resources: If the same string resource is to be used in more than one place, create one instance of the resource per use Use the same resource identifiers throughout the life of a product String resources should never be reused in different context. Take for example the word “File”. Is this the word that means the verb “to file’ as in “file the letter” or the verb “to file” meaning to make something smaller and smooth as in “file down your nails” (and is that your finger nails or iron nails that you use to build with wood). Each of these may and most of the time have different corresponding words in other languages. As part of using some type of messaging system like resource files, each resource should be given some unique identifier. If you change this identifier, then you will have to go back to each of your localized versions of the resource and change their identifiers also. 20th International Unicode Conference

Text Expansion Allow for text expansion Next to separating localizable resources from source code, not allowing for text expansion is the second biggest problem in localizability. Most languages take up more space than English. Thus designers and developers need to leave space to allow for the bigger localized text. A good rule of thumb for software is that if your text is less than 10 characters allow for 300% expansion. If the text is more than 10 characters allow for at least 30% expansion. Other ways to allow for expansion are: Turn wrap on for controls. This is important to remember because the default is nowarp. This simple change makes it possible for text to wrap if necessary. When creating radio buttons and check boxes leave room for the text to wrap if necessary. Leave room for vertical expansion as well as for horizontal expansion. Some languages (Arabic, Japanese, etc.) need bigger point sizes in order to be readable. Rule of thumb: < 10 = 300% > 10 = 30% 20th International Unicode Conference

Text Expansion (web pages)
Developing World-Ready Applications for Windows 2000/XP Text Expansion (web pages) Design so Entire Dialog Consists of Tables: <body>  <table width=100%> <… can contain other tables …> </table> </body> Avoid Fixed Width Items Each Control Should be in a Separate Cell Allow Text Wrapping - do not use “nowrap” Separate Check Boxes and Radio Buttons from Labels Design dialogs to take advantage of the available width and height. This is done by building the dialog using tables sized to width=100%. The goal of this rule is to ensure the dialog can take advantage of the available width and height. Fixed dialog widths should only be used when absolutely necessary (e.g., the dialog must match the size of other tabs in tabbed dialog). Of course, this is not an easy thing to do. It takes careful planning and you need to explicitly decide up-front how controls are related to each other. Key points to decide are: Which controls are the same size. These must be contained in the same cell of the table, but can be on a different row. Which controls are left or right aligned with each other. Which controls can push other controls to the right. Which controls can have to have a fixed size. Try your best to put each control into a separate cell. This is needed to allow the text to wrap independently and for flipping and alignment to work in MidEast. Do not use the “nowrap” attribute for table cells that contain text. This will prevent the text from wrapping when it grows longer. You can only use this when you want the text to stay on one line and there is enough room for it to grow for all languages. The second major advantage of tables is that the text will automatically wrap if it does not fit in a table cell. This greatly increases the amount of space available for translated text. Place the labels for checkboxes and radio buttons in separate cells to the controls. This allows them to wrap correctly when the text grows longer. However, you should still design the dialog so that the text has a reasonable chance of staying on one line when it is translated. 20th International Unicode Conference

Text Expansion HTML Dialogs
Developing World-Ready Applications for Windows 2000/XP Text Expansion HTML Dialogs Here is an example of what appears to a user to be a single dialog box but in reality is several tables inside a defined parent table. Notice how the text wraps automatically to fit the size of the cell as changes (“Find what” and “Match whole word only”). Also notice that because the check boxes and the text are in two different cells, when the text wraps to the next line it aligns itself to the text above it and does not align itself under the check box (“Match whole word only”). 20th International Unicode Conference

Mirroring HTML Dialogs
Developing World-Ready Applications for Windows 2000/XP Mirroring HTML Dialogs As mentioned before, another advantage of using a table to create dialog boxes and web pages is it makes mirroring the dialog much easier for Hebrew and Arabic languages. To mirror this example, the only thing done different was set the DIR attribute to “RTL” instead of “LTR”. Notice how both the text alignment and the positioning of the check-boxes and buttons where done automatically. 20th International Unicode Conference

Localizability Pseudo-Localization is a good way to test Localizability One way to test how well your product can be localized is by creating a pseudo-localization that your source language testers can test as part of their overall plan. A pseudo-localization can be created by replacing all non accented vowels with accented ones (a --> å, e --> ê, etc.), adding expansion characters (!!! !!! !!!) and replacing other characters with close representation (0->Θ, n->ñ, s->ş, etc.). Also, since resource files are encoded as Unicode, all of these different languages can be bundle together in one pseudo-localization. 20th International Unicode Conference

Final Conclusions Benefits of investing in development of World-Ready applications are real Windows 2000/XP eases the pain and sets the standard The biggest task in implementing World-Ready applications is setting the designers and engineers mind-set to think GLOBAL At Microsoft, we have seen the benefits of investing in developing World-Ready applications. Currently over 50% of our revenue comes from outside the United States. We have also seen that putting in place a Unicode-based local-aware application frame work allows us to move into new markets much quicker than before. This is a by-product of considering English as just another language to be handled by our global products. Our biggest task is getting the designers/engineers to think globally and not locally, but this task has been made much easier because we can deliver globally aware development tools that makes the task that much easier. 20th International Unicode Conference

Resources MSDN for latest documentation about new APIs Developing International Software for Windows 95 and Windows NT Windows 2000/XP Globalization: World-Ready Guide You are not World-Ready If… aliases: The MSJ articles on Uniscribe and Writing a Unicode Application for Win9x by our colleagues F. Avery Bishop and David Brown are invaluable. “Developing International Software for Windows 95 and Windows NT” by Nadine Kano, Microsoft Press (no longer published), Online at: Nadine’s book is still the single best reference. Some of the material is now a bit dated because it doesn’t cover the new features in Windows 2000 but what is there is generally still accurate. The Windows 2000 Software Developers Kit has a lot of good information. Especially if you want to write a new input method or keyboard driver. Visit our web site: Check-out our guidelines: World-Ready Guide You are not World-Ready if… Subscribe to our site update notification: and feel free to contact us. 20th International Unicode Conference

Questions? 20th International Unicode Conference

Developing World-Ready Applications for Windows 2000/XP

Similar presentations

Presentation on theme: "Developing World-Ready Applications for Windows 2000/XP"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Developing World-Ready Applications for Windows 2000/XP

Similar presentations

Presentation on theme: "Developing World-Ready Applications for Windows 2000/XP"— Presentation transcript:

Similar presentations

About project

Feedback