Presentation is loading. Please wait.

Presentation is loading. Please wait.

© 2008 IBM Corporation Knotty problems in date/time parsing and formatting and time zones Yoshito Umaoka IBM Globalization Center of Competency 32nd Internationalization.

Similar presentations


Presentation on theme: "© 2008 IBM Corporation Knotty problems in date/time parsing and formatting and time zones Yoshito Umaoka IBM Globalization Center of Competency 32nd Internationalization."— Presentation transcript:

1 © 2008 IBM Corporation Knotty problems in date/time parsing and formatting and time zones Yoshito Umaoka IBM Globalization Center of Competency 32nd Internationalization and Unicode Conference

2 IUC32: Knotty problems in date/time parsing and formatting and time zones © 2008 IBM Corporation 2 Agenda  Challenges for Implementing Date and Time UI  Understanding Time Zone Formatting Parsing

3 IUC32: Knotty problems in date/time parsing and formatting and time zones © 2008 IBM Corporation 3 Challenges for Implementing Date and Time UI  Two examples –Google Calendar –IBM Lotus Notes  Walking through various requirements for displaying date and time  Solutions provided by CLDR  Design/Implementation Tips

4 IUC32: Knotty problems in date/time parsing and formatting and time zones © 2008 IBM Corporation 4 Google Calendar

5 IUC32: Knotty problems in date/time parsing and formatting and time zones © 2008 IBM Corporation 5 Lotus Notes 8 Calendar

6 IUC32: Knotty problems in date/time parsing and formatting and time zones © 2008 IBM Corporation 6 Date Format Types Basic: July 27, 2008 Relative: Today Basic: July 28, 2008 Relative: Tomorrow Basic: August 3, 2008 Relative: August 3, 2008 Interval: July , 2008 Duration: 1 day Interval: July 27 – August 3, 2008 Duration: 7 days

7 IUC32: Knotty problems in date/time parsing and formatting and time zones © 2008 IBM Corporation 7 Mini Calendar  Month –Different form without date in some locales –Eg. Polish - lipiec (nominative) vs. lipca (genitive) –lipiec 2008 –28 lipca 2008  Day of week –Very short abbreviation –Not always the first letter of day of week name –Eg. Chinese: 星期日 ⇒ 日  The first day of week –Sunday is the first day of week in many regions, but it’s not true in some regions.

8 IUC32: Knotty problems in date/time parsing and formatting and time zones © 2008 IBM Corporation 8 Month/Day of Week Names in CLDR  3 different widths - wide / abbreviated / narrow  2 context types – format / stand-alone Localeformatstand-alone wideabbreviatednarrowwideabbreviatednarrow en_USJanuaryJanJJanuaryJanJ pl_PLstyczniastysstyczeństys ru_RUянваряянв.ЯЯнварьянв.Я Localeformatstand-alone wideabbreviatednarrowwideabbreviatednarrow en_USSundaySunSSundaySunS zh_Hans_CN 星期日周日日星期日周日日 Month name example - January Day of week name example - Sunday

9 IUC32: Knotty problems in date/time parsing and formatting and time zones © 2008 IBM Corporation 9 Date and Time Interval  When displaying a date interval, duplicated date fields could be stripped off. –3 possible patterns depending on combination of start date and end date –July 20–26, 2008 –July 20 – August 1, 2008 –July 20, 2008 – July 19, 2009 –Different combination patterns in different locales –20–26 July 2008 –20 July – 1 August 2008 –20 July 2008 – 19 July 2009

10 IUC32: Knotty problems in date/time parsing and formatting and time zones © 2008 IBM Corporation 10 Date/Time Interval in CLDR MMM d, yyyy – MMM d, yyyy MMM d – MMM d, yyyy MMM d–d, yyyy  Each is associated with as “skeleton” pattern and contains one or more patterns  A element contains a pattern which will be used when the greatest difference of two given dates matches its “id” attribute

11 IUC32: Knotty problems in date/time parsing and formatting and time zones © 2008 IBM Corporation 11 Other Challenges  Various combinations of date fields and widths –“Sat 7/26” –The UI requires to display short format including month, day of month and day of week, but not year –The pattern could be changed depending on the locale –“Sat 26/7” for en_GB –“7/26( 土 )” for ja_JP  Week number –Week number is commonly used in European countries –The way of calculating week numbers in a year may vary depending on local conventions

12 IUC32: Knotty problems in date/time parsing and formatting and time zones © 2008 IBM Corporation 12 Flexible Date Format Support in CLDR (1)  contains various  Each has id attribute representing “skeleton”  “skeleton” contains only field information in a canonical order  A CLDR consumer provides a “skeleton” – When the matching “skeleton” is available in the locale, the associated pattern is returned. If not, closest match which contains all requested fields is returned. E d MMM d MMMM dd/MM d/M MMM yy MM/yyyy MMMM yyyy

13 IUC32: Knotty problems in date/time parsing and formatting and time zones © 2008 IBM Corporation 13 Flexible Date Format Support in CLDR (2)  When any element does not satisfy the matching criteria, use the rules defined by to append missing fields to one of the existing format. {0} ({2}: {1}) {0} {1} {0} ({2}: {1}) {0} {1} {0} ({2}: {1}) {0} {1}

14 IUC32: Knotty problems in date/time parsing and formatting and time zones © 2008 IBM Corporation 14 Week Data in CLDR  –minDays: minimum days in the first week –firstDay: first day in a week –weekendStart/weekendEnd: start/end day of weekend

15 IUC32: Knotty problems in date/time parsing and formatting and time zones © 2008 IBM Corporation 15 Comparison of Format Functions Standard C libraryMicrosoft.NETJDKICU Basic format functionstrftime/wcsftimeDateTimeSimpleDateFormat Predefined format patterns LC_TIME  date  time  date & time DateTimeFormatInfo  date (long/short)  time (long/short)  date & time  month and day  year and month DateFormat constructor  date  time  date & time  4 different lengths for above (full/long/medium/short) DateFormat constructor  Same with JDK  Support for arbitrary combination of date fields using “skeleton” pattern Localized month/day names LC_TIME  full & abbreviated DateTimeformatInfo  full & abbreviated  genitive month  shortest day names DateFormatSymbols  full & abbreviated DateFormatSymbols  full/abbreviated/narrow  formatting/standalone Relativen/a DateFormat (RelativeDateFormat) Intervaln/a DateIntervalFormat Durationn/a TimeUnitFormat Calendar systemGregorian and its variants 15 calendar typesGregorian, Thai Buddhist and Japanese 11 calendar types

16 IUC32: Knotty problems in date/time parsing and formatting and time zones © 2008 IBM Corporation 16 Design/Implementation Tips  Keep internal date/time representation locale-independent –Localized format may vary depending on implementation –Use standard format such as ISO8601 for data exchange  Do not hardcode format patterns in your source code  Do not put format patterns in resource bundles with other localizable messages! –Locale support is more than UI translation –Translation vendors are usually not able to handle regional variants –You should be able to find solutions in CLDR/ICU – if no available, file bugs to request new features  Avoid date/time data entry by text –Formatting date/time is complicated, so is parsing –Use UI widget to eliminate ambiguous data entry  Understand regional conventions of calendar system –Rules for calculating some calendar fields may vary  Be prepared to support non-Gregorian calendar systems –For example, –Buddhist calendar is the most preferred calendar system in Thai –Japanese calendar support may be required depending on target sectors

17 IUC32: Knotty problems in date/time parsing and formatting and time zones © 2008 IBM Corporation 17 Understanding Time Zone Formatting and Parsing  CLDR’s approach for supporting time zone formatting  Choosing a right time zone format type for your needs  Tips for processing date/time with time zone

18 IUC32: Knotty problems in date/time parsing and formatting and time zones © 2008 IBM Corporation 18 Time Zone Implementations  The tz database (a.k.a Olson database) –568 zones (436 unique zones / 132 aliases) (2008d) –Support historic time transitions since late 19 th century –At least 1 zone per country/region –Time zone abbreviations for display (3 or 4 letter ASCII alphabet), such as “EST”, “JST”… –Used by *nix systems (Solaris, Linux, AIX, Mac OS X…) and Java  MS Windows time zone –84 zones (Windows Vista), some are obsolete –Support historic rules (2005 and beyond) in Vista/2008 Server (Dynamic DST) –A zone is shared by multiple cities/countries –Time zone display names including the standard offset and common name or exemplar cities, such as “(GMT-05:00) Eastern Time (US & Canada)”, “(GMT+09:00) Osaka, Sapporo, Tokyo”…

19 IUC32: Knotty problems in date/time parsing and formatting and time zones © 2008 IBM Corporation 19 Time Zone Format Types in CLDR (1)  Generic location format –Designed for populating choice lists for time zones –Uniquely mapped to “canonical” zone IDs –Examples –Europe/Rome ⇔ Italy Time [en] –America/New_York ⇔ United States (New York) Time [en] –America/New_York ⇔ Hora de Estados Unidos (New York) [es]  Generic non-location format –Designed for recurring events, meetings, or anywhere people do not want to be overly specific –Two widths – long/short –Examples –America/New_York ⇒ ET [en/short] –America/New_York ⇒ Eastern Time [en/long] –America/Montreal ⇒ Eastern Time [en/long]

20 IUC32: Knotty problems in date/time parsing and formatting and time zones © 2008 IBM Corporation 20 Time Zone Format Types in CLDR (2)  Generic partial location format –A variant of generic non-location format – used as a fallback name when the generic non-location format is not specific enough –Two widths – long/short –Examples –America/Mexico_City ⇒ Hora central (Ciudad de México) [es_US/short/Mar 9 – April 6, 2008] –America/Chicago ⇒ Hora central (Chicago) [es_MX/short/Mar 9 – April 6, 2008]  Specific (non-location) format –Designed to distinguish between standard time and daylight time –Two widths – long/short –Examples –America/New_York ⇒ EST [en/short/standard time] –America/New_York ⇒ EDT [en/short/daylight time] –America/New_York ⇒ Eastern Standard Time [en/long/standard time] –America/Montreal ⇒ Eastern Standard Time [en/long/standard time]

21 IUC32: Knotty problems in date/time parsing and formatting and time zones © 2008 IBM Corporation 21 Time Zone Format Types in CLDR (3)  Localized GMT format –Designed for representing the offset from GMT –Local decimal digits are used –Examples –America/New_York ⇒ GMT-05:00 [en/standard time] –America/New_York ⇒ GMT-04:00 [en/daylight time] –America/New_York ⇒ Гриинуич-0500 [bg/standard time]  RFC 822 format –Locale in-sensitive “fixed” format representing the offset from GMT defined by RFC 822 –ASCII decimal digits are always used –Examples –America/New_York ⇒ [standard time] –America/New_York ⇒ [daylight time]

22 IUC32: Knotty problems in date/time parsing and formatting and time zones © 2008 IBM Corporation 22 CLDR Metazone  A metazone is an grouping of one or more internal zones that share common non-location display names –Following zones are currently associated with a metazone “America_Eastern” (CLDR 1.6.1) America/Nassau, America/Resolute, America/Coral_Harbour, America/Thunder_Bay, America/Nipigon, America/Toronto, America/Montreal, America/Iqaluit, America/Pangnirtung, America/Port-au-Prince, America/Jamaica, America/Cayman, America/Panama, America/Grand_Turk, America/Indiana/Vincennes, America/Indiana/Petersburg, America/Indiana/Marengo, America/Indiana/Winamac, America/Indianapolis, America/Louisville, America/Indiana/Vevay, America/Kentucky/Monticello, America/Detroit, America/New_York  Each metazone has a set of localizable names –Following names are used for metazone “America_Eastern” (CLDR 1.6.1) Localelongshort genericstandarddaylightgenericstandarddaylight enEastern TimeEastern Standard TimeEastern Daylight TimeETESTEDT frHeure de l’EstHeure normale de l’EstHeure avancée de l’EstHEHNEHAE zh_Hans 美国东部时间东部标准时间东部夏令时间 ETESTEDT

23 IUC32: Knotty problems in date/time parsing and formatting and time zones © 2008 IBM Corporation 23 Time Zone Short Abbreviation Problem  2 to 4 letter ASCII alphabets abbreviations are used for short names, such as ET, EST, PDT…  The extent to which time zone abbreviations are understood varies heavily by region –For example, how many people recognize EAT (East Africa Time) in US?  CLDR’s solution - a boolean value associated with a zone/metazone “commonlyUsed” to enable/disable short abbreviations –Metazone “Africa_Eastern” has a short standard name “EAT” for English locales –For metazone “Africa_Eastern” –commonlyUsed = true in en_ZA [English (South Africa)] –commonlyUsed = false in en_US [English (United States)]

24 IUC32: Knotty problems in date/time parsing and formatting and time zones © 2008 IBM Corporation 24 Ambiguous Time with Generic format  Daylight ⇒ Standard transition –Sunday, November 2, :30:00 Pacific Time? –Valid, happens twice –Generic format cannot distinguish between 1:30 PST and 1:30 PDT  Standard ⇒ Daylight transition –Sunday, March 9, :30:00 Pacific Time? –Invalid! –30 minutes 1 second after 01:59:59? or 30 minutes before 03:00:00?

25 IUC32: Knotty problems in date/time parsing and formatting and time zones © 2008 IBM Corporation 25 CLDR Time Zone Formatting Patterns (1) LetterWidthFormat DescriptionExampleRoundtrip time Roundtrip canonical zone z 1…3Specific non-location short format (commonlyUsed = true) ⇒ Localized GMT format PST PDT GMT-08:00 yesno 4Specific non-location long format ⇒ Localized GMT format Pacific Standard Time Pacific Daylight Time GMT-08:00 yesno Z 1…3RFC 822 format-0800yesno 4Localized GMT formatGMT-08:00 Гриинуич-0800 yesno

26 IUC32: Knotty problems in date/time parsing and formatting and time zones © 2008 IBM Corporation 26 CLDR Time Zone Formatting Patterns (2) LetterWidthFormat DescriptionExampleRoundtrip timeRoundtrip canonical zone v 1 Generic non-location short format (commonlyUsed = true) ⇒ Generic partial location short format & (commonlyUsed = true) ⇒ Localized GMT format PT PT (Canada) PT (Yellowknife) GMT-08:00 no (at transition) no 4 Generic non-location long format ⇒ Generic partial location long format ⇒ Localized GMT format Pacific Time Pacific Time (Canada) Pacific Time (Yellowknife) GMT-08:00 no (at transition) no V 1 Specific non-location short format ⇒ Localized GMT format PST PDT GMT-08:00 yesno 4 Generic location format ⇒ Localized GMT format (only for GMT style time zones such as Etc/GMT+8) Italy Time United States (Los Angeles) Time GMT-08:00 no (at transition) yes

27 IUC32: Knotty problems in date/time parsing and formatting and time zones © 2008 IBM Corporation 27 Tips for Processing Date/Time with Time Zone  For serializing future date/time data in text format, use RFC 822 format with zone ID –Time zone rules could be changed –GMT offset information along with zone ID is sufficient to fix up data  The result of java.util.Date#toString() might be ambiguous –“CST” is used for both “America/Chicago” and “Asia/Shanghai” in Java –CLDR does not use a same name for multiple time/meta zone  Many zones in tz database use LMT (Local Mean Time) as initial offset –LMT is calculated from the longitude and the GMT offset has a fraction of minute –ISO8601 / RFC822 / Java GMT format does not have second field, so it may not roundtrip  Minimize the dependencies on Windows time zone in multi-platform applications –Some windows time zones are not well maintained –No historic time zone rule support before Vista/2008 server –Mapping between Windows time zones and the tz database is 1-to-n

28 IUC32: Knotty problems in date/time parsing and formatting and time zones © 2008 IBM Corporation 28 Links  Unicode CLDR project -  UTS#35 UNICODE LOCALE DATA MARKUP LANGUAGE (LDML) -  ICU Project -  tz database -


Download ppt "© 2008 IBM Corporation Knotty problems in date/time parsing and formatting and time zones Yoshito Umaoka IBM Globalization Center of Competency 32nd Internationalization."

Similar presentations


Ads by Google