Presentation is loading. Please wait.

Presentation is loading. Please wait.

© 1998, Progress Software Corporation 1 Migration of a 4GL and Relational Database to Unicode Tex Texin International Product Manager.

Similar presentations


Presentation on theme: "© 1998, Progress Software Corporation 1 Migration of a 4GL and Relational Database to Unicode Tex Texin International Product Manager."— Presentation transcript:

1 © 1998, Progress Software Corporation 1 Migration of a 4GL and Relational Database to Unicode Tex Texin International Product Manager

2 © 1998, Progress Software Corporation 2 Presentation Goals Outline Migration Steps Describe Design Considerations Leverage Existing Double-byte Implementation Describe Impact on 4GL and Report Formats

3 © 1998, Progress Software Corporation 3 PROGRESS Application Development Suite Powerful tools for the rapid creation of distributed business applications Creates character, GUI, or web-based clients with common source Host-based, client-server, or n-tier distribution on variety of platforms Scalable, robust RDBMS and open International, double-byte enabled

4 © 1998, Progress Software Corporation 4 Optional n-tier Application Server Possible Configuration Options Database Server Progress Database Host-based Character Client GUI Client Client-Server Web-based Client Other Database

5 © 1998, Progress Software Corporation 5 Why do our customers need Unicode? Many do not... However, Multinationals deploy across regions with incompatible character sets, yet they must share data between them. Programs are distributed worldwide with one container of text in many languages. Certain applications require multilingual databases. E.g. Translation systems and web-based applications.

6 © 1998, Progress Software Corporation 6 The Existing Architecture 1.5M lines of C code 0.3M lines of 4GL code Double-byte enabled –CJK, 9 double-byte charsets supported –2-byte only, no 3 or 4-byte –No shift-sequenced charsets –DBE changes earmarked, easy to find –4 years, 3 developers, 2 QA

7 © 1998, Progress Software Corporation 7 Estimated cost of implementing UCS-2, was very big! Changing to 16-bit text units affects almost every source module –Largest cost is separating char variables based on usage for text or binary data. –Use 16-bit null terminators, ignore 8-bit A 0041, 0000 Ã 0100, 0000 –Pointer arithmetic (advance 2 bytes) –Sizing (bytes or characters) –New API to use new WIDE TEXT datatype

8 © 1998, Progress Software Corporation 8 Product requirements for a multilingual version Minimize cost for application migration Minimize cost for application upgrade Minimize support cost –One executable! Maintain user-definable character sets Add UTF-8 as just another character set –UTF-8 algorithms are compatible with other charsets

9 © 1998, Progress Software Corporation 9 Scaled down multilingual proposal: UTF-8 implementation Implement UTF-8 as 3-byte character set –Leverage & extend double-byte enabling –Places to change are already earmarked –Restrict to composed characters for now –Restrict to no surrogates Supports all the markets we are in UTF-8-enable 4GL and RDBMS first –Provides multilingual logic and storage –Java+other client technologies coming

10 © 1998, Progress Software Corporation 10 Architecture changes UTF-8-enabling the string library N-byte enable character+string functions –GetNextChar, GetPreviousChar –GetCharacterSize (table-based) –Modified IsFirstByte New GetColumnLength New datatype normalized BIG char Minor algorithm changes for efficiency –Find Character

11 © 1998, Progress Software Corporation 11 Architecture changes UTF-8-enabling character tables String libraries use character tables –Alphanumeric, Lead-byte, Tail-byte –Upper, lower case (700+ characters) New property ColumnCount New table formats –Old architecture presumed 256 byte table –Now organized by range lists and trie Update table compiler & allow hex entry

12 © 1998, Progress Software Corporation 12 Architecture changes UTF-8-enabling sorting How to sort multilingual data? Binary sort used for double-byte data With UTF-8, Europe is 2-byte, CJK 3-byte Solution –Binary sort on server –Client uses native sort Bump key length limit for UTF-8 Next phase will be enhanced sort

13 © 1998, Progress Software Corporation 13 Architecture changes Character conversion algorithms Existing, user-definable, conversions –Single-byte character set table maps –Double-byte Shift-JIS - EUCJIS algorithm New table-driven automated conversions –Single-byte to UTF-8, and back –Double-byte to UCS-2 and back –UTF-8 - UCS-2 –Trie for speed and memory optimization Requires significant QA for data integrity

14 © 1998, Progress Software Corporation 14 Architecture changes Impact on the 4GL user 4GL is character set independent Almost all functions are character-based 3 functions require optional byte-basing –Length, Substring, Overlay –Options: Byte, Character Add new option: Column Format (Picture) Phrase –XXXX has different meaning for UTF-8

15 © 1998, Progress Software Corporation 15 Status Functioning Well Going to second beta Implemented with very low cost Performance is OK –Metrics not yet available Testing is most significant cost –Reviewing all character set properties –Evaluating all conversions

16 © 1998, Progress Software Corporation 16 Pièce de Résistance

17 © 1998, Progress Software Corporation 17 Futures For the Progress International Team –Multilingual Clients –Enhanced Character Folding –Enhanced Sorting For Progress Customers –Deployment of multilingual databases –Worldwide access to these databases –Worldwide deployment of multi-language applications

18 © 1998, Progress Software Corporation 18 Conclusions Migration can be achieved in phases Migration thru UTF-8 can be low cost Double-byte applications can migrate easily to UTF-8 Asian users can integrate with other languages now Non-English users can integrate with Asian languages now

19 © 1998, Progress Software Corporation 19 Any questions?


Download ppt "© 1998, Progress Software Corporation 1 Migration of a 4GL and Relational Database to Unicode Tex Texin International Product Manager."

Similar presentations


Ads by Google