We think you have liked this presentation. If you wish to download it, please recommend it to your friends in any social system. Share buttons are a little bit lower. Thank you!
Presentation is loading. Please wait.
Published bySara Crawford
Modified over 4 years ago
© 1998, Progress Software Corporation 1 Migration of a 4GL and Relational Database to Unicode Tex Texin International Product Manager
© 1998, Progress Software Corporation 2 Presentation Goals Outline Migration Steps Describe Design Considerations Leverage Existing Double-byte Implementation Describe Impact on 4GL and Report Formats
© 1998, Progress Software Corporation 3 PROGRESS Application Development Suite Powerful tools for the rapid creation of distributed business applications Creates character, GUI, or web-based clients with common source Host-based, client-server, or n-tier distribution on variety of platforms Scalable, robust RDBMS and open International, double-byte enabled
© 1998, Progress Software Corporation 4 Optional n-tier Application Server Possible Configuration Options Database Server Progress Database Host-based Character Client GUI Client Client-Server Web-based Client Other Database
© 1998, Progress Software Corporation 5 Why do our customers need Unicode? Many do not... However, Multinationals deploy across regions with incompatible character sets, yet they must share data between them. Programs are distributed worldwide with one container of text in many languages. Certain applications require multilingual databases. E.g. Translation systems and web-based applications.
© 1998, Progress Software Corporation 6 The Existing Architecture 1.5M lines of C code 0.3M lines of 4GL code Double-byte enabled –CJK, 9 double-byte charsets supported –2-byte only, no 3 or 4-byte –No shift-sequenced charsets –DBE changes earmarked, easy to find –4 years, 3 developers, 2 QA
© 1998, Progress Software Corporation 7 Estimated cost of implementing UCS-2, was very big! Changing to 16-bit text units affects almost every source module –Largest cost is separating char variables based on usage for text or binary data. –Use 16-bit null terminators, ignore 8-bit A 0041, 0000 Ã 0100, 0000 –Pointer arithmetic (advance 2 bytes) –Sizing (bytes or characters) –New API to use new WIDE TEXT datatype
© 1998, Progress Software Corporation 8 Product requirements for a multilingual version Minimize cost for application migration Minimize cost for application upgrade Minimize support cost –One executable! Maintain user-definable character sets Add UTF-8 as just another character set –UTF-8 algorithms are compatible with other charsets
© 1998, Progress Software Corporation 9 Scaled down multilingual proposal: UTF-8 implementation Implement UTF-8 as 3-byte character set –Leverage & extend double-byte enabling –Places to change are already earmarked –Restrict to composed characters for now –Restrict to no surrogates Supports all the markets we are in UTF-8-enable 4GL and RDBMS first –Provides multilingual logic and storage –Java+other client technologies coming
© 1998, Progress Software Corporation 10 Architecture changes UTF-8-enabling the string library N-byte enable character+string functions –GetNextChar, GetPreviousChar –GetCharacterSize (table-based) –Modified IsFirstByte New GetColumnLength New datatype normalized BIG char Minor algorithm changes for efficiency –Find Character
© 1998, Progress Software Corporation 11 Architecture changes UTF-8-enabling character tables String libraries use character tables –Alphanumeric, Lead-byte, Tail-byte –Upper, lower case (700+ characters) New property ColumnCount New table formats –Old architecture presumed 256 byte table –Now organized by range lists and trie Update table compiler & allow hex entry
© 1998, Progress Software Corporation 12 Architecture changes UTF-8-enabling sorting How to sort multilingual data? Binary sort used for double-byte data With UTF-8, Europe is 2-byte, CJK 3-byte Solution –Binary sort on server –Client uses native sort Bump key length limit for UTF-8 Next phase will be enhanced sort
© 1998, Progress Software Corporation 13 Architecture changes Character conversion algorithms Existing, user-definable, conversions –Single-byte character set table maps –Double-byte Shift-JIS - EUCJIS algorithm New table-driven automated conversions –Single-byte to UTF-8, and back –Double-byte to UCS-2 and back –UTF-8 - UCS-2 –Trie for speed and memory optimization Requires significant QA for data integrity
© 1998, Progress Software Corporation 14 Architecture changes Impact on the 4GL user 4GL is character set independent Almost all functions are character-based 3 functions require optional byte-basing –Length, Substring, Overlay –Options: Byte, Character Add new option: Column Format (Picture) Phrase –XXXX has different meaning for UTF-8
© 1998, Progress Software Corporation 15 Status Functioning Well Going to second beta Implemented with very low cost Performance is OK –Metrics not yet available Testing is most significant cost –Reviewing all character set properties –Evaluating all conversions
© 1998, Progress Software Corporation 16 Pièce de Résistance
© 1998, Progress Software Corporation 17 Futures For the Progress International Team –Multilingual Clients –Enhanced Character Folding –Enhanced Sorting For Progress Customers –Deployment of multilingual databases –Worldwide access to these databases –Worldwide deployment of multi-language applications
© 1998, Progress Software Corporation 18 Conclusions Migration can be achieved in phases Migration thru UTF-8 can be low cost Double-byte applications can migrate easily to UTF-8 Asian users can integrate with other languages now Non-English users can integrate with Asian languages now
© 1998, Progress Software Corporation 19 Any questions?
Case Study: Examining the Results of P2P Collaboration at PricewaterhouseCoopers February 14, 2001 Case Study: Examining the Results of Collaboration at.
© Copyright 2007 Exempler Telecom Test Automation System Exempler - We pride ourselves with providing lightweight robust engineering solutions.
1 Capability Set - Detail. 2 Common Content Problems Content Mayhem –File management and storage confusion Content Multiplication –Editing déjà vu - same.
Analysis of Computer Algorithms
Chapter 1: The Database Environment
Chapter 11 Introduction to Programming in C
1 jNIK IT tool for electronic audit papers 17th meeting of the INTOSAI Working Group on IT Audit (WGITA) SAI POLAND (the Supreme Chamber of Control)
Introduction to Product Family Engineering. 11 Oct 2002 Ver 2.0 ©Copyright 2002 Vortex System Concepts 2 Product Family Engineering Overview Project Engineering.
ITCR Success through Innovation iTCR Success through Innovation CiTRs DECADE Strategy ä DECADE vision integrated electronic customer access.
Extending Eclipse CDT for Remote Target Debugging Thomas Fletcher Director, Automotive Engineering Services QNX Software Systems.
Worldwide typography (and how to apply JIS-X to Unicode) Michel Suignard Microsoft Corporation.
Beyond Text Representation Building on Unicode to Implement a Multilingual Text Analysis Framework Thomas Hampp – IBM Germany Content Management Development.
From UCS-2 to UTF-16 Discussion and practical example for the transition of a Unicode library from UCS-2 to UTF-16.
18 th International Unicode Conference Documentum Proprietary 1 18 th International Unicode Conference Documentum and UTF-8: Converting Content Management.
The creation of "Yaolan.com" A Site for Pre-natal and Parenting Education in Chinese by James Caldwell DAE Interactive Marketing a Web Connection Company.
Credit hours: 4 Contact hours: 50 (30 Theory, 20 Lab) Prerequisite: TB143 Introduction to Personal Computers.
1 Copyright © 2005, Oracle. All rights reserved. Introducing the Java and Oracle Platforms.
0 - 0.
MULTIPLYING MONOMIALS TIMES POLYNOMIALS (DISTRIBUTIVE PROPERTY)
SUBTRACTING INTEGERS 1. CHANGE THE SUBTRACTION SIGN TO ADDITION
© 2018 SlidePlayer.com Inc. All rights reserved.