Presentation is loading. Please wait.

Presentation is loading. Please wait.

ObjectStudio for Unicode Alexander Augustin Getting ready for global markets.

Similar presentations


Presentation on theme: "ObjectStudio for Unicode Alexander Augustin Getting ready for global markets."— Presentation transcript:

1 ObjectStudio for Unicode Alexander Augustin Getting ready for global markets

2 Overview Problem description History of character sets and Encoding Goals and approach Features and technologies Limitations Conclusions

3 ObjectStudio ObjectStudio is an integrated Smalltalk environment for the Windows platform Access to most common Windows services and database systems, like DLL functions, COM, ODBC, Oracle … It’s Smalltalk – so almost anything is possible – except easy localization and processing multilingual data.

4 ObjectStudio in a Unicode World ObjectStudio (ANSI/OEM) Operating System (Unicode) Other programs (Unicode) Data sources (Unicode) ? ?

5 Go Multilingual! Applications in a global market must represent texts and names of Eastern Europe and Asia. User interfaces must be localizable Offer capabilities of handling multilingual Data Must be supported by the runtime environment and the development system Screenshot: Japanese Version of Microsoft Word

6 ObjectStudio Supports: ANSI (CP1252) and OEM (CP850) 8 Bit characters Adequate for: Writing source code Creating English UIs Processing English text files Accessing databases with English texts Screenshot: ObjectStudio Environment

7 Overview Problem description  History of character sets and Encoding Goals and approach Features and technologies Limitations Conclusions

8 The history of character sets Punch card – late 18th century Enhanced by Holerith (patented 1890) 5 channel punch tape – 19th century 2 5 = 32, not enough for 26 letters + 10 digits Solution: shift key as prefix state shift 8 channel punch tape – mid 20th century 7 bit US-ASCII + parity No support for umlauts VT220 terminal invents ISO8859-L Similar to Microsoft codepage 1252 Many character encodings for many languages EBCDIC, KOI8, ShiftJIS, …

9 Unicode Unicode - a standard defined by the Unicode consortium. Unicode assigns a unique number (code point) to each glyph Version reserves more than code points Several transformation formats for binary representation of Unicode code points UCS-2 (2Bytes/char), UTF-8 (1-4 bytes/char), UTF-16 (2/4 bytes/char)

10 Unicode World-wide unification effort for all characters of the world Supported by all major vendors! The solution for ObjectStudio!

11 Encoding CharacterCode Binary representation Transforming characters into their binary representation in another encoding One main problem when accessing external data sources Distinguish between specialized encodings and Unicode

12 Byte Encodings Differ in the value that represents a character in the encoding Do not differ in the binary format of the code ( always 1 Byte) Decimal value/Binary hexadecimal representation Encoding\characterÖ€ CP /D6128/80 CP852153/99-- ISO8859-L15214/D6164/A4 CharacterCodeBinary representation

13 Unicode Encodings Do not differ in the value (Code Point) that is assigned to a character Differ in the binary format of the value CharacterCode PointBinary representation Hexadecimal binary representation UTF\characterÖ (Code Point 214)€ (Code Point 8364) UCS-2 (little-endian)D6 00AC 20 UTF-8C3 96E2 82 AC

14 Goals 1. Enable Unicode! Extend encoding capabilities Provide native multilingual IO support 2. Extend external access features Add Unicode file access Add Unicode database access

15 Changes Create a Unicode VM Make ObjectStudio a native Windows Unicode application Adapted class library Make Smalltalk String/Symbol Objects 16bit Unicode strings (UCS-2) Add encodings External interfaces and resources C Calls Unicode File access Database access (ODBC, OCI)

16 Stream Encoding Ported from VisualWorks Use StreamEncoders and CharacterEncoders that „know“ the encoding Can be applied to any kind of stream with a byte-like buffer to encode or decode data EncodedStream Stream StreamEncoder Buffer Character Encoder

17 CharacterEncoder StreamEncoder Stream Encoding EncodedStream Stream Buffer Character Code Binary representation

18 StreamEncoding use cases Accessing external services and storages without UCS-2 support (e.g. ANSI C calls) Examples Access to databases without UCS-2 support Calling ANSI DLL functions without UCS-2 support String transfer via TCP/IP Access to text files with foreign encodings

19 Text file access Read/write access to any kind of text file UTF8, UTF16, UCS-2 little-endian, … CP1252 (Windows ANSI) CP850 (Windows OEM) And Many more Using EncodedStreams and NewFileStreams Example: read UTF-8 encoded file | fileStream encoder encodedStream result | fileStream := NewFileStream file: ‘example.txt’ mode: #binary onError: [ self error: ‘could not open file’ ]. encoder := StreamEncoder new: #utf8. encodedStream := EncodedStream on: fileStream encodedBy: encoder. result := encodedStream upToEnd. encodedStream close

20 External Database Access Supported Unicode database interfaces ODBC OCI (ORACLE Call Interface) Features Native access to Unicode data sources No application modifications needed Requirements ODBC: Version 3.5 OCI: OCI Client Version (9 i ) or higher

21 Limitations Source files continue to be OEM encoded Store Unicode text data in text files or external databases UIs sources can‘t contain Unicode strings Use external files/databases to store Unicode data for localizing UIs Planned to implement some localization support Implicit conversions between Strings and ByteArrays cannot be supported Use encoded streams or #asByteArrayEncoding:

22 Limitations Image files are not compatible Compile class files and create new images

23 Conclusion ObjectStudio Unicode Operating System (Unicode) Other programs (Unicode) Data sources (Unicode)

24 Availability ObjectStudio 7.0 for Unicode is available to the new CINCOM Smalltalk CD together with VisualWorks 7.3

25 Contact Information We provide project support to internationalize your ObjectStudio application Georg Heeg eK Baroper Str. 337 D Dortmund Tel: Fax: Georg Heeg AG Seestr. 131 CH-8027 Zürich Tel: Georg Heeg eK Mühlenstr. 19 D Köthen Tel: Fax:

26   2004 Cincom Systems, Inc. All Rights Reserved Developed in the U.S.A. CINCOM,, and The World’s Most Experienced Software Company are trademarks or registered trademarks of Cincom Systems, Inc All other trademarks belong to their respective companies.


Download ppt "ObjectStudio for Unicode Alexander Augustin Getting ready for global markets."

Similar presentations


Ads by Google