Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 NAAM Oracle Character sets Aino Andriessen. 2 Demo1.

Similar presentations


Presentation on theme: "1 NAAM Oracle Character sets Aino Andriessen. 2 Demo1."— Presentation transcript:

1 1 NAAM Oracle Character sets Aino Andriessen

2 2 Demo1

3 3 demo1.sql rem rem name demo1.sql rem created jan 18, 2009 rem purpose 1e script ter demonstratie van de nls_length_semantics parameter op tabellen rem remarks desc vd demo2 tabel zou varchar2(4 char) laten zien, maar dat wil ik nog niet. Daarom een select statement gemaakt wat er op lijkt. rem cl scr set echo off set pagesize 100 set feedback off drop table demo; drop table demo2; cl scr -- Create demo2 table with CHAR set feedback off create table demo2 (naam varchar2(4 char)); prompt Table demo2 created prompt prompt desc demo2 COLUMN Name FORMAT A42 COLUMN Null FORMAT A8 COLUMN Type FORMAT A27 select column_name Name, Null, data_type || '(' || char_length || ')' Type from cols where table_name = 'DEMO2'; set feedback on pause prompt insert into demo2 values ('Rene'); insert into demo2 values ('Rene'); pause; prompt insert into demo2 values ('René'); insert into demo2 values ('René'); commit; pause prompt select * from demo2; spool demo1.log select * from demo2; spool off pause cl scr -- Create demo table according to the default (BYTE) set feedback off create table demo (naam varchar2(4)); prompt Table demo created set feedback on prompt prompt desc demo prompt desc demo pause; prompt insert into demo values ('Rene'); insert into demo values ('Rene'); pause; prompt insert into demo values ('René'); insert into demo values ('René'); commit; pause prompt select * from demo; select * from demo; /* select parameter, value from nls_database_parameters where parameter = 'NLS_LENGTH_SEMANTICS'; */

4 4 nls_length_semantics Intializatie parameter CHAR of BYTE (default) Van toepassing op multi byte character sets Definieert het type voor de lengte van character kolommen en variabelen alter session set nls_length_semantics=CHAR;  niet met terugwerkende kracht  ev pl/sql recompile  alter system

5 5 nls_length_semantics 2 lengte van karakter kolommen en variabelen expliciet opgeven  create table demo (naam varchar2(4 char))  create table demo (naam varchar2(4 byte))  t_naam varchar2(4 char);  t_naam demo2.naam%TYPE

6 6 Demo2

7 7 demo2.sql rem rem name demo2.sql rem created jan 18, 2009 rem purpose 2e script ter demonstratie van de nls_length_semantics parameter op pl/sql rem remarks rem declare t_naam varchar2(4); t_naamC demo2.naam%TYPE; r_demo2 demo2%ROWTYPE; cursor c_demo2 is select naam from demo2; begin for r_demo2 in c_demo2 loop dbms_output.put_line (r_demo2.naam); t_naamc := r_demo2.naam; dbms_output.put_line (t_naamc); t_naam := r_demo2.naam; dbms_output.put_line (t_naam); end loop; end; /

8 8 Character encoding

9 9 Character set Character set definieert de 'mapping' tussen binary/headecimale code en het character  UTF8  WE8MSWIN1252  WE8ISO8859P1  JA16EUC  US7ASCII  WE8DEC ... Code pages  IBM / windows terminologie  ~ analoog met character set  code page per language

10 10 Character sets 2 ASCII  1 byte  128 karakters  standaard letters uit het engels zonder accenten ISO 8859 en latin-1  1 byte (8 bit)  256 karakters CP-1252  Windows variant op latin 1 UTF8  variabel, multibyte  max 4 bytes  ~ karakters ~1 miljoen beschikbaar  meertalig  ascii codes zijn gelijk

11 11 Voorbeelden Character SetHexadecimale code - Euro AL32UTF8E282AC WE8MSWIN ASCII- WE8ISO8859P1- WE8ISO8859P15164 Character SetHexadecimale code - é AL32UTF8C3A9 (50089) WE8MSWIN1252E9 (233) ASCII- WE8ISO8859P1E9 WE8ISO8859P15E9

12 12 Unicode / UTF 8 example The image shows the number of bytes needed to store different kinds of characters in the UTF-8 character set. The ASCII characters (C, t, and d) require one byte. The Latin and Greek characters (á, ö, and Ø) require 2 bytes. The Asian character requires 3 bytes. The supplementary character (treble clef sign) requires 4 bytes of storage.

13 13 Diakrieten en speciale tekens Diakrieten zijn accenten die bij (boven, onder of zelfs door) een letter gezet worden om de uitspraak van een letter te veranderen en daarmee taaleigen klanken van een (gewijzigde) letter te voorzien.  àÿęňĜş etc. Speciale tekens  ßæ¿

14 14 Diakrieten en speciale tekens Single byte character sets  1 byte voor samengesteld karakter  Niet alle combinaties mogelijk  code pages UTF-8  diakriet heeft eigen codering  samengesteld karakter heeft eigen codering meestal (altijd) samenstelling van oorspronkelijke karakter + diakriet

15 15 Database functies Character functies  substr - substrb - substrc - substr2  instr -...  length - lengthb chr (n)  Returns a character corresponding to the number passed in as the argument in the database character set  select chr (50089) from dual; dump  Returns a VARCHAR2 value containing the datatype code, length in bytes, and internal representation of expr. The returned result is always in the database character set.  select dump (naam, 1017) from demo2; convert  Converts a character string from one character set to another utl_raw  select utl_raw.cast_to_raw(naam) from demo2; unistr()  Converts the characters in x to the national language character set  select (unistr('Ren\00e9')) from dual;

16 16 Demo3

17 17 demo3.sql rem rem name demo3.sql rem created jan 18, 2009 rem purpose 3e script ter demonstratie van diverse character sets functies rem remarks rem select value from nls_database_parameters where parameter = 'NLS_CHARACTERSET'; select chr (50089) from dual; select dump (naam, 1017) from demo2; select utl_raw.cast_to_raw(naam) from demo2; select substr (naam,1,4) from demo2; select substrb (naam,1,4) from demo2; select '*' || substrb (naam,1,4) || '*' from demo2; select utl_raw.cast_to_raw (substrb (naam,1,4)) from demo2; select naam, length (naam) from demo2; select naam, lengthb (naam) from demo2;

18 18 nls_lang Client character set When the client NLS_LANG character set is set to the same value as the database character set, Oracle assumes that the data being sent or received are of the same (correct) encoding, so no conversions or validations may occur for performance reasons. The data is just stored as delivered by the client, bit by bit.

19 19 nls lang 2 language_country.character set  american_america.UTF8  dutch_the netherlands.WE8MSWIN1252  american_THE NETHERLANDS.WE8MSWIN1252 Environment variable, nls_lang Verschil in Windows GUI (WE8MSWIN1252) en command line (WE8PC850) Wordt niet door Java clients gebruikt

20 20 Demo4

21 21 demo4.bat rem rem name demo4.bat rem create jan 18, 2008 AA rem purpose Set the nls_lang parameter to the one that is used in the dos window rem remarks Only use to select and insert from the command line. rem Do not run scripts because they are in another character set / code page which is different fdrom the one in the dos box rem If you'll run these scripts, unexpected character conversion occurs, resulting in weird, unexpected, characters rem rem local NLS_LANG : rem DUTCH_THE NETHERLANDS.WE8MSWIN1252 rem AMERICAN_THE NETHERLANDS.WE8MSWIN1252 rem AMERICAN_THE NETHERLANDS.WE8PC850 set NLS_LANG=AMERICAN_THE NETHERLANDS.WE8PC850

22 22 National character set Support for another character set next to the database character set e.g to allow japanese in a MSWIN1252 or ISO8859 character set Less necessary in a UTF8 database Multibyte nvarchar, nclob etc.

23 23 Case TELETEX karakterset  bestaat niet meer in Oracle select convert(naam,’TELETEX’,’UTF8’) from tabel; Locale builder

24 24 Oracle Locale builder.nlb in ORA_NLS33 directory SQL> select convert(‘test’,’TELETEX’,’UTF8’) from dual; Oracle Locale Builder LXINST LX22711.NLT LX22711.NLB LX0BOOT.NLT

25 25 sql> select name from emp sql> select utl_raw.cast_to_varchar (utl_raw.cast_to_raw (name)) from sql> select utl_raw.cast_to_varchar (name)) from sql> select name from

26 26 Vraag Diacrietloos zoeken Case insensitive zoeken

27 27 Summary nls_lenght_semantics Always explicitly define a character column with its type (CHAR or BYTE) Oracle performs automatic character set conversion  wysinawyg Use a Java client Working with character sets can be confusing UTF8 is often the preferred character set

28 28 Referenties Unicode en Ultraedit  aedit/unicode.html aedit/unicode.html nls_lang  cs/nls_lang%20faq.htm cs/nls_lang%20faq.htm Oracle globalization support  11/b28298/toc.htm 11/b28298/toc.htm Wikipedia


Download ppt "1 NAAM Oracle Character sets Aino Andriessen. 2 Demo1."

Similar presentations


Ads by Google