Migrating to a database that supports Unicode is a challenge. You are safe if you are currently using US7ASCII for you Oracle database because Oracle provides a mechanism with ALTER DATABASE SET CHARACTERSET UTF8 which allows you by just bouncing the database to switch to this new character set. But what if you are currently on a different character set? In Europe we have a bunch of WE8 (Western European) or EE8 (Eastern European) character sets and as they are not a strict subset of Unicode the ALTER Database does not work. So the only chance is to export the whole database, create a new one (with UTF8 or similar) and import it again.
Is it that easy? No! The reason is that in most databases the so called length semantics hasn’t been used while creating a table. Since Oracle9i the correct syntax for creating is:
CREATE TABLE test1
(custid NUMBER(10),
lastname VARCHAR2(6 CHAR))
Have you used this syntax? No of course not! We all used the old version:
CREATE TABLE test1
(custid NUMBER(10),
lastname VARCHAR2(6))
But what is the difference? The default behavior for an Oracle database is to use “BYTE” as length semantic. And this can cause huge problems while migrating to Unicode.
Let’s assume your current database is working with WE8MSWIN1252 (common on western European Windows Databases). Entering the following name into the customer table is safe:
INSERT INTO customer (custid, lastname)
VALUES
(1,’Müller’);
COMMIT;
If you try to enter the same row in an Oracle database with AL32UTF8 you get the following error message:
ERROR at line 2:
ORA-12899: value too large for column "DEMO"."TEST1"."LASTNAME" (actual: 7, maximum: 6)
The reason is that a German umlaut takes two bytes in Unicode as well as most special characters for other languages.
When using Oracle export/import or datapump the situation doesn’t change because both utilities are using the exported format while importing into an new database.
The only chance is to first create the objects and then do the import.
Toad has a very nice switch under options -> data types which is called “Include Byte/Char spec when creating DDL scripts from 9i databases”. This behavior is not limited to 9i but all existing Oracle versions you can switch off the “BYTE or CHAR” specification for all scripts.
Now you can generate the schema script (under menu export) for all schemas you want to recreate and execute a user export or full export of the database.
On your target you first create the user (maybe with CREATE LIKE…) using Toad and load your generated schema script. Before you execute it ,please add one command to the script:
ALTER SESSION SET NLS_LENGTH_SEMANTICS = 'CHAR';
This will switch the actual session into a mode where the default length semantic is now CHAR instead of BYTE. This works for a session as well as for the database but there are several known bugs if you switch it for the database so better only use it for the actual session.
The test table is now created like:
CREATE TABLE test1
(custid NUMBER(10),
lastname VARCHAR2(6 CHAR));
which allows you to insert the lastname “Müller” now.
After all objects are successfully created you can run the import for the table data but don’t forget to switch on “IGNORE ERRORS” as the “CREATE TABLE” scripts are included in the export file.