Jun
8
Written by:
John Pocknell
Monday, June 08, 2009
Part 1 of this 2-part blog asked “what is the problem Unicode seeks to address?” as well as “what is Unicode” and “how is it implemented by Oracle into their databases”
In this blog, I hope to be able to help you understand how Unicode is handled on your Windows PC/notebook and how to configure your PC/notebook in order to be able to effectively use Unicode in Toad.
What settings do I need to change on my PC?
Make sure the NLS_LANG value on your Oracle client (registry or environment variable) matches your Windows client character set. (This has nothing to do with the database you are connecting to)
Oracle searches for an NLS_LANG value in this order:
- User environment variable
- System environment variable
- Registry entry for the active Home (Note: Each Oracle Home has its own NLS_LANG value – see Toad Advisor below)
If no NLS_LANG value is found, the value AMERICAN_AMERICA.US7ASCII is used, not the Oracle database character set as is often believed. (Oracle states explicitly that this is a myth). The registry location is HKLM\SOFTWARE\ORACLE\ “NLS_LANG” key. The registry should not be supplanted by environment variables, but rather environment variables should be used judiciously to override the registry value only when needed.
To discover your active Windows character set, look in Regional Settings > Advanced > “Language for non-Unicode programs”, or in the Registry at HKLM\SYSTEM\CurrentControlSet\Control\Nls\CodePage “ACP” value.
There is a new warning in Toad Advisor telling you if NLS_LANG does not properly match your character set. (a connection is required only so an active Home is set, which is used to pull the NLS_LANG for that Home)

NLS_LANG on your client computer is how Oracle knows what your client character set is, so it knows if it needs to transform characters going back & forth to your client (carried out using SQL*Net). This seems to be one of the most misunderstood things out there related to getting Unicode properly working in Oracle. It has nothing to do with NLS_CHARACTERSET, which is the database character set. If NLS_LANG matches the database character set, the Oracle client will not do any transformation. But that’s not our concern anyway.
How does Oracle manage multiple clients accessing multiple databases?
The database has a value indicating its character set and the client has a value indicating the active character set of Windows. Before a client application sends data to the database, if these two values are different, Oracle on the client transforms the characters being sent to the database so they may be encoded properly. This process usually occurs on the client for performance:

Here is a more detailed look at what occurs. When a client application starts and creates a session, the database character set is read and the client character set is determined through NLS_LANG. If these values differ, when data is transferred to and from the database it gets transformed into the proper bytes for the destination:

It should be obvious that NLS_LANG is concerned with the client, not the database. You cannot query the database to discover the value of NLS_LANG, you cannot set NLS_LANG on the database. NLS_LANG is all about the client character set. It is used by the Oracle client software before sending character data to the database so the database receives proper values.
What happens if I want to connect to different databases which have different character sets?
Here’s an example:
In this case you would create two Oracle Homes, one to access the Asian character set database and one to access the American (Latin) character set database. Alternatively, before running an application such as SQL*Plus, a local NLS_LANG environment variable can be assigned. An easy way to manage this would be to create two different batch files to launch two different types of SQL*Plus sessions, one for each database:
AsianSQLPlusSession.bat
set NLS_LANG=AMERICAN_AMERICA.ZHT16MSWIN950
sqlplusw.exe
AmericanSQLPlusSession.bat
set NLS_LANG=AMERICAN_AMERICA.WE8MSWIN1252
sqlplusw.exe
What else do I need to do?
Toad 10 will be a fully Unicode application. However, the various tools Toad calls – Oracle Utilities, SQL Plus and so forth – are not. This means they require NLS_LANG to be properly set, and in most cases, requires the data they work with to be available in the client character set. For example, if you have NLS_LANG set to a Big5/Double Byte character set with Chinese, (which means your Windows “Language for non-Unicode Programs” – XP’s silly name for your character set - is set to Chinese) you can “Select Chinese Stuff from dual” and run that in SQL Plus. You cannot do “Select Turkish stuff from dual” because the Turkish characters won’t be available in your client character set. SQL Plus will complain. This *will* work in Toad, however, because the OCI has a Unicode switch flipped on via Toad, and so all data is Unicode-encoded on the client, regardless of the various encodings of the database.
 |
For the purposes of saving data from Toad to an external file, the Data Grid “Export Dataset” (replacing the existing “Save As”) dialog has a new drop down – ‘Encoding’ – where you can chose an encoding to use. If you use the typical choice – ANSI – no header/BOM (Byte Order Mark) (see Part 1) will be used and things operate as they always have.
In the case where another encoding such as UTF-8 is selected, these files will potentially contain Unicode data so this drop down ensures they can be properly saved if your windows character set is different than the character data in the file. Therefore, external files will contain the UTF-8 BOM of EF BB BF (hex). These are non-visible characters which Unicode applications require in order to know what encoding the file is in. If you are using Toad-created files in other applications and they have a hiccup somehow, it is possible they don’t know what to do with that BOM. I can’t think of any cases perhaps beyond Editor files where this could potentially be an issue for your other applications.
|
What does it mean if I see gobbledygook displayed?
If you see squares somewhere in a window, you are not using a Unicode-friendly font. Try a Unicode font. In general, squares mean the data is fine, the font just can’t show it. Question marks generally mean the data has been corrupted, for example, one byte of a double byte character was lost, so the character is no longer known.1
How do I try out Unicode support in Toad?
For those of you who are keen to see Unicode support working in Toad right now – ahead of the official Toad 10.0 release, you can download a beta version of Toad version 9.8 from
www.toadsoft.com/beta.html . You need to have a commercial license of Toad for Oracle and have it installed on the desktop on which you wish to install the Toad 9.8 beta.
Please note: Toad 9.8 is a Quest Software internal release only focusing on testing the conversion to Delphi 2009, the upgrade of all 3rd party components and Unicode testing; it will (mostly) exclude any new functionality.
Thank you again to Mark Lerch from the Toad Development Team for his kind permission to reproduce articles he has written on the subject.
1Mark Lerch – Toad development