An email to the Beta group went as follows:
From: email@example.com [mailto:firstname.lastname@example.org] On Behalf Of Brad Boddicker
Sent: Thursday, April 30, 2009 10:22 AM
Subject: [toadbeta] Beta 220.127.116.11 released
Unicode support added
Mark Lerch almost immediately replied with the following:
Emphasis in various places by Norm
This is huge. The most joyous email I've ever seen posted on the boards. I'm going to save it. Brad's email was intentionally subdued, as though we just "slipped it in." I love it. We're all very happy here these days regarding this.
Way back in May of 1999 Vinny (Quest owner & ceo at the time) tasked us with putting Unicode support into Toad, as he saw the Asian markets critical to Toad's future. Of course we had to wait this long for Borland to put support for it into Delphi to effectively do it. But it has certainly been an albatross around our neck for a long time. ("Toad International" was basically a one-off hack, a port done by an outside company we hired). And it [Unicode Support] doesn't become official until the commercial release of Toad 10 later this fall.  But the great majority of it should all be working very well.
I'll anticipate the most common problem right up front:
This has nothing to do with the database you are connecting to. There is a new warning in Toad Advisor telling you if NLS_LANG does not properly match your character set. (A connection is required only so an active Home is set, which is used to pull the NLS_LANG for that Home)
NLS_LANG on your client computer is how Oracle knows what your client character set is, so it knows if it needs to transform characters going back & forth to your client. This seems to be one of the most misunderstood things out there related to getting Unicode properly working in Oracle. It has nothing to do with NLS_CHARACTERSET, which is the database character set. If NLS_LANG matches the database character set, the Oracle client will not do any transformation. But that's not our concern anyway.
Toad itself is now a fully Unicode application. However, the various tools Toad calls – Oracle Utilities, SQL Plus and so forth – are not. This means they require NLS_LANG to be properly set, and in most cases, requires the data they work with to be available in the client character set. For example, if you have NLS_LANG set to a Big5/Double Byte character set with Chinese, (which means your Windows "Language for non-Unicode Programs" – XP's silly name for your character set - is set to Chinese) you can "Select Chinese Stuff from dual" and run that in SQL Plus. You cannot do "Select Turkish stuff from dual" because the Turkish characters won't be available in your client character set. SQL Plus will complain. This *will* work in Toad, however, because the OCI has a Unicode switch flipped on via Toad, and so all data is Unicode-encoded on the client, regardless of the various encodings of the database.
Clear so far? Good. : )
For most external files Toad creates, we chose UTF-8 as the encoding to use. Some files can potentially contain Unicode data so this ensures they can be properly saved if your windows character set is different than the character data in the file. Therefore, external files will contain the UTF-8 BOM (Byte Order Mark) of EF BB BF (hex). These are non-visible characters which Unicode applications require in order to know what encoding the file is in. If you are using Toad-created files in other applications and they have a hiccup somehow, it is possible they don't know what to do with that BOM. I can't think of any cases perhaps beyond Editor files where this could potentially be an issue for your other applications. But notice that the Editor "Save As" dialog has a new drop down – ‘Encoding' – where you can chose an encoding to use. If you use the typical choice – ANSI – of course no header/BOM will be used and things operate as they always have.
If you see squares somewhere in a window, you are not using a Unicode-friendly font. Try a Unicode font. In general, squares mean the data is fine, the font just can't show it. Question marks generally means the data has been corrupted, for example, one byte of a double byte character was lost, so the character is no longer known.
We have a very unofficial list of issues we have found, things not supported. Qualifications, additions, subtractions are likely – I simply did a quick snag of various comments we put into some to-do list files to pull these out for the sake of sharing:
I think that's enough for one email, if anyone has lasted this long.