Date: 2006-03-04
The Unicode patch for DBD::ODBC was initially written by Alexander Foken (alexander at foken dot de). I wrote this patch during my working hours for a project that needs Unicode support in DBD::ODBC on Windows, and I have the permission of my former employer to publish this patch.
You may distribute under the terms of either the GNU General Public License or the Artistic License, as specified in the Perl README file.
patch -p1 < DBD-ODBC-1.13-Unicode.patch
perl Makefile.pl
, make
, make test
,
make install
)
You need to install SP1, SP2, and SP3 on both client and server, because DBD::ODBC needs a recent set of MDAC libraries.
You have to set the environment variables NLS_NCHAR=AL32UTF8
and
NLS_LANG=AMERICAN_AMERICA.AL32UTF8
(or any other language setting ending with
``.AL32UTF8
'') before loading the patched DBD::ODBC to make Oracle return
Unicode data. (See also ``Oracle and Unicode'' in the POD of DBD::Oracle.)
And you have to enable the ``Force SQL_WCHAR support
'' Workaround in the Oracle
ODBC Driver Configuration to make Oracle return Unicode to a non-Unicode
application. Alternatively, you can include ``FWC=T
'' in your connect string.
Better try to use DBD::Oracle to access Oracle with less overhead and better support for Oracle-specific stuff.
Quoting the DBD::Oracle man page: ``Oracle 8 client libraries have a number of bugs related to character set handling, especially when connected to an Oracle 9+ server. [...] If you wish to use Unicode, I recommend upgrading client and server to Oracle 9 or later.''
You may want to try the MSDE, it has the SQL Server engine, but with a lower connection limit and without GUI tools. There are several 3rd party GUIs for the MSDE.
Unicode tests fail because PostgreSQL seems not to give any hints about Unicode, so all data is treated as non-Unicode.
Better try to use DBD::Pg to access PostgreSQL with less overhead and better
support for PostgreSQL-specific stuff. DBD::Pg has a driver attribute named
pg_enable_utf8
, set it to 1 and you have proper Unicode support.
(Yes, this list should be longer.)
Perl 5.8.1 or newer is required. Older Perl before 5.8.0 lacked proper Unicode support. Perl 5.8.0 lacks some auxillary functions for Unicode.
Unicode is supported only for SQL statement parameters and data returned by the fetch methods, SQL statements are still treated as native encoding. If you need a unicode constant in an SQL statement, you have to pass it as parameter or use SQL functions to convert your constant from native encoding to Unicode.
All data passed to the patched DBD::ODBC for SQL_C_WCHAR
, SQL_WCHAR
,
SQL_WVARCHAR
, and SQL_WLONGVARCHAR
is treated as Unicode, even if it is
not Unicode. unicode_helper.c should check the UTF8 flag of the scalar and
pass a value different from CP_UTF8
as first argument to
WideCharToMultiByte()
. The problem is to know what encoding is used for the
data in the scalar.
Binding of unicode output parameters is untested (I don't need them) and likely to fail.
The patched DBD::ODBC may fail to compile on non-Win32 platforms. It needs a header file named wchar.h defining at least the following:
WCHAR
data type capable of storing a single Unicode character.typedef wchar_t WCHAR
in wchar.h, and typedef unsigned
short wchar_t;
in ctype.h.
WCHAR * wcscpy(WCHAR *, const WCHAR *)
strcpy()
for wide characters. Microsoft declares this function in both
string.h and wchar.h.
size_t wcslen(const WCHAR *)
strlen()
for wide characters, returns character count, not bytes. Microsoft
declares this function in both string.h and wchar.h.
WideCharToMultiByte()
and MultiByteToWideChar()
functionsCP_UTF8
argument tells the
function that the MultiByte string is in UTF-8 encoding.
This patch adds support for SQL_C_WCHAR
, SQL_WCHAR
, SQL_WVARCHAR
, and
SQL_WLONGVARCHAR
. Strings written to columns that are reported as
SQL_WCHAR
, SQL_WVARCHAR
, or SQL_WLONGVARCHAR
are automatically
converted to 16 bit Unicode using the Windows API function
WideCharToMultiByte()
, return values reported as SQL_C_WCHAR
,
SQL_WCHAR
, SQL_WVARCHAR
, or SQL_WLONGVARCHAR
are converted back to 8
bit Unicode (UTF-8) using the Windows API function MultiByteToWideChar()
and
have Perl's UTF8 flag set except for empty strings.
This patch adds two new tests, t/40UnicodeRoundTrip.t and t/41Unicode.t. Test 40 checks that Unicode strings can be entered as bound parameters and are returned unmodified. Test 41 creates a table, writes and reads unicode data using various bind variants.
When using Oracle, the empty string test in t/40UnicodeRoundTrip.t is skipped because Oracle converts empty strings to NULL in this situation.
I had to add SQL_WCHAR
, SQL_WVARCHAR
, SQL_WLONGVARCHAR
to
t/ODBCTEST.pm, because Oracle in the setup described above returns Unicode
more often than expected.
I added t/UChelp.pm, that exports two utility functions for unicode string tests:
dumpstr($)
utf_eq_ok($$$)
pass()
or
cmp_ok()
.