FAQ
As an FAQ you can easily navigate to the
sections of interest.
It is recommended to start with the NLS_LANG Parameter Fundamentals
section
first to get a basic understanding of how the NLS_LANG parameter works.
Checking
the current NLS_LANG
Setting
Parameter Fundamentals
A
locale is a set of information addressing linguistic and cultural
requirements
that corresponds to a given language and country. Traditionally, the
data
associated with a locale provides support for formatting and parsing of
dates,
times, numbers, and currencies, etc. Providing current and correct
locale data
has historically been the responsibility of each platform owner or
vendor,
leading to inconsistencies and errors in locale data.
Setting
the NLS_LANG
environment parameter is the simplest way
to specify locale behavior for Oracle software. It sets the language
and
territory used by the client application and the database server. It
also
indicates the client's character set, which corresponds to the
character set
for data to be entered or displayed by a client program.
NLS_LANG
is set as a local environment variable on
UNIX platforms. NLS_LANG
is set in the registry on Windows platforms.
The NLS_LANG
parameter has three components: language,
territory, and character set. Specify it in the following format,
including the
punctuation:
NLS_LANG
= language_territory.charset
Each
component of the NLS_LANG
parameter controls the operation of a
subset of globalization support features:
Specifies
conventions such as the language
used for Oracle messages, sorting, day names, and month names. Each
supported language
has a unique name; for example, AMERICAN
, FRENCH
, or GERMAN
. The language argument specifies default values
for the territory and character set arguments. If the language is not
specified, then the value defaults to AMERICAN
.
Specifies
conventions such as the default
date, monetary, and numeric formats. Each supported territory has a
unique
name; for example, AMERICA
, FRANCE
, or CANADA
. If the territory is not specified, then the
value is derived from the language value.
Specifies
the character set used by the
client application (normally the Oracle character set that corresponds
to the
user's terminal character set or the OS character set). Each supported
character set has a unique acronym, for example, US7ASCII
, WE8ISO8859P1
, WE8DEC
, WE8MSWIN1252
, or JA16EUC
. Each language has a default character
set associated with it.
Note:
All components
of the NLS_LANG
definition are optional; any item that is not specified uses its
default value. If you specify territory or character set, then you must
include the preceding delimiter [underscore (_) for territory, period
(.) for character set]. Otherwise, the value is parsed as a language
name.
For example,
to set only the territory portion of NLS_LANG ,
use the following format: NLS_LANG=_JAPAN
|
The remainder of
this document
will focus on the charset component of the NLS_LANG setting, as it is
the least
understood and most important piece to set correctly.
Top of the
Document
·
Setting
the NLS_LANG to the character set of the database MAY be correct but IS
often
not correct. DO NOT assume that NLS_LANG needs to be the same as the
database
character set. THIS IS OFTEN NOT TRUE.
·
The
character set defined with the NLS_LANG parameter does NOT CHANGE your
client's
character set. It is used to let Oracle know what character set you are
USING
on the client side, so Oracle can do the proper conversion. You cannot
change
the character set of your client by using a different NLS_LANG!
·
If
you don't set the NLS_LANG on the client it uses the NLS_LANG of the
server.
This is also NOT true! For
example, if the Oracle Installer does not populate NLS_LANG
, and it is not otherwise set then its
value by default is AMERICAN_AMERICA.US7ASCII
. The language is AMERICAN
, the territory is AMERICA
, and the character set is US7ASCII
.
·
Setting
the LANGUAGE and TERRITORY parameters of NLS_LANG has nothing to do
with the
ability to store characters in a database. A NLS_LANG set to
JAPANESE_JAPAN.WE8MSWIN1252 will not allow you to store Japanese, as
WE8MSWIN1252 doesn't support Japanese characters. However a NLS_LANG
set to
AMERICAN_AMERICA.JA16SJIS will allow you to store Japanese providing
the input
data is truly JA16SJIS and if the database is also in a character set
that can
store Japanese like UTF8 or JA16SJIS)
Top of the
Document
Checking
the current NLS_LANG Setting
In many cases the
NLS_LANG has
been already set during the Oracle install or thereafter manually. To
be sure
you can use these methods to get back the value of NLS_LANG for
SQL*Plus:
On UNIX:
SQL> HOST ECHO
$NLS_LANG
This returns the
value of the
parameter.
On Windows:
On Windows you have
two possible options,
normally the NLS_LANG is set in the registry, but it can also be set in
the
environment, however this is not often done. The value in the
environment takes
precedence over the value in the registry and is used for ALL
Oracle_Homes on
the server. Also note that any USER environment variable takes
precedence over
any SYSTEM environment variable (this is Windows behavior, and has
nothing to
do with Oracle) if set.
To check if it's set
in the
environment:
SQL> HOST ECHO %NLS_LANG%
If this
reports just
%NLS_LANG% back, the variable is not set in the environment.
If it's set it
reports something
like
ENGLISH_UNITED KINGDOM.WE8ISO8859P1
If
NLS_LANG is not
set in the environment, check the value in the registry:
SQL>@.[%NLS_LANG%].
If you get something
like:
Unable to open file.[ENGLISH_UNITED KINGDOM.WE8ISO8859P1].
The "file name"
between
the braces is the value of the registry parameter.
If you get this as
result:
Unable to open file
".[%NLS_LANG%]." then the parameter NLS_LANG is also not set in the
registry.
Note the @.[%NLS_LANG%].
technique reports the NLS_LANG
known by the SQL*Plus executable, it will not read the registry itself.
But if
you run the HOST command first and the NLS_LANG is not set in the
environment then
you can be sure the variable is set in the registry if the @.[%NLS_LANG%].
returns a valid value.
All other NLS
parameters can be
retrieved by a:
SELECT * FROM NLS_SESSION_PARAMETERS;
Note:
SELECT USERENV
('language') FROM DUAL; gives the session's
_ but the DATABASE character set not
the client, so the value returned is not the client's complete NLS_LANG
setting!
|
Top of the
Document
This
section explains the order in which NLS parameters are taken into
account in
the database client/server model. (This does NOT cover Thin JDBC
connections)
There
are 3 levels at which you can set NLS parameters: Database, Instance and
Session.
If a parameter is defined at more than one level then the rules on
which one
takes precedence are quite straightforward:
1.
NLS database settings are superseded by NLS instance settings
2.
NLS database & NLS instance settings are superseded by NLS session
settings
SELECT
* from NLS_SESSION_PARAMETERS;
These
are the settings used for the current SQL session.
These
reflect (in this order):
1)
The values of NLS parameters set by "ALTER SESSION "
ALTER
SESSION set NLS_DATE_FORMAT = 'DD/MM/YYYY';
2)
If there is no explicit "ALTER SESSION " statement done then it
reflects the setting of the corresponding NLS parameter on the client
derived
from the NLS_LANG variable.
3)
If NLS_LANG is specified with only the part then
AMERICAN is
used as default .
So
if you set NLS_LANG=_BELGIUM.
WE8MSWIN1252 then you get this:
PARAMETER
VALUE
------------------------------
--------------
NLS_LANGUAGE
AMERICAN
NLS_TERRITORY
BELGIUM
NLS_CURRENCY
NLS_ISO_CURRENCY
BELGIUM
....
Note:
The difference
between NLS_LANG=_BELGIUM.WE8MSWIN1252
(correct) and
NLS_LANG=BELGIUM.WE8MSWIN1252
(incorrect), you need to set the "_" as separator.
|
4)
If NLS_LANG is specified with only the part then the
defaults to a setting based on .
So
if you set NLS_LANG=ITALIAN_.WE8MSWIN1252 then you get this:
PARAMETER
VALUE
------------------------------
--------------
NLS_LANGUAGE
ITALIAN
NLS_TERRITORY
ITALY
NLS_CURRENCY
NLS_ISO_CURRENCY
ITALY
.....
Note:
Note the
difference between NLS_LANG=ITALIAN_.WE8MSWIN1252
(correct) and
NLS_LANG=ITALIAN.WE8MSWIN1252
(incorrect), you need to set the "_" as separator.
|
5)
If NLS_LANG is specified without the _
part
then the _ part defaults to
AMERICAN_AMERICA.
So
if you set NLS_LANG=.WE8MSWIN1252 then you get this:
PARAMETER
VALUE
------------------------------
----------
NLS_LANGUAGE
AMERICAN
NLS_TERRITORY
AMERICA
NLS_CURRENCY
$
NLS_ISO_CURRENCY
AMERICA
....
Note:
The difference
between NLS_LANG=.WE8MSWIN1252
(correct) and
NLS_LANG=WE8MSWIN1252
(incorrect), you need to set the "." as separator.
|
6)
If the NLS_LANG is set (either like in point 3, 4 or 5) then parameters
like
NLS_SORT,
NLS_DATE_FORMAT, etc. can be set as a "standalone" setting and will
overrule the defaults derived from NLS_LANG
_
part.
So
if you set NLS_LANG=AMERICAN_AMERICA.WE8MSWIN1252
and NLS_ISO_CURRENCY=FRANCE
then you get this:
PARAMETER
VALUE
------------------------------
-----------
NLS_LANGUAGE
AMERICAN
NLS_TERRITORY
AMERICA
NLS_CURRENCY
$
NLS_ISO_CURRENCY
FRANCE
...
Defaults:
---------
*
If NLS_DATE_LANGUAGE or NLS_SORT are not set then they are derived from
NLS_LANGUAGE.
*
If NLS_CURRENCY, NLS_DUAL_CURRENCY, NLS_ISO_CURRENCY, NLS_DATE_FORMAT,
NLS_TIMESTAMP_FORMAT, NLS_TIMESTAMP_TZ_FORMAT, NLS_NUMERIC_CHARACTERS
are not
set then they are derived from NLS_TERRITORY
7)
If the NLS_LANG is not set at all, then it defaults to
_.US7ASCII
and the values for the
_
part used are the ones found in
NLS_INSTANCE_PARAMETERS.
Parameters like NLS_SORT defined as "standalone" on the client side
are ignored.
Note:
* If set,
client parameters (NLS_SESSION_PARAMETERS) always take precedence over
NLS_INSTANCE_PARAMETERS and NLS_DATABASE_PARAMETERS.
* This
behavior cannot be disabled on/from the server, so a parameter set on
the client always has precedence above an instance or database
parameter.
* NLS_LANG
cannot be changed by ALTER SESSION, NLS_LANGUAGE and NLS_TERRITORY can.
However NLS_LANGUAGE and /or NLS_TERRITORY cannot be set as
"standalone" parameters in the environment or registry on the client.
*
NLS_SESSION_PARAMETERS is NOT visible for other sessions. If you need
to trace this then you have to use a logon trigger to create your own
logging table (based on session parameters)
* The
part of NLS_LANG is NOT shown in any
system table or view.
* On Windows
you have two possible options, normally the NLS_LANG is set in the
registry, but it can also be set in the environment, however this is
not often done and generally not recommended to do so. The value in the
environment takes precedence over the value in the registry and is used
for ALL Oracle_Homes on the server if defined as a system environment
variable.
* NLS_LANGUAGE
in the session parameters also declares the language for the client
error messages.
* You cannot
"set" a NLS parameter in an SQL script; you need to use ALTER SESSION.
|
Top of the
Document
SELECT
* from NLS_INSTANCE_PARAMETERS;
These
are the settings in the init.ora of the database at the moment that the
database
was started or set through ALTER SYSTEM.
If
the parameter is not explicitly set in the init.ora or defined by ALTER
SYSTEM
then its value is NOT derived from a "higher" parameter (we are
talking about parameters like NLS_SORT that derive a default from
NLS_LANGUAGE
in NLS_SESSION_PARAMETERS, this is NOT the case for
NLS_INSTANCE_PARAMETERS)
Note:
* NLS_LANG is
not an init.ora parameter; NLS_LANGUAGE and NLS_TERRITORY are so you
need to set NLS_LANGUAGE and NLS_TERRITORY separately.
* You cannot
define the or NLS_LANG in the init.ora
The client
characterset is defined by the NLS_LANG on the client OS (see above).
* You cannot
define the database characterset in the init.ora. The database
characterset is defined by the "Create Database" command.
* These
settings take precedence above the NLS_DATABASE_PARAMETERS.
* These values
are used for the NLS_SESSION_PARAMETERS if the client the
NLS_LANG is
NOT set.
* Oracle
strongly recommends that you set the NLS_LANG on the client at least to
NLS_LANG=.
* The
NLS_LANGUAGE in the instance parameters also declares the language for
the server error messages in alert.log and in trace files.
|
Top of the
Document
SELECT
* from NLS_DATABASE_PARAMETERS;
Defaults
to AMERICAN_AMERICA if there are no parameters explicitly set in the
init.ora
during database creation time. If there is parameters set in the
init.ora
during database creation you see them here. There is no way to change
these
after the database creation. Do NOT attempt to update system tables to
bypass
these settings! These settings are used to give the database a default
if the
INSTANCE and SESSION parameters are not set.
Note:
* NLS_LANG is
not an init.ora parameter, NLS_LANGUAGE and NLS_TERRITORY are.
So you need to
set NLS_LANGUAGE and NLS_TERRITORY separately.
* These
parameters are overridden by NLS_INSTANCE_PARAMETERS and
NLS_SESSION_PARAMETERS.
* You cannot
define the or NLS_LANG in the init.ora.
The client character set is defined by the NLS_LANG on the client OS.
* You cannot
define the database character set in the init.ora.
The database
(national) character set NLS_(NCHAR)_CHARACTERSET) is defined by the
"Create Database" command.
* The
NLS_CHARACTERSET and NLS_NCHAR_CHARACTERSET parameters cannot be
overridden by instance or session parameters.
They are
defined by the value specified in the "CREATE DATABASE command and are
not intended to be changed afterwards dynamically. Do NOT update system
tables to change the character set. This can corrupt your database and
potentially make it impossible to open the database again.
* Setting the
NLS_LANG during the creation of the database does not influence the
NLS_DATABASE_PARAMETERS.
* The NLS_LANG
set during the database creation has NO impact on the database National
Characterset.
|
Additional
SELECT statements:
A)
SELECT name,value$ from sys.props$ where name like '%NLS%';
This
gives the same info as NLS_DATABASE_PARAMETERS.
You
should use NLS_DATABASE_PARAMETERS instead of props$.
Note
the UPPERCASE '%NLS%'
B)
SELECT * from v$nls_parameters;
This
view shows the current session parameters and the *DATABASE*
characterset as
seen in the NLS_DATABASE_PARAMETERS view.
C)
SELECT name,value from v$parameter where name like '%NLS%';
This
view gives the same information as NLS_INSTANCE_PARAMETERS.
Note
the LOWERCASE '%NLS%'
D)
SELECT userenv ('language') from dual;
and
SELECT
sys_context('userenv','language') from dual;
Both
these SELECT statements give the session's
_
and the
DATABASE
character set. The database character set is not the same as the
character set of
the NLS_LANG that you started this connection with! So don't be fooled,
although the output of this query looks like the value of a NLS_LANG
variable,
it is NOT.
E)
SELECT userenv ('lang')
from dual;
This
SELECT gives the short code that Oracle uses for the Language defined
by
NLS_LANGUAGE setting for this session. If NLS_LANGUAGE is set to French
then
this will return "F", if NLS_LANGUAGE is set to English then this
will return "GB"
If
NLS_LANGUAGE is set to American then this will return "US", and so on...
F)
SHOW parameter NLS%
This
will give the same as the NLS_INSTANCE_PARAMETERS
Top of the
Document
A
database is created on a UNIX system with the US7ASCII character set. A
Windows
client connecting to the database works with the WE8MSWIN1252 character
set
(regional settings -> Western Europe /ACP 1252) and the DBA, use the
UNIX
shell (ROMAN8) to work on the database. The NLS_LANG is set to
american_america.US7ASCII on the clients and the server.
Note:
This is an
INCORRECT setup to explain character set conversion, don't use it in
your environment!
|
A
very important point (as mentioned before):
When
the client NLS_LANG character set is set to the same value as the
database
character set, Oracle assumes that the data being sent or received are
of the
same (correct) encoding, so no conversions or validations may occur for
performance reasons. The data is just stored as delivered by the
client, bit by
bit.
From Windows insert an ‘é’
(LATIN SMALL LETTER E WITH ACUTE) into a table NLS_TEST containing one
column
‘TEST’ of the type 'char'.
As
long as you insert into and select from the column on Windows with the
WE8MSWIN1252 character set everything runs smoothly. No conversion is
done and
8 bits are inserted and read back, even if the character set of the
database is
defined as 7 bits. This happens because a byte is 8 bits and Oracle is
ALWAYS
using 8 bits even with a 7 bit character set. In a correct setup the
most
Significant Bit is just not used and only 7 bits are taken into
account.
For
one reason or another you need to insert from the UNIX server. When you
SELECT
from tables where data is inserted by the Windows clients you get a ‘Ò’
(LATIN CAPITAL LETTER O WITH TILDE) instead of the ‘é’.
If
you insert ‘é’ on the UNIX server and you
SELECT the row at the Windows client you get an ‘Å’
(LATIN CAPITAL LETTER A WITH RING ABOVE) back.
Bottom
line is that you have INCORRECT data in the database. You store the
numeric
value for ‘é’ of the WE8MSWIN1252 character
set in the database but you tell Oracle this is US7ASCII data, so
Oracle is NOT
converting anything and just stores the numeric value (again: Oracle
thinks
that the client is giving US7ASCII codes because the NLS_LANG is set to
US7ASCII, and the database character set is also US7ASCII -> no
conversion
done).
When you SELECT the same column back on
the
UNIX server, Oracle is again expecting that the value is correct and
passes the
value to the UNIX terminal without any conversion.
Now
the problem is that in the WE8MSWIN1252 character set the ‘é’
has the hexadecimal value 'E9’and in the Roman8 character set the
hexadecimal
value for ‘é’ is 'C5'. Oracle just passes the value stored in the
database ('E9') to the UNIX terminal, and the UNIX terminal thinks this
is the
letter ‘?’ because in its (Roman8) character set the hexadecimal value
'E9' is
representing the letter ‘Ò’. So instead of the ‘é’
you get ‘Ò’ on the
UNIX terminal screen.
The inverse (the insert on the UNIX and
the
SELECT on the Windows client) is the same story, but you get other
results.
The solution is creating the database
with a
character set that contains ‘é’
(WE8MSWIN1252,
WE8ISO89859P1, UTF-8, etc.) and setting the NLS_LANG on the client to
WE8MSWIN1252 and on the server to WE8ROMAN8. If you then insert an ‘é’
on both sides, you will get an ‘é’ back regardless of where you
SELECT them. Oracle knows then that a hexadecimal value of 'C5’
inserted by the
UNIX and an 'E9’ from a WE8MSWIN1252 client are both ‘é’
and inserts ‘é’ into the database (the code in
the database depends on the character set you have chosen).
You don't have to switch between UNIX,
Windows or other OS clients to run into this kind of problem. The
same problem appears if you add Windows
clients that are using another character set and have an incorrect
NLS_LANG
set.
Top
of the Document
Properly for UNIX
To
specify the locale behavior of your client Oracle software, you have to
set
your NLS_LANG parameter. It sets the language, territory and also the
character
set of your client. You need to check the locale environment settings
to set
your NLS_LANG 3rd field (character set) in accordance with it. To do
this, use
the "locale" command like this:
$ locale
Example
of output:
LANG=fr_FR
LC_CTYPE="fr_FR.iso885915@euro"
LC_COLLATE="fr_FR.iso885915@euro"
LC_MONETARY="fr_FR.iso885915@euro"
LC_NUMERIC="fr_FR.iso885915@euro"
LC_TIME="fr_FR.iso885915@euro"
LC_MESSAGES="fr_FR.iso885915@euro"
LC_ALL=fr_FR.iso885915@euro
|
The output of this command is not exactly the same on all the Unix
environments.
On some platforms, it can be useful to use the following syntax to have
more
details about the codepage really used:
$ locale
LC_CTYPE | head
Example of output in a HP-UX environment:
""
""
"iso885915"
""
Example of output in a Linux environment:
upper;lower;alpha;digit;xdigit;space;print;graph;blank;cntrl;
punct;alnum;
combining;combining_level3
toupper;tolower;totitle
16
1
ISO-8859-15
70
84
1
0
1
|
In
these cases, the NLS_LANG 3rd field should be set to WE8ISO8859P15. On
Solaris,
AIX, TRU64, this syntax doesn't give interesting complementary
information. To
find more details about these settings:
On Solaris, look in /usr/lib/locale
On AIX, look in /usr/lib/nls/README
On TRU64, look in /usr/lib/nls
On HP-UX, look in /usr/lib/nls/config
On Linux, look in /usr/share/locale/locale.alias
To
set a chosen value for these "locale" settings, it's needed to know
which values are available. To know that, use the following syntax:
$ locale -a
Then, when you have chosen a value, for example UTF-8 on Linux, you can
set it
like this:
$ export
LC_ALL=UTF-8
or
% setenv
LC_ALL UTF-8
Example
of output after the setenv:
$ locale
LANG=fr_FR
LC_CTYPE="UTF-8"
LC_NUMERIC="UTF-8"
LC_TIME="UTF-8"
LC_COLLATE="UTF-8"
LC_MONETARY="UTF-8"
LC_MESSAGES="UTF-8"
LC_PAPER="UTF-8"
LC_NAME="UTF-8"
LC_ADDRESS="UTF-8"
LC_TELEPHONE="UTF-8"
LC_MEASUREMENT="UTF-8"
LC_IDENTIFICATION="UTF-8"
LC_ALL=UTF-8
$ locale LC_CTYPE | head
upper;lower;alpha;digit;xdigit;space;print;
graph;blank;cntrl;punct;alnum;combining;combining_level3
toupper;tolower;totitle
16
6
UTF-8
70
84
1
0
1
|
In
this case, the 3rd field (character set) of NLS_LANG should be set to
UTF8.
%
setenv NLS_LANG American_America.UTF8
Top of the
Document
On Windows systems,
the encoding scheme
(character set) is specified by a code page. Code pages are defined to
support
specific languages or groups of languages, which share common writing
systems.
From Oracle point of view the terms code page and character set mean
the same.
Note that in non Chinese-Japanese-Korean environments, the Windows GUI
and DOS
command prompt do not use the same code page.
As a result Windows
uses 2 different
character sets for the ANSI (sqlplusw.exe) and the OEM (dos box -
sqlplus.exe)
environments.
On
Windows systems, you should make sure that you have set an NLS_LANG
registry
subkey for each of your Oracle Homes:
You
can easily modify this subkey with the Windows Registry Editor:
Start
-> Run...
Type
"regedit", and click "ok"
Edit
the following registry entry:
For
Oracle version 7:
HKEY_LOCAL_MACHINE\SOFTWARE\ORACLE
For
Oracle Database versions 8, 8i and 9i:
HKEY_LOCAL_MACHINE\SOFTWARE\ORACLE\HOMEx\
where
"x" is the unique number identifying the Oracle home.
HOME0
is the first installation
For
Oracle Database 10g:
HKEY_LOCAL_MACHINE\SOFTWARE\ORACLE\KEY_
There
you have an entry with name NLS_LANG
When
starting an Oracle tools, like SQL*Plusw, it will read the content of
the
oracle.key file located in the same directory to determine which
registry tree
will be used, therefore which NLS_LANG subkey will be used.
Note:
Some people
are confused by finding a NLS_LANG set to "NA" in
HKEY_LOCAL_MACHINE\SOFTWARE\ORACLE when no version 7 was installed.
This is used for backwards compatibility, and can be ignored.
|
Although
the Registry is the primary repository for settings on Windows, it is
not the
only place where parameters can be set. Even if not at all recommended,
you can
set the NLS_LANG as a System or User Environment Variable in the System
properties.
This
setting will be used for ALL Oracle homes.
To
check and modify them:
Right-click
the 'My Computericon -> 'Properties'
Select
the 'Advanced Tab -> Click on 'Environment Variables'
The
'User Variables list contains the settings for the specific OS user
currently
logged on and the 'System variables system-wide variables for all users.
Since
these environment variables take precedence over the parameters already
set in
your Registry, you should not set Oracle parameters at this location
unless you
have a very good reason.
Before
using an Oracle command line tool you need to MANUALLY SET the NLS_LANG
parameter. In an MS-DOS command prompt, use the set command, for
example:
C:\>
set NLS_LANG=american_america.WE8PC850
Top of the
Document
Now
that you know what the NLS_LANG is currently set to you can check to
see if it
properly agrees with the current ANSI code page. The ACP (ANSI Code
Page) is
defined by the "default locale" setting of Windows, so if you have a
UK Windows 2000 client and you want to input Cyrillic (Russian) you
need to
change the ACP (by changing the "default locale") in order to be able
to input Russian.
You'll
find its value in the registry:
Start
-> Run...
Type
"regedit", and click "ok"
Browse
the following registry entry:
HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\NLS\CodePage\
There
you have (all the way down) an entry with as name ACP. The value of ACP
is your
current GUI Codepage, for the mapping to the Oracle name. Since there
are many
registry entries with very similar names, please make sure that you are
looking
at the right place in the registry.
Additionally,
the following URL provides a list of the default code pages for all
Windows
versions:
http://www.microsoft.com/globaldev/reference/
(under the REFERENCE tab on the left of the page)
OEM
= the command line codepage, ANSI = the GUI codepage
Note
that the Honk Kong HKSCS is listed here:
Find
the correspondent Oracle client character set:
Find
the Oracle client character set in the table below based on the ACP you
found
above. Note that there is only ONE CORRECT value for a given ACP.
ANSI
CodePage (ACP)
|
Oracle
Client character set (3rd part of NLS_LANG)
|
1250
|
EE8MSWIN1250
|
1251
|
CL8MSWIN1251
|
1252
|
WE8MSWIN1252
|
1253
|
EL8MSWIN1253
|
1254
|
TR8MSWIN1254
|
1255
|
IW8MSWIN1255
|
1256
|
AR8MSWIN1256
|
1257
|
BLT8MSWIN1257
|
1258
|
VN8MSWIN1258
|
874
|
TH8TISASCII
|
932
|
JA16SJIS
|
936
|
ZHS16GBK
|
949
|
KO16MSWIN949
|
950
|
ZHT16MSWIN950
- except for Hong Kong (see below)
|
This
is the character set used by the GUI SQL*Plus (sqlplusW.exe/
plus80W.exe / plus33W.exe
) that you start through the Windows start menu. Please note the
difference
between the GUI SQL*Plus and the "DOS mode" SQL*Plus.
You
can use UTF8 as Oracle client character set (=NLS_LANG) on Windows NT,
2000 and
XP but you will be limited to use only client programs that explicitly
support
this configuration. This is because the user interface of Win32 is not
UTF8,
therefore the client program have to perform explicit conversions
between UTF8
(used on Oracle side) and UTF16 (used on Win32 side).
Set
it in your Registry:
Use
the Windows Registry Editor to set up the NLS_LANG in your Oracle Home
with the
value you have just found above.
Top of the
Document
MS-DOS
mode uses, with a few exceptions like CJK, (Japanese, Korean,
Simplified
Chinese, and Traditional Chinese) a different code page (called OEM
code page)
than Windows GUI (ANSI code page). Meaning that before using an Oracle
command
line tool such as SQL*Plus (sqlplus.exe/ plus80.exe / plus33.exe ) en
svrmgrl
in a command prompt then you need to MANUALLY SET the NLS_LANG
parameter as an
environment variable with the set DOS command BEFORE using the tool.
For
Japanese, Korean, Simplified Chinese, and Traditional Chinese, the
MS-DOS OEM
code page (CJK) is identical to the ANSI code page meaning that, in
this
particular case, there is no need to set the NLS_LANG parameter in
MS-DOS mode.
In
all other cases, you need to set it in order to overwrite the NLS_LANG
registry
key already matching the ANSI code page. The new "MS-DOS dedicated"
NLS_LANG needs to match the MS-DOS OEM code page that could be
retrieved by
typing chcp in a Command Prompt:
C:\>
chcp
Active
code page: 437
C:\>
set NLS_LANG=american_america.US8PC437
If
the NLS_LANG parameter for the MS-DOS mode session is not set
appropriately,
error messages and data can be corrupted due to incorrect character set
conversion.
Use
the following list to find the Oracle character set that fits to your
MS-DOS
code page in use on your locale system:
MS-DOS
codepage
|
Oracle
Client character set (3rd part of NLS_LANG)
|
437
|
US8PC437
|
737
|
EL8PC737
|
850
|
WE8PC850
|
852
|
EE8PC852
|
857
|
TR8PC857
|
858
|
WE8PC858
|
861
|
IS8PC861
|
862
|
IW8PC1507
|
865
|
N8PC865
|
866
|
RU8PC866
|
Top of the
Document
Note:
this is the correct setting for the GUI SQL*Plus version,
(sqlplusW.exe/
plus80W.exe / plus33W.exe )
if
you are testing with "special" characters please DO use the GUI and
not the "DOS box" sqlplus.exe !
Operating
System Locale
|
NLS_LANG
Value
|
Arabic
(U.A.E.)
|
ARABIC_UNITED
ARAB EMIRATES.AR8MSWIN1256
|
Bulgarian
|
BULGARIAN_BULGARIA.CL8MSWIN1251
|
Catalan
|
CATALAN_CATALONIA.WE8MSWIN1252
|
Chinese (PRC)
|
SIMPLIFIED
CHINESE_CHINA.ZHS16GBK
|
Chinese
(Taiwan)
|
TRADITIONAL
CHINESE_TAIWAN.ZHT16MSWIN950
|
Chinese (Hong
Kong HKCS)
|
TRADITIONAL
CHINESE_HONG KONG.ZHT16HKSCS
|
Chinese (Hong
Kong HKCS2001)
|
TRADITIONAL
CHINESE_HONG KONG.ZHT16HKSCS2001 (new in 10gR1)
|
Croatian
|
CROATIAN_CROATIA.EE8MSWIN1250
|
Czech
|
CZECH_CZECH
REPUBLIC.EE8MSWIN1250
|
Danish
|
DANISH_DENMARK.WE8MSWIN1252
|
Dutch
(Netherlands)
|
DUTCH_THE
NETHERLANDS.WE8MSWIN1252
|
Dutch
(Belgium)
|
DUTCH_BELGIUM.WE8MSWIN1252
|
English
(United Kingdom)
|
ENGLISH_UNITED
KINGDOM.WE8MSWIN1252
|
English
(United States)
|
AMERICAN_AMERICA.WE8MSWIN1252
|
Estonian
|
ESTONIAN_ESTONIA.BLT8MSWIN1257
|
Finnish
|
FINNISH_FINLAND.WE8MSWIN1252
|
French
(Canada)
|
CANADIAN
FRENCH_CANADA.WE8MSWIN1252
|
French (France)
|
FRENCH_FRANCE.WE8MSWIN1252
|
German (Germany)
|
GERMAN_GERMANY.WE8MSWIN1252
|
Greek
|
GREEK_GREECE.EL8MSWIN1253
|
Hebrew
|
HEBREW_ISRAEL.IW8MSWIN1255
|
Hungarian
|
HUNGARIAN_HUNGARY.EE8MSWIN1250
|
Icelandic
|
ICELANDIC_ICELAND.WE8MSWIN1252
|
Indonesian
|
INDONESIAN_INDONESIA.WE8MSWIN1252
|
Italian (Italy)
|
ITALIAN_ITALY.WE8MSWIN1252
|
Japanese
|
JAPANESE_JAPAN.JA16SJIS
|
Korean
|
KOREAN_KOREA.KO16MSWIN949
|
Latvian
|
LATVIAN_LATVIA.BLT8MSWIN1257
|
Lithuanian
|
LITHUANIAN_LITHUANIA.BLT8MSWIN1257
|
Norwegian
|
NORWEGIAN_NORWAY.WE8MSWIN1252
|
Polish
|
POLISH_POLAND.EE8MSWIN1250
|
Portuguese (Brazil)
|
BRAZILIAN PORTUGUESE_BRAZIL.WE8MSWIN1252
|
Portuguese (Portugal)
|
PORTUGUESE_PORTUGAL.WE8MSWIN1252
|
Romanian
|
ROMANIAN_ROMANIA.EE8MSWIN1250
|
Russian
|
RUSSIAN_CIS.CL8MSWIN1251
|
Slovak
|
SLOVAK_SLOVAKIA.EE8MSWIN1250
|
Spanish (Spain)
|
SPANISH_SPAIN.WE8MSWIN1252
|
Swedish
|
SWEDISH_SWEDEN.WE8MSWIN1252
|
Thai
|
THAI_THAILAND.TH8TISASCII
|
Spanish (Mexico)
|
MEXICAN SPANISH_MEXICO.WE8MSWIN1252
|
Spanish (Venezuela)
|
LATIN AMERICAN
SPANISH_VENEZUELA.WE8MSWIN1252
|
Turkish
|
TURKISH_TURKEY.TR8MSWIN1254
|
Ukrainian
|
UKRAINIAN_UKRAINE.CL8MSWIN1251
|
Vietnamese
|
VIETNAMESE_VIETNAM.VN8MSWIN1258
|
Top of the Document
List of common NLS_LANG
settings used in the
Command Prompt (DOS box)
Note:
this is the correct setting for the DOS BOX SQL*Plus version,
(sqlplus.exe/
plus80.exe / plus33.exe )
Operating
System Locale
|
Oracle
Client character set (3rd part of NLS_LANG)
|
Arabic
|
AR8ASMO8X
|
Catalan
|
WE8PC850
|
Chinese (PRC)
|
ZHS16GBK
|
Chinese
(Taiwan)
|
ZHT16MSWIN950
|
Czech
|
EE8PC852
|
Danish
|
WE8PC850
|
Dutch
|
WE8PC850
|
English
(United Kingdom)
|
WE8PC850
|
English
(United States)
|
US8PC437
|
Finnish
|
WE8PC850
|
French
|
WE8PC850
|
German
|
WE8PC850
|
Greek
|
EL8PC737
|
Hebrew
|
IW8PC1507
|
Hungarian
|
EE8PC852
|
Italian
|
WE8PC850
|
Japanese
|
JA16SJIS
|
Korean
|
KO16MSWIN949
|
Norwegian
|
WE8PC850
|
Polish
|
EE8PC852
|
Portuguese
|
WE8PC850
|
Romanian
|
EE8PC852
|
Russian
|
RU8PC866
|
Slovak
|
EE8PC852
|
Slovenian
|
EE8PC852
|
Spanish
|
WE8PC850
|
Swedish
|
WE8PC850
|
Turkish
|
TR8PC857
|
Top of the
Document
The
language component of the NLS_LANG
parameter controls the operation of a subset of globalization support
features.
It specifies conventions such as the language used for Oracle messages,
sorting, day names, and month names. Each supported language has a
unique name;
for example, AMERICAN
,
FRENCH
,
or GERMAN
.
The language argument specifies default values for the territory and
character
set arguments. If the language is not specified, then the value
defaults to AMERICAN
.
The
territory component of the NLS_LANG
parameter controls the operation of a subset of globalization support
features.
It specifies conventions such as the default date, monetary, and
numeric
formats. Each supported territory has a unique name; for example, AMERICA
,
FRANCE
,
or CANADA
.
If the territory is not specified, then the value is derived from the
language
value.
Top
of the Document
To find the real
numeric value
for a character stored in the database use the dump command:
The
syntax of the function call is:
DUMP(
[, [, [, ] ]
] )
where:
value
- is the value to be displayed
format
- is a number which describes the format in which bytes of the value
are to be
displayed: 8 - means octal, 10 - means decimal, 16 - means hexadecimal;
other
values between 0 and 16 mean decimal; values greater then 16 are a
little
confusing and mean: print bytes as ASCII characters if they correspond
to
printable ASCII codes, print them as "^x" if they correspond to ASCII
control codes and print them in hexadecimal otherwise; adding 1000 to
the
format number will add character set information for the character data
type
values to the return value offset - is the offset of the first byte of
the
value to display; negative values mean counting from the end length -
is the
number of bytes to display. So for example,
SQL>
SELECT DUMP(col,1016)FROM table;
Typ=1
Len=39 CharacterSet=UTF8: 227,131,143,227,131,170
returns
the value of a column consisting of 3 Japanese characters in UTF8
encoding .
For example the 1st char is 227(*255)+131. You will probably need to
convert this
to UCS2 to verify the codepoint value with the Unicode Standard
codepage.
Normally conversion
is done at
client side for performance reasons. This is true from Version 8.0.4
onwards.
If the database is using a character set not known by the client then
the
conversion is done at server side. This is true from Version 8.1.6
onwards.
?
You see black
squares instead of the
characters you probably don’t have the right font defined for your
codepage. A
font is a collection of glyphs (from "hieroglyphs") that share common
appearance (typeface, character size). A font is used by the operating
system
to convert a numeric value into a graphical representation on screen. A
font
does not necessarily contain a graphical representation for all numeric
values
defined in the code page you are using. That's why you sometimes get
black
squares on the screen if you change fonts and the new font has no
representation for a certain symbol.
The
Windows "Character Set Map" utility can be used to see which glyphs
are parts of a certain font.
On
Windows 2000 and XP:
Start
-> Run...
Type
"charmap", and click "ok".
When
characters are converted between the client and the database character
set, or
vice versa, the character should exist in both. If it does not exist in
the
character set being converted to (the destination) then a replacement
character
is used. Some character sets have specific replacement characters
defined when
translating from other specific character sets but where this is not
done a
default replacement character, such as a ?, is used. Conversion from a
replacement character back to the original character is not possible.
On Windows OS, yes,
On Unix OS's, no. All the database utilties, including Import, Export,
SQL*Loader, SQL*Plus, can act as a UTF-8 client if the OS locale is
UTF-8 (e.g., en_US.UTF-8 on Linux) and NLS_LANG character set is set to
UTF8 or AL32UTF8.
How to check the code
points
managed by a UNIX Operating System?
To know which code point is generated for a character in a Unix
Environment,
you can use the "od" command:
$ echo "" |
od -xc
You can also check the character corresponding to a code point using
the
"echo" command like this:
for Solaris, AIX, HP-UX, TRU64:
$echo '\0351'
for Linux:
$echo -e '\0351'
You can use Locale Builder or a printed code page (see links below) to
verify
that your native code page and NLS_LANG setting properly correspond.
If
there is any ambiguity then use the command above to get the values for
more
than one character. For printed code page:
Top of the
Document
Typically
the NLS_LANG needs to match the MS-DOS OEM code page that could be
retrieved by
typing chcp in a Command Prompt:
C:\>
chcp
Active
code page: 437
C:\>
set NLS_LANG=american_america.US8PC437
For
tools like SQL*Loader you can temporarily change the NLS_LANG to the
character
set of the FILE you are loading. An alternative to changing NLS_LANG is
to
specify the character set of the data in the datafile using the
characterset
keyword in the .ctl file. In that case, SQL*Loader will interpret the
data in
the datafile as that character set regardless of the client character
set
setting of NLS_LANG. Here is an example .ctl file for utf16. This
example ships
in the demo area:
--
Copyright (c) 2001 by Oracle Corporation
-- NAME
-- ulcase11.ctl - Load Data in the Unicode
Character
Set UTF-16
-- DESCRIPTION
-- Loads data in the Unicode character set
UTF-16. The
data is in little
-- Endean byte order. This means that depending
on
whether SQL*Loader is
-- running on a little Endean or a big Endean
system,
it will have to
-- byte swap the UTF-16 character data as
necessary.
This load uses
-- character length semantics, the default for
the
character set UTF-16.
--
-- This case study is modeled after case study
3
(ulcase3), which loads
-- variable length delimited (terminated and
enclosed)
data.
--
-- RETURNS
--
-- NOTES
-- None
-- MODIFIED (MM/DD/YY)
-- rpfau 02/06/01 -
Merged
rpfau_sqlldr_add_case_study_11
-- rpfau 01/30/01 -
Creation
--
LOAD
DATA
CHARACTERSET utf16
BYTEORDER little
INFILE ulcase11.dat
REPLACE
INTO
TABLE EMP
FIELDS TERMINATED BY X'002c' OPTIONALLY ENCLOSED BY X'0022'
(empno integer external (5), ename, job, mgr,
hiredate DATE(20) "DD-Month-YYYY",
sal, comm,
deptno CHAR(5) TERMINATED BY ":",
projno,
loadseq SEQUENCE(MAX,1) )
In
Oracle9i the Export utility always exports user data, including
Unicode
data, in the character set of the database. The Import utility
automatically
converts the data to the character set of the target database.
In
Oracle8i the Export utility exports user data converting them
from the
database character set to the character set of the NLS_LANG of the
Export
session. The Import utility first converts the data to the character
set of the
NLS_LANG of the Import session and then converts them to the character
set of
the target database. Care must be taken that the character set of the
NLS_LANG
for Export and Import sessions contain all characters to be migrated.
This
character set is usually chosen to be either the source database or the
target
database character set and it is usually the same for both Export and
Import
sessions. This choice is recommended especially with multibyte
character sets,
which pose some restrictions on export files. The
Oracle8i
conversions to and from the NLS_LANG character set happen in Oracle9i
for DDL statements contained in the Export file.
Top of the Document
The NLS_LANG on the
server (or
client) has no influence on character set conversion through a database
link.
Oracle will do the conversion from the character set of the source
database to
the character set of the target database (or reverse).
There is nothing
special with
NLS_LANG and the multiple homes on Windows. The parameter taken into
account is
the one specified in the ORACLE_HOME registry key used by the
executable. If
the NLS_LANG is set in the environment, it takes precedence over the
value in
the registry and is used for ALL Oracle_Homes on the server/client.
The NLS_LANG can be
found in
these registry keys:
HKEY_LOCAL_MACHINE\SOFTWARE\ORACLE
or
HKEY_LOCAL_MACHINE\SOFTWARE\ORACLE\HOMEx
On
Windows there are two kinds of tools / applications:
1)
A fully Unicode enabled applications which accepts Unicode codepoints
and which
can render them. It's the application that needs to deal with the
Unicode.
Windows provides the Unicode API but the GUI system itself is NOT
Unicode
"by nature".
A
fully Unicode application can only show one glyph for a given Unicode
code
point. So there is NO confusion possible here, this application will
need to
use a full Unicode font. If you have a full Unicode application, then
you need
to set the NLS_LANG to UTF8.
Note
that there are currently not many applications like this and if not
explicitly
stated by the vendor it's most likely an ANSI application. So don’t set
the
NLS_LANG to UTF8 if you are not sure!
The
only Unicode capable client that is included in the Oracle database is
iSQL*Plus.
2)
Standard ANSI application (like sqlplusw.exe) cannot use Unicode code
points.
So the Unicode code point stored in the database needs to be CONVERTED
to an
ANSI code point based on the correct setting of the NLS_LANG. This
allows
Oracle to map the Unicode code point to the character set of the
client. If the
Unicode code point does not have a mapping to the character set of the
client
then a replacement character is used.
A character set is just an
agreement on what numeric value a symbol has. A computer does not know
‘A’ or
‘B ', it only knows the (binary) numeric value for that symbol, defined
in the
character set used by its Operating System (OS) or in hardware
(firmware) for
terminals. A computer can only manipulate numbers, which is why there
is a need
for character sets. An example is 'ASCII', an old 7 bit character set,
'ROMAN8’ a 8 bit character set on UNIX or 'UTF8’ a multibyte character
set.
A code page is the name for the
Windows/DOS encoding schemes, for Oracle NLS you can consider it the
same as a
character set. You also have to distinguish between a FONT and a
character
set/codepage. A font is used by the OS to convert a numeric value into
a
graphical 'print’ on screen. The
Wingdings Font on Windows is the best example of a font where an ‘A’ is
NOT
shown as an ‘A’ on screen, but for the OS the numeric value represents
an ‘A‘.
So you don't SEE it as an ‘A’, but for Windows it's an ‘A’ and will be
saved
(or used) as an ‘A’.
To better understand the
explanation above, just open MS Word, choose the Wingdings Font, type
your name
(you will see symbols) and save this as html, if you open the html file
with
Notepad you will see that in the