分类: Oracle
2006-11-07 16:25:31
Exp/imp中字符集转换问题
Johnson chen
1.在oracle9i及其以后版本中,对待导入/导出时的字符转换有不同的方式:(根据metalink文档Doc ID: Note:15095.1写)
9i之前的版本:在源数据库的字符集和export的session的NLS_LANG设置不同时,所有数据的字符集(用户数据和字典数据)均会转换;
在import过程中,如果import session的NLS_LANG和export时不一致,将会将dmp中的字符集转换成import session的NLS_LANG设置成的字符集;
当import session的NLS_LANG和目标数据库的字符集不一致时,将会发生公import session的NLS_LANG字符集到目标数据库的字符集转换
9i及之后的版本:在源数据库的字符集和export的session的NLS_LANG设置不同时,只有字典数据会发生字符集转换,用户数据则和源数据库的字符集一致,而忽略NLS_LANG的设置;
在import过程中,如果import session的NLS_LANG和export时不一致,将会将dmp中的字符集转换成import session的NLS_LANG设置成的字符集;
当import session的NLS_LANG和目标数据库的字符集不一致时,将会发生公import session的NLS_LANG字符集到目标数据库的字符集转换
注: 1)如果没有显示的设置NLS_LANG,系统的缺省为US7ASCII。
2)所有的数据转换都有导致数据丢失的可能(丢失发生在转化目标字符集不是源字符集的超集时,针对多字节字符而言),为了减少数据丢失的可能,在进行导入/导出过程中尽量减少字符集的转化,原则是在导出时将NLS_LANG设置成和源数据库一致(v$nls_parameters. NLS_CHARACTERSET),在导入时将NLS_LANG设置成和export时一致,这样就最多发生一次字符转换。
3)NCLOB的导入/导出使用UCS-2/AL16UTF16, NCHAR/NVARCHAR2字符在导入/导出时使用数据库的国家字符集设置;这两类会忽略NLS_LANG的设置
2. 对于dmp 文件,其export时使用的字符集信息位于文件的第二、三个字节中,在windows环境中可以用UltraEdit的十六进制方式打开dmp文件,其第二、三个自己反应了字符集信息
表示该dmp文件是以0354的字符集形式导出的,在数据库中可以用以下命令来获得字符集名称:
SQL> select nls_charset_name(to_number('0354','xxxx')) from dual;
ZHS16GBK
在unix环境中,我们可以用以下命令来获得dmp文件的字符集信息:
c3sun04:(71)/export/home/ccz> cat exp.dmp |od -x|head -1|awk '{print $2 $3}'|cut -c 3-6
0369
同样用命令来获得其字符集名称:
SQL> select nls_charset_name(to_number('0369','xxxx')) from dual;
AL32UTF8
3.字符集的子集/超集关系(Subset/Superset Pairs)Doc ID: Note:119164.1
当字符集转换发生时,如果目标字符集是当前字符集的超集时不会发生数据丢失,此时甚至可以在数据库中用'ALTER DATABASE CHARACTER SET xxxx;'来直接更改数据库的字符集而不发生数据丢失。对照表如下:
=========================== A. Current Char set B. New Char set (Superset of A.) ------------------- ----------------------------- US7ASCII WE8DEC US7ASCII US8PC437 US7ASCII WE8PC850 US7ASCII IN8ISCII US7ASCII WE8PC858 US7ASCII WE8ISO8859P1 US7ASCII EE8ISO8859P2 US7ASCII SE8ISO8859P3 US7ASCII NEE8ISO8859P4 US7ASCII CL8ISO8859P5 US7ASCII AR8ISO8859P6 US7ASCII EL8ISO8859P7 US7ASCII IW8ISO8859P8 US7ASCII WE8ISO8859P9 US7ASCII NE8ISO8859P10 US7ASCII TH8TISASCII US7ASCII BN8BSCII US7ASCII VN8VN3 US7ASCII VN8MSWIN1258 US7ASCII WE8ISO8859P15 US7ASCII WE8NEXTSTEP US7ASCII AR8ASMO708PLUS US7ASCII EL8DEC US7ASCII TR8DEC US7ASCII LA8PASSPORT US7ASCII BG8PC437S US7ASCII EE8PC852 US7ASCII RU8PC866 US7ASCII RU8BESTA US7ASCII IW8PC1507 US7ASCII RU8PC855 US7ASCII TR8PC857 US7ASCII CL8MACCYRILLICS US7ASCII WE8PC860 US7ASCII IS8PC861 US7ASCII EE8MACCES US7ASCII EE8MACCROATIANS US7ASCII TR8MACTURKISHS US7ASCII EL8MACGREEKS US7ASCII IW8MACHEBREWS US7ASCII EE8MSWIN1250 US7ASCII CL8MSWIN1251 US7ASCII ET8MSWIN923 US7ASCII BG8MSWIN US7ASCII EL8MSWIN1253 US7ASCII IW8MSWIN1255 US7ASCII LT8MSWIN921 US7ASCII TR8MSWIN1254 US7ASCII WE8MSWIN1252 US7ASCII BLT8MSWIN1257 US7ASCII N8PC865 US7ASCII BLT8CP921 US7ASCII LV8PC1117 US7ASCII LV8PC8LR US7ASCII LV8RST104090 US7ASCII CL8KOI8R US7ASCII BLT8PC775 US7ASCII WE8DG US7ASCII WE8NCR4970 US7ASCII WE8ROMAN8 US7ASCII WE8MACROMAN8S US7ASCII TH8MACTHAIS US7ASCII HU8CWI2 US7ASCII EL8PC437S US7ASCII LT8PC772 US7ASCII LT8PC774 US7ASCII EL8PC869 US7ASCII EL8PC851 US7ASCII CDN8PC863 US7ASCII HU8ABMOD US7ASCII AR8ASMO8X US7ASCII AR8NAFITHA711T US7ASCII AR8SAKHR707T US7ASCII AR8MUSSAD768T US7ASCII AR8ADOS710T US7ASCII AR8ADOS720T US7ASCII AR8APTEC715T US7ASCII AR8NAFITHA721T US7ASCII AR8HPARABIC8T US7ASCII AR8NAFITHA711 US7ASCII AR8SAKHR707 US7ASCII AR8MUSSAD768 US7ASCII AR8ADOS710 US7ASCII AR8ADOS720 US7ASCII AR8APTEC715 US7ASCII AR8MSAWIN US7ASCII AR8NAFITHA721 US7ASCII AR8SAKHR706 US7ASCII AR8ARABICMACS US7ASCII LA8ISO6937 US7ASCII JA16VMS US7ASCII JA16EUC US7ASCII JA16SJIS US7ASCII KO16KSC5601 US7ASCII KO16KSCCS US7ASCII KO16MSWIN949 US7ASCII ZHS16CGB231280 US7ASCII ZHS16GBK US7ASCII ZHT32EUC US7ASCII ZHT32SOPS US7ASCII ZHT16DBT US7ASCII ZHT32TRIS US7ASCII ZHT16BIG5 US7ASCII ZHT16CCDC US7ASCII ZHT16MSWIN950 US7ASCII AL24UTFFSS US7ASCII UTF8 US7ASCII JA16TSTSET2 US7ASCII JA16TSTSET =============== US7ASCII ZHT16HKSCS US7ASCII KO16TSTSET WE8DEC TR8DEC WE8DEC WE8NCR4970 WE8PC850 WE8PC858 D7DEC D7SIEMENS9780X I7DEC I7SIEMENS9780X WE8ISO8859P1 WE8MSWIN1252 AR8ISO8859P6 AR8ASMO708PLUS AR8ISO8859P6 AR8ASMO8X IW8EBCDIC424 IW8EBCDIC1086 IW8EBCDIC1086 IW8EBCDIC424 LV8PC8LR LV8RST104090 DK7SIEMENS9780X N7SIEMENS9780X N7SIEMENS9780X DK7SIEMENS9780X I7SIEMENS9780X I7DEC D7SIEMENS9780X D7DEC WE8NCR4970 WE8DEC WE8NCR4970 TR8DEC AR8SAKHR707T AR8SAKHR707 AR8MUSSAD768T AR8MUSSAD768 AR8ADOS720T AR8ADOS720 AR8NAFITHA711 AR8NAFITHA711T AR8SAKHR707 AR8SAKHR707T AR8MUSSAD768 AR8MUSSAD768T AR8ADOS710 AR8ADOS710T AR8ADOS720 AR8ADOS720T AR8APTEC715 AR8APTEC715T AR8NAFITHA721 AR8NAFITHA721T AR8ARABICMAC AR8ARABICMACT AR8ARABICMACT AR8ARABICMAC KO16KSC5601 KO16MSWIN949 WE16DECTST2 WE16DECTST WE16DECTST WE16DECTST2 =============== US7ASCII BLT8ISO8859P13 US7ASCII CEL8ISO8859P14 US7ASCII CL8ISOIR111 US7ASCII CL8KOI8U US7ASCII AL32UTF8 BLT8CP921 BLT8ISO8859P13 US7ASCII AR8MSWIN1256 UTF8 AL32UTF8 (added in patchset Character Set Subset/Superset Pairs Obsolete from ======================================================= US7ASCII AR8MSAWIN AR8ARABICMAC AR8ARABICMACT =============== US7ASCII JA16EUCTILDE US7ASCII JA16SJISTILDE US7ASCII ZHS32GB18030 US7ASCII ZHT32EUCTST WE8ISO8859P9 TR8MSWIN1254 LT8MSWIN921 BLT8ISO8859P13 LT8MSWIN921 BLT8CP921 BLT8CP921 LT8MSWIN921 AR8ARABICMAC AR8ARABICMACT ZHT32EUC ZHT32EUCTST UTF8 AL32UTF8 Character Set Subset/Superset Pairs Obsolete from ======================================================= LV8PC8LR LV8RST104090 |