Chinaunix首页 | 论坛 | 博客
  • 博客访问: 2880837
  • 博文数量: 599
  • 博客积分: 16398
  • 博客等级: 上将
  • 技术积分: 6875
  • 用 户 组: 普通用户
  • 注册时间: 2009-11-30 12:04
个人简介

WINDOWS下的程序员出身,偶尔也写一些linux平台下小程序, 后转行数据库行业,专注于ORACLE和DB2的运维和优化。 同时也是ios移动开发者。欢迎志同道合的朋友一起研究技术。 数据库技术交流群:58308065,23618606

文章分类

全部博文(599)

文章存档

2014年(12)

2013年(56)

2012年(199)

2011年(105)

2010年(128)

2009年(99)

分类: Oracle

2011-11-14 14:10:07

10gR2 new feature: Case Insensitive Sorts & Compares

 

One handy new feature of 10gR2 is its vastly expanded ability to do case-insensitive sorts and compares (a subset of its expanded ability with other special sorts and compares, such as special linguistic sorts).

To get us started, here's a simple sample table with a few rows in it.

SQL> select * from invoice_demo order by invoice_id

SQL> /

 

INVOICE_ID SUPPLIER_NAME

---------- ------------------------------

      1000 Max Books

      1001 MAX BOOKS

      1002 max books

      1003 janus pet supply

The default sort order is always binary, meaning that the characters are sorted in the order of their character number in the character set (aka code point or encoded value). For example, compare the sort order on SUPPLIER_NAME with the character number of the first character in the field:

SQL> select invoice_id, supplier_name, ascii(substr(supplier_name,1,1))

  2  from  invoice_demo

  3  order by supplier_name

SQL> /

 

INVOICE_ID SUPPLIER_NAME        ASCII(SUBSTR(SUPPLIER_NAME,1,1))

---------- -------------------- --------------------------------

      1001 MAX BOOKS                                          77

      1000 Max Books                                          77

      1003 janus pet supply                                  106

      1002 max books                                         109

 

7 rows selected.

All the lower-case characters occur after all the upper-case characters in ASCII, WE89ISO8859P1, and other Latin character sets, so 'M' comes before 'j'. A pretty common requirement is to do a case-insensitive sort, where 'j' would come before 'M'. Oracle has offered various ways to do this since at least 9i, but 10gR2 improves on the offering significantly.

In all versions since at least 9i, you can do a case insensitive sort by setting NLS_COMP to ANSI and choosing a case insensitive sort order. (The default setting for NLS_COMP is BINARY, which produces the binary sort we saw in the first query.)

Choosing a case-insensitive sort order is easy. You can take any of the Oracle-defined linguistic sort orders - the default is BINARY - and append "_CI" to it. For example:

SQL> alter session set NLS_COMP='ANSI' ;

 

Session altered.

 

SQL> alter session set NLS_SORT='BINARY_CI' ;

 

Session altered.

 

SQL> select * from invoice_demo order by supplier_name ;

 

INVOICE_ID SUPPLIER_NAME

---------- ------------------------------

      1003 janus pet supply

      1001 MAX BOOKS

      1000 Max Books

      1002 max books

Setting NLS_COMP to ANSI causes Oracle to use the sort order specified in NLS_SORT when doing an ORDER BY. But one big limitation is that when NLS_COMP is set to ANSI, only certain SQL functions and operations will use the NLS_SORT sort order. The rest will still use the default BINARY sort order. For example, the = (equality) operator will do a case-insensitive comparison, but the "like" operator will not:

SQL> select * from invoice_demo

  2  where supplier_name='MAX BOOKS';

 

INVOICE_ID SUPPLIER_NAME

---------- ------------------------------

      1000 Max Books

      1001 MAX BOOKS

      1002 max books

 

SQL> select * from invoice_demo where supplier_name like 'M%';

 

INVOICE_ID SUPPLIER_NAME

---------- ------------------------------

      1000 Max Books

      1001 MAX BOOKS

And neither will most other SQL functions and operators; for example:

SQL> select min(supplier_name) from invoice_demo ;

 

MIN(SUPPLIER_NAME)

------------------------------

MAX BOOKS

 

SQL> select distinct (supplier_name) from invoice_demo

  2  where substr(supplier_name,1,1)='M' ;

 

SUPPLIER_NAME

------------------------------

Max Books

max books

MAX BOOKS

10gR2 introduced a new value for NLS_COMP which extends the NLS_SORT sort order to cover all SQL sorts and comparisons. Here are the results of the same queries as above, with NLS_COMP set to LINGUISTIC.

SQL> alter session set nls_comp='LINGUISTIC';

 

Session altered.

 

SQL> select * from invoice_demo

  2  where supplier_name like 'M%';

 

INVOICE_ID SUPPLIER_NAME

---------- ------------------------------

      1000 Max Books

      1001 MAX BOOKS

      1002 max books

 

SQL> select min(supplier_name) from invoice_demo ;

 

MIN(SUPPLIER_NAME)

------------------------------------------

janus pet supply

 

SQL> select distinct (supplier_name) from invoice_demo

  2  where substr(supplier_name,1,1)='M' ;

 

SUPPLIER_NAME

------------------------------

Max Books

With NLS_COMP set to LINGUISTIC, every SQL sort and comparison uses the NLS_SORT linguistic sort order. So, if you have set your sort order to a case insensitive one by appending _CI, all sorts and comparisons in that session will be case insensitive. Very handy!

You might want to mix case-sensitive and case-insensitive behaviour in the same query. For example, I might want to group case-sensitively and sort case-insensitively. First I'll add a bit more data to the table to make this example clearer:

SQL> select * from invoice_demo

  2  where supplier_name like 'm%'

  3  order by invoice_id

 

INVOICE_ID SUPPLIER_NAME

---------- -------------------------------

      1000 Max Books

      1001 MAX BOOKS

      1002 max books

      1010 MAX BOOKS

      1011 max books

In this example, I want to know how many occurrences of each supplier_name show up in the table, case-sensitively, to see how the data is being entered; but I still want 'j' to come before 'M' in the results. So I want to group case-sensitively, and sort case-insensitively. The easiest way to do this is to use the SQL function NLSSORT. This function takes a string and an NLS sort order, and returns a hex value. The following example shows the values returned by NLSSORT for the BINARY sort order, and for the BINARY_CI sort order, side-by-side.

SQL> select invoice_id, supplier_name,

  2  nlssort(supplier_name,'NLS_SORT=BINARY'),

  3  nlssort(supplier_name,'NLS_SORT=BINARY_CI')

  4  from invoice_demo

  5  where supplier_name like 'm%'

  6  order by invoice_id

 

SQL> /

 

INVOICE_ID SUPPLIER_NAME        NLSSORT(SUPPLIER_NAME,'NLS_SORT=BIN NLSSORT(SUPPLIER_NAME,'NLS_SORT=BIN

---------- -------------------- ----------------------------------- -----------------------------------

      1000 Max Books            4D617820426F6F6B7300                6D617820626F6F6B7300

      1001 MAX BOOKS            4D415820424F4F4B5300                6D617820626F6F6B7300

      1002 max books            6D617820626F6F6B7300                6D617820626F6F6B7300

      1010 MAX BOOKS            4D415820424F4F4B5300                6D617820626F6F6B7300

      1011 max books            6D617820626F6F6B7300                6D617820626F6F6B7300

 

5 rows selected.

This function makes it easy to combine case-sensitive and case-insensitive sorts and comparisons in the same query. In this example, the SQL would be:

SQL> alter session set nls_comp='binary' ;

 

Session altered.

 

SQL> alter session set nls_sort='binary' ;

 

Session altered.

 

SQL> select supplier_name, count(*)

  2  from invoice_demo

  3  group by supplier_name

  4  order by nlssort(supplier_name,'NLS_SORT=BINARY_CI') ;

 

SUPPLIER_NAME          COUNT(*)

-------------------- ----------

janus pet supply              1

MAX BOOKS                     2

max books                     2

Max Books                     1

 

4 rows selected.

A normal index uses the default sort order, by default, and so it's useless in a case-insensitive search - and Oracle won't use it. For large tables, the resulting full table scan can be quite a performance penalty. Fortunately, it's easy to create an index that uses a specific sort order. You simply create a function-based index that uses the NLSSORT function we saw above.

SQL> create index ind_idemo_sname_ci

  2  on invoice_demo (nlssort(supplier_name, 'NLS_SORT=BINARY_CI'))

SQL> /

 

Index created.

The main thing to watch out for here is that because this is a function-based index, it is subject to all the restrictions on function-based indexes. For example, Oracle will not use the index if it's theoretically possible for the result set to include NULLS in the indexed column. Oracle will only use the index if either (1) the column that the index is on is declared as NOT NULL, or (2) you have a where clause in the query that specifies that the function on the indexed column return NOT NULL, such as WHERE NLSSORT(SUPPLIER_NAME,'NLS_SORT=BINARY_CI') IS NOT NULL .

A related feature is what one might call "accent insensitivity," or the ability to ignore diacritics in the sort order. For example, you might want "à la mode" (note the accent over the "a") and "a la mode" (no accent) to sort to the same position. Just as you can append _CI to any of Oracle's linguistic sort orders, you can append _AI to get case insensitivity and accent insensitivity.

For this example, we'll need some data with accented characters:

INVOICE_ID SUPPLIER_NAME

---------- --------------------

      2002 École Lyonnaise

      2000 ecole lyonnaise

      2001 école lyonnaise

In WE8ISO8859P1, and other Latin character sets, characters with diacritics come after all of the regular ASCII characters. So the "E"s with accents will sort after everything else in the default binary sort order.

SQL> alter session set nls_sort='BINARY';

 

Session altered.

 

SQL> select supplier_name from invoice_demo order by supplier_name ;

 

SUPPLIER_NAME

--------------------

MAX BOOKS

MAX BOOKS

Max Books

ecole lyonnaise

janus pet supply

max books

max books

École Lyonnaise

école lyonnaise

 

9 rows selected.

If we use the BINARY_CI sort order, "ecole lyonnaise" will sort before "Max Books", but "école lyonnaise" will still come last.

SQL> alter session set nls_sort='BINARY_CI' ;

 

Session altered.

 

SQL> select supplier_name

  2  from invoice_demo order by supplier_name ;

 

SUPPLIER_NAME

--------------------

ecole lyonnaise

janus pet supply

Max Books

max books

max books

MAX BOOKS

MAX BOOKS

École Lyonnaise

école lyonnaise

 

9 rows selected.

But if we use the "accent insensitive" sort order BINARY_AI, we get both case and accent insensitivity:

SQL> alter session set nls_sort='BINARY_AI' ;

 

Session altered.

 

SQL> select supplier_name

  2  from invoice_demo order by supplier_name ;

 

SUPPLIER_NAME

--------------------

ecole lyonnaise

École Lyonnaise

école lyonnaise

janus pet supply

max books

Max Books

max books

MAX BOOKS

MAX BOOKS

 

9 rows selected.

The new NLS_COMP=LINGUISTIC setting doesn't just apply to case-insensitive and accent-insensitive binary searches. Any of Oracle's linguistic sort orders can be used with it. For example, you can choose NLS_SORT=XSPANISH to force all SQL to use traditional Spanish sort and comparison semantics, in which "ch" and "ll" are both treated as one character - so "llaves" does not meet the condition "like 'l%' ", for example. And you can apply the _CI and _AI suffixes to any linguistic sort order, eg. NLS_SORT=XSPANISH_CI .

Refer to Chapter 5 of the Oracle 10gR2 Database Globalization Support Guide.

Natalka Roshak is a senior Oracle and Sybase database administrator, analyst, and architect. She is based in Kingston, Ontario, and consults across North America. More of her scripts and tips can be found in her online DBA toolkit at .

 

阅读(1172) | 评论(0) | 转发(0) |
给主人留下些什么吧!~~