About me:Oracle ACE pro,optimistic,passionate and harmonious. Focus on ORACLE,MySQL and other database programming,peformance tuning,db design, j2ee,Linux/AIX,Architecture tech,etc
全部博文(172)
分类: Oracle
2020-06-30 09:11:46
LISTAGG函数是11g R2引入用于合并字符串(列转行),可替换wmsys.wm_concat功能,而且效率更好。wm_concat函数是自定义聚集函数,而且是undocument的,从12C开始已经去掉,从11g R2开始,有字符串合并的需求,最好使用LISTAGG替代,以获得更好的性能。
但是,LISTAGG函数在11g R2中有个缺点,就是不能直接DISTINCT,在ORACLE 19C中,ORACLE给它增加了DISTINCT功能,这样可以剔除重复的字符串合并。19C的完整LISTAGG语法如下:
从语法图上可以看出,LISTAGG函数可以是普通的组函数,也可以用于分析函数,并且12C开始增加了OVERFLOW语法,用于字符串过长的处理。
如下例所示:
需求:对emp表,按照部门分组,按逗号合并部门员工名。
使用LISTAGG实现如下:
select deptno, listagg(ename,',') within group(order by deptno) as enames
from emp
group by deptno
order by deptno;
DEPTNO ENAMES
------------------------------------------------------------------------------
10 CLARK,KING,MILLER
20 ADAMS,FORD,JONES,SCOTT,SMITH
30 ALLEN,BLAKE,JAMES,MARTIN,TURNER,WARD
下面给DEPTNO=10的部门插入2个重复名字的员工,如下所示:
INSERT INTO emp VALUES (8000,'KING','ANALYST',7782,to_date('1983-1-1','yyyy-mm-dd'),2000,NULL,10);
INSERT INTO emp VALUES (9000,'KING','MANADER',7782,to_date('1984-5-1','yyyy-mm-dd'),2500,NULL,10);
COMMIT;
再次查询,发现有重复的数据:
SQL> select deptno, listagg(ename,',') within group(order by deptno) as enames
2 from emp
3 group by deptno
4 order by deptno;
DEPTNO ENAMES
------ --------------------------------------------------------------------------------
10 CLARK,KING,KING,KING,MILLER
20 ADAMS,FORD,JONES,SCOTT,SMITH
30 ALLEN,BLAKE,JAMES,MARTIN,TURNER,WARD
一般遇到这种有重复数据的情况,需要剔除重复数据,在19C之前,需要先剔除重复数据:
--红色部分使用分析函数剔重,当然这里直接distinct也可以,结果与最前面的一致
select deptno, listagg(ename, ',') within group(order by deptno) as enames
from (select deptno,
ename,
row_number() over(partition by deptno, ename order by empno) rn
from emp)
where rn = 1
group by deptno
order by deptno;
或者直接DISTINCT :
select deptno, listagg(ename, ',') within group(order by deptno) as enames
from (select distinct deptno,
ename
from emp)
group by deptno
order by deptno;
DEPTNO ENAMES
-------------------------------------------------------------------------------
10 CLARK,KING,MILLER
20 ADAMS,FORD,JONES,SCOTT,SMITH
30 ALLEN,BLAKE,JAMES,MARTIN,TURNER,WARD
到了19C,LISTAGG提供了直接DISTINCT的功能,可以很简单地实现剔除重复数据,然后合并,如下所示:
select deptno, listagg(distinct ename,',') within group(order by deptno) as enames
from emp
group by deptno
order by deptno;
DEPTNO ENAMES
-------------------------------------------------------------------------------
10 CLARK,KING,MILLER
20 ADAMS,FORD,JONES,SCOTT,SMITH
30 ALLEN,BLAKE,JAMES,MARTIN,TURNER,WARD
默认情况下包含所有数据(ALL不用写默认),也就是LISTAGG(ALL column),如下所示:
select deptno, listagg(all ename,',') within group(order by deptno) as enames
from emp
group by deptno
order by deptno;
如果使用wm_concat,则是:
select deptno, wm_concat(distinct ename) as enames
from emp
group by deptno
order by deptno;
DEPTNO ENAMES
------ --------------------------------------------------------------------------------
10 CLARK,KING,MILLER
20 ADAMS,FORD,JONES,SCOTT,SMITH
30
ALLEN,BLAKE,JAMES,MARTIN,TURNER,WARD
wm_concat这个undocument函数可以直接distinct,但是它与LISTAGG比缺点很明显,一是性能差,而是undocument函数,比较如下:
1)wm_concat可以distinct去除重复的(用于分析函数,distinct不能带order by),19C之前的listagg不可以,可以先剔除重复,然后做listagg,19C之后的listagg可以distinct。
2)wm_concat不能保证排序,listagg可以保证排序.
3.WMSYS.WM_CONCAT是undocument的函数,最好不要使用,要使用也用自定义分析函数
LISTAGG函数在12.2开始,合并长度最多32767字节,依赖于MAX_STRING_SIZE参数,如下所示:
1) 如果MAX_STRING_SIZE=EXTEND,则对于VARCHAR2和RAW类型,最多返回32767字节
2) 如果MAX_STRING_SIZE=STANDARD,则对于VARCHAR2最多4000字节,对于RAW类型最多2000字节
那么在合并的字符串超过限制,溢出时,默认报错,但是12.2引入了ON OVERFLOW可以截断处理,如下:
ON OVERFLOW TRUNCATE默认对溢出数据后面用…(count),如下所示:
SELECT deptno, LISTAGG(ename, ',' ON OVERFLOW TRUNCATE) WITHIN GROUP (ORDER BY deptno) AS enames
FROM (select deptno,ename from emp,(select level from dual connect by level<1000))
GROUP BY deptno
ORDER BY deptno;
DEPTNO ENAMES
------ ----------------------------------------------------------------------------------------------------
10 CLARK,CLARK,CLARK,CLARK,CLARK,CLARK,CLARK,CLARK,CLARK,CLARK,CLARK,CLARK,CLARK,CLARK,CLARK,CLARK,CLAR
省略
ARK,CLARK,CLARK,CLARK,CLARK,CLARK,CLARK,CLARK,CLARK,CLARK,CLARK,CLARK,CLARK,CLARK,CLARK,CLARK,CLARK,
CLARK,CLARK,CLARK,CLARK,CLARK,CLARK,CLARK,CLARK,CLARK,CLARK,CLARK,...(4334)
20 ADAMS,ADAMS,ADAMS,ADAMS,ADAMS,ADAMS,ADAMS,ADAMS,ADAMS,ADAMS,ADAMS,ADAMS,ADAMS,ADAMS,ADAMS,ADAMS,ADAM
S,ADAMS,ADAMS,ADAMS,ADAMS,ADAMS,ADAMS,ADAMS,ADAMS,ADAMS,ADAMS,ADAMS,ADAMS,ADAMS,ADAMS,ADAMS,ADAMS,AD
AMS,ADAMS,ADAMS,ADAMS,ADAMS,ADAMS,ADAMS,ADAMS,ADAMS,ADAMS,ADAMS,ADAMS,ADAMS,ADAMS,ADAMS,ADAMS,ADAMS,
省略
ADAMS,ADAMS,ADAMS,ADAMS,ADAMS,ADAMS,ADAMS,ADAMS,ADAMS,ADAMS,ADAMS,...(4334)
30 ALLEN,ALLEN,ALLEN,ALLEN,ALLEN,ALLEN,ALLEN,ALLEN,ALLEN,ALLEN,ALLEN,ALLEN,ALLEN,ALLEN,ALLEN,ALLEN,ALLE
N,ALLEN,ALLEN,ALLEN,ALLEN,ALLEN,ALLEN,ALLEN,ALLEN,ALLEN,ALLEN,ALLEN,ALLEN,ALLEN,ALLEN,ALLEN,ALLEN,AL
LEN,ALLEN,ALLEN,ALLEN,ALLEN,ALLEN,ALLEN,ALLEN,ALLEN,ALLEN,ALLEN,ALLEN,ALLEN,ALLEN,ALLEN,ALLEN,ALLEN,
省略
LEN,ALLEN,ALLEN,ALLEN,ALLEN,ALLEN,ALLEN,ALLEN,ALLEN,ALLEN,ALLEN,ALLEN,ALLEN,ALLEN,ALLEN,ALLEN,ALLEN,
ALLEN,ALLEN,ALLEN,ALLEN,ALLEN,ALLEN,ALLEN,ALLEN,ALLEN,ALLEN,ALLEN,...(5333)
其他用法如下:
--1.改变省略的格式,换成~~~
SELECT deptno, LISTAGG(ename, ',' ON OVERFLOW TRUNCATE '~~~') WITHIN GROUP (ORDER BY deptno) AS enames
FROM (select deptno,ename from emp,(select level from dual connect by level<1000))
GROUP BY deptno
ORDER BY deptno;
--2.通过WINTHOUT COUNT省略掉计数
SELECT deptno, LISTAGG(ename, ',' ON OVERFLOW TRUNCATE WITHOUT COUNT) WITHIN GROUP (ORDER BY deptno) AS enames
FROM (select deptno,ename from emp,(select level from dual connect by level<1000))
GROUP BY deptno
ORDER BY deptno;
--3.默认格式ON OVERFLOW ERROW,溢出则报错ORA-01489: result of string concatenation is too long
SELECT deptno, LISTAGG(ename, ',' ON OVERFLOW ERROR) WITHIN GROUP (ORDER BY deptno) AS enames
FROM (select deptno,ename from emp,(select level from dual connect by level<1000))
GROUP BY deptno
ORDER BY deptno;
针对超过字符串长度溢出的问题,如果要完整显示,可以使用如下方式:
1) 自定义wm_concat返回值类型为CLOB
2) 使用XMLAGG函数获取CLOB值
下面使用XMLAGG函数处理超长合并字符串问题,如下所示:
SELECT deptno, RTRIM(xmlagg(xmlelement(c, ename || ',')
ORDER BY deptno).extract('//text()').getclobval(),
',') AS enames
FROM (select deptno,ename from emp,(select level from dual connect by level<1000))
GROUP BY deptno
ORDER BY deptno;
总结:ORACLE 19C针对LISTAGG增加了很多改进,在以后有字符串合并需求(列转行)的时候,要优先使用LISTAGG,而不是WM_CONCAT,对于自定义聚集函数,性能较差,而且功能也没有LISTAGG强大。