You can use the GROUP BY clause to divide the rows in a table into groups. You can then use the group functions to return summary information for each group. Group functions can appear in select lists and in ORDER BY and HAVING clauses. The Oracle Server applies the group functions to each group of rows and returns a single result row for each group.
・Each of the group functions AVG, SUM, MAX, MIN, COUNT, STDDEV, and VARIANCE accept one argument.
・The functions AVG, SUM, STDDEV, and VARIANCE operate only on numeric values(只运算数字类型的值).
:: The data types for the arguments can be CHAR, VARCHAR2, NUMBER, or DATE.
:: All group functions except COUNT(*) ignore null values. To substitute(为NULL替换) a value for null values, use the NVL function. COUNT returns either a number or zero.
:: The Oracle Server implicitly sorts(隐含) the results set in ascending order of the grouping columns specified(ASC), when you use a GROUP BY clause. To override this default ordering, you can use DESC in an ORDER BY clause.
Groups are formed and group functions are calculated before the HAVING clause is applied to the groups. The HAVING clause can precede the GROUP BY clause, but it is recommended that you place the GROUP BY clause first because it is more logical.
Note: When working with ROLLUP and CUBE, make sure that the columns following the GROUP BY clause have meaningful, real-life relationships with each other; otherwise the operators return irrelevant information(不相关的信息).
・The ROLLUP and CUBE operators are available only in Oracle8i and later releases.
The ROLLUP operator delivers aggregates and superaggregates for expressions within a GROUP BY statement. The ROLLUP operator can be used by report writers to extract statistics and summary information from results sets.(从结果集中得到精确的计算和信息概要) The cumulative aggregates can be used in reports, charts,and graphs.
The ROLLUP operator creates groupings by moving in one direction(从一个方向移动), from right to left, along the list of columns specified in the GROUP BY clause. It then applies the aggregate function to these groupings.
Note: To produce subtotals in n dimensions (that is, n columns in the GROUP BY clause) without a ROLLUP operator, n+1 SELECT statements must be linked with UNION ALL. (如果没有rollup操作,而产生N元小计值,要使用N+1个select操作用NUION ALL连接)This makes the query execution inefficient, because each of the SELECT statements causes table access. The ROLLUP operator gathers its results with just one table access. The ROLLUP operator is useful if there are many columns involved(许多与列有关的) in producing the subtotals.
The ROLLUP operator creates subtotals that roll up from the most detailed level to a grand total, following the grouping list specified in the GROUP BY clause. First it calculates the standard aggregate values for the groups specified in the GROUP BY clause (in the example, the sum of salaries grouped on each job within a department). Then it creates progressively higher-level subtotals, moving from right to left through the list of grouping columns. (In the preceding example, the sum of salaries for each department is calculated, followed by the sum of salaries for all departments.)
:: Given n expressions in the ROLLUP operator of the GROUP BY clause, the operation results in n + 1 = 2 + 1 = 3 groupings.
:: Rows based on the values of the first n expressions are called rows or regular rows and the others are called superaggregate rows.
The CUBE operator is an additional switch in the GROUP BY clause in a SELECT statement. The CUBE operator can be applied to all aggregate functions(能够用于所有聚集函数), including AVG, SUM, MAX, MIN, and COUNT. It is used to produce results sets that are typically used for cross-tabular reports. While ROLLUP produces only a fraction of possible subtotal combinations, CUBE produces subtotals for all possible combinations of groupings specified in the GROUP BY clause, and a grand total.
The CUBE operator is used with an aggregate function to generate additional rows in a results set. Columns included in the GROUP BY clause are cross-referenced to produce a superset of groups. The aggregate function specified in the select list is applied to these groups to produce summary values for the additional superaggregate rows. The number of extra groups in the results set is determined by the number of columns included in the GROUP BY clause.
In fact, every possible combination of the columns or expressions in the GROUP BY clause is used to produce superaggregates.
If you have n columns or expressions in the GROUP BY clause, there will be 2n possible superaggregate combinations. Mathematically, these combinations form an n-dimensional cube, which is how the operator got its name.
By using application or programming tools, these superaggregate values can then be fed into charts and graphs that convey results and relationships visually and effectively.
In the preceding example, all rows indicated as 1 are regular rows, all rows indicated as 2 and 4 are superaggregate rows, and all rows indicated as 3 are cross-tabulation values.
The CUBE operator has also performed the ROLLUP operation to display the subtotals for those departments whose department ID is less than 60 and the total
salary for those departments whose department ID is less than 60, irrespective of the job titles. Additionally, the CUBE operator displays the total salary for every job irrespective of the department.
※Note: Similar to the ROLLUP operator, producing subtotals in n dimensions (that is, n columns in the GROUP BY clause) without a CUBE operator requires 2n SELECT statements to be linked with UNION ALL. Thus, a report with three dimensions requires 2的3次方= 8 SELECT statements to be linked with UNION ALL.
■GROUPING 函数
SELECT [column,] group_function(column) ,
GROUPING(expr)
FROM table
[WHERE condition]
[GROUP BY [ROLLUP][CUBE] group_by_expression]
[HAVING having_expression]
[ORDER BY column];
:: GROUPING 函数既能够被用于 CUBE 操作,也可以被用于 ROLLUP 操作
:: 用 GROUPING 函数模拟能够发现在一行中的构成小计的分组
:: 用 GROUPING 函数,你能够从 ROLLUP 或 CUBE 创建的空值中区分存储的 NULL 值
:: GROUPING 函数返回 0 或 1
The GROUPING Function
The GROUPING function can be used with either the CUBE or ROLLUP operator to help you understand how a summary value has been obtained.
The GROUPING function uses a single column as its argument(使用单一列). The expr(表达式) in the GROUPING function must match one of the expressions in the GROUP BY clause. The function returns a value of 0 or 1.The values returned by the GROUPING function(被这个函数返回的值经常用于) are useful to:
:: Determine the level of aggregation of a given subtotal; that is, the group or groups on which the subtotal is based//确定组的小计值
:: Identify whether a NULL value in the expression column of a row of the result set indicates: 识别在结果集的列的表达式列中NULL值的说明:
-A NULL value from the base table (stored NULL value)
-A NULL value created by ROLLUP/CUBE (as a result of a group function on that expression)
A value of 0 returned by the GROUPING function based on an expression indicates one of the following:
:: The expression has been used to calculate the aggregate value.表达式已计算出聚集值.
:: The NULL value in the expression column is a stored NULL value.表达式中的NULL值是存储的NULL值
A value of 1 returned by the GROUPING function based on an expression indicates one of the following:
:: The expression has not been used to calculate the aggregate value.
:: The NULL value in the expression column is created by ROLLUP or CUBE as a result of grouping.
eg:
①
SELECT department_id DEPTID, job_id JOB,
SUM(salary),
GROUPING(department_id) GRP_DEPT,
GROUPING(job_id) GRP_JOB
FROM employees
WHERE department_id < 50
GROUP BY ROLLUP(department_id, job_id);
DEPTID JOB SUM(SALARY) GRP_DEPT GRP_JOB
---------- ---------- ----------- ---------- ----------
10 AD_ASST 4400 0 0
10 4400 0 1
20 MK_MAN 13000 0 0
20 MK_REP 6000 0 0
20 19000 0 1
30 PU_MAN 11000 0 0
30 PU_CLERK 13900 0 0
30 24900 0 1
40 HR_REP 6500 0 0
40 6500 0 1
54800 1 1
②
SQL> SELECT department_id DEPTID, job_id JOB,
2 SUM(salary),
3 GROUPING(department_id) GRP_DEPT
4 FROM employees
5 WHERE department_id < 50
6 GROUP BY ROLLUP(department_id, job_id);
DEPTID JOB SUM(SALARY) GRP_DEPT
---------- ---------- ----------- ----------
10 AD_ASST 4400 0
10 4400 0
20 MK_MAN 13000 0
20 MK_REP 6000 0
20 19000 0
30 PU_MAN 11000 0
30 PU_CLERK 13900 0
30 24900 0
40 HR_REP 6500 0
40 6500 0
54800 1
Instructor Note
Explain that if the same example is run with the CUBE operator, it returns a results set that has 1 for GROUPING(department_id) and 0 for GROUPING(job_id) in the cross-tabulation rows, because the subtotal values are the result of grouping on job irrespective of department number.
SQL> SELECT department_id DEPTID, job_id JOB,
2 SUM(salary),
3 GROUPING(department_id) GRP_DEPT,
4 GROUPING(job_id) GRP_JOB
5 FROM employees
6 WHERE department_id < 50
7 GROUP BY CUBE(department_id, job_id);
8 ORDER BY department_id;
DEPTID JOB SUM(SALARY) GRP_DEPT GRP_JOB
---------- ---------- ----------- ---------- ----------
10 AD_ASST 4400 0 0
10 4400 0 1
20 MK_MAN 13000 0 0
20 MK_REP 6000 0 0
20 19000 0 1
30 PU_CLERK 13900 0 0
30 PU_MAN 11000 0 0
30 24900 0 1
40 HR_REP 6500 0 0
40 6500 0 1
AD_ASST 4400 1 0
HR_REP 6500 1 0
MK_MAN 13000 1 0
MK_REP 6000 1 0
PU_CLERK 13900 1 0
PU_MAN 11000 1 0
54800 1 1
■分组集合(GROUPING SETS)
:: GROUPING SETS 是 GROUP BY 子句更进一步的扩展
:: 你能够用 GROUPING SETS 在同一查询中定义多重分组
:: Oracle 服务器计算在 GROUPING SETS 子句中指定的所有分组,并且用 UNION ALL 操作组合单个的分组结果
:: 分组集合的效率:
-对基表仅进行一个查询
-不需要写复杂的 UNION 语句
-GROUPING SETS 有更多的元素,更好的执行性能
GROUPING SETS
GROUPING SETS are a further extension of the GROUP BY clause that let you specify multiple groupings of data(可以指定多个组的数据). Doing so facilitates efficient aggregation and hence facilitates analysis of data across multiple dimensions.
A single SELECT statement can now be written using 'GROUPING SETS' to specify various groupings (that can also include ROLLUP or CUBE operators), rather than multiple SELECT statements combined by UNION ALL operators. For example, you can say:
eg:
①
SELECT department_id, job_id, manager_id, AVG(salary)
FROM employees
GROUP BY
GROUPING SETS
((department_id, job_id, manager_id),(department_id, manager_id),(job_id, manager_id));
DEPARTMENT_ID JOB_ID MANAGER_ID AVG(SALARY)
------------- ---------- ---------- -----------
SA_REP 149 7000
10 AD_ASST 101 4400
20 MK_MAN 100 13000
20 MK_REP 201 6000
30 PU_MAN 100 11000
30 PU_CLERK 114 2780
40 HR_REP 101 6500
50 ST_MAN 100 7280
50 SH_CLERK 120 2900
50 ST_CLERK 120 2625
50 SH_CLERK 121 3675
50 ST_CLERK 121 2675
50 SH_CLERK 122 3200
50 ST_CLERK 122 2700
50 SH_CLERK 123 3475
50 ST_CLERK 123 3000
50 SH_CLERK 124 2825
50 ST_CLERK 124 2925
60 IT_PROG 102 9000
60 IT_PROG 103 4950
70 PR_REP 101 10000
80 SA_MAN 100 12200
80 SA_REP 145 8500
80 SA_REP 146 8500
80 SA_REP 147 7766.66667
80 SA_REP 148 8650
80 SA_REP 149 8600
90 AD_PRES 24000
90 AD_VP 100 17000
100 FI_MGR 101 12000
100 FI_ACCOUNT 108 7920
110 AC_MGR 101 12000
110 AC_ACCOUNT 205 8300
149 7000
10 101 4400
20 100 13000
20 201 6000
30 100 11000
30 114 2780
40 101 6500
50 100 7280
50 120 2762.5
50 121 3175
50 122 2950
50 123 3237.5
50 124 2875
60 102 9000
60 103 4950
70 101 10000
80 100 12200
80 145 8500
80 146 8500
80 147 7766.66667
80 148 8650
80 149 8600
90 24000
90 100 17000
100 101 12000
100 108 7920
110 101 12000
110 205 8300
AD_VP 100 17000
AC_MGR 101 12000
FI_MGR 101 12000
HR_REP 101 6500
MK_MAN 100 13000
MK_REP 201 6000
PR_REP 101 10000
PU_MAN 100 11000
SA_MAN 100 12200
SA_REP 145 8500
SA_REP 146 8500
SA_REP 147 7766.66667
SA_REP 148 8650
SA_REP 149 8333.33333
ST_MAN 100 7280
AD_ASST 101 4400
AD_PRES 24000
IT_PROG 102 9000
IT_PROG 103 4950
PU_CLERK 114 2780
SH_CLERK 120 2900
SH_CLERK 121 3675
SH_CLERK 122 3200
SH_CLERK 123 3475
SH_CLERK 124 2825
ST_CLERK 120 2625
ST_CLERK 121 2675
ST_CLERK 122 2700
ST_CLERK 123 3000
ST_CLERK 124 2925
AC_ACCOUNT 205 8300
FI_ACCOUNT 108 7920
This statement calculates aggregates over three groupings:
(department_id, job_id, manager_id), (department_id, manager_id) and (job_id, manager_id)
Without this enhancement in Oracle9i, multiple queries combined together with UNION ALL are required to get the output of the preceding SELECT statement. A multiquery approach is inefficient, for it requires multiple scans of the same data.
②
Compare the preceding statement with this alternative:
1,
SELECT department_id, job_id, manager_id, AVG(salary)
FROM employees
GROUP BY CUBE(department_id, job_id, manager_id);
The preceding statement computes all the 8 (2 *2 *2) groupings, though only the groups (department_id, job_id, manager_id), (department_id, manager_id) and (job_id, manager_id)are of interest to you.
2,
SELECT department_id, job_id, manager_id, AVG(salary)
FROM employees
GROUP BY //移过来.
GROUPING SETS
((department_id, job_id, manager_id),(department_id, manager_id),(job_id, manager_id));
Another alternative is the following statement: //结果与上面的一样
3,
SELECT department_id, job_id, manager_id, AVG(salary)
FROM employees
GROUP BY department_id, job_id, manager_id
UNION ALL
SELECT department_id, NULL, manager_id, AVG(salary)
FROM employees
GROUP BY department_id, manager_id
UNION ALL
SELECT NULL, job_id, manager_id, AVG(salary)
FROM employees
GROUP BY job_id, manager_id;
This statement requires three scans of the base table, making it inefficient.
※※
※※CUBE and ROLLUP can be thought of as grouping sets with very specific semantics. The following equivalencies show this fact:
CUBE (a,b,c) GROUPING SETS //2的3次方
is equivalent to ((a,b,c),(a,b),(a,c),(b,c),(a),(b),(c)())
ROLLUP(a,b,c) GROUPING SETS
is equivalent to ((a,b,c),(a,b),(a),())
③
SELECT department_id, job_id,
manager_id,avg(salary)
FROM employees
GROUP BY GROUPING SETS
((department_id,job_id), (job_id,manager_id));
DEPARTMENT_ID JOB_ID MANAGER_ID AVG(SALARY)
------------- ---------- ---------- -----------
SA_REP 7000
10 AD_ASST 4400
20 MK_MAN 13000
20 MK_REP 6000
30 PU_MAN 11000
30 PU_CLERK 2780
40 HR_REP 6500
50 ST_MAN 7280
50 SH_CLERK 3215
50 ST_CLERK 2785
60 IT_PROG 5760
70 PR_REP 10000
80 SA_MAN 12200
80 SA_REP 8396.55172
90 AD_VP 17000
90 AD_PRES 24000
100 FI_MGR 12000
100 FI_ACCOUNT 7920
110 AC_MGR 12000
110 AC_ACCOUNT 8300
AD_VP 100 17000
AC_MGR 101 12000
FI_MGR 101 12000
HR_REP 101 6500
MK_MAN 100 13000
MK_REP 201 6000
PR_REP 101 10000
PU_MAN 100 11000
SA_MAN 100 12200
SA_REP 145 8500
SA_REP 146 8500
SA_REP 147 7766.66667
SA_REP 148 8650
...
The query in the slide calculates aggregates over two groupings. The table is divided into the following groups:
-Department ID, Job ID
-Job ID, Manager ID
The average salaries for each of these groups are calculated. The results set displays average salary for each of the two groups.
In the output, the group marked as 1 can be interpreted as:(标记为1的说明)
:: The average salary of all employees with the job ID AD_ASST in the department 10 is 4400.
:: The average salary of all employees with the job ID MK_MAN in the department 20 is 13000.
:: The average salary of all employees with the job ID MK_REP in the department 20 is 6000.
:: The average salary of all employees with the job ID ST_CLERK in the department 50 is 2925 and so on.
DEPARTMENT_ID JOB_ID MANAGER_ID AVG(SALARY)
10 AD_ASST 4400
20 MK_MAN 13000 1
20 MK_REP 6000
50 ST_CLERK 2925
...
...
The group marked as 2 in the output is interpreted as:
:: The average salary of all employees with the job ID FI_MGR, who report to the manager with the manager ID 101, is 12000.
:: The average salary of all employees with the job ID HR_REP, who report to the manager with the manager ID 101, is 6500, and so on.
DEPARTMENT_ID JOB_ID MANAGER_ID AVG(SALARY)
------------- ---------- ---------- -----------
FI_MGR 101 12000 2
HR_REP 101 6500
The example in the slide can also be written as:
SELECT department_id, job_id, NULL as manager_id,
AVG(salary) as AVGSAL
FROM employees
GROUP BY department_id, job_id
UNION ALL
SELECT NULL, job_id, manager_id, avg(salary) as AVGSAL
FROM employees
GROUP BY job_id, manager_id;
In the absence of an optimizer that looks across query blocks to generate the execution plan, the preceding query would need two scans of the base table, EMPLOYEES. This could be very inefficient. Hence the usage of the GROUPING SETS statement is recommended.
■复合列(Composite Columns )
复合列是一个作为整体被处理的列集合
:: 为了指定复合列,用 GROUP BY 子句来分组在圆括号内的列,因此,Oracle 服务器在进行 ROLLUP 或 CUBE 操作时将它们作为一个整体来处理
::当使用 ROLLUP 或 CUBE 时,复合列将跳过在确定级别上的集合
A composite column is a collection of columns that are treated as a unit during the computation of groupings. You specify the columns in parentheses(括号) as in the following statement:
ROLLUP (a, (b, c), d) (b,c)就是一个复合列,并且做为一个单元处理.
Here, (b,c) form a composite column and are treated as a unit. In general, composite columns are useful in ROLLUP, CUBE, and GROUPING SETS. For example, in CUBE or ROLLUP, composite columns would mean skipping aggregation across certain levels.
※That is, GROUP BY ROLLUP(a, (b, c))is equivalent to
GROUP BY a, b, c UNION ALL
GROUP BY a UNION ALL
GROUP BY ()
Here, (b, c) are treated as a unit and rollup will not be applied across (b, c). It is as if you have an alias, for example z, for (b, c), and the GROUP BY expression reduces to GROUP BY ROLLUP(a, z).
Note: GROUP BY( ) is typically a SELECT statement with NULL values for the columns a and b and only the aggregate function. This is generally used for generating the grand totals.
SELECT NULL, NULL, aggregate_col
GROUP BY ( );
※
Compare this with the normal ROLLUP as in:
GROUP BY ROLLUP(a, b, c)
which would be
GROUP BY a, b, c UNION ALL
GROUP BY a, b UNION ALL
GROUP BY a UNION ALL
GROUP BY ().
※
Similarly,
GROUP BY CUBE((a, b), c)
would be equivalent to
GROUP BY a, b, c UNION ALL
GROUP BY a, b UNION ALL
GROUP BY c UNION ALL
GROUP By ()
※※
The following table shows grouping sets specification and equivalent GROUP BY specification.
GROUPING SETS Satements Equivalent GROUP BY Statements
GROUP BY a UNION ALL
GROUP BY GROUPING SETS(a,b,c) GROUP BY b UNION ALL
GROUP BY c
GROUP BY GROUPING SETS(a,b,(b,c)) GROUP BY a UNION ALL
(The GROUPING SETS expression has a GROUP BY b UNION ALL
composite column) GROUP BY b,c
GROUP BY GROUPING SETS((a,b,c)) GROUP BY a,b,c
GROUP BY a UNION ALL
GROUP BY GROUPING SETS(a,(b),()) GROUP BY b UNION ALL
GROPY BY ()
GROUP BY GROUPING SETS(a,ROLLUP(b,c)) GROUP BY a UNION ALL
(The GROUPING SETS expression has a GROUP BY ROLLUP(b,c)
composite column)
eg:
1)
SELECT department_id, job_id, manager_id, SUM(salary)
FROM employees
GROUP BY ROLLUP( department_id,(job_id, manager_id));
把(job_id, manager_id)看成一个单元处理.
grouping by :
1. (department_id,job_id,manager_id)
2. (department_id)
3. ( )
2)
Consider the example:
SELECT department_id, job_id,manager_id, SUM(salary)
FROM employees
GROUP BY ROLLUP( department_id,job_id, manager_id);
The preceding query results in the Oracle Server computing the following groupings:
1. (department_id, job_id, manager_id)
2. (department_id, job_id)
3. (department_id)
4. ( ) //所有的sum(salary)
■连接分组
:: 连接分组提供一种简明的方式来生成有用的分组组合
:: 为了指定连接分组集合,用逗号分开多重分组集合, ROLLUP,和 CUBE 操作,以便 Oracle 服务器将它们组合在一个单个的 GROUP BY 子句中
:: 分组是每个分组集合交叉乘积的结果
Concatenated Columns
Concatenated groupings offer a concise way to generate useful combinations of groupings. The concatenated groupings are specified simply by listing multiple grouping sets, cubes, and rollups, and separating them with commas. Here is an example of concatenated grouping sets:
GROUP BY GROUPING SETS(a, b), GROUPING SETS(c, d)
The preceding SQL defines the following groupings:
(a, c), (a, d), (b, c), (b, d)
Concatenation of grouping sets is very helpful for these reasons:
:: Ease of query development: you need not manually enumerate all groupings//不用手动列举所有的组.
:: Use by applications: SQL generated by OLAP applications often involves concatenation of grouping sets, with each grouping set defining groupings needed for a dimension
eg:
SELECT department_id, job_id, manager_id,
SUM(salary)
FROM employees
GROUP BY department_id,ROLLUP(job_id),CUBE(manager_id);
Concatenated Groupings Example
The example in the slide results in the following groupings:
-(department_id, manager_id, job_id )
-(department_id, manager_id)
-(department_id, job_id)
-(department_id)
The total salary for each of these groups is calculated.
The example in the slide displays the following:
:: Total salary for every department, job ID, manager
:: Total salary for every department, manager ID
:: Total salary for every department, job ID
:: Total salary for every department