一直在思考,记录一点一滴
分类: Mysql/postgreSQL
2017-06-09 10:24:18
网上搜索 group by,结果一大堆,千篇一律 ……
现在从浅入深的讲一下 group by的应用
group by 简单来讲,就是分组,配合计算使用,计算方法如SUM、count、avg、max、min等…
应用举例 : 从交易表中查询出今天有交易的商户账户。
查询order_amount表中,是否有指定的账户 : 使用group by 就能做到 : select account_id from order_amount
group by account_id;
同样distinct 也能做到 : 使用 distinct 也能做到 : select distinct account_id from
order_amount ;
如上使用 group by 和 distinct 的效率是一样的,如果是查找是否存在,建议使用 order by + limit 1,扫描到第一条数据就完成,效率最高
select
account_id from
order_amount where
account_id =
xxx limit 1,
举例1:select count(distinct account_id) from order_amount ;
举例2:select count(account_id) from order_amount group by account_id;
感兴趣的话,可以自己运行一下。
差异:
group by : 先分组,再计算
distinct :先去重,再计算,distinct 只是去重,不太适合条件和计算类查找
举例,获取一批商户,符合指定条件的最后一个订单详情(如:12点到15点之间,参与秒杀商户的最后一个订单)
方法1:
1、select * from (select * from order_amount where xxx yyy zzz) as sel group by xy
方法2:
2、select * from order_amount group by xy having id = (select max(id) from order_amount where xxx yyy zzz)
注 distinct 无法完成本类需求,只能静静的看着 group by 表演了。
select * from (select * from order_amount where xxx yyy zzz) as sel group by xy
优点:
可以实现功能
缺点:
1、group by 中的,除了算法使用的字段和group by 以外的字段,其它字段的值是随机的,默认获取的是选择查询索引(where或者group by)的第一条符合分组的记录填充的。
2、当子查询的结果非常大的时候,数据库服务器的临时表空间会用完,因此多余的查询数据会丢失
3、子查询生成的临时表,没有索引可用,如果临时表数据很大,则主select语句的效率也很低
4、子查询结果很大的时候,生成临时表的时间也很长
select * from order_amount group by xy having id = (select max(id) from order_amount where xxx yyy zzz)
优点:
可以实现功能,合理的利用了having 语句,查询结果集很小,无临时表空间占满的问题
缺点:
1、效率偏低。
select * from order_amount where id in (select max(id) from order_amount where xxx yyy zzz group by xy)
优点:
可以实现功能,查询结果集很小,无临时表空间占满的问题,效率应该比网友指路要好很多
缺点:
不能说没有缺点,暂时是最好的选择。
1、group by 非计算列,非group by列,如何自行控制?
解决方法:子查询,子查询按需进行排序
select * from (select * from order_amount order by complete_time desc, create_time asc where xxx yyy zzz) as sel group by xy
2、group by 也是优先使用索引。
3、group by 一次可以完成多个函数,可以通过多个字段进行分组
select count(amount) as cnt, SUM(amount) as total_amount, avg(amount) as avg_amount, max(id) as max_id, min(id) as min_id, xy, za, hs from order_amount group by xy, za,hs
4、同时可以使用 with rollup再获取上级汇总
select count(amount) as cnt, SUM(amount) as total_amount, avg(amount) as avg_amount, max(id) as max_id, min(id) as min_id, xy, za, hs from order_amount group by xy, za,hs with rollup
5、group by 之后的结果也可以排序,并非select的条件,且不影响select的结果。
select count(amount) as cnt, SUM(amount) as total_amount, avg(amount) as avg_amount, max(id) as max_id, min(id) as min_id, xy, za, hs from order_amount group by xy asc, za desc,hs asc
6、使用 group by 的时候,难免会用到子查询,一定要严格审视子查询结果的大小和性能