Chinaunix首页 | 论坛 | 博客
  • 博客访问: 4187797
  • 博文数量: 240
  • 博客积分: 11504
  • 博客等级: 上将
  • 技术积分: 4277
  • 用 户 组: 普通用户
  • 注册时间: 2006-12-28 14:24
文章分类

全部博文(240)

分类: Python/Ruby

2014-05-14 09:26:13

大部分数据库都提供了窗口函数,比如RANK,ROW_NUMBER等等。 MySQL 这方面没有直接提供,但是可以变相的实现,我以前写了row_number 的实现,今天有时间把 rank 的实现贴出来。


这里,我用MySQL 以及Python 分别实现了rank 窗口函数。
原始表信息:

点击(此处)折叠或打开

  1. t_girl=# \d group_concat;
  2.            Table "ytt.group_concat"
  3.   Column | Type | Modifiers
  4. ----------+-----------------------+-----------
  5.  rank | integer |
  6.  username | character varying(20) |


表数据

点击(此处)折叠或打开

  1. t_girl=# select * from group_concat;
  2.  rank | username
  3. ------+----------
  4.   100 | Lucy
  5.   127 | Lucy
  6.   146 | Lucy
  7.   137 | Lucy
  8.   104 | Lucy
  9.   121 | Lucy
  10.   136 | Lily
  11.   100 | Lily
  12.   100 | Lily
  13.   105 | Lily
  14.   136 | Lily
  15.   149 | ytt
  16.   116 | ytt
  17.   116 | ytt
  18.   149 | ytt
  19.   106 | ytt
  20.   117 | ytt
  21. (17 rows)


  22. Time: 0.638 ms

PostgreSQL 的rank 窗口函数示例:

点击(此处)折叠或打开

  1. t_girl=# select username,rank,rank() over(partition by username order by rank desc) as rank_cnt from group_concat;
  2.  username | rank | rank_cnt
  3. ----------+------+----------
  4.  Lily | 136 | 1
  5.  Lily | 136 | 1
  6.  Lily | 105 | 3
  7.  Lily | 100 | 4
  8.  Lily | 100 | 4
  9.  Lucy | 146 | 1
  10.  Lucy | 137 | 2
  11.  Lucy | 127 | 3
  12.  Lucy | 121 | 4
  13.  Lucy | 104 | 5
  14.  Lucy | 100 | 6
  15.  ytt | 149 | 1
  16.  ytt | 149 | 1
  17.  ytt | 117 | 3
  18.  ytt | 116 | 4
  19.  ytt | 116 | 4
  20.  ytt | 106 | 6
  21. (17 rows)


  22. Time: 131.150 ms

MySQL 提供了group_concat 聚合函数可以变相的实现:

点击(此处)折叠或打开

  1. mysql>
  2. select a.username, a.rank, find_in_set(a.rank,b.rank_gp) as rank_cnt
  3. from group_concat as a ,
  4. (select username,group_concat(rank order by rank desc separator ',') as rank_gp from group_concat group by username
  5. ) b
  6. where a.username = b.username order by a.username asc,a.rank desc;
  7. +----------+------+----------+
  8. | username | rank | rank_cnt |
  9. +----------+------+----------+
  10. | Lily | 136 | 1 |
  11. | Lily | 136 | 1 |
  12. | Lily | 105 | 3 |
  13. | Lily | 100 | 4 |
  14. | Lily | 100 | 4 |
  15. | Lucy | 146 | 1 |
  16. | Lucy | 137 | 2 |
  17. | Lucy | 127 | 3 |
  18. | Lucy | 121 | 4 |
  19. | Lucy | 104 | 5 |
  20. | Lucy | 100 | 6 |
  21. | ytt | 149 | 1 |
  22. | ytt | 149 | 1 |
  23. | ytt | 117 | 3 |
  24. | ytt | 116 | 4 |
  25. | ytt | 116 | 4 |
  26. | ytt | 106 | 6 |
  27. +----------+------+----------+
  28. 17 rows in set (0.02 sec)

当然了,如果MySQL SQL不太熟悉,可以用程序来处理,比如我下面用python 实现了rank 函数,执行结果如下:(脚本源代码最后



点击(此处)折叠或打开

  1. >>> ================================ RESTART ================================
  2. >>>
  3.  username | rank | rank_cnt
  4. --------------------------------
  5. ytt |149 |1
  6. ytt |149 |1
  7. ytt |117 |3
  8. ytt |116 |4
  9. ytt |116 |4
  10. ytt |106 |6
  11. Lucy |146 |1
  12. Lucy |137 |2
  13. Lucy |127 |3
  14. Lucy |121 |4
  15. Lucy |104 |5
  16. Lucy |100 |6
  17. Lily |136 |1
  18. Lily |136 |2
  19. Lily |105 |3
  20. Lily |100 |4
  21. Lily |100 |4
  22. (17 Rows.)
  23. Time: 0.162 Seconds.



附上脚本代码:

点击(此处)折叠或打开

  1. from __future__ import print_function
  2. from datetime import date, datetime, timedelta
  3. import mysql.connector
  4. import time
  5. # Created by ytt 2014/5/14.
  6. # Rank function implement.
  7. def db_connect(is_true):
  8.     cnx = mysql.connector.connect(host='192.168.1.131',port='3306',user='python_user', password='python_user',database='t_girl',autocommit=is_true)
  9.     return cnx
  10. def db_rs_rank(c1 ='username desc' ,c2 = ' rank desc'):
  11.     # c1: partition column.
  12.     # c2: sort column.
  13.     time_start = time.time()
  14.     cnx = db_connect(True)
  15.     rs = cnx.cursor()
  16.     query0 = "select username,rank from group_concat order by " + c1 + ", " + c2
  17.     rs.execute(query0,multi=False)
  18.     if rs.with_rows:
  19.         rows = rs.fetchall()
  20.     else:
  21.         return "No rows affected."
  22.     i = 0
  23.     j = 0
  24.     k = 1
  25.     result = []
  26.     field1_compare = rows[0][0]
  27.     field2_compare = rows[0][1]
  28.     while i < len(rows):
  29.         if field1_compare == rows[i][0]:
  30.             j += 1
  31.             if field2_compare != rows[i][1]:
  32.                 field2_compare =rows[i][1]
  33.                 k = j
  34.             result.append((rows[i][0],rows[i][1],k))
  35.         else:
  36.             j = 1
  37.             k = 1
  38.             field1_compare = rows[i][0]
  39.             result.append((rows[i][0],rows[i][1],k))
  40.         i += 1
  41.     i = 0
  42.     rows_header = list(rs.column_names)
  43.     rows_header.append('rank_cnt')
  44.     print (rows_header[0].center(10,' ') + '|' + rows_header[1].center(10,' ') + '|' + rows_header[2].center(10,' '))
  45.     print ('-'.center(32,'-'))
  46.     while i < len(result):
  47.         print (result[i][0].ljust(10,' ') + '|' + str(result[i][1]).ljust(10,' ') + '|' + str(result[i][2]).ljust(10,' '))
  48.         i += 1
  49.     rs.close()
  50.     cnx.close()
  51.     time_end = time.time()
  52.     print ('(' + str(len(rows))+ ' Rows.)')
  53.     print ("Time:" + str(round((time_end-time_start),3)).rjust(10,' ') + ' Seconds.')
  54. if __name__=='__main__':
  55.     db_rs_rank()



阅读(6988) | 评论(0) | 转发(0) |
给主人留下些什么吧!~~