当今的直播行业越来越火爆,相对应的需求就越来越多,如今天的:求直播同时在线人数的最大值
原始数据
0: jdbc:hive2://localhost:10000> select * from online_problem;
+--------------------+----------------------+----------------------+
| online_problem.id | online_problem.stt | online_problem.edt |
+--------------------+----------------------+----------------------+
| 1001 | 2021-06-14 12:12:12 | 2021-06-14 18:12:12 |
| 1003 | 2021-06-14 13:12:12 | 2021-06-14 16:12:12 |
| 1004 | 2021-06-14 13:15:12 | 2021-06-14 20:12:12 |
| 1002 | 2021-06-14 15:12:12 | 2021-06-14 16:12:12 |
| 1005 | 2021-06-14 15:18:12 | 2021-06-14 20:12:12 |
| 1001 | 2021-06-14 20:12:12 | 2021-06-14 23:12:12 |
| 1006 | 2021-06-14 21:12:12 | 2021-06-14 23:15:12 |
| 1007 | 2021-06-14 22:12:12 | 2021-06-14 23:10:12 |
+--------------------+----------------------+----------------------+
8 rows selected (0.192 seconds)
需求分析
乍一看对于这种静态的数据似乎无从下手,但是要学会跳出问题,我们从流式计算的角度看问题,在流式计算中我们如何求当前在线人数呢?其实很简单,只要主播上线就 +1,有主播下线就 -1。那对于上述数据我们可以按上线下线时间拆分同时拓展一个字段,如果是上线就是 1,下线就是 -1,最后按照时间排序按行求和即可找出最大的直播人数
SQL 实现
拆分数据并拓展列
0: jdbc:hive2://localhost:10000> select id, stt dt, 1 flag
. . . . . . . . . . . . . . . .> from online_problem t
. . . . . . . . . . . . . . . .> union
. . . . . . . . . . . . . . . .> select id, edt dt, -1 flag
. . . . . . . . . . . . . . . .> from online_problem t;
+---------+----------------------+-----------+
| _u1.id | _u1.dt | _u1.flag |
+---------+----------------------+-----------+
| 1001 | 2021-06-14 12:12:12 | 1 |
| 1001 | 2021-06-14 18:12:12 | -1 |
| 1001 | 2021-06-14 20:12:12 | 1 |
| 1001 | 2021-06-14 23:12:12 | -1 |
| 1002 | 2021-06-14 15:12:12 | 1 |
| 1002 | 2021-06-14 16:12:12 | -1 |
| 1003 | 2021-06-14 13:12:12 | 1 |
| 1003 | 2021-06-14 16:12:12 | -1 |
| 1004 | 2021-06-14 13:15:12 | 1 |
| 1004 | 2021-06-14 20:12:12 | -1 |
| 1005 | 2021-06-14 15:18:12 | 1 |
| 1005 | 2021-06-14 20:12:12 | -1 |
| 1006 | 2021-06-14 21:12:12 | 1 |
| 1006 | 2021-06-14 23:15:12 | -1 |
| 1007 | 2021-06-14 22:12:12 | 1 |
| 1007 | 2021-06-14 23:10:12 | -1 |
+---------+----------------------+-----------+
16 rows selected (14.63 seconds)
按时间排序,累加flag到当前行
0: jdbc:hive2://localhost:10000> select id,dt,sum(flag) over (order by dt) crs
. . . . . . . . . . . . . . . .> from (
. . . . . . . . . . . . . . . .> select id, stt dt, 1 flag
. . . . . . . . . . . . . . . .> from online_problem t
. . . . . . . . . . . . . . . .> union
. . . . . . . . . . . . . . . .> select id, edt dt, -1 flag
. . . . . . . . . . . . . . . .> from online_problem t
. . . . . . . . . . . . . . . .> ) t1;
+-------+----------------------+------+
| id | dt | crs |
+-------+----------------------+------+
| 1001 | 2021-06-14 12:12:12 | 1 |
| 1003 | 2021-06-14 13:12:12 | 2 |
| 1004 | 2021-06-14 13:15:12 | 3 |
| 1002 | 2021-06-14 15:12:12 | 4 |
| 1005 | 2021-06-14 15:18:12 | 5 |
| 1003 | 2021-06-14 16:12:12 | 3 |
| 1002 | 2021-06-14 16:12:12 | 3 |
| 1001 | 2021-06-14 18:12:12 | 2 |
| 1005 | 2021-06-14 20:12:12 | 1 |
| 1001 | 2021-06-14 20:12:12 | 1 |
| 1004 | 2021-06-14 20:12:12 | 1 |
| 1006 | 2021-06-14 21:12:12 | 2 |
| 1007 | 2021-06-14 22:12:12 | 3 |
| 1007 | 2021-06-14 23:10:12 | 2 |
| 1001 | 2021-06-14 23:12:12 | 1 |
| 1006 | 2021-06-14 23:15:12 | 0 |
+-------+----------------------+------+
16 rows selected (31.464 seconds)
从上面这个结果我们可以看出主播的上下线情况,那同时在线的最大任务就是 max(crs) 了
0: jdbc:hive2://localhost:10000> select max(crs)
. . . . . . . . . . . . . . . .> from (
. . . . . . . . . . . . . . . .> select id, dt, sum(flag) over (order by dt) crs
. . . . . . . . . . . . . . . .> from (
. . . . . . . . . . . . . . . .> select id, stt dt, 1 flag
. . . . . . . . . . . . . . . .> from online_problem t
. . . . . . . . . . . . . . . .> union
. . . . . . . . . . . . . . . .> select id, edt dt, -1 flag
. . . . . . . . . . . . . . . .> from online_problem t
. . . . . . . . . . . . . . . .> ) t1
. . . . . . . . . . . . . . . .> ) t2;
+------+
| _c0 |
+------+
| 5 |
+------+
1 row selected (41.981 seconds)
SQL 系列文章可能就要告一段落了!完结撒花!!!
评论区