> 데이터 베이스 > MySQL 튜토리얼 > Hive分析窗口函数(五) GROUPING SETS,GROUPING__ID,CUBE,ROLLUP

Hive分析窗口函数(五) GROUPING SETS,GROUPING__ID,CUBE,ROLLUP

WBOY
풀어 주다: 2016-06-07 14:51:08
원래의
1767명이 탐색했습니다.

1.GROUPING SETS与另外哪种方式等价? 2.根据GROUP BY的维度的所有组合进行聚合由哪个关键字完成? 3.ROLLUP与ROLLUP关系是什么? GROUPING SETS,GROUPING__ID,CUBE,ROLLUP这几个分析函数通常用于OLAP中,不能累加,而且需要根据不同维度上钻和下钻的指标统

1.GROUPING SETS与另外哪种方式等价?
2.根据GROUP BY的维度的所有组合进行聚合由哪个关键字完成?

3.ROLLUP与ROLLUP关系是什么?


GROUPING SETS,GROUPING__ID,CUBE,ROLLUP 这几个分析函数通常用于OLAP中,不能累加,而且需要根据不同维度上钻和下钻的指标统计,比如,分小时、天、月的UV数。 Hive版本为 apache-hive-0.13.1 数据准备:
    2015-03,2015-03-10,cookie1
    2015-03,2015-03-10,cookie5
    2015-03,2015-03-12,cookie7
    2015-04,2015-04-12,cookie3
    2015-04,2015-04-13,cookie2
    2015-04,2015-04-13,cookie4
    2015-04,2015-04-16,cookie4
    2015-03,2015-03-10,cookie2
    2015-03,2015-03-10,cookie3
    2015-04,2015-04-12,cookie5
    2015-04,2015-04-13,cookie6
    2015-04,2015-04-15,cookie3
    2015-04,2015-04-15,cookie2
    2015-04,2015-04-16,cookie1

    CREATE EXTERNAL TABLE lxw1234 (
    month STRING,
    day STRING,
    cookieid STRING
    ) ROW FORMAT DELIMITED
    FIELDS TERMINATED BY ','
    stored as textfile location '/tmp/lxw11/';


    hive> select * from lxw1234;
    OK
    2015-03 2015-03-10      cookie1
    2015-03 2015-03-10      cookie5
    2015-03 2015-03-12      cookie7
    2015-04 2015-04-12      cookie3
    2015-04 2015-04-13      cookie2
    2015-04 2015-04-13      cookie4
    2015-04 2015-04-16      cookie4
    2015-03 2015-03-10      cookie2
    2015-03 2015-03-10      cookie3
    2015-04 2015-04-12      cookie5
    2015-04 2015-04-13      cookie6
    2015-04 2015-04-15      cookie3
    2015-04 2015-04-15      cookie2
    2015-04 2015-04-16      cookie1
로그인 후 복사

GROUPING SETS
在一个GROUP BY查询中,根据不同的维度组合进行聚合,等价于将不同维度的GROUP BY结果集进行UNION ALL
    SELECT
    month,
    day,
    COUNT(DISTINCT cookieid) AS uv,
    GROUPING__ID
    FROM lxw1234
    GROUP BY month,day
    GROUPING SETS (month,day)
    ORDER BY GROUPING__ID;

    month      day            uv      GROUPING__ID
    ------------------------------------------------
    2015-03    NULL            5       1
    2015-04    NULL            6       1
    NULL       2015-03-10      4       2
    NULL       2015-03-12      1       2
    NULL       2015-04-12      2       2
    NULL       2015-04-13      3       2
    NULL       2015-04-15      2       2
    NULL       2015-04-16      2       2


    等价于
    SELECT month,NULL,COUNT(DISTINCT cookieid) AS uv,1 AS GROUPING__ID FROM lxw1234 GROUP BY month
    UNION ALL
    SELECT NULL,day,COUNT(DISTINCT cookieid) AS uv,2 AS GROUPING__ID FROM lxw1234 GROUP BY day
로그인 후 복사

再如:
    SELECT
    month,
    day,
    COUNT(DISTINCT cookieid) AS uv,
    GROUPING__ID
    FROM lxw1234
    GROUP BY month,day
    GROUPING SETS (month,day,(month,day))
    ORDER BY GROUPING__ID;

    month         day             uv      GROUPING__ID
    ------------------------------------------------
    2015-03       NULL            5       1
    2015-04       NULL            6       1
    NULL          2015-03-10      4       2
    NULL          2015-03-12      1       2
    NULL          2015-04-12      2       2
    NULL          2015-04-13      3       2
    NULL          2015-04-15      2       2
    NULL          2015-04-16      2       2
    2015-03       2015-03-10      4       3
    2015-03       2015-03-12      1       3
    2015-04       2015-04-12      2       3
    2015-04       2015-04-13      3       3
    2015-04       2015-04-15      2       3
    2015-04       2015-04-16      2       3


    等价于
    SELECT month,NULL,COUNT(DISTINCT cookieid) AS uv,1 AS GROUPING__ID FROM lxw1234 GROUP BY month
    UNION ALL
    SELECT NULL,day,COUNT(DISTINCT cookieid) AS uv,2 AS GROUPING__ID FROM lxw1234 GROUP BY day
    UNION ALL
    SELECT month,day,COUNT(DISTINCT cookieid) AS uv,3 AS GROUPING__ID FROM lxw1234 GROUP BY month,day
로그인 후 복사

其中的 GROUPING__ID,表示结果属于哪一个分组集合。

CUBE
根据GROUP BY的维度的所有组合进行聚合。
    SELECT
    month,
    day,
    COUNT(DISTINCT cookieid) AS uv,
    GROUPING__ID
    FROM lxw1234
    GROUP BY month,day
    WITH CUBE
    ORDER BY GROUPING__ID;


    month                              day             uv     GROUPING__ID
    --------------------------------------------
    NULL            NULL            7       0
    2015-03         NULL            5       1
    2015-04         NULL            6       1
    NULL            2015-04-12      2       2
    NULL            2015-04-13      3       2
    NULL            2015-04-15      2       2
    NULL            2015-04-16      2       2
    NULL            2015-03-10      4       2
    NULL            2015-03-12      1       2
    2015-03         2015-03-10      4       3
    2015-03         2015-03-12      1       3
    2015-04         2015-04-16      2       3
    2015-04         2015-04-12      2       3
    2015-04         2015-04-13      3       3
    2015-04         2015-04-15      2       3



    等价于
    SELECT NULL,NULL,COUNT(DISTINCT cookieid) AS uv,0 AS GROUPING__ID FROM lxw1234
    UNION ALL
    SELECT month,NULL,COUNT(DISTINCT cookieid) AS uv,1 AS GROUPING__ID FROM lxw1234 GROUP BY month
    UNION ALL
    SELECT NULL,day,COUNT(DISTINCT cookieid) AS uv,2 AS GROUPING__ID FROM lxw1234 GROUP BY day
    UNION ALL
    SELECT month,day,COUNT(DISTINCT cookieid) AS uv,3 AS GROUPING__ID FROM lxw1234 GROUP BY month,day
로그인 후 복사

ROLLUP
是CUBE的子集,以最左侧的维度为主,从该维度进行层级聚合。
    比如,以month维度进行层级聚合:
    SELECT
    month,
    day,
    COUNT(DISTINCT cookieid) AS uv,
    GROUPING__ID  
    FROM lxw1234
    GROUP BY month,day
    WITH ROLLUP
    ORDER BY GROUPING__ID;

    month                              day             uv     GROUPING__ID
    ---------------------------------------------------
    NULL             NULL            7       0
    2015-03          NULL            5       1
    2015-04          NULL            6       1
    2015-03          2015-03-10      4       3
    2015-03          2015-03-12      1       3
    2015-04          2015-04-12      2       3
    2015-04          2015-04-13      3       3
    2015-04          2015-04-15      2       3
    2015-04          2015-04-16      2       3

    可以实现这样的上钻过程:
    月天的UV->月的UV->总UV

复制代码

    --把month和day调换顺序,则以day维度进行层级聚合:

    SELECT
    day,
    month,
    COUNT(DISTINCT cookieid) AS uv,
    GROUPING__ID  
    FROM lxw1234
    GROUP BY day,month
    WITH ROLLUP
    ORDER BY GROUPING__ID;


    day                                month              uv     GROUPING__ID
    -------------------------------------------------------
    NULL            NULL               7       0
    2015-04-13      NULL               3       1
    2015-03-12      NULL               1       1
    2015-04-15      NULL               2       1
    2015-03-10      NULL               4       1
    2015-04-16      NULL               2       1
    2015-04-12      NULL               2       1
    2015-04-12      2015-04            2       3
    2015-03-10      2015-03            4       3
    2015-03-12      2015-03            1       3
    2015-04-13      2015-04            3       3
    2015-04-15      2015-04            2       3
    2015-04-16      2015-04            2       3

    可以实现这样的上钻过程:
    天月的UV->天的UV->总UV
    (这里,根据天和月进行聚合,和根据天聚合结果一样,因为有父子关系,如果是其他维度组合的话,就会不一样)
로그인 후 복사


관련 라벨:
원천:php.cn
본 웹사이트의 성명
본 글의 내용은 네티즌들의 자발적인 기여로 작성되었으며, 저작권은 원저작자에게 있습니다. 본 사이트는 이에 상응하는 법적 책임을 지지 않습니다. 표절이나 침해가 의심되는 콘텐츠를 발견한 경우 admin@php.cn으로 문의하세요.
인기 튜토리얼
더>
최신 다운로드
더>
웹 효과
웹사이트 소스 코드
웹사이트 자료
프론트엔드 템플릿