Hive 某个id对特定字段出现2条记录的id ,既访问过pagea,又访问过pageb的设备总数是多少

Hive,表名tb,含有2个字段,uuid是设备号,page是访问的页面名称,一个设备每访问一次页面会往该表插入一条记录,求 既访问过pagea,又访问过pageb的设备总数是多少(是去重后的设备总数)?

最早写的是:

SQL1:

select
count( distinct t1.did) 
from
(
    select
    did
    from
    tb2
    where 
    page='A'
)tb3
inner join 
(
    select
    did
    from
    tb2
    where 
    page='B'
)tb4
on tb3.did = tb4.did


如果只查表一次,不要查询两次再join:

hive> desc formatted tmp.test_0630_ord_det ;
OK
# col_name            	data_type           	comment

cate_id             	string              	分类id
sale_order_id       	string              	该订单id,一个订单id可含有多个商品,订单id会有重复
item_sku_id         	string              	商品sku
item_sku_name       	string              	商品名称
sale_price          	double              	本次购买该商品单价
sale_qtty           	bigint              	本次购买该商品数量
actual_payment      	double              	本次购买该商品实付金额
payment_mode        	string              	支付方式

查询购买过cate_id = 12,13商品的订单号,与上面的问题同义:

 

SQL2:

select
 count(sale_order_id)
 from (
      select 
      sale_order_id,
      cate_id,
      row_number() over(partition by sale_order_id order by cate_id ) as row_num 
      from tmp.test_0630_ord_det where dt = '2020-07-01' 
      and  cate_id in ( '12','13')
      group by sale_order_id,cate_id
  )t2 where row_num = 2 
 -- 2	
 
 select
 sale_order_id
 from (
      select 
      sale_order_id,
      cate_id,
      row_number() over(partition by sale_order_id order by cate_id ) as row_num 
      from tmp.test_0630_ord_det where dt = '2020-07-01' 
      and  cate_id in ( '12','13')
      group by sale_order_id,cate_id
  )t2 where row_num = 2 
  order by sale_order_id ; -- 结果5

select
 sale_order_id
 from (
      select 
      sale_order_id,
        cate_id,
      row_number() over(partition by sale_order_id ) as row_num  -- 这里不写order by cate_id也行
      from tmp.test_0630_ord_det where dt = '2020-07-01' 
      and  cate_id in ( '12','13')
      group by sale_order_id,cate_id
  )t2 where row_num = 2 
  order by sale_order_id ;
-- 1	1203233325	
-- 2	1203233331



select
 sale_order_id
 from (
      select 
      sale_order_id,
       -- cate_id,
      row_number() over(partition by sale_order_id ) as row_num 
      from tmp.test_0630_ord_det where dt = '2020-07-01' 
      and  cate_id in ( '12','13')
      group by sale_order_id -- ,cate_id
  )t2 where row_num = 2 
  order by sale_order_id ;
 -- 为空,cate_id不能省略
 

SQL3:

 select
 sale_order_id
 from (
      select 
      sale_order_id,
      sum (case when cate_id ='12' then 1 else 0 end) as cate_id12_cnt,
       sum (case when cate_id ='13' then 1 else 0 end ) as cate_id13_cnt
      from tmp.test_0630_ord_det where dt = '2020-07-01' 
      and  cate_id in ( '12','13')
      group by sale_order_id
  )t2 where cate_id12_cnt >0  and cate_id13_cnt > 0 
  order by sale_order_id ; -- 结果8

 

尝试:

相关推荐
©️2020 CSDN 皮肤主题: 大白 设计师:CSDN官方博客 返回首页