Hive string join int -> double / (string + int ) join int -> double

同Java语言一样,Hive也包括 隐式转换(implicit conversions)和显式转换(explicitly conversions)。
  Hive在需要的时候将会对numeric类型的数据进行隐式转换。比如我们对两个不同数据类型的数字进行比较,假如一个数据类型是INT型,另一个 是SMALLINT类型,那么SMALLINT类型的数据将会被隐式转换地转换为INT类型,这个到底和Java中的一样;但是我们不能隐式地将一个 INT类型的数据转换成SMALLINT或TINYINT类型的数据,这将会返回错误,除非你使用了CAST操作。

  任何整数类型都可以隐式地转换成一个范围更大的类型。TINYINT,SMALLINT,INT,BIGINT,FLOAT和STRING都可以隐式 地转换成DOUBLE;是的你没看出,STRING也可以隐式地转换成DOUBLE!但是你要记住,BOOLEAN类型不能转换为其他任何数据类型!

  下标列出了Hive内置的数据类型之间是否可以进行隐式的转换操作:(横着看)

 bltinyintsiintbigintfloatdoubledmstringvctsdateba
booleantruefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalse
tinyintfalsetruetruetruetruetruetruetruetruetruefalsefalsefalse
smallintfalsefalsetruetruetruetruetruetruetruetruefalsefalsefalse
intfalsefalsefalsetruetruetruetruetruetruetruefalsefalsefalse
bigintfalsefalsefalsefalsetruetruetruetruetruetruefalsefalsefalse
floatfalsefalsefalsefalsefalsetruetruetruetruetruefalsefalsefalse
doublefalsefalsefalsefalsefalsefalsetruetruetruetruefalsefalsefalse
decimalfalsefalsefalsefalsefalsefalsefalsetruetruetruefalsefalsefalse
stringfalsefalsefalsefalsefalsefalsetruetruetruetruefalsefalsefalse
varcharfalsefalsefalsefalsefalsefalsetruetruetruetruefalsefalsefalse
tsfalsefalsefalsefalsefalsefalsefalsefalsetruetruetruefalsefalse
datefalsefalsefalsefalsefalsefalsefalsefalsetruetruefalsetruefalse
binaryfalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsefalsetrue

  注:由于表格比较大,这里对一些比较长的字符串进行缩写,ts是timestamp的缩写,bl是boolean的缩写,sl是smallint的缩写,dm是decimal的缩写,vc是varchar的缩写,ba是binary的缩写。

  我们可以用CAST来显式的将一个类型的数据转换成另一个数据类型。如何使用?CAST的语法为cast(value AS TYPE)

https://blog.csdn.net/yimingsilence/article/details/70057638

 

非数字的string转为double后变为NULL,如果不对NULL做处理的话,NULL值会输入到一个Reduce Task中,容易导致数据倾斜!

1 string join int -> double,2个字段都会转化成double(非数字的string值会转化为NULL)

 > explain
    > select t1.c1,t2.c1
    > from (
    > select '1' as c1
    >
    > union all
    > select '2' as c1
    >
    > union all
    > select '34' as c1
    >
    > union all
    > select '3ab4' as c1
    >
    > union all
    > select 'ab' as c1
    >
    > union all
    > select '' as c1
    >
    > union all
    > select ' ' as c1
    >
    > union all
    > select null as c1
    > ) t1
    >
    > join
    >
    > (
    > select 1 as c1
    >
    > union all
    > select 2 as c1
    >
    > ) t2
    > on t1.c1 = t2.c1
    >
    > ;
OK
STAGE DEPENDENCIES:
  Stage-1 is a root stage
  Stage-0 depends on stages: Stage-1

STAGE PLANS:
  Stage: Stage-1
    Map Reduce
      Map Operator Tree:
          TableScan
            alias: _dummy_table
            Row Limit Per Split: 1
            Statistics: Num rows: 1 Data size: 1 Basic stats: COMPLETE Column stats: COMPLETE
            Filter Operator
              predicate: UDFToDouble('1') is not null (type: boolean)
              Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: COMPLETE
              Select Operator
                expressions: '1' (type: string)
                outputColumnNames: _col0
                Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: COMPLETE
                Union
                  Statistics: Num rows: 8 Data size: 184 Basic stats: COMPLETE Column stats: COMPLETE
                  Filter Operator
                    predicate: UDFToDouble(_col0) is not null (type: boolean)
                    Statistics: Num rows: 4 Data size: 736 Basic stats: COMPLETE Column stats: COMPLETE
                    Reduce Output Operator
                      key expressions: UDFToDouble(_col0) (type: double)
                      sort order: +
                      Map-reduce partition columns: UDFToDouble(_col0) (type: double)
                      Statistics: Num rows: 4 Data size: 736 Basic stats: COMPLETE Column stats: COMPLETE
                      value expressions: _col0 (type: string)
          TableScan
            alias: _dummy_table
            Row Limit Per Split: 1
            Statistics: Num rows: 1 Data size: 1 Basic stats: COMPLETE Column stats: COMPLETE
            Filter Operator
              predicate: UDFToDouble('2') is not null (type: boolean)
              Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: COMPLETE
              Select Operator
                expressions: '2' (type: string)
                outputColumnNames: _col0
                Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: COMPLETE
                Union
                  Statistics: Num rows: 8 Data size: 184 Basic stats: COMPLETE Column stats: COMPLETE
                  Filter Operator
                    predicate: UDFToDouble(_col0) is not null (type: boolean)
                    Statistics: Num rows: 4 Data size: 736 Basic stats: COMPLETE Column stats: COMPLETE
                    Reduce Output Operator
                      key expressions: UDFToDouble(_col0) (type: double)
                      sort order: +
                      Map-reduce partition columns: UDFToDouble(_col0) (type: double)
                      Statistics: Num rows: 4 Data size: 736 Basic stats: COMPLETE Column stats: COMPLETE
                      value expressions: _col0 (type: string)
          TableScan
            alias: _dummy_table
            Row Limit Per Split: 1
            Statistics: Num rows: 1 Data size: 1 Basic stats: COMPLETE Column stats: COMPLETE
            Filter Operator
              predicate: UDFToDouble('34') is not null (type: boolean)
              Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: COMPLETE
              Select Operator
                expressions: '34' (type: string)
                outputColumnNames: _col0
                Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: COMPLETE
                Union
                  Statistics: Num rows: 8 Data size: 184 Basic stats: COMPLETE Column stats: COMPLETE
                  Filter Operator
                    predicate: UDFToDouble(_col0) is not null (type: boolean)
                    Statistics: Num rows: 4 Data size: 736 Basic stats: COMPLETE Column stats: COMPLETE
                    Reduce Output Operator
                      key expressions: UDFToDouble(_col0) (type: double)
                      sort order: +
                      Map-reduce partition columns: UDFToDouble(_col0) (type: double)
                      Statistics: Num rows: 4 Data size: 736 Basic stats: COMPLETE Column stats: COMPLETE
                      value expressions: _col0 (type: string)
          TableScan
            alias: _dummy_table
            Row Limit Per Split: 1
            Statistics: Num rows: 1 Data size: 1 Basic stats: COMPLETE Column stats: COMPLETE
            Filter Operator
              predicate: UDFToDouble('3ab4') is not null (type: boolean)
              Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: COMPLETE
              Select Operator
                expressions: '3ab4' (type: string)
                outputColumnNames: _col0
                Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: COMPLETE
                Union
                  Statistics: Num rows: 8 Data size: 184 Basic stats: COMPLETE Column stats: COMPLETE
                  Filter Operator
                    predicate: UDFToDouble(_col0) is not null (type: boolean)
                    Statistics: Num rows: 4 Data size: 736 Basic stats: COMPLETE Column stats: COMPLETE
                    Reduce Output Operator
                      key expressions: UDFToDouble(_col0) (type: double)
                      sort order: +
                      Map-reduce partition columns: UDFToDouble(_col0) (type: double)
                      Statistics: Num rows: 4 Data size: 736 Basic stats: COMPLETE Column stats: COMPLETE
                      value expressions: _col0 (type: string)
          TableScan
            alias: _dummy_table
            Row Limit Per Split: 1
            Statistics: Num rows: 1 Data size: 1 Basic stats: COMPLETE Column stats: COMPLETE
            Filter Operator
              predicate: UDFToDouble('ab') is not null (type: boolean)
              Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: COMPLETE
              Select Operator
                expressions: 'ab' (type: string)
                outputColumnNames: _col0
                Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: COMPLETE
                Union
                  Statistics: Num rows: 8 Data size: 184 Basic stats: COMPLETE Column stats: COMPLETE
                  Filter Operator
                    predicate: UDFToDouble(_col0) is not null (type: boolean)
                    Statistics: Num rows: 4 Data size: 736 Basic stats: COMPLETE Column stats: COMPLETE
                    Reduce Output Operator
                      key expressions: UDFToDouble(_col0) (type: double)
                      sort order: +
                      Map-reduce partition columns: UDFToDouble(_col0) (type: double)
                      Statistics: Num rows: 4 Data size: 736 Basic stats: COMPLETE Column stats: COMPLETE
                      value expressions: _col0 (type: string)
          TableScan
            alias: _dummy_table
            Row Limit Per Split: 1
            Statistics: Num rows: 1 Data size: 1 Basic stats: COMPLETE Column stats: COMPLETE
            Filter Operator
              predicate: UDFToDouble('') is not null (type: boolean)
              Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: COMPLETE
              Select Operator
                expressions: '' (type: string)
                outputColumnNames: _col0
                Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: COMPLETE
                Union
                  Statistics: Num rows: 8 Data size: 184 Basic stats: COMPLETE Column stats: COMPLETE
                  Filter Operator
                    predicate: UDFToDouble(_col0) is not null (type: boolean)
                    Statistics: Num rows: 4 Data size: 736 Basic stats: COMPLETE Column stats: COMPLETE
                    Reduce Output Operator
                      key expressions: UDFToDouble(_col0) (type: double)
                      sort order: +
                      Map-reduce partition columns: UDFToDouble(_col0) (type: double)
                      Statistics: Num rows: 4 Data size: 736 Basic stats: COMPLETE Column stats: COMPLETE
                      value expressions: _col0 (type: string)
          TableScan
            alias: _dummy_table
            Row Limit Per Split: 1
            Statistics: Num rows: 1 Data size: 1 Basic stats: COMPLETE Column stats: COMPLETE
            Filter Operator
              predicate: UDFToDouble(' ') is not null (type: boolean)
              Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: COMPLETE
              Select Operator
                expressions: ' ' (type: string)
                outputColumnNames: _col0
                Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: COMPLETE
                Union
                  Statistics: Num rows: 8 Data size: 184 Basic stats: COMPLETE Column stats: COMPLETE
                  Filter Operator
                    predicate: UDFToDouble(_col0) is not null (type: boolean)
                    Statistics: Num rows: 4 Data size: 736 Basic stats: COMPLETE Column stats: COMPLETE
                    Reduce Output Operator
                      key expressions: UDFToDouble(_col0) (type: double)
                      sort order: +
                      Map-reduce partition columns: UDFToDouble(_col0) (type: double)
                      Statistics: Num rows: 4 Data size: 736 Basic stats: COMPLETE Column stats: COMPLETE
                      value expressions: _col0 (type: string)
          TableScan
            alias: _dummy_table
            Row Limit Per Split: 1
            Statistics: Num rows: 1 Data size: 1 Basic stats: COMPLETE Column stats: COMPLETE
            Select Operator
              expressions: UDFToString(null) (type: string)
              outputColumnNames: _col0
              Statistics: Num rows: 1 Data size: 184 Basic stats: COMPLETE Column stats: COMPLETE
              Union
                Statistics: Num rows: 8 Data size: 184 Basic stats: COMPLETE Column stats: COMPLETE
                Filter Operator
                  predicate: UDFToDouble(_col0) is not null (type: boolean)
                  Statistics: Num rows: 4 Data size: 736 Basic stats: COMPLETE Column stats: COMPLETE
                  Reduce Output Operator
                    key expressions: UDFToDouble(_col0) (type: double)
                    sort order: +
                    Map-reduce partition columns: UDFToDouble(_col0) (type: double)
                    Statistics: Num rows: 4 Data size: 736 Basic stats: COMPLETE Column stats: COMPLETE
                    value expressions: _col0 (type: string)
          TableScan
            alias: _dummy_table
            Row Limit Per Split: 1
            Statistics: Num rows: 1 Data size: 1 Basic stats: COMPLETE Column stats: COMPLETE
            Filter Operator
              predicate: UDFToDouble(1) is not null (type: boolean)
              Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: COMPLETE
              Select Operator
                expressions: 1 (type: int)
                outputColumnNames: _col0
                Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: COMPLETE
                Union
                  Statistics: Num rows: 2 Data size: 0 Basic stats: PARTIAL Column stats: COMPLETE
                  Filter Operator
                    predicate: UDFToDouble(_col0) is not null (type: boolean)
                    Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: COMPLETE
                    Reduce Output Operator
                      key expressions: UDFToDouble(_col0) (type: double)
                      sort order: +
                      Map-reduce partition columns: UDFToDouble(_col0) (type: double)
                      Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: COMPLETE
                      value expressions: _col0 (type: int)
          TableScan
            alias: _dummy_table
            Row Limit Per Split: 1
            Statistics: Num rows: 1 Data size: 1 Basic stats: COMPLETE Column stats: COMPLETE
            Filter Operator
              predicate: UDFToDouble(2) is not null (type: boolean)
              Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: COMPLETE
              Select Operator
                expressions: 2 (type: int)
                outputColumnNames: _col0
                Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: COMPLETE
                Union
                  Statistics: Num rows: 2 Data size: 0 Basic stats: PARTIAL Column stats: COMPLETE
                  Filter Operator
                    predicate: UDFToDouble(_col0) is not null (type: boolean)
                    Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: COMPLETE
                    Reduce Output Operator
                      key expressions: UDFToDouble(_col0) (type: double)
                      sort order: +
                      Map-reduce partition columns: UDFToDouble(_col0) (type: double)
                      Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: COMPLETE
                      value expressions: _col0 (type: int)
      Reduce Operator Tree:
        Join Operator
          condition map:
               Inner Join 0 to 1
          keys:
            0 UDFToDouble(_col0) (type: double)
            1 UDFToDouble(_col0) (type: double)
          outputColumnNames: _col0, _col1
          Statistics: Num rows: 4 Data size: 809 Basic stats: COMPLETE Column stats: NONE
          File Output Operator
            compressed: false
            Statistics: Num rows: 4 Data size: 809 Basic stats: COMPLETE Column stats: NONE
            table:
                input format: org.apache.hadoop.mapred.TextInputFormat
                output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
                serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe

  Stage: Stage-0
    Fetch Operator
      limit: -1
      Processor Tree:
        ListSink

Time taken: 0.148 seconds, Fetched: 248 row(s)
hive>

    Reduce Operator Tree:
        Join Operator
          condition map:
               Inner Join 0 to 1
          keys:
            0 UDFToDouble(_col0) (type: double)
            1 UDFToDouble(_col0) (type: double)

 

 

2 (string + int ) join int -> double ,2个字段都会转化成double(非数字的string值会转化为NULL)

  > explain
    > select t1.c1,t2.c1
    > from (
    > select 1 as c1
    >
    > union all
    > select 2 as c1
    >
    > union all
    > select '34' as c1
    >
    > union all
    > select '3ab4' as c1
    >
    > union all
    > select 'ab' as c1
    >
    > union all
    > select '' as c1
    >
    > union all
    > select ' ' as c1
    >
    > union all
    > select null as c1
    > ) t1
    >
    > join
    >
    > (
    > select 1 as c1
    >
    > union all
    > select 2 as c1
    >
    > ) t2
    > on t1.c1 = t2.c1
    >
    >
    > ;
OK
STAGE DEPENDENCIES:
  Stage-1 is a root stage
  Stage-0 depends on stages: Stage-1

STAGE PLANS:
  Stage: Stage-1
    Map Reduce
      Map Operator Tree:
          TableScan
            alias: _dummy_table
            Row Limit Per Split: 1
            Statistics: Num rows: 1 Data size: 1 Basic stats: COMPLETE Column stats: COMPLETE
            Filter Operator
              predicate: UDFToDouble(1) is not null (type: boolean)
              Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: COMPLETE
              Select Operator
                expressions: 1 (type: int)
                outputColumnNames: _col0
                Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: COMPLETE
                Union
                  Statistics: Num rows: 8 Data size: 184 Basic stats: COMPLETE Column stats: COMPLETE
                  Filter Operator
                    predicate: UDFToDouble(_col0) is not null (type: boolean)
                    Statistics: Num rows: 4 Data size: 736 Basic stats: COMPLETE Column stats: COMPLETE
                    Reduce Output Operator
                      key expressions: UDFToDouble(_col0) (type: double)
                      sort order: +
                      Map-reduce partition columns: UDFToDouble(_col0) (type: double)
                      Statistics: Num rows: 4 Data size: 736 Basic stats: COMPLETE Column stats: COMPLETE
                      value expressions: _col0 (type: string)
          TableScan
            alias: _dummy_table
            Row Limit Per Split: 1
            Statistics: Num rows: 1 Data size: 1 Basic stats: COMPLETE Column stats: COMPLETE
            Filter Operator
              predicate: UDFToDouble(2) is not null (type: boolean)
              Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: COMPLETE
              Select Operator
                expressions: 2 (type: int)
                outputColumnNames: _col0
                Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: COMPLETE
                Union
                  Statistics: Num rows: 8 Data size: 184 Basic stats: COMPLETE Column stats: COMPLETE
                  Filter Operator
                    predicate: UDFToDouble(_col0) is not null (type: boolean)
                    Statistics: Num rows: 4 Data size: 736 Basic stats: COMPLETE Column stats: COMPLETE
                    Reduce Output Operator
                      key expressions: UDFToDouble(_col0) (type: double)
                      sort order: +
                      Map-reduce partition columns: UDFToDouble(_col0) (type: double)
                      Statistics: Num rows: 4 Data size: 736 Basic stats: COMPLETE Column stats: COMPLETE
                      value expressions: _col0 (type: string)
          TableScan
            alias: _dummy_table
            Row Limit Per Split: 1
            Statistics: Num rows: 1 Data size: 1 Basic stats: COMPLETE Column stats: COMPLETE
            Filter Operator
              predicate: UDFToDouble('34') is not null (type: boolean)
              Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: COMPLETE
              Select Operator
                expressions: '34' (type: string)
                outputColumnNames: _col0
                Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: COMPLETE
                Union
                  Statistics: Num rows: 8 Data size: 184 Basic stats: COMPLETE Column stats: COMPLETE
                  Filter Operator
                    predicate: UDFToDouble(_col0) is not null (type: boolean)
                    Statistics: Num rows: 4 Data size: 736 Basic stats: COMPLETE Column stats: COMPLETE
                    Reduce Output Operator
                      key expressions: UDFToDouble(_col0) (type: double)
                      sort order: +
                      Map-reduce partition columns: UDFToDouble(_col0) (type: double)
                      Statistics: Num rows: 4 Data size: 736 Basic stats: COMPLETE Column stats: COMPLETE
                      value expressions: _col0 (type: string)
          TableScan
            alias: _dummy_table
            Row Limit Per Split: 1
            Statistics: Num rows: 1 Data size: 1 Basic stats: COMPLETE Column stats: COMPLETE
            Filter Operator
              predicate: UDFToDouble('3ab4') is not null (type: boolean)
              Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: COMPLETE
              Select Operator
                expressions: '3ab4' (type: string)
                outputColumnNames: _col0
                Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: COMPLETE
                Union
                  Statistics: Num rows: 8 Data size: 184 Basic stats: COMPLETE Column stats: COMPLETE
                  Filter Operator
                    predicate: UDFToDouble(_col0) is not null (type: boolean)
                    Statistics: Num rows: 4 Data size: 736 Basic stats: COMPLETE Column stats: COMPLETE
                    Reduce Output Operator
                      key expressions: UDFToDouble(_col0) (type: double)
                      sort order: +
                      Map-reduce partition columns: UDFToDouble(_col0) (type: double)
                      Statistics: Num rows: 4 Data size: 736 Basic stats: COMPLETE Column stats: COMPLETE
                      value expressions: _col0 (type: string)
          TableScan
            alias: _dummy_table
            Row Limit Per Split: 1
            Statistics: Num rows: 1 Data size: 1 Basic stats: COMPLETE Column stats: COMPLETE
            Filter Operator
              predicate: UDFToDouble('ab') is not null (type: boolean)
              Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: COMPLETE
              Select Operator
                expressions: 'ab' (type: string)
                outputColumnNames: _col0
                Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: COMPLETE
                Union
                  Statistics: Num rows: 8 Data size: 184 Basic stats: COMPLETE Column stats: COMPLETE
                  Filter Operator
                    predicate: UDFToDouble(_col0) is not null (type: boolean)
                    Statistics: Num rows: 4 Data size: 736 Basic stats: COMPLETE Column stats: COMPLETE
                    Reduce Output Operator
                      key expressions: UDFToDouble(_col0) (type: double)
                      sort order: +
                      Map-reduce partition columns: UDFToDouble(_col0) (type: double)
                      Statistics: Num rows: 4 Data size: 736 Basic stats: COMPLETE Column stats: COMPLETE
                      value expressions: _col0 (type: string)
          TableScan
            alias: _dummy_table
            Row Limit Per Split: 1
            Statistics: Num rows: 1 Data size: 1 Basic stats: COMPLETE Column stats: COMPLETE
            Filter Operator
              predicate: UDFToDouble('') is not null (type: boolean)
              Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: COMPLETE
              Select Operator
                expressions: '' (type: string)
                outputColumnNames: _col0
                Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: COMPLETE
                Union
                  Statistics: Num rows: 8 Data size: 184 Basic stats: COMPLETE Column stats: COMPLETE
                  Filter Operator
                    predicate: UDFToDouble(_col0) is not null (type: boolean)
                    Statistics: Num rows: 4 Data size: 736 Basic stats: COMPLETE Column stats: COMPLETE
                    Reduce Output Operator
                      key expressions: UDFToDouble(_col0) (type: double)
                      sort order: +
                      Map-reduce partition columns: UDFToDouble(_col0) (type: double)
                      Statistics: Num rows: 4 Data size: 736 Basic stats: COMPLETE Column stats: COMPLETE
                      value expressions: _col0 (type: string)
          TableScan
            alias: _dummy_table
            Row Limit Per Split: 1
            Statistics: Num rows: 1 Data size: 1 Basic stats: COMPLETE Column stats: COMPLETE
            Filter Operator
              predicate: UDFToDouble(' ') is not null (type: boolean)
              Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: COMPLETE
              Select Operator
                expressions: ' ' (type: string)
                outputColumnNames: _col0
                Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: COMPLETE
                Union
                  Statistics: Num rows: 8 Data size: 184 Basic stats: COMPLETE Column stats: COMPLETE
                  Filter Operator
                    predicate: UDFToDouble(_col0) is not null (type: boolean)
                    Statistics: Num rows: 4 Data size: 736 Basic stats: COMPLETE Column stats: COMPLETE
                    Reduce Output Operator
                      key expressions: UDFToDouble(_col0) (type: double)
                      sort order: +
                      Map-reduce partition columns: UDFToDouble(_col0) (type: double)
                      Statistics: Num rows: 4 Data size: 736 Basic stats: COMPLETE Column stats: COMPLETE
                      value expressions: _col0 (type: string)
          TableScan
            alias: _dummy_table
            Row Limit Per Split: 1
            Statistics: Num rows: 1 Data size: 1 Basic stats: COMPLETE Column stats: COMPLETE
            Select Operator
              expressions: UDFToString(null) (type: string)
              outputColumnNames: _col0
              Statistics: Num rows: 1 Data size: 184 Basic stats: COMPLETE Column stats: COMPLETE
              Union
                Statistics: Num rows: 8 Data size: 184 Basic stats: COMPLETE Column stats: COMPLETE
                Filter Operator
                  predicate: UDFToDouble(_col0) is not null (type: boolean)
                  Statistics: Num rows: 4 Data size: 736 Basic stats: COMPLETE Column stats: COMPLETE
                  Reduce Output Operator
                    key expressions: UDFToDouble(_col0) (type: double)
                    sort order: +
                    Map-reduce partition columns: UDFToDouble(_col0) (type: double)
                    Statistics: Num rows: 4 Data size: 736 Basic stats: COMPLETE Column stats: COMPLETE
                    value expressions: _col0 (type: string)
          TableScan
            alias: _dummy_table
            Row Limit Per Split: 1
            Statistics: Num rows: 1 Data size: 1 Basic stats: COMPLETE Column stats: COMPLETE
            Filter Operator
              predicate: UDFToDouble(1) is not null (type: boolean)
              Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: COMPLETE
              Select Operator
                expressions: 1 (type: int)
                outputColumnNames: _col0
                Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: COMPLETE
                Union
                  Statistics: Num rows: 2 Data size: 0 Basic stats: PARTIAL Column stats: COMPLETE
                  Filter Operator
                    predicate: UDFToDouble(_col0) is not null (type: boolean)
                    Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: COMPLETE
                    Reduce Output Operator
                      key expressions: UDFToDouble(_col0) (type: double)
                      sort order: +
                      Map-reduce partition columns: UDFToDouble(_col0) (type: double)
                      Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: COMPLETE
                      value expressions: _col0 (type: int)
          TableScan
            alias: _dummy_table
            Row Limit Per Split: 1
            Statistics: Num rows: 1 Data size: 1 Basic stats: COMPLETE Column stats: COMPLETE
            Filter Operator
              predicate: UDFToDouble(2) is not null (type: boolean)
              Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: COMPLETE
              Select Operator
                expressions: 2 (type: int)
                outputColumnNames: _col0
                Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: COMPLETE
                Union
                  Statistics: Num rows: 2 Data size: 0 Basic stats: PARTIAL Column stats: COMPLETE
                  Filter Operator
                    predicate: UDFToDouble(_col0) is not null (type: boolean)
                    Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: COMPLETE
                    Reduce Output Operator
                      key expressions: UDFToDouble(_col0) (type: double)
                      sort order: +
                      Map-reduce partition columns: UDFToDouble(_col0) (type: double)
                      Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: COMPLETE
                      value expressions: _col0 (type: int)
      Reduce Operator Tree:
        Join Operator
          condition map:
               Inner Join 0 to 1
          keys:
            0 UDFToDouble(_col0) (type: double)
            1 UDFToDouble(_col0) (type: double)
          outputColumnNames: _col0, _col1
          Statistics: Num rows: 4 Data size: 809 Basic stats: COMPLETE Column stats: NONE
          File Output Operator
            compressed: false
            Statistics: Num rows: 4 Data size: 809 Basic stats: COMPLETE Column stats: NONE
            table:
                input format: org.apache.hadoop.mapred.TextInputFormat
                output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
                serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe

  Stage: Stage-0
    Fetch Operator
      limit: -1
      Processor Tree:
        ListSink

Time taken: 0.763 seconds, Fetched: 248 row(s)
hive>

 

3 string join string -> string

hive> explain
    > select t1.c1,t2.c1
    > from (
    > select '1' as c1
    >
    > union all
    > select '2' as c1
    >
    > ) t1
    >
    > join
    >
    > (
    > select '1' as c1
    >
    > union all
    > select '2' as c1
    >
    > ) t2
    > on t1.c1 = t2.c1
    >
    > ;
OK
STAGE DEPENDENCIES:
  Stage-1 is a root stage
  Stage-0 depends on stages: Stage-1

STAGE PLANS:
  Stage: Stage-1
    Map Reduce
      Map Operator Tree:
          TableScan
            alias: _dummy_table
            Row Limit Per Split: 1
            Statistics: Num rows: 1 Data size: 1 Basic stats: COMPLETE Column stats: COMPLETE
            Filter Operator
              predicate: '1' is not null (type: boolean)
              Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: COMPLETE
              Select Operator
                expressions: '1' (type: string)
                outputColumnNames: _col0
                Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: COMPLETE
                Union
                  Statistics: Num rows: 2 Data size: 0 Basic stats: PARTIAL Column stats: COMPLETE
                  Filter Operator
                    predicate: _col0 is not null (type: boolean)
                    Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: COMPLETE
                    Reduce Output Operator
                      key expressions: _col0 (type: string)
                      sort order: +
                      Map-reduce partition columns: _col0 (type: string)
                      Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: COMPLETE
          TableScan
            alias: _dummy_table
            Row Limit Per Split: 1
            Statistics: Num rows: 1 Data size: 1 Basic stats: COMPLETE Column stats: COMPLETE
            Filter Operator
              predicate: '2' is not null (type: boolean)
              Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: COMPLETE
              Select Operator
                expressions: '2' (type: string)
                outputColumnNames: _col0
                Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: COMPLETE
                Union
                  Statistics: Num rows: 2 Data size: 0 Basic stats: PARTIAL Column stats: COMPLETE
                  Filter Operator
                    predicate: _col0 is not null (type: boolean)
                    Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: COMPLETE
                    Reduce Output Operator
                      key expressions: _col0 (type: string)
                      sort order: +
                      Map-reduce partition columns: _col0 (type: string)
                      Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: COMPLETE
          TableScan
            alias: _dummy_table
            Row Limit Per Split: 1
            Statistics: Num rows: 1 Data size: 1 Basic stats: COMPLETE Column stats: COMPLETE
            Filter Operator
              predicate: '1' is not null (type: boolean)
              Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: COMPLETE
              Select Operator
                expressions: '1' (type: string)
                outputColumnNames: _col0
                Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: COMPLETE
                Union
                  Statistics: Num rows: 2 Data size: 0 Basic stats: PARTIAL Column stats: COMPLETE
                  Filter Operator
                    predicate: _col0 is not null (type: boolean)
                    Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: COMPLETE
                    Reduce Output Operator
                      key expressions: _col0 (type: string)
                      sort order: +
                      Map-reduce partition columns: _col0 (type: string)
                      Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: COMPLETE
          TableScan
            alias: _dummy_table
            Row Limit Per Split: 1
            Statistics: Num rows: 1 Data size: 1 Basic stats: COMPLETE Column stats: COMPLETE
            Filter Operator
              predicate: '2' is not null (type: boolean)
              Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: COMPLETE
              Select Operator
                expressions: '2' (type: string)
                outputColumnNames: _col0
                Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: COMPLETE
                Union
                  Statistics: Num rows: 2 Data size: 0 Basic stats: PARTIAL Column stats: COMPLETE
                  Filter Operator
                    predicate: _col0 is not null (type: boolean)
                    Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: COMPLETE
                    Reduce Output Operator
                      key expressions: _col0 (type: string)
                      sort order: +
                      Map-reduce partition columns: _col0 (type: string)
                      Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: COMPLETE
      Reduce Operator Tree:
        Join Operator
          condition map:
               Inner Join 0 to 1
          keys:
            0 _col0 (type: string)
            1 _col0 (type: string)
          outputColumnNames: _col0, _col1
          Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
          File Output Operator
            compressed: false
            Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
            table:
                input format: org.apache.hadoop.mapred.TextInputFormat
                output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
                serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe

  Stage: Stage-0
    Fetch Operator
      limit: -1
      Processor Tree:
        ListSink

Time taken: 0.083 seconds, Fetched: 115 row(s)
hive>

 

4 int join int -> int

> explain
    > select t1.c1,t2.c1
    > from (
    > select 1 as c1
    >
    > union all
    > select 2 as c1
    >
    > ) t1
    >
    > join
    >
    > (
    > select 1 as c1
    >
    > union all
    > select 2 as c1
    >
    > ) t2
    > on t1.c1 = t2.c1
    > ;
OK
STAGE DEPENDENCIES:
  Stage-1 is a root stage
  Stage-0 depends on stages: Stage-1

STAGE PLANS:
  Stage: Stage-1
    Map Reduce
      Map Operator Tree:
          TableScan
            alias: _dummy_table
            Row Limit Per Split: 1
            Statistics: Num rows: 1 Data size: 1 Basic stats: COMPLETE Column stats: COMPLETE
            Filter Operator
              predicate: 1 is not null (type: boolean)
              Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: COMPLETE
              Select Operator
                expressions: 1 (type: int)
                outputColumnNames: _col0
                Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: COMPLETE
                Union
                  Statistics: Num rows: 2 Data size: 0 Basic stats: PARTIAL Column stats: COMPLETE
                  Filter Operator
                    predicate: _col0 is not null (type: boolean)
                    Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: COMPLETE
                    Reduce Output Operator
                      key expressions: _col0 (type: int)
                      sort order: +
                      Map-reduce partition columns: _col0 (type: int)
                      Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: COMPLETE
          TableScan
            alias: _dummy_table
            Row Limit Per Split: 1
            Statistics: Num rows: 1 Data size: 1 Basic stats: COMPLETE Column stats: COMPLETE
            Filter Operator
              predicate: 2 is not null (type: boolean)
              Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: COMPLETE
              Select Operator
                expressions: 2 (type: int)
                outputColumnNames: _col0
                Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: COMPLETE
                Union
                  Statistics: Num rows: 2 Data size: 0 Basic stats: PARTIAL Column stats: COMPLETE
                  Filter Operator
                    predicate: _col0 is not null (type: boolean)
                    Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: COMPLETE
                    Reduce Output Operator
                      key expressions: _col0 (type: int)
                      sort order: +
                      Map-reduce partition columns: _col0 (type: int)
                      Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: COMPLETE
          TableScan
            alias: _dummy_table
            Row Limit Per Split: 1
            Statistics: Num rows: 1 Data size: 1 Basic stats: COMPLETE Column stats: COMPLETE
            Filter Operator
              predicate: 1 is not null (type: boolean)
              Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: COMPLETE
              Select Operator
                expressions: 1 (type: int)
                outputColumnNames: _col0
                Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: COMPLETE
                Union
                  Statistics: Num rows: 2 Data size: 0 Basic stats: PARTIAL Column stats: COMPLETE
                  Filter Operator
                    predicate: _col0 is not null (type: boolean)
                    Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: COMPLETE
                    Reduce Output Operator
                      key expressions: _col0 (type: int)
                      sort order: +
                      Map-reduce partition columns: _col0 (type: int)
                      Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: COMPLETE
          TableScan
            alias: _dummy_table
            Row Limit Per Split: 1
            Statistics: Num rows: 1 Data size: 1 Basic stats: COMPLETE Column stats: COMPLETE
            Filter Operator
              predicate: 2 is not null (type: boolean)
              Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: COMPLETE
              Select Operator
                expressions: 2 (type: int)
                outputColumnNames: _col0
                Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: COMPLETE
                Union
                  Statistics: Num rows: 2 Data size: 0 Basic stats: PARTIAL Column stats: COMPLETE
                  Filter Operator
                    predicate: _col0 is not null (type: boolean)
                    Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: COMPLETE
                    Reduce Output Operator
                      key expressions: _col0 (type: int)
                      sort order: +
                      Map-reduce partition columns: _col0 (type: int)
                      Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: COMPLETE
      Reduce Operator Tree:
        Join Operator
          condition map:
               Inner Join 0 to 1
          keys:
            0 _col0 (type: int)
            1 _col0 (type: int)
          outputColumnNames: _col0, _col1
          Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
          File Output Operator
            compressed: false
            Statistics: Num rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
            table:
                input format: org.apache.hadoop.mapred.TextInputFormat
                output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
                serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe

  Stage: Stage-0
    Fetch Operator
      limit: -1
      Processor Tree:
        ListSink

Time taken: 0.156 seconds, Fetched: 115 row(s)
hive>

以上Hive2.0.0

相关推荐
©️2020 CSDN 皮肤主题: 大白 设计师:CSDN官方博客 返回首页