参考:
http:
//
blog.csdn.net/qiaochao911/article/details/8613988
http:
//xm-king.iteye.com/blog/1088422
HIVE分区,实际上是通过一个路径来标识的,而不是在物理数据中。比如每天的数据,可能分区是pt=20121023这样,那么路径中它就会变成:/hdfs/path/pt=
20121023
/
data_files。通过路径来标识的好处是,如果我们需要取特定分区的数据,只需要把这个路径下的数据取出来就可以了,不用扫描全部的数据。
HIVE默认是静态分区。但是有时候可能需要动态创建不同的分区,比如商品信息,我想根据它是否在线分成两个分区,这样后续如果要取在线商品,就只需要从在线的分区获取即可。动态分区可以通过下面的设置来打开:
[sql] view plaincopy
set hive.exec.dynamic.partition=true;
set hive.exec.dynamic.partition.mode=
nonstrict;
然后代码里就可以这么写:
[sql] view plaincopy
insert overwrite table tbl_name partition(
pt, if_online
)
select
field1, field2, ...,
pt, if_online
from
tbl
where
xxx;
注意输入字段的最后面必须是动态分区字段
。
看一下与静态分区写法的区别:
[sql] view plaincopy
insert overwrite table tbl_name partition(
pt
=20121023, if_online=1
)
select
field1, field2, ..., fieldn
from
tbl
where
xxx;
动
态分区与静态分区还有一个细微的差别是,静态分区一定会创建分区,不管SELECT语句的结果有没有数据。而动态分区,只有在SELECT结果的记录数
>
0的时候,才会创建分区
。因此在不同的业务场景下,可能会选择不同的方案。
另外使用动态分区时需要注意的比较重要的一点是,动态分区会为每一个分区分配reduce数。比如说你在脚本上面写了:
set
mapred.reduce.tasks=
100
;
并且有两个分区:pt, if_online。如果结果集中pt
=
20121023
,if_online=
0
/
1
,那么它就会为pt=
20121023
/if_online=
0
,pt=
20121023
/if_online=
1各分配100个reduce。也就是说,namenode会同时处理200个文件的写操作。这在分区值很多的情况下,会成为一个灾难,容易直接把namenode给搞挂掉,是非常危险的。因此使用动态分区时,一定要清楚地知道产生的动态分区值,并且合理地设置reduce数量。
hive可以通过partitions将表粗粒度划分为不同的目录来提高查询的效率,例如包含
时间戳的日志文件
,如果我们按照时间来把日志文件分在不同的目录下,那么相同日期的记录就会存储在同一个分区目录下面,那我们就可以更高效率地查询特定某个时间的记录了
hive (economy)
>
describe stocks;
OK
exchange
string
symbol
string
ymd
string
price_open
float
price_high
float
price_low
float
price_close
float
volumn
int
price_adj_close
float
Time taken:
0.108
seconds
hive (economy)
> alter table partition_stocks add
if
not exists
> partition(exchange=
'
NASDAQ
'
,symbol=
'
ABXA
'
);
OK
Time taken:
1.834
seconds
hive (economy)
>
show partitions partition_stocks;
OK
exchange
=NASDAQ/symbol=
ABXA
Time taken:
0.155
seconds
hive (economy)
> insert overwrite table partition_stocks partition(exchange=
'
NASDAQ
'
,
> symbol=
'
ABXA
'
)
>
select
s.ymd,s.price_open,s.price_high,s.price_low,s.price_close,s.volumn,
> s.price_adj_close
from
stocks s
>
where
s.exchange=
'
NASDAQ
'
and s.symbol=
'
ABXA
'
;
Total MapReduce jobs
=
3
Launching Job
1
out
of
3
Number of reduce tasks
is
set
to
0
since there
'
s no reduce operator
Starting Job = job_201303271617_0021, Tracking URL = http:
//
localhost:50030/jobdetails.jsp?jobid=job_201303271617_0021
Kill Command = /home/landen/UntarFile/hadoop-
1.0
.
4
/libexec/../bin/hadoop job -
kill job_201303271617_0021
Hadoop job information
for
Stage-
1
: number of mappers:
2
; number of reducers:
0
2013
-
03
-
31
20
:
50
:
01
,
631
Stage-
1
map =
0
%, reduce =
0
%
2013
-
03
-
31
20
:
50
:
10
,
656
Stage-
1
map =
16
%, reduce =
0
%, Cumulative CPU
8.34
sec
2013
-
03
-
31
20
:
50
:
11
,
659
Stage-
1
map =
16
%, reduce =
0
%, Cumulative CPU
8.34
sec
2013
-
03
-
31
20
:
50
:
12
,
662
Stage-
1
map =
16
%, reduce =
0
%, Cumulative CPU
8.34
sec
2013
-
03
-
31
20
:
50
:
13
,
665
Stage-
1
map =
20
%, reduce =
0
%, Cumulative CPU
8.34
sec
2013
-
03
-
31
20
:
50
:
14
,
681
Stage-
1
map =
20
%, reduce =
0
%, Cumulative CPU
8.34
sec
2013
-
03
-
31
20
:
50
:
15
,
684
Stage-
1
map =
20
%, reduce =
0
%, Cumulative CPU
8.34
sec
2013
-
03
-
31
20
:
50
:
16
,
687
Stage-
1
map =
52
%, reduce =
0
%, Cumulative CPU
8.34
sec
2013
-
03
-
31
20
:
50
:
17
,
720
Stage-
1
map =
52
%, reduce =
0
%, Cumulative CPU
8.34
sec
2013
-
03
-
31
20
:
50
:
18
,
723
Stage-
1
map =
52
%, reduce =
0
%, Cumulative CPU
8.34
sec
2013
-
03
-
31
20
:
50
:
19
,
787
Stage-
1
map =
57
%, reduce =
0
%, Cumulative CPU
8.34
sec
2013
-
03
-
31
20
:
50
:
20
,
791
Stage-
1
map =
57
%, reduce =
0
%, Cumulative CPU
8.34
sec
2013
-
03
-
31
20
:
50
:
21
,
797
Stage-
1
map =
57
%, reduce =
0
%, Cumulative CPU
8.34
sec
2013
-
03
-
31
20
:
50
:
22
,
800
Stage-
1
map =
84
%, reduce =
0
%, Cumulative CPU
8.34
sec
2013
-
03
-
31
20
:
50
:
23
,
803
Stage-
1
map =
84
%, reduce =
0
%, Cumulative CPU
8.34
sec
2013
-
03
-
31
20
:
50
:
24
,
807
Stage-
1
map =
84
%, reduce =
0
%, Cumulative CPU
8.34
sec
2013
-
03
-
31
20
:
50
:
25
,
837
Stage-
1
map =
100
%, reduce =
0
%, Cumulative CPU
21.0
sec
2013
-
03
-
31
20
:
50
:
26
,
841
Stage-
1
map =
100
%, reduce =
0
%, Cumulative CPU
21.0
sec
2013
-
03
-
31
20
:
50
:
27
,
844
Stage-
1
map =
100
%, reduce =
0
%, Cumulative CPU
21.0
sec
2013
-
03
-
31
20
:
50
:
28
,
848
Stage-
1
map =
100
%, reduce =
0
%, Cumulative CPU
21.0
sec
2013
-
03
-
31
20
:
50
:
29
,
852
Stage-
1
map =
100
%, reduce =
0
%, Cumulative CPU
21.0
sec
2013
-
03
-
31
20
:
50
:
30
,
884
Stage-
1
map =
100
%, reduce =
0
%, Cumulative CPU
21.0
sec
2013
-
03
-
31
20
:
50
:
31
,
888
Stage-
1
map =
100
%, reduce =
100
%, Cumulative CPU
21.0
sec
MapReduce Total cumulative CPU time:
21
seconds
0
msec
Ended Job
=
job_201303271617_0021
Ended Job
=
61034062
, job
is
filtered
out
(removed at runtime).
Ended Job
=
914476855
, job
is
filtered
out
(removed at runtime).
Launching Job
3
out
of
3
Number of reduce tasks
is
set
to
0
since there
'
s no reduce operator
Starting Job = job_201303271617_0022, Tracking URL = http:
//
localhost:50030/jobdetails.jsp?jobid=job_201303271617_0022
Kill Command = /home/landen/UntarFile/hadoop-
1.0
.
4
/libexec/../bin/hadoop job -
kill job_201303271617_0022
Hadoop job information
for
Stage-
3
: number of mappers:
1
; number of reducers:
0
2013
-
03
-
31
20
:
50
:
40
,
727
Stage-
3
map =
0
%, reduce =
0
%
2013
-
03
-
31
20
:
50
:
46
,
740
Stage-
3
map =
100
%, reduce =
0
%, Cumulative CPU
1.05
sec
2013
-
03
-
31
20
:
50
:
47
,
743
Stage-
3
map =
100
%, reduce =
0
%, Cumulative CPU
1.05
sec
2013
-
03
-
31
20
:
50
:
48
,
746
Stage-
3
map =
100
%, reduce =
0
%, Cumulative CPU
1.05
sec
2013
-
03
-
31
20
:
50
:
49
,
749
Stage-
3
map =
100
%, reduce =
0
%, Cumulative CPU
1.05
sec
2013
-
03
-
31
20
:
50
:
50
,
753
Stage-
3
map =
100
%, reduce =
0
%, Cumulative CPU
1.05
sec
2013
-
03
-
31
20
:
50
:
51
,
760
Stage-
3
map =
100
%, reduce =
0
%, Cumulative CPU
1.05
sec
2013
-
03
-
31
20
:
50
:
52
,
763
Stage-
3
map =
100
%, reduce =
100
%, Cumulative CPU
1.05
sec
MapReduce Total cumulative CPU time:
1
seconds
50
msec
Ended Job
=
job_201303271617_0022
Loading data to table economy.partition_stocks partition (exchange
=NASDAQ, symbol=
ABXA)
Deleted hdfs:
//
localhost:9000/home/landen/UntarFile/hive-0.10.0/user/hive/warehouse/economy.db/partition_stocks/exchange=NASDAQ/symbol=ABXA
Partition economy.partition_stocks{exchange=NASDAQ, symbol=ABXA} stats: [num_files:
1
, num_rows:
0
, total_size:
67679
, raw_data_size:
0
]
Table economy.partition_stocks stats: [num_partitions:
1
, num_files:
1
, num_rows:
0
, total_size:
67679
, raw_data_size:
0
]
1608
Rows loaded to partition_stocks
MapReduce Jobs Launched:
Job
0
: Map:
2
Cumulative CPU:
21.0
sec HDFS Read:
481098497
HDFS Write:
67679
SUCCESS
Job
1
: Map:
1
Cumulative CPU:
1.05
sec HDFS Read:
67989
HDFS Write:
67679
SUCCESS
Total MapReduce CPU Time Spent:
22
seconds
50
msec
OK
Time taken:
66.143
seconds
hive (economy)
>
select
*
from
partition_stocks
>
where
exchange =
'
NASDAQ
'
and symbol =
'
ABXA
'
;
2005
-
03
-
16
7.97
7.97
7.91
7.92
300200
7.92
NASDAQ ABXA
2005
-
03
-
15
7.92
7.97
7.92
7.97
42500
7.97
NASDAQ ABXA
2005
-
03
-
14
7.93
7.99
7.92
7.92
109700
7.92
NASDAQ ABXA
2005
-
03
-
11
7.96
7.96
7.91
7.94
188200
7.94
NASDAQ ABXA
2005
-
03
-
10
7.99
8.0
7.93
7.96
1275900
7.96
NASDAQ ABXA
2005
-
03
-
09
8.01
8.02
7.78
8.0
198600
8.0
NASDAQ ABXA
2005
-
03
-
08
8.1
8.1
8.0
8.01
81600
8.01
NASDAQ ABXA
2005
-
03
-
07
8.05
8.11
8.05
8.11
70300
8.11
NASDAQ ABXA
2005
-
03
-
04
8.19
8.19
8.05
8.12
116300
8.12
NASDAQ ABXA
2005
-
03
-
03
8.09
8.2
8.09
8.2
64200
8.2
NASDAQ ABXA
2005
-
03
-
02
8.29
8.45
8.18
8.2
197600
8.2
NASDAQ ABXA
2005
-
03
-
01
8.07
8.35
8.06
8.25
332000
8.25
NASDAQ ABXA
2005
-
02
-
28
8.0
8.09
7.81
8.05
241000
8.05
NASDAQ ABXA
2005
-
02
-
25
7.55
8.3
7.55
8.1
1114100
8.1
NASDAQ ABXA
2005
-
02
-
24
7.01
7.3
7.0
7.3
174100
7.3
NASDAQ ABXA
2005
-
02
-
23
7.22
7.22
6.9
7.06
542100
7.06
NASDAQ ABXA
2005
-
02
-
22
7.56
7.56
7.25
7.26
209000
7.26
NASDAQ ABXA
2005
-
02
-
18
7.7
7.8
7.58
7.58
1525500
7.58
NASDAQ ABXA
2005
-
02
-
17
7.41
7.7
7.41
7.65
75200
7.65
NASDAQ ABXA
2005
-
02
-
16
7.51
7.51
7.38
7.45
1297600
7.45
NASDAQ ABXA
2005
-
02
-
15
7.58
7.61
7.41
7.56
248400
7.56
NASDAQ ABXA
2005
-
02
-
14
7.78
7.81
7.53
7.6
1487200
7.6
NASDAQ ABXA
2005
-
02
-
11
7.92
7.92
7.73
7.88
265400
7.88
NASDAQ ABXA
2005
-
02
-
10
8.03
8.03
7.91
7.93
205100
7.93
NASDAQ ABXA
Now Let's learn about dynamic partitions properties:
1. hive.exec.dynamic.partition = false by default. Set to true to enable dynamic partitioning;
2. hive.exec.dynamic.partition.mode = strict by default. Set to nonstrict to enable all partitions to be determined dynamiclly;
3. hive.exec.max.dynamic.partition.pernode = 100 by default. The maximum number of dynamic partitions that can be created by mapper or reducer. Raises a fatal error if
one mapper or reducer attempts to create more than the threshold;
4. hive.exec.max.dynamic.partitions = +1000 by default. The total number of dynamic partitions that can be created by one statement with dynamic partitioning. Raises a fatal error
if the limit is exceeded;
5. hive.exec.max.created.files = 100000 by default. The maximum total number of files that can be created globally. A Hadoop counter is used to track the number of files created.
Raises a fatal error if the limit is exceeded.

