参考:
http: // blog.csdn.net/qiaochao911/article/details/8613988
http: //xm-king.iteye.com/blog/1088422
HIVE分区,实际上是通过一个路径来标识的,而不是在物理数据中。比如每天的数据,可能分区是pt=20121023这样,那么路径中它就会变成:/hdfs/path/pt= 20121023 / data_files。通过路径来标识的好处是,如果我们需要取特定分区的数据,只需要把这个路径下的数据取出来就可以了,不用扫描全部的数据。 HIVE默认是静态分区。但是有时候可能需要动态创建不同的分区,比如商品信息,我想根据它是否在线分成两个分区,这样后续如果要取在线商品,就只需要从在线的分区获取即可。动态分区可以通过下面的设置来打开: [sql] view plaincopy set hive.exec.dynamic.partition=true; set hive.exec.dynamic.partition.mode= nonstrict; 然后代码里就可以这么写: [sql] view plaincopy insert overwrite table tbl_name partition( pt, if_online ) select field1, field2, ..., pt, if_online from tbl where xxx; 注意输入字段的最后面必须是动态分区字段 。 看一下与静态分区写法的区别: [sql] view plaincopy insert overwrite table tbl_name partition( pt =20121023, if_online=1 ) select field1, field2, ..., fieldn from tbl where xxx; 动 态分区与静态分区还有一个细微的差别是,静态分区一定会创建分区,不管SELECT语句的结果有没有数据。而动态分区,只有在SELECT结果的记录数 > 0的时候,才会创建分区 。因此在不同的业务场景下,可能会选择不同的方案。 另外使用动态分区时需要注意的比较重要的一点是,动态分区会为每一个分区分配reduce数。比如说你在脚本上面写了: set mapred.reduce.tasks= 100 ; 并且有两个分区:pt, if_online。如果结果集中pt = 20121023 ,if_online= 0 / 1 ,那么它就会为pt= 20121023 /if_online= 0 ,pt= 20121023 /if_online= 1各分配100个reduce。也就是说,namenode会同时处理200个文件的写操作。这在分区值很多的情况下,会成为一个灾难,容易直接把namenode给搞挂掉,是非常危险的。因此使用动态分区时,一定要清楚地知道产生的动态分区值,并且合理地设置reduce数量。 hive可以通过partitions将表粗粒度划分为不同的目录来提高查询的效率,例如包含 时间戳的日志文件 ,如果我们按照时间来把日志文件分在不同的目录下,那么相同日期的记录就会存储在同一个分区目录下面,那我们就可以更高效率地查询特定某个时间的记录了 hive (economy) > describe stocks; OK exchange string symbol string ymd string price_open float price_high float price_low float price_close float volumn int price_adj_close float Time taken: 0.108 seconds hive (economy) > alter table partition_stocks add if not exists > partition(exchange= ' NASDAQ ' ,symbol= ' ABXA ' ); OK Time taken: 1.834 seconds hive (economy) > show partitions partition_stocks; OK exchange =NASDAQ/symbol= ABXA Time taken: 0.155 seconds hive (economy) > insert overwrite table partition_stocks partition(exchange= ' NASDAQ ' , > symbol= ' ABXA ' ) > select s.ymd,s.price_open,s.price_high,s.price_low,s.price_close,s.volumn, > s.price_adj_close from stocks s > where s.exchange= ' NASDAQ ' and s.symbol= ' ABXA ' ; Total MapReduce jobs = 3 Launching Job 1 out of 3 Number of reduce tasks is set to 0 since there ' s no reduce operator Starting Job = job_201303271617_0021, Tracking URL = http: // localhost:50030/jobdetails.jsp?jobid=job_201303271617_0021 Kill Command = /home/landen/UntarFile/hadoop- 1.0 . 4 /libexec/../bin/hadoop job - kill job_201303271617_0021 Hadoop job information for Stage- 1 : number of mappers: 2 ; number of reducers: 0 2013 - 03 - 31 20 : 50 : 01 , 631 Stage- 1 map = 0 %, reduce = 0 % 2013 - 03 - 31 20 : 50 : 10 , 656 Stage- 1 map = 16 %, reduce = 0 %, Cumulative CPU 8.34 sec 2013 - 03 - 31 20 : 50 : 11 , 659 Stage- 1 map = 16 %, reduce = 0 %, Cumulative CPU 8.34 sec 2013 - 03 - 31 20 : 50 : 12 , 662 Stage- 1 map = 16 %, reduce = 0 %, Cumulative CPU 8.34 sec 2013 - 03 - 31 20 : 50 : 13 , 665 Stage- 1 map = 20 %, reduce = 0 %, Cumulative CPU 8.34 sec 2013 - 03 - 31 20 : 50 : 14 , 681 Stage- 1 map = 20 %, reduce = 0 %, Cumulative CPU 8.34 sec 2013 - 03 - 31 20 : 50 : 15 , 684 Stage- 1 map = 20 %, reduce = 0 %, Cumulative CPU 8.34 sec 2013 - 03 - 31 20 : 50 : 16 , 687 Stage- 1 map = 52 %, reduce = 0 %, Cumulative CPU 8.34 sec 2013 - 03 - 31 20 : 50 : 17 , 720 Stage- 1 map = 52 %, reduce = 0 %, Cumulative CPU 8.34 sec 2013 - 03 - 31 20 : 50 : 18 , 723 Stage- 1 map = 52 %, reduce = 0 %, Cumulative CPU 8.34 sec 2013 - 03 - 31 20 : 50 : 19 , 787 Stage- 1 map = 57 %, reduce = 0 %, Cumulative CPU 8.34 sec 2013 - 03 - 31 20 : 50 : 20 , 791 Stage- 1 map = 57 %, reduce = 0 %, Cumulative CPU 8.34 sec 2013 - 03 - 31 20 : 50 : 21 , 797 Stage- 1 map = 57 %, reduce = 0 %, Cumulative CPU 8.34 sec 2013 - 03 - 31 20 : 50 : 22 , 800 Stage- 1 map = 84 %, reduce = 0 %, Cumulative CPU 8.34 sec 2013 - 03 - 31 20 : 50 : 23 , 803 Stage- 1 map = 84 %, reduce = 0 %, Cumulative CPU 8.34 sec 2013 - 03 - 31 20 : 50 : 24 , 807 Stage- 1 map = 84 %, reduce = 0 %, Cumulative CPU 8.34 sec 2013 - 03 - 31 20 : 50 : 25 , 837 Stage- 1 map = 100 %, reduce = 0 %, Cumulative CPU 21.0 sec 2013 - 03 - 31 20 : 50 : 26 , 841 Stage- 1 map = 100 %, reduce = 0 %, Cumulative CPU 21.0 sec 2013 - 03 - 31 20 : 50 : 27 , 844 Stage- 1 map = 100 %, reduce = 0 %, Cumulative CPU 21.0 sec 2013 - 03 - 31 20 : 50 : 28 , 848 Stage- 1 map = 100 %, reduce = 0 %, Cumulative CPU 21.0 sec 2013 - 03 - 31 20 : 50 : 29 , 852 Stage- 1 map = 100 %, reduce = 0 %, Cumulative CPU 21.0 sec 2013 - 03 - 31 20 : 50 : 30 , 884 Stage- 1 map = 100 %, reduce = 0 %, Cumulative CPU 21.0 sec 2013 - 03 - 31 20 : 50 : 31 , 888 Stage- 1 map = 100 %, reduce = 100 %, Cumulative CPU 21.0 sec MapReduce Total cumulative CPU time: 21 seconds 0 msec Ended Job = job_201303271617_0021 Ended Job = 61034062 , job is filtered out (removed at runtime). Ended Job = 914476855 , job is filtered out (removed at runtime). Launching Job 3 out of 3 Number of reduce tasks is set to 0 since there ' s no reduce operator Starting Job = job_201303271617_0022, Tracking URL = http: // localhost:50030/jobdetails.jsp?jobid=job_201303271617_0022 Kill Command = /home/landen/UntarFile/hadoop- 1.0 . 4 /libexec/../bin/hadoop job - kill job_201303271617_0022 Hadoop job information for Stage- 3 : number of mappers: 1 ; number of reducers: 0 2013 - 03 - 31 20 : 50 : 40 , 727 Stage- 3 map = 0 %, reduce = 0 % 2013 - 03 - 31 20 : 50 : 46 , 740 Stage- 3 map = 100 %, reduce = 0 %, Cumulative CPU 1.05 sec 2013 - 03 - 31 20 : 50 : 47 , 743 Stage- 3 map = 100 %, reduce = 0 %, Cumulative CPU 1.05 sec 2013 - 03 - 31 20 : 50 : 48 , 746 Stage- 3 map = 100 %, reduce = 0 %, Cumulative CPU 1.05 sec 2013 - 03 - 31 20 : 50 : 49 , 749 Stage- 3 map = 100 %, reduce = 0 %, Cumulative CPU 1.05 sec 2013 - 03 - 31 20 : 50 : 50 , 753 Stage- 3 map = 100 %, reduce = 0 %, Cumulative CPU 1.05 sec 2013 - 03 - 31 20 : 50 : 51 , 760 Stage- 3 map = 100 %, reduce = 0 %, Cumulative CPU 1.05 sec 2013 - 03 - 31 20 : 50 : 52 , 763 Stage- 3 map = 100 %, reduce = 100 %, Cumulative CPU 1.05 sec MapReduce Total cumulative CPU time: 1 seconds 50 msec Ended Job = job_201303271617_0022 Loading data to table economy.partition_stocks partition (exchange =NASDAQ, symbol= ABXA) Deleted hdfs: // localhost:9000/home/landen/UntarFile/hive-0.10.0/user/hive/warehouse/economy.db/partition_stocks/exchange=NASDAQ/symbol=ABXA Partition economy.partition_stocks{exchange=NASDAQ, symbol=ABXA} stats: [num_files: 1 , num_rows: 0 , total_size: 67679 , raw_data_size: 0 ] Table economy.partition_stocks stats: [num_partitions: 1 , num_files: 1 , num_rows: 0 , total_size: 67679 , raw_data_size: 0 ] 1608 Rows loaded to partition_stocks MapReduce Jobs Launched: Job 0 : Map: 2 Cumulative CPU: 21.0 sec HDFS Read: 481098497 HDFS Write: 67679 SUCCESS Job 1 : Map: 1 Cumulative CPU: 1.05 sec HDFS Read: 67989 HDFS Write: 67679 SUCCESS Total MapReduce CPU Time Spent: 22 seconds 50 msec OK Time taken: 66.143 seconds hive (economy) > select * from partition_stocks > where exchange = ' NASDAQ ' and symbol = ' ABXA ' ; 2005 - 03 - 16 7.97 7.97 7.91 7.92 300200 7.92 NASDAQ ABXA 2005 - 03 - 15 7.92 7.97 7.92 7.97 42500 7.97 NASDAQ ABXA 2005 - 03 - 14 7.93 7.99 7.92 7.92 109700 7.92 NASDAQ ABXA 2005 - 03 - 11 7.96 7.96 7.91 7.94 188200 7.94 NASDAQ ABXA 2005 - 03 - 10 7.99 8.0 7.93 7.96 1275900 7.96 NASDAQ ABXA 2005 - 03 - 09 8.01 8.02 7.78 8.0 198600 8.0 NASDAQ ABXA 2005 - 03 - 08 8.1 8.1 8.0 8.01 81600 8.01 NASDAQ ABXA 2005 - 03 - 07 8.05 8.11 8.05 8.11 70300 8.11 NASDAQ ABXA 2005 - 03 - 04 8.19 8.19 8.05 8.12 116300 8.12 NASDAQ ABXA 2005 - 03 - 03 8.09 8.2 8.09 8.2 64200 8.2 NASDAQ ABXA 2005 - 03 - 02 8.29 8.45 8.18 8.2 197600 8.2 NASDAQ ABXA 2005 - 03 - 01 8.07 8.35 8.06 8.25 332000 8.25 NASDAQ ABXA 2005 - 02 - 28 8.0 8.09 7.81 8.05 241000 8.05 NASDAQ ABXA 2005 - 02 - 25 7.55 8.3 7.55 8.1 1114100 8.1 NASDAQ ABXA 2005 - 02 - 24 7.01 7.3 7.0 7.3 174100 7.3 NASDAQ ABXA 2005 - 02 - 23 7.22 7.22 6.9 7.06 542100 7.06 NASDAQ ABXA 2005 - 02 - 22 7.56 7.56 7.25 7.26 209000 7.26 NASDAQ ABXA 2005 - 02 - 18 7.7 7.8 7.58 7.58 1525500 7.58 NASDAQ ABXA 2005 - 02 - 17 7.41 7.7 7.41 7.65 75200 7.65 NASDAQ ABXA 2005 - 02 - 16 7.51 7.51 7.38 7.45 1297600 7.45 NASDAQ ABXA 2005 - 02 - 15 7.58 7.61 7.41 7.56 248400 7.56 NASDAQ ABXA 2005 - 02 - 14 7.78 7.81 7.53 7.6 1487200 7.6 NASDAQ ABXA 2005 - 02 - 11 7.92 7.92 7.73 7.88 265400 7.88 NASDAQ ABXA 2005 - 02 - 10 8.03 8.03 7.91 7.93 205100 7.93 NASDAQ ABXA
Now Let's learn about dynamic partitions properties:
1. hive.exec.dynamic.partition = false by default. Set to true to enable dynamic partitioning;
2. hive.exec.dynamic.partition.mode = strict by default. Set to nonstrict to enable all partitions to be determined dynamiclly;
3. hive.exec.max.dynamic.partition.pernode = 100 by default. The maximum number of dynamic partitions that can be created by mapper or reducer. Raises a fatal error if
one mapper or reducer attempts to create more than the threshold;
4. hive.exec.max.dynamic.partitions = +1000 by default. The total number of dynamic partitions that can be created by one statement with dynamic partitioning. Raises a fatal error
if the limit is exceeded;
5. hive.exec.max.created.files = 100000 by default. The maximum total number of files that can be created globally. A Hadoop counter is used to track the number of files created.
Raises a fatal error if the limit is exceeded.