How to store Hive\'s metadata in Oracle 10g-zhangzhu1984-ChinaUnix博客

harryzhangharryzhang.blog.chinaunix.net

首页　| 　博文目录　| 　关于我

zhangzhu1984

博客访问： 358049
博文数量： 49
博客积分： 2709
博客等级：少校
技术积分： 890
用户组：普通用户
注册时间： 2007-03-01 16:19

文章分类

全部博文（49）

vmware（1）
Mysql（3）
安全管理（0）
网络管理（0）
系统管理（3）

技术－备份（0）
Linux（23）
Windows（2）
grennplum（1）
hadoop（1）
oracle（9）
生活（1）

情感（0）

生活－娱乐（1）
未分配的博文（5）

文章存档

2013年（1）

2012年（5）

2011年（5）

2010年（5）

2009年（6）

2008年（27）

我的朋友

相关博文

How to store Hive's metadata in Oracle 10g

分类： LINUX

2010-01-17 00:03:37

How to store Hive's metadata in Oracle

hive by default uses an embedded derby database. In real-world scenarios which multiple hive queries are executed using multiple session, a database server like MySQL or Oracle or derby server is required. For my example, i will use Oracle. Before doing that, make sure cluster is down.

1)create oracle user

sql> create user hive identified by hive identified by hive

default tablespace hive;

sql> grant dba to hive; [notice security]

2)edit hive/conf/hive-default.xml

1 2 3 4 5 6 7 8 9 10 11 12 13 14 mapred.reduce.tasks 15 -1 16 The default number of reduce tasks per job. Typically set 17 to a prime close to the number of available hosts. Ignored when 18 mapred.job.tracker is "local". Hadoop set this to 1 by default, whereas hive uses -1 as its default value. 19 By setting this property to -1, Hive will automatically figure out what should be the number of reducers. 20 21 22 23 24 hive.exec.reducers.bytes.per.reducer 25 1000000000 26 size per reducer.The default is 1G, i.e if the input size is 10G, it will use 10 reducers. 27 28 29 30 hive.exec.reducers.max 31 999 32 max number of reducers will be used. If the one 33 specified in the configuration parameter mapred.reduce.tasks is 34 negative, hive will use this one as the max number of reducers when 35 automatically determine number of reducers. 36 37 38 39 hive.exec.scratchdir 40 /tmp/hive-${user.name} 41 Scratch space for Hive jobs 42 43 44 45 hive.test.mode 46 false 47 whether hive is running in test mode. If yes, it turns on sampling and prefixes the output tablename 48 49 50 51 hive.test.mode.prefix 52 test_ 53 if hive is running in test mode, prefixes the output table by this string 54 55 56 57 58 59 60 61 62 63 64 hive.test.mode.samplefreq 65 32 66 if hive is running in test mode and table is not bucketed, sampling frequency 67 68 69 70 hive.test.mode.nosamplelist 71 72 if hive is running in test mode, dont sample the above comma seperated list of tables 73 74 75 76 hive.metastore.local 77 true 78 controls whether to connect to remove metastore server or open a new metastore server in Hive Client JVM 79 80 81 82 javax.jdo.option.ConnectionURL 83 jdbc:oracle:thin:@192.168.1.101:1521/ORCL 84 JDBC connect string for a JDBC metastore 85 86 87 88 javax.jdo.option.ConnectionDriverName 89 oracle.jdbc.driver.OracleDriver 90 Driver class name for a JDBC metastore 91 92 93 94 javax.jdo.PersistenceManagerFactoryClass 95 org.datanucleus.jdo.JDOPersistenceManagerFactory 96 class implementing the jdo persistence 97 98 99 100 javax.jdo.option.DetachAllOnCommit 101 true 102 detaches all objects from session so that they can be used after transaction is committed 103 104 105 106 javax.jdo.option.NonTransactionalRead 107 true 108 reads outside of transactions 109 110 111 112 javax.jdo.option.ConnectionUserName 113 hive 114 username to use against metastore database 115 116 117 118 javax.jdo.option.ConnectionPassword 119 hive 120 password to use against metastore database 121 122 123 124 datanucleus.validateTables 125 false 126 validates existing schema against code. turn this on if you want to verify existing schema 127 128 129 130 datanucleus.validateColumns 131 false 132 validates existing schema against code. turn this on if you want to verify existing schema 133 134 135 136 datanucleus.validateConstraints 137 false 138 validates existing schema against code. turn this on if you want to verify existing schema 139 140 141 142 datanucleus.storeManagerType 143 rdbms 144 metadata store type 145 146 147 148 datanucleus.autoCreateSchema 149 true 150 creates necessary schema on a startup if one doesn't exist. set this to false, after creating it once 151 152 153 154 datanucleus.autoStartMechanismMode 155 checked 156 throw exception if metadata tables are incorrect 157 158 159 160 datancucleus.transactionIsolation 161 read-committed 162 163 164 165 166 datanuclues.cache.level2 167 true 168 use a level 2 cache. turn this off if metadata is changed independently of hive metastore server 169 170 171 172 datanuclues.cache.level2.type 173 SOFT 174 SOFT=soft reference based cache, WEAK=weak reference based cache. 175 176 177 178 hive.metastore.warehouse.dir 179 /user/hive/warehouse 180 location of default database for the warehouse 181 182 183 184 hive.metastore.connect.retries 185 5 186 Number of retries while opening a connection to metastore 187 188 189 190 hive.metastore.rawstore.impl 191 org.apache.hadoop.hive.metastore.ObjectStore 192 Name of the class that implements org.apache.hadoop.hive.metastore.rawstore interface. This class is used to store and retrieval of raw metadata objects such as table, database 193 194 195 196 hive.default.fileformat 197 TextFile 198 Default file format for CREATE TABLE statement. Options are TextFile and SequenceFile. Users can explicitly say CREATE TABLE ... STORED AS <TEXTFILE|SEQUENCEFILE> to override 199 200 201 202 hive.map.aggr 203 true 204 Whether to use map-side aggregation in Hive Group By queries 205 206 207 208 hive.groupby.skewindata 209 false 210 Whether there is skew in data to optimize group by queries 211 212 213 214 hive.groupby.mapaggr.checkinterval 215 100000 216 Number of rows after which size of the grouping keys/aggregation classes is performed 217 218 219 220 hive.mapred.local.mem221 0 222 For local mode, memory of the mappers/reducers 223 224 225 226 hive.map.aggr.hash.percentmemory 227 0.5 228 Portion of total memory to be used by map-side grup aggregation hash table 229 230 231 232 hive.map.aggr.hash.min.reduction 233 0.5 234 Hash aggregation will be turned off if the ratio between hash 235 table size and input rows is bigger than this number. Set to 1 to make sure 236 hash aggregation is never turned off. 237 238 239 240 hive.optimize.cp 241 true 242 Whether to enable column pruner 243 244 245 246 hive.optimize.ppd 247 true 248 Whether to enable predicate pushdown 249 250 251 252 hive.optimize.pruner 253 true 254 Whether to enable the new partition pruner which depends on predicate pushdown. If this is disabled, 255 the old partition pruner which is based on AST will be enabled. 256 257 258 259 hive.join.emit.interval 260 1000 261 How many rows in the right-most join operand Hive should buffer before emitting the join result. 262 263 264 265 hive.mapred.mode 266 nonstrict 267 The mode in which the hive operations are being performed. In strict mode, some risky queries are not allowed to run 268 269 270 271 hive.exec.script.maxerrsize 272 100000 273 Maximum number of bytes a script is allowed to emit to standard error (per map-reduce task). This prevents runaway scripts from filling logs partitions to capacity 274 275 276 277 hive.exec.compress.output 278 false 279 This controls whether the final outputs of a query (to a local/hdfs file or a hive table) is compressed. The compression codec and other options are determined from hadoop config variables mapred.output.compress* 280 281 282 283 hive.exec.compress.intermediate 284 false 285 This controls whether intermediate files produced by hive between multiple map-reduce jobs are compressed. The compression codec and other options are determined from hadoop config variables mapred.output.compress* 286 287 288 289 hive.hwi.listen.host 290 0.0.0.0 291 This is the host address the Hive Web Interface will listen on 292 293 294 295 hive.hwi.listen.port 296 9999 297 This is the port the Hive Web Interface will listen on 298 299 300 301 hive.hwi.war.file 302 ${HIVE_HOME}/lib/hive-hwi.war 303 This is the WAR file with the jsp content for Hive Web Interface 304 305 306 307 hive.exec.pre.hooks 308 309 Pre Execute Hook for Tests 310 311 312 313 hive.merge.mapfiles 314 true 315 Merge small files at the end of a map-only job 316 317 318 319 hive.merge.mapredfiles 320 false 321 Merge small files at the end of any job(map only or map-reduce) 322 323 324 325 hive.heartbeat.interval 326 1000 327 Send a heartbeat after this interval - used by mapjoin and filter operators 328 329 330 331 hive.merge.size.per.task 332 256000000 333 Size of merged files at the end of the job 334 335 336 337 hive.script.auto.progress 338 false 339 Whether Hive Tranform/Map/Reduce Clause should automatically send progress information to TaskTracker to avoid the task getting killed because of inactivity. Hive sends progress information when the script is outputting to stderr. This option removes the need of periodically producing stderr messages, but users should be cautious because this may prevent infinite loops in the scripts to be killed by TaskTracker. 340 341 342

3)download oracle jdbc：------- ojdbc-14.jar

4) test it ：

hive> [root@boss ~]# hive

Hi

ve history file=/tmp/root/hive_job_log_root_201001162330_322775424.txt

hive> show tables;

OK

Time taken: 6.008 seconds

hive>

create table test(id bigint);



OK

Time taken: 0.495 seconds

hive>

5）login into oracle,check if autocreate some tables;

[oracle@boss ~]$ sqlplus hive/hive@192.168.1.101:1521/spgjmega

SQL*Plus: Release 10.2.0.1.0 - Production on Sat Jan 16 23:57:37 2010

Copyright (c) 1982, 2005, Oracle. All rights reserved.

Connected to:
Oracle Database 10g Enterprise Edition Release 10.2.0.1.0 - Production
With the Partitioning, OLAP and Data Mining options

SQL> show user;
USER is "HIVE"
SQL> select * from tab;

TNAME TABTYPE CLUSTERID
------------------------------ ------- ----------
NUCLEUS_TABLES TABLE
A TABLE
DBS TABLE
SEQUENCE_TABLE TABLE
SERDES TABLE
TBLS TABLE
SDS TABLE
PARTITION_KEYS TABLE
COLUMNS TABLE
BUCKETING_COLS TABLE
SD_PARAMS TABLE

TNAME TABTYPE CLUSTERID
------------------------------ ------- ----------
SORT_COLS TABLE
SERDE_PARAMS TABLE
TABLE_PARAMS TABLE

14 rows selected.

SQL> set line 1000
SQL> r
1* select * from TBLS

TBL_ID CREATE_TIME DB_ID LAST_ACCESS_TIME OWNER RETENTION SD_ID TBL_NAME
---------- ----------- ---------- ---------------- -------------------------------------------------------------------------------------------------------------------------------- ---------- ---------- --------------------------------------------------------------------------------------------------------------------------------
1 1263655926 1 0 root 0 1 test

SQL>

http://www.mazsoft.com/blog/post/2009/11/19/setting-up-hadoophive-cluster-on-Centos-5.aspx

阅读(1909) | 评论(0) | 转发(0) |

上一篇：oracle 9i_10g编程艺术

下一篇：python使用ceODBC连接sybase iq 12.7 数据库

给主人留下些什么吧！~~

感谢所有关心和支持过ChinaUnix的朋友们

16024965号-6