Chinaunix首页 | 论坛 | 博客
  • 博客访问: 358049
  • 博文数量: 49
  • 博客积分: 2709
  • 博客等级: 少校
  • 技术积分: 890
  • 用 户 组: 普通用户
  • 注册时间: 2007-03-01 16:19
文章分类

全部博文(49)

文章存档

2013年(1)

2012年(5)

2011年(5)

2010年(5)

2009年(6)

2008年(27)

我的朋友

分类: LINUX

2010-01-17 00:03:37

How to store Hive's metadata in Oracle

hive by default uses an embedded derby database. In real-world scenarios which multiple hive queries are executed using multiple session, a database server like MySQL or Oracle or derby server is required. For my example, i will use Oracle. Before doing that, make sure cluster is down.


1)create oracle user

sql> create user hive identified by hive identified by hive

     default tablespace hive;

sql> grant dba to hive;              [notice security]

2)edit hive/conf/hive-default.xml 

1
2
3
4
5
6
7
8
9
10
11
12
13
14 mapred.reduce.tasks
15 -1
16 The default number of reduce tasks per job. Typically set
17 to a prime close to the number of available hosts. Ignored when
18 mapred.job.tracker is "local". Hadoop set this to 1 by default, whereas hive uses -1 as its default value.
19 By setting this property to -1, Hive will automatically figure out what should be the number of reducers.
20

21

22
23
24 hive.exec.reducers.bytes.per.reducer
25 1000000000
26 size per reducer.The default is 1G, i.e if the input size is 10G, it will use 10 reducers.
27

28
29
30 hive.exec.reducers.max
31 999
32 max number of reducers will be used. If the one
33 specified in the configuration parameter mapred.reduce.tasks is
34 negative, hive will use this one as the max number of reducers when
35 automatically determine number of reducers.

36

37
38
39 hive.exec.scratchdir
40 /tmp/hive-${user.name}
41 Scratch space for Hive jobs
42

43
44
45 hive.test.mode
46 false
47 whether hive is running in test mode. If yes, it turns on sampling and prefixes the output tablename
48

49
50
51 hive.test.mode.prefix
52 test_
53 if hive is running in test mode, prefixes the output table by this string
54

55
56
57
58
59
60
61
62
63
64 hive.test.mode.samplefreq
65 32
66 if hive is running in test mode and table is not bucketed, sampling frequency
67

68
69
70 hive.test.mode.nosamplelist
71
72 if hive is running in test mode, dont sample the above comma seperated list of tables
73

74
75
76 hive.metastore.local
77 true
78 controls whether to connect to remove metastore server or open a new metastore server in Hive Client JVM
79

80
81
82 javax.jdo.option.ConnectionURL
83 jdbc:oracle:thin:@192.168.1.101:1521/ORCL
84 JDBC connect string for a JDBC metastore
85

86
87
88 javax.jdo.option.ConnectionDriverName
89 oracle.jdbc.driver.OracleDriver
90 Driver class name for a JDBC metastore
91

92
93
94 javax.jdo.PersistenceManagerFactoryClass
95 org.datanucleus.jdo.JDOPersistenceManagerFactory
96 class implementing the jdo persistence
97

98
99
100 javax.jdo.option.DetachAllOnCommit
101 true
102 detaches all objects from session so that they can be used after transaction is committed
103

104
105
106 javax.jdo.option.NonTransactionalRead
107 true
108 reads outside of transactions
109

110
111
112 javax.jdo.option.ConnectionUserName
113 hive
114 username to use against metastore database
115

116
117
118 javax.jdo.option.ConnectionPassword
119 hive
120 password to use against metastore database
121

122
123
124 datanucleus.validateTables
125 false
126 validates existing schema against code. turn this on if you want to verify existing schema
127

128
129
130 datanucleus.validateColumns
131 false
132 validates existing schema against code. turn this on if you want to verify existing schema
133

134
135
136 datanucleus.validateConstraints
137 false
138 validates existing schema against code. turn this on if you want to verify existing schema
139

140
141
142 datanucleus.storeManagerType
143 rdbms
144 metadata store type
145

146
147
148 datanucleus.autoCreateSchema
149 true
150 creates necessary schema on a startup if one doesn't exist. set this to false, after creating it once
151

152
153
154 datanucleus.autoStartMechanismMode
155 checked
156 throw exception if metadata tables are incorrect
157

158
159
160 datancucleus.transactionIsolation
161 read-committed
162
163

164
165
166 datanuclues.cache.level2
167 true
168 use a level 2 cache. turn this off if metadata is changed independently of hive metastore server
169

170
171
172 datanuclues.cache.level2.type
173 SOFT
174 SOFT=soft reference based cache, WEAK=weak reference based cache.
175

176
177
178 hive.metastore.warehouse.dir
179 /user/hive/warehouse
180 location of default database for the warehouse
181

182
183
184 hive.metastore.connect.retries
185 5
186 Number of retries while opening a connection to metastore
187

188
189
190 hive.metastore.rawstore.impl
191 org.apache.hadoop.hive.metastore.ObjectStore
192 Name of the class that implements org.apache.hadoop.hive.metastore.rawstore interface. This class is used to store and retrieval of raw metadata objects such as table, database
193

194
195
196 hive.default.fileformat
197 TextFile
198 Default file format for CREATE TABLE statement. Options are TextFile and SequenceFile. Users can explicitly say CREATE TABLE ... STORED AS <TEXTFILE|SEQUENCEFILE> to override
199

200
201
202 hive.map.aggr
203 true
204 Whether to use map-side aggregation in Hive Group By queries
205

206
207
208 hive.groupby.skewindata
209 false
210 Whether there is skew in data to optimize group by queries
211

212
213
214 hive.groupby.mapaggr.checkinterval
215 100000
216 Number of rows after which size of the grouping keys/aggregation classes is performed
217

218
219
220 hive.mapred.local.mem

221 0
222 For local mode, memory of the mappers/reducers
223
224
225
226 hive.map.aggr.hash.percentmemory
227 0.5
228 Portion of total memory to be used by map-side grup aggregation hash table
229

230
231
232 hive.map.aggr.hash.min.reduction
233 0.5
234 Hash aggregation will be turned off if the ratio between hash
235 table size and input rows is bigger than this number. Set to 1 to make sure
236 hash aggregation is never turned off.

237

238
239
240 hive.optimize.cp
241 true
242 Whether to enable column pruner
243

244
245
246 hive.optimize.ppd
247 true
248 Whether to enable predicate pushdown
249

250
251
252 hive.optimize.pruner
253 true
254 Whether to enable the new partition pruner which depends on predicate pushdown. If this is disabled,
255 the old partition pruner which is based on AST will be enabled.

256

257
258
259 hive.join.emit.interval
260 1000
261 How many rows in the right-most join operand Hive should buffer before emitting the join result.
262

263
264
265 hive.mapred.mode
266 nonstrict
267 The mode in which the hive operations are being performed. In strict mode, some risky queries are not allowed to run
268

269
270
271 hive.exec.script.maxerrsize
272 100000
273 Maximum number of bytes a script is allowed to emit to standard error (per map-reduce task). This prevents runaway scripts from filling logs partitions to capacity
274

275
276
277 hive.exec.compress.output
278 false
279 This controls whether the final outputs of a query (to a local/hdfs file or a hive table) is compressed. The compression codec and other options are determined from hadoop config variables mapred.output.compress*
280

281
282
283 hive.exec.compress.intermediate
284 false
285 This controls whether intermediate files produced by hive between multiple map-reduce jobs are compressed. The compression codec and other options are determined from hadoop config variables mapred.output.compress*
286

287
288
289 hive.hwi.listen.host
290 0.0.0.0
291 This is the host address the Hive Web Interface will listen on
292

293
294
295 hive.hwi.listen.port
296 9999
297 This is the port the Hive Web Interface will listen on
298

299
300
301 hive.hwi.war.file
302 ${HIVE_HOME}/lib/hive-hwi.war
303 This is the WAR file with the jsp content for Hive Web Interface
304

305
306
307 hive.exec.pre.hooks
308
309 Pre Execute Hook for Tests
310

311
312
313 hive.merge.mapfiles
314 true
315 Merge small files at the end of a map-only job
316

317
318
319 hive.merge.mapredfiles
320 false
321 Merge small files at the end of any job(map only or map-reduce)
322

323
324
325 hive.heartbeat.interval
326 1000
327 Send a heartbeat after this interval - used by mapjoin and filter operators
328

329
330
331 hive.merge.size.per.task
332 256000000
333 Size of merged files at the end of the job
334

335
336
337 hive.script.auto.progress
338 false
339 Whether Hive Tranform/Map/Reduce Clause should automatically send progress information to TaskTracker to avoid the task getting killed because of inactivity. Hive sends progress information when the script is outputting to stderr. This option removes the need of periodically producing stderr messages, but users should be cautious because this may prevent infinite loops in the scripts to be killed by TaskTracker.
340

341
342


3)download oracle jdbc:------- ojdbc-14.jar

  

4) test it :

hive> [root@boss ~]# hive
Hi
ve history file=/tmp/root/hive_job_log_root_201001162330_322775424.txt
hive> show tables;
OK
Time taken: 6.008 seconds
hive>
create tabl
e test(id bigint);
OK
Time taken: 0.495 seconds
hive>


5)login into oracle,check if autocreate some tables;

[oracle@boss ~]$ sqlplus hive/hive@192.168.1.101:1521/spgjmega

SQL*Plus: Release 10.2.0.1.0 - Production on Sat Jan 16 23:57:37 2010

Copyright (c) 1982, 2005, Oracle. All rights reserved.


Connected to:
Oracle Database 10g Enterprise Edition Release 10.2.0.1.0 - Production
With the Partitioning, OLAP and Data Mining options

SQL> show user;
USER is "HIVE"
SQL> select * from tab;

TNAME TABTYPE CLUSTERID
------------------------------ ------- ----------
NUCLEUS_TABLES TABLE
A TABLE
DBS TABLE
SEQUENCE_TABLE TABLE
SERDES TABLE
TBLS TABLE
SDS TABLE
PARTITION_KEYS TABLE
COLUMNS TABLE
BUCKETING_COLS TABLE
SD_PARAMS TABLE

TNAME TABTYPE CLUSTERID
------------------------------ ------- ----------
SORT_COLS TABLE
SERDE_PARAMS TABLE
TABLE_PARAMS TABLE

14 rows selected.

SQL> set line 1000
SQL> r
1* select * from TBLS

TBL_ID CREATE_TIME DB_ID LAST_ACCESS_TIME OWNER RETENTION SD_ID TBL_NAME
---------- ----------- ---------- ---------------- -------------------------------------------------------------------------------------------------------------------------------- ---------- ---------- --------------------------------------------------------------------------------------------------------------------------------
1 1263655926 1 0 root 0 1 test

SQL>




http://www.mazsoft.com/blog/post/2009/11/19/setting-up-hadoophive-cluster-on-Centos-5.aspx


阅读(1909) | 评论(0) | 转发(0) |
给主人留下些什么吧!~~