全部博文(22)
分类: 大数据
2013-01-18 12:55:56
${IMPALA_HOME}/bin/start-impalad.sh -use_statestore=false ${IMPALA_HOME}/bin/impala-shell.sh
问题1:
虽然本地集群的hive metastore已经配置好了,执行impala-shell.sh脚本后也能成功,但是执行show databases的时候,却看不到在hive里已经创建的test.db这个数据库,而脚本还在其执行目录生成了derby.log和metastore.db两个日志文件和目录,这是impala自带的hive元数据库,所以问题就很清楚了,这是因为impala未能了解你所配置的hive元数据。
按照官网上所说的,为了配置impala需要使用的hdfs,hbase,hive的metastore,其内部实现是将其配置文件放入fe/src/test/resources目录下,这个是在${IMPALA_HOME}/bin/set-classpath.sh中设置的,set-classpath.sh中的shell脚本如下:
#!/bin/sh # Copyright 2012 Cloudera Inc. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. # You may obtain a copy of the License at # # # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. # This script explicitly sets the CLASSPATH for embedded JVMs (e.g. in # Impalad or in runquery) Because embedded JVMs do not honour # CLASSPATH wildcard expansion, we have to add every dependency jar # explicitly to the CLASSPATH. CLASSPATH=\ $IMPALA_HOME/fe/src/test/resources:\ $IMPALA_HOME/fe/target/classes:\ $IMPALA_HOME/fe/target/dependency:\ $IMPALA_HOME/fe/target/test-classes:\ ${HIVE_HOME}/lib/datanucleus-core-2.0.3.jar:\ ${HIVE_HOME}/lib/datanucleus-enhancer-2.0.3.jar:\ ${HIVE_HOME}/lib/datanucleus-rdbms-2.0.3.jar:\ ${HIVE_HOME}/lib/datanucleus-connectionpool-2.0.3.jar:${CLASSPATH} for jar in `ls ${IMPALA_HOME}/fe/target/dependency/*.jar`; do CLASSPATH=${CLASSPATH}:$jar done export CLASSPATH
但是问题出现了,我所编译成功后的源码fe/src目录下,并没有resources这个目录,所以我从其它地方将它下了下来,然后放到相应目录中,修改相应的core.site.xml,hdfs-site.xml,hive-site.xml这三个配置文件,和集群配置相同;然后执行source bin/set-classpath.sh,这样第一个问题就解决了!
问题2:
待到前面的那个问题解决之后,执行与前面相同的两个脚本,执行下面的命令:
Welcome to the Impala shell. Press TAB twice to see a list of available commands. Copyright (c) 2012 Cloudera, Inc. All rights reserved. (Build version: build version not available) [Not connected] > connect hadoop-01 [hadoop-01:21000] > show databases; default test_impala [hadoop-01:21000] > use test_impala; [hadoop-01:21000] > show tables; tab1 tab2 tab3 [hadoop-01:21000] > select * from tab3; [hadoop-01:21000] > select * from tab1; ERROR: Failed to open HDFS file hdfs://hadoop-01.localdomain:8030/user/impala/warehouse/test_impala.db/tab1/tab1.csv Error(255): Unknown error 255 ERROR: Invalid query handle [hadoop-01:21000] > select * from tab1; ERROR: Failed to open HDFS file hdfs://hadoop-01.localdomain:8030/user/impala/warehouse/test_impala.db/tab1/tab1.csv Error(255): Unknown error 255 ERROR: Invalid query handle [hadoop-01:21000] > quit
后台impalad的日志信息如下:
13/01/18 11:50:46 INFO service.Frontend: createExecRequest for query select * from tab1 13/01/18 11:50:46 INFO service.JniFrontend: Plan Fragment 0 UNPARTITIONED EXCHANGE (1) TUPLE IDS: 0 Plan Fragment 1 RANDOM STREAM DATA SINK EXCHANGE ID: 1 UNPARTITIONED SCAN HDFS table=test_impala.tab1 (0) TUPLE IDS: 0 13/01/18 11:50:46 INFO service.JniFrontend: returned TQueryExecRequest2: TExecRequest(stmt_type:QUERY, sql_stmt:select * from tab1, request_id:TUniqueId(hi:-6897121767931491435, lo:-4792011001236606993), query_options:TQueryOptions(abort_on_error:false, max_errors:0, disable_codegen:false, batch_size:0, return_as_ascii:true, num_nodes:0, max_scan_range_length:0, num_scanner_threads:0, max_io_buffers:0, allow_unsupported_formats:false, partition_agg:false), query_exec_request:TQueryExecRequest(desc_tbl:TDescriptorTable(slotDescriptors:[TSlotDescriptor(id:0, parent:0, slotType:INT, columnPos:0, byteOffset:4, nullIndicatorByte:0, nullIndicatorBit:1, slotIdx:1, isMaterialized:true), TSlotDescriptor(id:1, parent:0, slotType:BOOLEAN, columnPos:1, byteOffset:1, nullIndicatorByte:0, nullIndicatorBit:0, slotIdx:0, isMaterialized:true), TSlotDescriptor(id:2, parent:0, slotType:DOUBLE, columnPos:2, byteOffset:8, nullIndicatorByte:0, nullIndicatorBit:2, slotIdx:2, isMaterialized:true), TSlotDescriptor(id:3, parent:0, slotType:TIMESTAMP, columnPos:3, byteOffset:16, nullIndicatorByte:0, nullIndicatorBit:3, slotIdx:3, isMaterialized:true)], tupleDescriptors:[TTupleDescriptor(id:0, byteSize:32, numNullBytes:1, tableId:1)], tableDescriptors:[TTableDescriptor(id:1, tableType:HDFS_TABLE, numCols:4, numClusteringCols:0, hdfsTable:THdfsTable(hdfsBaseDir:hdfs://hadoop-01.localdomain:8030/user/impala/warehouse/test_impala.db/tab1, partitionKeyNames:[], nullPartitionKeyValue:__HIVE_DEFAULT_PARTITION__, partitions:{-1=THdfsPartition(lineDelim:10, fieldDelim:44, collectionDelim:44, mapKeyDelim:44, escapeChar:0, fileFormat:TEXT, partitionKeyExprs:[], blockSize:0, compression:NONE), 1=THdfsPartition(lineDelim:10, fieldDelim:44, collectionDelim:44, mapKeyDelim:44, escapeChar:0, fileFormat:TEXT, partitionKeyExprs:[], blockSize:0, compression:NONE)}), tableName:tab1, dbName:test_impala)]), fragments:[TPlanFragment(plan:TPlan(nodes:[TPlanNode(node_id:1, node_type:EXCHANGE_NODE, num_children:0, limit:-1, row_tuples:[0], nullable_tuples:[false], compact_data:false)]), output_exprs:[TExpr(nodes:[TExprNode(node_type:SLOT_REF, type:INT, num_children:0, slot_ref:TSlotRef(slot_id:0))]), TExpr(nodes:[TExprNode(node_type:SLOT_REF, type:BOOLEAN, num_children:0, slot_ref:TSlotRef(slot_id:1))]), TExpr(nodes:[TExprNode(node_type:SLOT_REF, type:DOUBLE, num_children:0, slot_ref:TSlotRef(slot_id:2))]), TExpr(nodes:[TExprNode(node_type:SLOT_REF, type:TIMESTAMP, num_children:0, slot_ref:TSlotRef(slot_id:3))])], partition:TDataPartition(type:UNPARTITIONED, partitioning_exprs:[])), TPlanFragment(plan:TPlan(nodes:[TPlanNode(node_id:0, node_type:HDFS_SCAN_NODE, num_children:0, limit:-1, row_tuples:[0], nullable_tuples:[false], compact_data:false, hdfs_scan_node:THdfsScanNode(tuple_id:0))]), output_sink:TDataSink(type:DATA_STREAM_SINK, stream_sink:TDataStreamSink(dest_node_id:1, output_partition:TDataPartition(type:UNPARTITIONED, partitioning_exprs:[]))), partition:TDataPartition(type:RANDOM, partitioning_exprs:[]))], dest_fragment_idx:[0], per_node_scan_ranges:{0=[TScanRangeLocations(scan_range:TScanRange(hdfs_file_split:THdfsFileSplit(path:hdfs://hadoop-01.localdomain:8030/user/impala/warehouse/test_impala.db/tab1/tab1.csv, offset:0, length:192, partition_id:1)), locations:[TScanRangeLocation(server:THostPort(hostname:192.168.1.2, ipaddress:192.168.1.2, port:50010), volume_id:0)])]}, query_globals:TQueryGlobals(now_string:2013-01-18 11:50:46.000000862)), result_set_metadata:TResultSetMetadata(columnDescs:[TColumnDesc(columnName:id, columnType:INT), TColumnDesc(columnName:col_1, columnType:BOOLEAN), TColumnDesc(columnName:col_2, columnType:DOUBLE), TColumnDesc(columnName:col_3, columnType:TIMESTAMP)])) hdfsOpenFile(hdfs://hadoop-01.localdomain:8030/user/impala/warehouse/test_impala.db/tab1/tab1.csv): FileSystem#open((Lorg/apache/hadoop/fs/Path;I)Lorg/apache/hadoop/fs/FSDataInputStream;) error: java.lang.IllegalArgumentException: Wrong FS: hdfs://hadoop-01.localdomain:8030/user/impala/warehouse/test_impala.db/tab1/tab1.csv, expected: hdfs://localhost:20500 at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:547) at org.apache.hadoop.hdfs.DistributedFileSystem.getPathName(DistributedFileSystem.java:169) at org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:245) at org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSyst问题所指的是Wrong FS错误,expected:hdfs://localhost:20500,我在resources目录下的core-site.xml配置文件明明就已经指定了namenode的地址和端口为8030,后来看了下impala关于impala的源码,才发现在/ / / / hdfs-fs-cache.cc目录下,有指定默认的nn和nn_port,
DEFINE_string(nn, "localhost", "hostname or ip address of HDFS namenode"); DEFINE_int32(nn_port, 20500, "namenode port");所以,在启动impalad的服务的时候,需要同时指定nn和nn_port为集群所设置的相应地址和端口,如下所示:
/bin/start-impalad.sh -use_statestore=false -nn=hadoop-01.localdomain -nn_port=8030这样关于expected: hdfs://localhost:20500第二个问题也就解决了,执行任何查询都没有问题!