推荐: blog.csdn.net/aquester https://github.com/eyjian https://www.cnblogs.com/aquester http://blog.chinaunix.net/uid/20682147.html
全部博文(594)
分类: HADOOP
2014-04-25 09:23:05
HBase-1.2.1和Phoenix-4.7.0分布式安装指南.pdf
本文将HBase-1.2.1安装在Hadoop-2.7.2上,关于Hadoop-2.7.2的安装,请参见《Hadoop-2.7.2分布式安装手册》一文。安装环境为64位SuSE-Linux 10.1版本。
本文将在HBase官方提供的quickstart.html文件的指导下进行,在docs/getting_started目录下可找到quickstart.html,或直接浏览在线的:。
安装使用外置的ZooKeeper,有关ZooKeeper的安装,请参见《ZooKeeper-3.4.6分布式安装指南》一文。
关于分布式安装,请浏览:,关于HBase使用外置的ZooKeeper配置,请浏览:。
所有在线的文档,均会出现在二进制安装包解压后的docs目录下。本文的安装环境为64位SuSE 10.1 Linux。
Region name用来标识一个Region,它的格式为:表名,StartKey,随机生成的RegionID,如:
test,83--G40V6UdCnEHKSKqR_yjJo798594847946710200000795,1461323021820.d4cc7afbc2d6bf3843c121fedf4d696d. |
上述test为表名,中间蓝色串为Startkey,最后红色部分为Region ID(注意包含了2个点号)。如果为第一个Region,则StartKey为空,比如变成这样:
t_user,,1461549916081.f4e17b0d99f2d77da44ccb184812c345. |
假设将Hadoop-2.7.2安装在/data/hadoop/current目录,而/data/hadoop/current实际是到/data/hadoop/hadoop-2.7.2的软链接。
HBase安装目录为/data/hadoop/hbase,而/data/hadoop/hbase实际是到hbase-1.2.1-hadoop2的软链接。
2888 |
ZooKeeper,如果是Leader,用来监听Follower的连接 |
3888 |
ZooKeeper,用于Leader选举 |
2181 |
ZooKeeper,用来监听客户端的连接 |
16010 |
hbase.master.info.port,HMaster的http端口 |
16000 |
hbase.master.port,HMaster的RPC端口 |
16030 |
hbase.regionserver.info.port,HRegionServer的http端口 |
16020 |
hbase.regionserver.port,HRegionServer的RPC端口 |
8080 |
hbase.rest.port,HBase REST server的端口 |
9095 |
hbase.thrift.info.port,HBase Thrift Server的http端口号 |
官网:,在这里即可找到下载HBase的链接。
下载国内映像站点:,HBase-1.2.1版本的下载网址:。选择下载hbase-1.2.1-hadoop2-bin.tar.gz。
regionservers类似于Hadoop的slaves文件,不需要在RegionServer机器上执行些修改。
将所有HRegionServers的IP或主机名一行一行的例举在在regionservers文件中,注意必须一行一个,不能一行多个。本文配置如下:
hadoop@VM_40_171_sles10_64:~/hbase/conf> cat regionservers 10.12.154.77 10.12.154.78 10.12.154.79 |
需要在所有机器上做同样的操作,可以借助scp命令,先配置好一台,然后复制过去,如:scp hbase-site.xml hadoop@10.12.154.79:/data/hadoop/hbase/conf/。
hbase-site.xml是HBase的配置文件。默认的hbase-site.xml是空的,如下所示:
<configuration>
|
没关系,就用它。不要用docs目录下的hbase-default.xml,这个会让你看得难受。
编辑hbase-site.xml,添加如下内容(摘自,搜索“Fully-distributed”):
false: standalone and pseudo-distributed setups with managed Zookeeper true: fully-distributed with unmanaged Zookeeper Quorum (see hbase-env.sh)
For example, "host1.mydomain.com,host2.mydomain.com,host3.mydomain.com". By default this is set to localhost for local and pseudo-distributed modes of operation. For a fully-distributed setup, this should be set to a full list of ZooKeeper quorum servers. If HBASE_MANAGES_ZK is set in hbase-env.sh this is the list of servers which we will start/stop ZooKeeper on.
|
“hbase.zookeeper.quorum”可以填写IP列表。hdfs://172.25.40.171:9001对应hdfs-site.xml中的“dfs.namenode.rpc-address”。“hbase.zookeeper.quorum”配置为ZooKeeper集群各节点主机名或IP。
如果HDFS是cluster模式,那么hbase.rootdir请改成集群方式,如:
|
即值为core-site.xml中的fs.defaultFS值,再加上hbase目录。上述示例中的test,实际为hdfs-site.xml中的dfs.nameservices的值。
更多的信息,可以浏览:。
用于指定HMaster的http端口。
用于指定HMaster的http的IP地址,如果不设定该值,可能使用IPv6地址。
需要在所有机器上做同样的操作,可以借助scp命令,先配置好一台,然后复制过去,如:scp hbase-site.xml ,修改内容如下:
1) 设置JAVA_HOME
# The java implementation to use. Java 1.6 required. export JAVA_HOME=/data/jdk |
上述/data/jdk是JDK的安装目录。
2) 设置HBASE_MANAGES_ZK
# Tell HBase whether it should manage it's own instance of Zookeeper or not. export HBASE_MANAGES_ZK=false |
如果HBASE_MANAGES_ZK值为true,则表示使用HBase自带的ZooKeeper,建议单独部署ZooKeeper,这样便于ZooKeeper同时为其它系统提供服务。
3) 设置HBASE_CLASSPATH
# Extra Java CLASSPATH elements. Optional. export HBASE_CLASSPATH=/data/hadoop/current/etc/hadoop |
这个设置是不是有点让人迷惑?CLASSPATH怎么指向了hadoop的conf目录?这个设置是让hbase能找到hadoop,名字确实没取好。
除此之外,还可以考虑在hbase的conf目录下建立hadoop的hdfs-site.xml软链接。
在启动HBase之前完成即可,但这步需要root操作,在文件/etc/security/limits.conf中增加两项:limits和nproc,如:
hadoop - nofile 32768 hadoop hard nproc 320000 hadoop soft nproc 320000 |
nofile指定单个进程可以打开的文件个数,nproc指定最多进程数。“hadoop”需要改成实际的用户名。
为使limits生效,需要确保文件/etc/pam.d/login中有如下一行:
session required pam_limits.so |
如果由crond拉起,则还需要将上面这一行加入到/etc/pam.d/crond中。
完成修改后,不需要重启机器,只需要得新登录一下即可生效,可以使用命令“ulimit -a”查看生效前后的变化。
进入HBASE_HOME/bin目录,执行start-hbase.sh即可启动HBase。请使用JDK提供的jps命令,分别查看HMaster和HRegionServer进程是否已经起来,同时检查日志文件是否有错误。
通过执行“hbase shell”进入命令行操作界面。详细请浏览官方文档:。
# 查看有哪些表 list
hbase(main):003:0> create 'test', 'cf' # 创建表test,一个列族cf 0 row(s) in 1.2200 seconds hbase(main):003:0> list 'test' .. 1 row(s) in 0.0550 seconds hbase(main):004:0> put 'test', 'row1', 'cf:a', 'value1' # 往表test的cf列族的a字段插入值value1 0 row(s) in 0.0560 seconds hbase(main):005:0> put 'test', 'row2', 'cf:b', 'value2' 0 row(s) in 0.0370 seconds hbase(main):006:0> put 'test', 'row3', 'cf:c', 'value3' 0 row(s) in 0.0450 seconds
hbase(main):007:0> scan 'test' # 扫描表test ROW COLUMN+CELL row1 column=cf:a, timestamp=1288380727188, value=value1 row2 column=cf:b, timestamp=1288380738440, value=value2 row3 column=cf:c, timestamp=1288380747365, value=value3 3 row(s) in 0.0590 seconds
hbase(main):008:0> get 'test', 'row1' # 从表test取一行数据 COLUMN CELL cf:a timestamp=1288380727188, value=value1 1 row(s) in 0.0400 seconds
# 取某列的数据 get 'test', 'row1', 'cf1:col1' # 或者 get 'test', 'row1', {COLUMN=>'cf1:col1'}
hbase(main):012:0> disable 'test' 0 row(s) in 1.0930 seconds hbase(main):013:0> drop 'test' 0 row(s) in 0.0770 seconds
# 清空一个表 truncate 'test'
# 查表行数方法 count ‘test’
# 删除行中的某个列值 delete 't1','row1','cf1:col1'
# 删除整行 deleteall 't1','row1'
# 退出hbase shell hbase(main):014:0> exit |
查表行数第二种方法:
bin/hbase org.apache.hadoop.hbase.mapreduce.RowCounter 'test'
分拆Region最简单的方式是利用HBase web提供的Split功能,只需要输入被分拆的Region Key即可,如要拆分名为“test,03333333,1467613810867.38b8ef87bbf2f1715998911aafc8c7b3.”的Resion,只需要输入:test,03333333,1467613810867,然后点Split即可。
38b8ef87bbf2f1715998911aafc8c7b3为Region的ENCODED名,是一个MD5值,即md5(test,03333333,1467613810867)的结果。
在hbase shell中操作为:split 'regionName', 'splitKey'。
预分Region时,可能会产生一些过小或空的Region,这个时候可以考虑合并空的和过小的Region。
如果需要合并Region,可以使用工具org.apache.hadoop.hbase.util.Merge,但要求停集群,如:
$ ./hbase org.apache.hadoop.hbase.util.Merge
For hadoop 0.21+, Usage: bin/hbase org.apache.hadoop.hbase.util.Merge [-Dfs.defaultFS=hdfs://nn:port]
hbase shell内置了合并region命令merge_region。
hbase shell通过调用lib/ruby目录下的ruby脚本来完成许多命令,这些命令的脚本全用ruby编码,均位于lib/ruby/shell/commands目录下。不能直接运行lib/ruby/shell/commands目录下的ruby脚本,它们只是各种功能的ruby模块,需进入hbase shell环境后运行,文件名即为命令名,不带参数运行,可以得到用法,如:
hbase(main):001:0> merge_region
ERROR: wrong number of arguments (0 for 2)
Here is some help for this command: Merge two regions. Passing 'true' as the optional third parameter will force a merge ('force' merges regardless else merge will fail unless passed adjacent regions. 'force' is for expert use only).
NOTE: You must pass the encoded region name, not the full region name so this command is a little different from other region operations. The encoded region name is the hash suffix on region names: e.g. if the region name were TestTable,0094429456,1289497600452.527db22f95c8a9e0116f0cc13c680396. then the encoded region name portion is 527db22f95c8a9e0116f0cc13c680396
Examples:
hbase> merge_region 'ENCODED_REGIONNAME', 'ENCODED_REGIONNAME' hbase> merge_region 'ENCODED_REGIONNAME', 'ENCODED_REGIONNAME', true |
实际上,编码的Region名ENCODED_REGIONNAME是一个MD5值。在线合并示例:
hbase(main):003:0> merge_region '000d96eef8380430d650c6936b9cef7d','b27a07c88dbbc070f716ee87fab15106' 0 row(s) in 0.0730 seconds |
备HMaster可以有0到多个,配置和主HMaster完全相同,所以只需要复制一份已配置好的HMaster过去即可,然后同样的命令启动。启动好后,一样可以执行HBase shell命令。
为启用HBase的访问控制,需在hbase-site.xml文件中增加如下两个配置项:
org.apache.hadoop.hbase.security.token.TokenProvider,org.apache.hadoop.hbase.security.access.AccessController
|
可以通过HBase shell进行权限管理,可以控制表(Table)和列族(Column Family)两个级别的权限,superuser为超级用户:
grant
permissions取值为0或字母R、W、C和A的组合(R:read,W:write,C:create,A:admin)。
revoke
alter 'tablename', {OWNER => 'username'}
查看用户有哪些权限:user_permission
以下命令均直接在hbase shell中运行:
import org.apache.hadoop.hbase.filter.SingleColumnValueFilter
import org.apache.hadoop.hbase.filter.CompareFilter
import org.apache.hadoop.hbase.util.Bytes
# 包含所有列
scan 'test',{STARTROW =>'2016081100AA1600011516', STOPROW =>'2016081124ZZ1600011516',LIMIT=>2, FILTER=>SingleColumnValueFilter.new(Bytes.toBytes('cf1'),Bytes.toBytes('id'),CompareFilter::CompareOp.valueOf('EQUAL'),Bytes.toBytes('1299840901201608111600011516'))}
# 不包含过滤的列的其它所有列
import org.apache.hadoop.hbase.filter.SingleColumnValueExcludeFilter
scan 'test',{STARTROW =>'2016081100AA1600011516', STOPROW =>'2016081124ZZ1600011516',LIMIT=>2, FILTER=>SingleColumnValueExcludeFilter.new(Bytes.toBytes('cf1'),Bytes.toBytes('id'),CompareFilter::CompareOp.valueOf('EQUAL'),Bytes.toBytes('1299840901201608111600011516'))}
# 预分区建表(splits是针对整个表的,而非某列族,因此独立的{})
create 'test',{NAME => 'cf1', VERSIONS => 1},{SPLITS_FILE => 'splits.txt'}
本文的实践过程中遇到了如下一些错误:
1) 错误1:Host key not found from database
遇到如下错误,说明不能免密码登录DEVNET-154-70、DEVNET-154-77和DEVNET-154-79,假设用户名为hadoop,则可以试试ssh hadoop@DEVNET-154-70来检查是否能免密码登录:
./start-hbase.sh
DEVNET-154-70: Host key not found from database.
DEVNET-154-70: Key fingerprint:
DEVNET-154-70: xihad-rotuf-lykeh-mapup-kylin-kybub-sohid-bucaf-gafyg-vecuc-tyxux
DEVNET-154-70: You can get a public key's fingerprint by running
DEVNET-154-70: % ssh-keygen -F publickey.pub
DEVNET-154-70: on the keyfile.
DEVNET-154-70: warning: tcgetattr failed in ssh_rl_set_tty_modes_for_fd: fd 1: Invalid argument
DEVNET-154-77: Host key not found from database.
DEVNET-154-77: Key fingerprint:
DEVNET-154-77: xuhog-tavip-donon-vuvac-tycyh-sysyz-zacur-didoz-fugif-vosar-ruxyx
DEVNET-154-77: You can get a public key's fingerprint by running
DEVNET-154-77: % ssh-keygen -F publickey.pub
DEVNET-154-77: on the keyfile.
DEVNET-154-77: warning: tcgetattr failed in ssh_rl_set_tty_modes_for_fd: fd 1: Invalid argument
DEVNET-154-79: Host key not found from database.
DEVNET-154-79: Key fingerprint:
DEVNET-154-79: xolim-mysyg-bozes-zilyz-futaf-tatig-zaryn-pilaf-betyf-meduf-tixux
DEVNET-154-79: You can get a public key's fingerprint by running
DEVNET-154-79: % ssh-keygen -F publickey.pub
DEVNET-154-79: on the keyfile.
DEVNET-154-79: warning: tcgetattr failed in ssh_rl_set_tty_modes_for_fd: fd 1: Invalid argument
2) 错误2:Failed deleting my ephemeral node
原因可能是因为之前配置错误,比如使用HBase自带的ZooKeeper启动过,后改为使用外围的ZooKeeper再启动。
2014-04-22 16:26:17,452 WARN [regionserver60020] zookeeper.RecoverableZooKeeper: Node /hbase/rs/DEVNET-154-79,60020,1398155173411 already deleted, retry=false
2014-04-22 16:26:17,453 WARN [regionserver60020] regionserver.HRegionServer: Failed deleting my ephemeral node
org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode for /hbase/rs/DEVNET-154-79,60020,1398155173411
at org.apache.zookeeper.KeeperException.create(KeeperException.java:111)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
at org.apache.zookeeper.ZooKeeper.delete(ZooKeeper.java:873)
at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.delete(RecoverableZooKeeper.java:156)
at org.apache.hadoop.hbase.zookeeper.ZKUtil.deleteNode(ZKUtil.java:1273)
at org.apache.hadoop.hbase.zookeeper.ZKUtil.deleteNode(ZKUtil.java:1262)
at org.apache.hadoop.hbase.regionserver.HRegionServer.deleteMyEphemeralNode(HRegionServer.java:1273)
at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:1003)
at java.lang.Thread.run(Thread.java:744)
3) 错误3:Master rejected startup because clock is out of sync
来自RegionServer端的日志,HMaster拒绝RegionServer的连接。这个错误是因为HMaster上的时间和RegionServer上的时间相差超过30秒。两种解决办法:一是同步时间,二是修改hbase-site.xml中的hbase.master.maxclockskew(HMaster端的hdfs-site.xml文件):。
2014-04-22 16:34:36,701 FATAL [regionserver60020] regionserver.HRegionServer: Master rejected startup because clock is out of sync
org.apache.hadoop.hbase.ClockOutOfSyncException: org.apache.hadoop.hbase.ClockOutOfSyncException: Server DEVNET-154-79,60020,1398155672511 has been rejected; Reported time is too far out of sync with master. Time difference of 175968ms > max allowed of 30000ms
at org.apache.hadoop.hbase.master.ServerManager.checkClockSkew(ServerManager.java:316)
at org.apache.hadoop.hbase.master.ServerManager.regionServerStartup(ServerManager.java:216)
at org.apache.hadoop.hbase.master.HMaster.regionServerStartup(HMaster.java:1281)
at org.apache.hadoop.hbase.protobuf.generated.RegionServerStatusProtos$RegionServerStatusService$2.callBlockingMethod(RegionServerStatusProtos.java:5085)
at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2008)
at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:92)
at org.apache.hadoop.hbase.ipc.FifoRpcScheduler$1.run(FifoRpcScheduler.java:73)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:744)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:408)
at org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106)
at org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:95)
at org.apache.hadoop.hbase.protobuf.ProtobufUtil.getRemoteException(ProtobufUtil.java:284)
at org.apache.hadoop.hbase.regionserver.HRegionServer.reportForDuty(HRegionServer.java:1998)
at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:839)
at java.lang.Thread.run(Thread.java:744)
将hbase.master.maxclockskew改成可以容忍10分钟:
</property>
4) UnknownHostException: mycluster
下面这个错误是因为底层的HDFS变更了hdfs-site.xml中的配置项dfs.nameservices。hbase-site.xml中的配置项hbase.rootdir要跟着同步更新:
2015-12-01 15:33:23,200 ERROR [main] regionserver.HRegionServerCommandLine: Region server exiting
java.lang.RuntimeException: Failed construction of Regionserver: class org.apache.hadoop.hbase.regionserver.HRegionServer
at org.apache.hadoop.hbase.regionserver.HRegionServer.constructRegionServer(HRegionServer.java:2636)
at org.apache.hadoop.hbase.regionserver.HRegionServerCommandLine.start(HRegionServerCommandLine.java:64)
at org.apache.hadoop.hbase.regionserver.HRegionServerCommandLine.run(HRegionServerCommandLine.java:87)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.hbase.util.ServerCommandLine.doMain(ServerCommandLine.java:126)
at org.apache.hadoop.hbase.regionserver.HRegionServer.main(HRegionServer.java:2651)
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
at org.apache.hadoop.hbase.regionserver.HRegionServer.constructRegionServer(HRegionServer.java:2634)
... 5 more
Caused by: java.lang.IllegalArgumentException: java.net.UnknownHostException: mycluster
at org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:373)
at org.apache.hadoop.hdfs.NameNodeProxies.createNonHAProxy(NameNodeProxies.java:258)
at org.apache.hadoop.hdfs.NameNodeProxies.createProxy(NameNodeProxies.java:153)
at org.apache.hadoop.hdfs.DFSClient.
at org.apache.hadoop.hdfs.DFSClient.
at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:139)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2591)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:89)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2625)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2607)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:368)
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:296)
at org.apache.hadoop.hbase.util.FSUtils.getRootDir(FSUtils.java:1002)
at org.apache.hadoop.hbase.regionserver.HRegionServer.
bin/hbase-daemon.sh start thrift2 --framed --hsha --workers 100
--hsha表示使用HshaServer,--workers表示HshaServer的工作线程数。更多信息请参考:
https://hbase.apache.org/devapidocs/org/apache/hadoop/hbase/thrift2/package-summary.html
默认端口号为9090,相应的http端口为9095。
bin/hbase-daemon.sh start rest -p 8080
简单访问示例(假设在10.143.136.232上启动了HBase rest server):
1) 查看HBase版本:
2) 查看集群状态
3) 列出所有非系统表
4) 列出表test的所有regions
5) 取rowkey为100000797550117的整行数据(返回结果需要base64解密)
6) 取rowkey为100000797550117,列族cf1下列field0列的数据(返回结果需要base64解密)
更多请浏览:
Endpoint
HTTP Verb
说明
示例
/version/cluster
GET
查看HBase版本
curl -vi -X GET \
-H "Accept: text/xml" \
""
/status/cluster
GET
查看集群状态
curl -vi -X GET \
-H "Accept: text/xml" \
""
/
GET
列出所有的非系统表
curl -vi -X GET \
-H "Accept: text/xml" \
""
注:可浏览器中直接打开,如:。
Endpoint
HTTP Verb
说明
示例
/namespaces
GET
列出所有namespaces
curl -vi -X GET \
-H "Accept: text/xml" \
"namespaces/"
/namespaces/namespace
GET
查看指定namespace的说明
curl -vi -X GET \
-H "Accept: text/xml" \
"namespaces/special_ns"
/namespaces/namespace
POST
创建一个新的namespace
curl -vi -X POST \
-H "Accept: text/xml" \
"example.com:8000/namespaces/special_ns"
/namespaces/namespace/tables
GET
列出指定namespace下的所有表
curl -vi -X GET \
-H "Accept: text/xml" \
"namespaces/special_ns/tables"
/namespaces/namespace
PUT
修改一个已存在的namespace
curl -vi -X PUT \
-H "Accept: text/xml" \
"namespaces/special_ns
/namespaces/namespace
DELETE
删除一个namespace,前提是该namespace已为空
curl -vi -X DELETE \
-H "Accept: text/xml" \
"example.com:8000/namespaces/special_ns"
注:斜体部分是需要输入的。
Endpoint
HTTP Verb
说明
示例
/table/schema
GET
查看指定表的schema
curl -vi -X GET \
-H "Accept: text/xml" \
"users/schema"
/table/schema
POST
使用schema创建一个新的表或修改已存在表的schema
curl -vi -X POST \
-H "Accept: text/xml" \
-H "Content-Type: text/xml" \
-d '
"users/schema"
/table/schema
PUT
使用schema更新已存在的表
curl -vi -X PUT \
-H "Accept: text/xml" \
-H "Content-Type: text/xml" \
-d '
"users/schema"
/table/schema
DELETE
删除表
curl -vi -X DELETE \
-H "Accept: text/xml" \
"users/schema"
/table/regions
GET
列出表的所有regions
curl -vi -X GET \
-H "Accept: text/xml" \
"users/regions
Endpoint
HTTP Verb
说明
示例
/table/row/column:qualifier/timestamp
GET
取指定表指定列族下指定列的指定时间戳的值,返回的值为经过base64编码的,因此使用时需要做base64解码
curl -vi -X GET \
-H "Accept: text/xml" \
"users/row1"
curl -vi -X GET \
-H "Accept: text/xml" \
"users/row1/cf:a/1458586888395"
/table/row/column:qualifier
GET
取指定表的指定列族下指定列的值
curl -vi -X GET \
-H "Accept: text/xml" \
"users/row1/cf:a"
curl -vi -X GET \
-H "Accept: text/xml" \
"users/row1/cf:a/"
/table/row/column:qualifier/?v=number_of_versions
GET
取指定表的指定列族下指定列的指定版本值
curl -vi -X GET \
-H "Accept: text/xml" \
"users/row1/cf:a?v=2"
Endpoint
HTTP Verb
说明
示例
/table/scanner/
PUT
创建一个scanner
curl -vi -X PUT \
-H "Accept: text/xml" \
-H "Content-Type: text/xml" \
-d '
"users/scanner/"
/table/scanner/
PUT
带Filter创建一个scanner,过滤器可以写在一个文本文件中,格式如:
{
"type": "PrefixFilter",
"value": "u123"
}
curl -vi -X PUT \
-H "Accept: text/xml" \
-H "Content-Type:text/xml" \
-d @filter.txt \
"users/scanner/"
/table/scanner/scanner-id
GET
取下一批数据,如果已无数据,则返回的HTTP代码为204
curl -vi -X GET \
-H "Accept: text/xml" \
"users/scanner/145869072824375522207"
table/scanner/scanner-id
DELETE
删除指定的scanner,释放资源
curl -vi -X DELETE \
-H "Accept: text/xml" \
"users/scanner/145869072824375522207"
Endpoint
HTTP Verb
说明
示例
/table/row_key
PUT
往指定表写一行数据,注意行键、列族、列名和列值都必须采用base64编码
curl -vi -X PUT \
-H "Accept: text/xml" \
-H "Content-Type: text/xml" \
-d '
"users/fakerow"
curl -vi -X PUT \
-H "Accept: text/json" \
-H "Content-Type: text/json" \
-d '{"Row":[{"key":"cm93NQo=", "Cell": [{"column":"Y2Y6ZQo=", "$":"dmFsdWU1Cg=="}]}]}'' \
"example.com:8000/users/fakerow"
《HBase-1.2.1分布式安装指南》
《Hive 0.12.0安装指南》
《ZooKeeper-3.4.6分布式安装指南》
《Hadoop 2.3.0源码反向工程》
《在Linux上编译Hadoop-2.7.2》
《Accumulo-1.5.1安装指南》
《Drill 1.0.0安装指南》
《Shark 0.9.1安装指南》
更多,敬请关注技术博客:http://aquester.cublog.cn。
hbase在zookeeper上的目录结构:
[zk: localhost:2181(CONNECTED) 24] ls /hbase
[replication, meta-region-server, rs, splitWAL, backup-masters, table-lock, flush-table-proc, region-in-transition, online-snapshot, acl, master, running, recovering-regions, draining, namespace, hbaseid, table]
从0.96版本开始root-region-server被meta-region-server替代,原来的root被删除了,新的meta像原来的root一样,只有一个Region,不再会有多个Region。
从0.96版本开始引入了namespace,删除了-ROOT-表,之前的.META.表被hbase:meta表替代,其中hbase为namespace名。namespace可以认为类似于MySQL中的DB名,用于对表进行逻辑分组管理。
客户端对hbase提供DML操作不需要访问master,但DDL操作依赖master,在hbase shell中的list也依赖于master。
在主hbase master的web上,可以看到有三个系统表:hbase:acl,hbase:meta和hbase:namespace,注意hbase:acl和hbase:namespace的元数据也存储在hbase:meta中,这可以通过在hbase shell中执行scan 'hbase:meta'观察到。
hbase(main):015:0* scan 'hbase:meta',{LIMIT=>10}
hbase:acl,,1460426731436.0bbdf170c309223c0ce830 column=info:regioninfo, timestamp=1460426830411, value={ENCODED => 0bbdf170c309223c0ce830facdff9edd, NAME => 'hbase:acl,,1460426731436.0bbdf
facdff9edd. 170c309223c0ce830facdff9edd.', STARTKEY => '', ENDKEY => ''}
hbase:acl,,1460426731436.0bbdf170c309223c0ce830 column=info:seqnumDuringOpen, timestamp=1461653766642, value=\x00\x00\x00\x00\x00\x00\x002
facdff9edd.
hbase:acl,,1460426731436.0bbdf170c309223c0ce830 column=info:server, timestamp=1461653766642, value=hadoop-034:16020
facdff9edd.
hbase:acl,,1460426731436.0bbdf170c309223c0ce830 column=info:serverstartcode, timestamp=1461653766642, value=1461653610096
第一列,即红色串为Region name;serverstartcode为Regsion server加载region的时间;server为Region server的IP和端口;regioninfo结构为:
1) ENCODED 为Region name的MD5值
2) NAME 为Region name
3) STARTKEY 为空表示为第一个Region
4) ENDKEY 如果也为空,则表示该表只有一个Region
Phoenix的安装非常简单。官网有说明(http://phoenix.incubator.apache.org/download.html),二进制安装包可从http://www.apache.org/dyn/closer.cgi/incubator/phoenix/上下载,本文下载的是phoenix-4.7.0-incubating.tar.gz,注意和HBase的兼容关系:
Phoenix版本
HBase版本
Phoenix 2.x
HBase 0.94.x
Phoenix 3.x
HBase 0.94.x
Phoenix 4.x
HBase 0.98.1+
安装步骤为:
1) 将phoenix-4.7.0-incubating.tar.gz上传到Phoenix客户端机器,假设安装到/data/hadoop
2) 解压phoenix-4.7.0-incubating.tar.gz,解压后生成phoenix-4.7.0-incubating目录
3) 建立软链接:ln -s phoenix-4.7.0-incubating phoenix
4) 将/data/hadoop/phoenix/hadoop-2/phoenix-4.7.0-incubating-client.jar添加到CLASSPATH
5) 将phoenix/common目录下的phoenix-core-4.7.0-incubating.jar复制到所有HBase region server的的CLASSPATH中,比如HBase的lib目录
6) 重启HBase集群
运行phoenix也非常简单,命令格式为:
sqlline.py zookeeper file.sql
示例:
hadoop@VM-40-171-sles10-64:~/phoenix/bin> ./sqlline.py 10.12.154.78
Setting property: [isolation, TRANSACTION_READ_COMMITTED]
issuing: !connect jdbc:phoenix:10.12.154.78 none none org.apache.phoenix.jdbc.PhoenixDriver
Connecting to jdbc:phoenix:10.12.154.78
Connected to: Phoenix (version 4.0)
Driver: org.apache.phoenix.jdbc.PhoenixDriver (version 4.0)
Autocommit status: true
Transaction isolation: TRANSACTION_READ_COMMITTED
Building list of tables and columns for tab-completion (set fastconnect to true to skip)...
53/53 (100%) Done
Done
sqlline version 1.1.2
0: jdbc:phoenix:10.12.154.78> select * from test;
Error: ERROR 1012 (42M03): Table undefined. tableName=TEST (state=42M03,code=1012)
0: jdbc:phoenix:10.12.154.78> create table test ( a int, b string);
Error: ERROR 601 (42P00): Syntax error. Unsupported sql type: INT (state=42P00,code=601)
0: jdbc:phoenix:10.12.154.78> create table test (a integer, b integer);
Error: ERROR 509 (42888): The table does not have a primary key. tableName=TEST (state=42888,code=509)
0: jdbc:phoenix:10.12.154.78> create table test (a integer primary key, b integer) ;
No rows affected (1.424 seconds)
0: jdbc:phoenix:10.12.154.78> UPSERT INTO TEST VALUES (1, 1);
1 row affected (0.099 seconds)
0: jdbc:phoenix:10.12.154.78> UPSERT INTO TEST VALUES (2, 12);
1 row affected (0.02 seconds)
0: jdbc:phoenix:10.12.154.78> select * from test;
+------------+------------+
| A | B |
+------------+------------+
| 1 | 1 |
| 2 | 12 |
+------------+------------+
2 rows selected (0.116 seconds)
0: jdbc:phoenix:10.12.154.78>
有关语法请浏览:http://phoenix.incubator.apache.org/language/index.html,有关数据类型请浏览:http://phoenix.incubator.apache.org/language/datatypes.html。
Aquester2016-08-11 17:02:10 Aquester:import org.apache.hadoop.hbase.filter.SingleColumnValueFilter 以上hbase shell直接运行 Aquester2016-08-11 17:01:24 import org.apache.hadoop.hbase.filter.SingleColumnValueFilter |