脚踏实地、勇往直前!
全部博文(1005)
分类: HADOOP
2014-12-03 09:23:50
linux下安装sqoop
环境:
OS:Rad Hat Linux As5
sqoop-1.4.5
1.安装步骤
下载安装介质,下载地址为: sqoop/
根据情况选择下载的版本,我这里下载的版本是sqoop-1.4.5.bin__hadoop-1.0.0.tar.gz
使用hadoop登陆
拷贝安装文件到usr1目录
[hadoop1@node1 ~]$ cp sqoop-1.4.5.bin__hadoop-1.0.0.tar.gz /usr1/
解压
[hadoop1@node1 usr1]$ tar -zxvf sqoop-1.4.5.bin__hadoop-1.0.0.tar.gz
目录改名
[hadoop1@node1 usr1]$ mv sqoop-1.4.5.bin__hadoop-1.0.0 sqoop
将hive目录权限赋予hadoop用户
[root@node1 usr1]# chown -R hadoop1:hadoop1 ./hive
SQOOP_HOME= /usr1/sqoop
修改后的红色标识
[hadoop1@node1 ~]$ more .bash_profile
# .bash_profile
# Get the aliases and functions
if [ -f ~/.bashrc ]; then
. ~/.bashrc
fi
# User specific environment and startup programs
export JAVA_HOME=/usr/java/jdk1.8.0_05
export JRE_HOME=/usr/java/jdk1.8.0_05/jre
export CLASSPATH=$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar
export CLASSPATH=.:$CLASSPATH:$JAVA_HOME/lib:$JRE_HOME/lib
export HADOOP_HOME=/usr1/hadoop
HIVE_HOME=/usr1/hive
export PATH=$HADOOP_HOME/bin:$HIVE_HOME/bin:$JAVA_HOME/bin:$JRE_HOME/bin:$PATH
PATH=$PATH:$HOME/bin
export PATH
进入conf目录
[hadoop1@node1 conf]$ cd $SQOOP_HOME/conf
[hadoop1@node1 conf]$ mv sqoop-env-template.sh sqoop-env.sh
编辑sqoop-env.sh文件添加如下内容
#Set path to where hadoop-*-core.jar is available
export HADOOP_HOME=/usr1/hadoop
#set the path to where bin/hbase is available
export HBASE_HOME=/usr1/hbase
#Set the path to where bin/hive is available
export HIVE_HOME=/usr1/hive
#Set the path for where zookeper config dir is
export ZOOKEEPER_HOME=/usr1/zookeeper
首先将mysql驱动拷贝到$SQOOP_HOME/lib目录下
[hadoop1@node1 ext]$ cp mysql-connector-java-5.1.32-bin.jar /usr1/sqoop/lib
查看mysql中的数据库
[hadoop1@node1 ~]$ sqoop list-databases --connect jdbc:mysql://192.168.56.103:3306/ --username root --password mysql
Warning: /usr1/sqoop/../hcatalog does not exist! HCatalog jobs will fail.
Please set $HCAT_HOME to the root of your HCatalog installation.
Warning: /usr1/sqoop/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
Warning: $HADOOP_HOME is deprecated.
14/11/28 13:42:56 INFO sqoop.Sqoop: Running Sqoop version: 1.4.5
14/11/28 13:42:56 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
14/11/28 13:42:56 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
information_schema
BACKUP
hxl
hxlbak
mysql
ndb_2_fs
ndbinfo
performance
创建mysql表
CREATE TABLE tb_mysql_to_hdfs
(
id int,
NAME varchar(20)
);
写入数据
INSERT INTO tb_mysql_to_hdfs
VALUES
(1,'name1'),
(2,'name1'),
(3,'name1'),
(4,'name1'),
(5,'name1'),
(6,'name1'),
(7,'name1'),
(8,'name1'),
(9,'name1'),
(10,'name1')
mysql> select * from tb_mysql_to_hdfs;
+------+-------+
| id | NAME |
+------+-------+
| 1 | name1 |
| 2 | name1 |
| 3 | name1 |
| 4 | name1 |
| 5 | name1 |
| 6 | name1 |
| 7 | name1 |
| 8 | name1 |
| 9 | name1 |
| 10 | name1 |
+------+-------+
10 rows in set (0.00 sec)
通过在sqoop在hive创建跟mysql需要同步的表(只是创建了表结构)
sqoop create-hive-table --connect jdbc:mysql://192.168.56.103:3306/hxl --table tb_mysql_to_hdfs --username root --password mysql --hive-table tb_hive_from_mysql --fields-terminated-by "|" --lines-terminated-by "\n";
可以通过hive查看,已经创建了表
hive> show tables;
OK
tb_emp_info
tb_hive_from_mysql
x
Time taken: 0.819 seconds, Fetched: 3 row(s)
将数据导入到hdfs中
[hadoop1@node1 ~]$ sqoop import --connect jdbc:mysql://192.168.56.103:3306/hxl --username root --password mysql --table tb_mysql_to_hdfs --hive-import --hive-database hxl --hive-table tb_hive_from_mysql -m 1 --fields-terminated-by "|";
-m 标示 启动并行的个数
查看导入到hdfs中的数据
hive> select * from tb_hive_from_mysql;
OK
1 name1
2 name1
3 name1
4 name1
5 name1
6 name1
7 name1
8 name1
9 name1
10 name1
Time taken: 0.07 seconds, Fetched: 10 row(s)
实现创建好表
CREATE TABLE tb_from_hadoop
(
id int,
NAME varchar(20)
);
hdfs文件内容
[hadoop1@node1 ~]$ hadoop fs -cat /user/hive/warehouse/hxl.db/tb_hive_from_mysql/part-m-00000
Warning: $HADOOP_HOME is deprecated.
1|name1
2|name1
3|name1
4|name1
5|name1
6|name1
7|name1
8|name1
9|name1
10|name1
导入到mysql的表tb_from_hadoop
$sqoop export --connect jdbc:mysql://192.168.56.103:3306/hxl --username root --password mysql --table tb_from_hadoop --export-dir /user/hive/warehouse/hxl.db/tb_hive_from_mysql/part-m-00000 --input-fields-terminated-by '|'
查看mysql中的数据
mysql> SELECT * FROM tb_from_hadoop;
+------+-------+
| id | NAME |
+------+-------+
| 9 | name1 |
| 10 | name1 |
| 1 | name1 |
| 2 | name1 |
| 3 | name1 |
| 7 | name1 |
| 8 | name1 |
| 4 | name1 |
| 5 | name1 |
| 6 | name1 |
+------+-------+
10 rows in set (0.00 sec)
14/11/28 13:58:19 INFO manager.MySQLManager: Setting zero DATETIME behavior to convertToNull (mysql)
14/11/28 13:58:19 ERROR tool.ImportTool: Error during import: No primary key could be found for table tb_mysql_to_hdfs. Please specify one with --split-by or perform a sequential import with '-m 1'.
报如上的错误是因为 mysql中的表没有主键导致的。