Sqoop工具使用(一)-hexel-ChinaUnix博客

关注RDBMS&nbsp;&amp;&amp;&nbsp;NoSQL

首页　| 　博文目录　| 　关于我

hexel

博客访问： 442559
博文数量： 55
博客积分： 0
博客等级：民兵
技术积分： 1584
用户组：普通用户
注册时间： 2013-05-04 15:15

个人简介

热衷技术，热爱交流

文章分类

全部博文（55）

Hadoop（4）
Oracle数据库（19）

Schemas/objects（3）

启动数据库（1）

性能优化（2）

Backup/Recoverin（6）

sqlplus（1）

Oracle Arch（2）

数据库创建与初始（0）

Security（0）

Oracle Netw（0）

PL/SQL（2）
MongoDB（3）
Linux（15）

shell（2）
未分配的博文（14）

文章存档

2014年（7）

2013年（48）

我的朋友

相关博文

Sqoop工具使用(一)

分类： HADOOP

2014-04-09 19:48:22

1. 安装配置

(1)下载软件：

[mongodb_f002 ~]#wget

(2)解压文件到相应目录：

[mongodb_f002 ~]#tar -zxvf sqoop-1.4.3-cdh4.5.0.tar.gz -C /hadoop

(3).bash_profile文件中确认具备如下环境变量：

export SQOOP_HOME=/hadoop/sqoop-1.4.3-cdh4.5.0

export HADOOP_MAPRED_HOME=${HADOOP_HOME}

export HADOOP_COMMON_HOME=${HADOOP_HOME}

export HIVE_HOME=/hadoop/hive-0.10.0-cdh4.5.0

export HADOOP_CONF_DIR=${HADOOP_HOME}/etc/hadoop

export YARN_CONF_DIR=${HADOOP_HOME}/etc/hadoop

(4)安装各种关系数据库驱动程序

[bigdata1 hadoop]#cp ojdbc6.jar sqoop-1.4.3-cdh4.5.0/lib/

[bigdata1 hadoop]#cp hive-0.10.0-cdh4.5.0/lib/mysql-connector-java-5.1.29-bin.jar sqoop-1.4.3-cdh4.5.0/lib/

2. sqoop基本操作：

可参见

(1)查看帮助信息

[bigdata1 ~]$sqoop help

Warning: /usr/lib/hbase does not exist! HBase imports will fail.

Please set $HBASE_HOME to the root of your HBase installation.

Warning: /usr/lib/hcatalog does not exist! HCatalog jobs will fail.

Please set $HCAT_HOME to the root of your HCatalog installation.

14/04/09 13:58:40 INFO sqoop.Sqoop: Running Sqoop version: 1.4.3-cdh4.5.0

usage: sqoop COMMAND [ARGS]

Available commands:

codegen Generate code to interact with database records

create-hive-table Import a table definition into Hive

eval Evaluate a SQL statement and display the results

export Export an HDFS directory to a database table

help List available commands

import Import a table from a database to HDFS

import-all-tables Import tables from a database to HDFS

job Work with saved jobs

list-databases List available databases on a server

list-tables List available tables in a database

merge Merge results of incremental imports

metastore Run a standalone Sqoop metastore

version Display version information

See 'sqoop help COMMAND' for information on a specific command.

查看各个工具的帮助信息：

[bigdata1 ~]$sqoop help import

Generic Hadoop command-line arguments:

(must preceed any tool-specific arguments)

Generic options supported are

-conf specify an application configuration file

-D use value for given property

-fs specify a namenode

-jt specify a job tracker

-files specify comma separated files to be copied to the map reduce cluster

-libjars specify comma separated jar files to include in the classpath.

-archives specify comma separated archives to be unarchived on the compute machines.

The general command line syntax is

bin/hadoop command [genericOptions] [commandOptions]

(2). 初识sqoop:

sqoop import --connect jdbc:mysql://bigdata1:3307/test --username hive --password hive --table test \

--split-by id --target-dir /user/yarn/test1

为了不显示密码，应该把--password hive改成-P,这样在执行导出时输入密码：

[bigdata1 ~]$sqoop import --connect jdbc:mysql://bigdata1:3307/test --table test -m 1 --username=hive -P

使用配置文件：

sqoop --options-file ./import.txt

import.txt文件内容如下：

import

--connect

jdbc:mysql://bigdata1:3307/test

--username

hive

--table

test

--split-by

--target-dir

/user/yarn/test1

--fields-terminated-by

'\t'

执行完成后在hdfs上查看：

[bigdata1 ~]$hdfs dfs -cat test1/*

1 1 lili

2 2 nana

3. sqoop各种工具详解：

(1)import工具

Import control arguments:

Argument Description

--append Append data to an existing dataset in HDFS

--as-avrodatafile Imports data to Avro Data Files

--as-sequencefile Imports data to SequenceFiles

--as-textfile Imports data as plain text (default)

--boundary-query Boundary query to use for creating splits

--columns Columns to import from table

--delete-target-dir Delete the import target directory if it exists

--direct Use direct import fast path

--direct-split-size Split the input stream every n bytes when importing in direct mode

--fetch-size Number of entries to read from database at once.

--inline-lob-limit Set the maximum size for an inline LOB

-m,--num-mappers Use n map tasks to import in parallel

-e,--query Import the results of statement.

--split-by Column of the table used to split work units

--table Table to read

--target-dir

HDFS destination dir

--warehouse-dir

HDFS parent for table destination

--where WHERE clause to use during import

-z,--compress Enable compression

--compression-codec Use Hadoop codec (default gzip)

--null-string The string to be written for a null value for string columns

--null-non-string The string to be written for a null value for non-string columns

The --null-string and --null-non-string arguments are optional.\ If not specified, then the string "null" will be used.

下面以一个实例来说明一些常用选项的用法：

sqoop import --connect jdbc:mysql://bigdata1:3307/test --username hive -P --table test -m 1 --target-dir /user/yarn/test \

--columns "id,name" --where "id != 0" --fields-terminated-by '\t' --append --direct --direct-split-size 2000 \

-- --default-character-set=utf-8

--connect，后面是连接数据库的字符串

--username,数据库用户名

--table要导出的表

--target-dir,存放导出结果的hdfds路径

-m 1，数据导出时的map任务数，在执行导出时，sqoop会根据某个字段的最小值和最大值平分成若干任务并行导出。如果有主键，sqoop默认会根据主键并行导出，如果没有主键，可以使用--split-by指定按照某个字段导出，但是要注意保证这个字段值是唯一的。如果没有唯一键，可以使用-m 1只执行一个map任务，避免数据重复。

--columns,表示要导出的列，默认导出所有列

--where,相当于sql中的where字句,过滤导出,需要注意的是，目前where条件还不支持or，and等复合条件。

--fields-terminated-by '\t',指定导出后的文件字段分隔符

--query ,查询导出，它实际是集table,columns,where于一身

例如：上面的语句可以换成：

sqoop import --query 'select id,name from test where WHERE $CONDITIONS' -m 1 --target-dir /user/yarn/test

--boundary-query,由于默认sqoop会查出split-by字段的最小值和最大值，然后根据字段值分任务，这样可能不是最好的，可以指定--boundary-query

--direct,表示启用数据库特定的导出工具，而不是使用sql查询导出,mysql对应mysqldump，这个命令的路径要加入环境变量PATH才行。

--direct-split-size,使用--direct时候，可以根据指定的大小分割文件

--，由于使用了direct，导出时候就可以引用导出工具的一些选项了，例如mysqldump的--default-character-set=utf-8

--append，如果--target-dir指定的hdfs目录已经存在，导入会失败，指定了append，sqoop先把数据放到一个临时目录，完成后才把文件复制到--target-dir指定的目录。

阅读(4584) | 评论(0) | 转发(0) |

上一篇：eclipse运行第一个MapReduce程序

下一篇：Sqoop工具使用(二)--从oracle导入数据到hive

给主人留下些什么吧！~~

感谢所有关心和支持过ChinaUnix的朋友们

16024965号-6