1:先在idea中写好udf 例如HelloUDF 。
package com.hynoo.bigdata;
//package org.apache.hadoop.hive.ql.udf;
import org.apache.hadoop.hive.ql.exec.UDF;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
/**
* 功能:输入xxx,输出:Hello:xxx
* Created by nero on 2017/7/5.
* 开发udf函数步骤
* 1)extends UDF
* 2)重写evaluate方法,注意该方法是支持重载的
*/
public class HelloUDF extends UDF {
public Text evaluate(Text name) {
return new Text("hello11:" + name);
}
public Text evaluate(Text name, IntWritable age) {
return new Text("hello:" + name + ",age" + age);
}
public static void main(String[] args) {
HelloUDF udf =new HelloUDF();
System.out.println(udf.evaluate(new Text("zhangsan")));
System.out.println(udf.evaluate(new Text("zhangsan"),new IntWritable(18)));
}
}
测试完成!
2:下载hive的源码,我用的是1.1.0cdh的版本 下载地址 选择进行下载 下载完成后 我放在/Users/nero/sourceCode 解压:tar -zxvf hive-1.1.0-cdh5.7.0-src.tar.gz 然后进入 cd hive-1.1.0-cdh5.7.0
3:因为注册永久函数的类是FunctionRegistry 这个类是在ql模块下面 然后我们需要把自己写好的udf .java文件放到/Users/nero/sourceCode/hive-1.1.0-cdh5.7.0/ql/src/java/org/apache/hadoop/hive/ql/udf/ . 这个文件夹下面
然后进入到/Users/nero/sourceCode/hive-1.1.0-cdh5.7.0/ql/src/java/org/apache/hadoop/hive/ql/exec 会发现里面我们需要的FunctionRegistry.java
接下来编辑这个FunctionRegistry.java vi FunctionRegistry.java
然后 import org.apache.hadoop.hive.ql.udf.HelloUDF;
system.registerUDF("sayhello1", HelloUDF.class, false);
4:然后回到hive源码目录下 cd ~/sourceCode/hive-1.1.0-cdh5.7.0
对ql模块进行编译执行如下代码会等一段时间 。
mvn install -pl ql -am -DskipTests
不过会报错 如下:
[ERROR] COMPILATION ERROR :
[INFO] -------------------------------------------------------------
[ERROR] /Users/nero/sourceCode/hive-1.1.0-cdh5.7.0/ql/src/java/org/apache/hadoop/hive/ql/udf/HelloUDF.java:[15,8] 类重复: com.hynoo.bigdata.HelloUDF
[ERROR] /Users/nero/sourceCode/hive-1.1.0-cdh5.7.0/ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java:[44,37] 无法访问org.apache.hadoop.hive.ql.udf.HelloUDF
错误的源文件: /Users/nero/sourceCode/hive-1.1.0-cdh5.7.0/ql/src/java/org/apache/hadoop/hive/ql/udf/HelloUDF.java
文件不包含类org.apache.hadoop.hive.ql.udf.HelloUDF
请删除该文件或确保该文件位于正确的源路径子目录中。
[INFO] 2 errors
[INFO] -------------------------------------------------------------
[INFO] ------------------------------------------------------------------------
[INFO] Reactor Summary:
[INFO]
[INFO] Hive ............................................... SUCCESS [ 1.116 s]
[INFO] Hive Shims Common .................................. SUCCESS [ 1.432 s]
[INFO] Hive Shims 0.23 .................................... SUCCESS [ 1.305 s]
[INFO] Hive Shims Scheduler ............................... SUCCESS [ 1.270 s]
[INFO] Hive Shims ......................................... SUCCESS [ 0.675 s]
[INFO] Hive Common ........................................ SUCCESS [ 3.139 s]
[INFO] Hive Serde ......................................... SUCCESS [ 1.169 s]
[INFO] Hive Metastore ..................................... SUCCESS [ 3.512 s]
[INFO] Hive Ant Utilities ................................. SUCCESS [ 0.160 s]
[INFO] Spark Remote Client ................................ SUCCESS [ 2.640 s]
[INFO] Hive Query Language ................................ FAILURE [ 5.453 s]
[INFO] Hive Service ....................................... SKIPPED
[INFO] Hive Accumulo Handler .............................. SKIPPED
[INFO] Hive JDBC .......................................... SKIPPED
[INFO] Hive Beeline ....................................... SKIPPED
[INFO] Hive CLI ........................................... SKIPPED
[INFO] Hive Contrib ....................................... SKIPPED
[INFO] Hive HBase Handler ................................. SKIPPED
[INFO] Hive HCatalog ...................................... SKIPPED
[INFO] Hive HCatalog Core ................................. SKIPPED
[INFO] Hive HCatalog Pig Adapter .......................... SKIPPED
[INFO] Hive HCatalog Server Extensions .................... SKIPPED
[INFO] Hive HCatalog Webhcat Java Client .................. SKIPPED
[INFO] Hive HCatalog Webhcat .............................. SKIPPED
[INFO] Hive HCatalog Streaming ............................ SKIPPED
[INFO] Hive HWI ........................................... SKIPPED
[INFO] Hive ODBC .......................................... SKIPPED
[INFO] Hive Shims Aggregator .............................. SKIPPED
[INFO] Hive TestUtils ..................................... SKIPPED
[INFO] Hive Packaging ..................................... SKIPPED
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 22.564 s
[INFO] Finished at: 2017-07-07T15:34:49+08:00
[INFO] Final Memory: 106M/1492M
[INFO] ------------------------------------------------------------------------
[WARNING] The requested profile "hadoop-2.dist" could not be activated because it does not exist.
[ERROR] Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:3.1:compile (default-compile) on project hive-exec: Compilation failure: Compilation failure:
[ERROR] /Users/nero/sourceCode/hive-1.1.0-cdh5.7.0/ql/src/java/org/apache/hadoop/hive/ql/udf/HelloUDF.java:[15,8] 类重复: com.hynoo.bigdata.HelloUDF
[ERROR] /Users/nero/sourceCode/hive-1.1.0-cdh5.7.0/ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java:[44,37] 无法访问org.apache.hadoop.hive.ql.udf.HelloUDF
[ERROR] 错误的源文件: /Users/nero/sourceCode/hive-1.1.0-cdh5.7.0/ql/src/java/org/apache/hadoop/hive/ql/udf/HelloUDF.java
[ERROR] 文件不包含类org.apache.hadoop.hive.ql.udf.HelloUDF
[ERROR] 请删除该文件或确保该文件位于正确的源路径子目录中。
[ERROR] -> [Help 1]
[ERROR]
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR]
[ERROR] For more information about the errors and possible solutions, please read the following articles:
[ERROR] [Help 1]
[ERROR]
[ERROR] After correcting the problems, you can resume the build with the command
[ERROR] mvn -rf :hive-exec
这个原因是因为我们写的udf 的包名不是FunctionRegistry.java 里面需要 。 例如包名是package com.hynoo.bigdata; 我们只需要在/Users/nero/sourceCode/hive-1.1.0-cdh5.7.0/ql/src/java/org/apache/hadoop/hive/ql/udf/ . 在这个下面的我们自己的udf 。 vi HelloUDF.java
然后把包名 。 package com.hynoo.bigdata; 改成package org.apache.hadoop.hive.ql.udf; 即可
然后重新编译 。 这个时候会成功我们会看到
[INFO]
[INFO] --- maven-install-plugin:2.4:install (default-install) @ hive-exec ---
[INFO] Installing /Users/nero/sourceCode/hive-1.1.0-cdh5.7.0/ql/target/hive-exec-1.1.0-cdh5.7.0.jar to /Users/nero/apache-maven-3.3.9/repository/org/apache/hive/hive-exec/1.1.0-cdh5.7.0/hive-exec-1.1.0-cdh5.7.0.jar
[INFO] Installing /Users/nero/sourceCode/hive-1.1.0-cdh5.7.0/ql/dependency-reduced-pom.xml to /Users/nero/apache-maven-3.3.9/repository/org/apache/hive/hive-exec/1.1.0-cdh5.7.0/hive-exec-1.1.0-cdh5.7.0.pom
[INFO] Installing /Users/nero/sourceCode/hive-1.1.0-cdh5.7.0/ql/target/hive-exec-1.1.0-cdh5.7.0-tests.jar to /Users/nero/apache-maven-3.3.9/repository/org/apache/hive/hive-exec/1.1.0-cdh5.7.0/hive-exec-1.1.0-cdh5.7.0-tests.jar
[INFO] Installing /Users/nero/sourceCode/hive-1.1.0-cdh5.7.0/ql/target/hive-exec-1.1.0-cdh5.7.0-core.jar to /Users/nero/apache-maven-3.3.9/repository/org/apache/hive/hive-exec/1.1.0-cdh5.7.0/hive-exec-1.1.0-cdh5.7.0-core.jar
[INFO] ------------------------------------------------------------------------
[INFO] Reactor Summary:
[INFO]
[INFO] Hive ............................................... SUCCESS [ 0.866 s]
[INFO] Hive Shims Common .................................. SUCCESS [ 1.192 s]
[INFO] Hive Shims 0.23 .................................... SUCCESS [ 1.175 s]
[INFO] Hive Shims Scheduler ............................... SUCCESS [ 0.907 s]
[INFO] Hive Shims ......................................... SUCCESS [ 0.558 s]
[INFO] Hive Common ........................................ SUCCESS [ 2.322 s]
[INFO] Hive Serde ......................................... SUCCESS [ 1.042 s]
[INFO] Hive Metastore ..................................... SUCCESS [ 2.713 s]
[INFO] Hive Ant Utilities ................................. SUCCESS [ 0.156 s]
[INFO] Spark Remote Client ................................ SUCCESS [ 2.081 s]
[INFO] Hive Query Language ................................ SUCCESS [ 15.828 s]
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 29.468 s
[INFO] Finished at: 2017-07-07T17:09:01+08:00
[INFO] Final Memory: 122M/1488M
[INFO] -------------------------------------------------
从上面的日志可以看到[INFO] Installing /Users/nero/sourceCode/hive-1.1.0-cdh5.7.0/ql/target/hive-exec-1.1.0-cdh5.7.0.jar to /Users/nero/apache-maven-3.3.9/repository/org/apache/hive/hive-exec/1.1.0-cdh5.7.0/hive-exec-1.1.0-cdh5.7.0.jar
hive-exec-1.1.0-cdh5.7.0.jar这个jar就是我们需要的jar(这个路径根据你自己的路径下面去找),现在把这个jar拷贝到我们现在用的hive安装目录下面的lib下面 。 lib下面本来也有一个同样的jar,在本来的jar重命名后,把我们编译后的jar放进去,重启hive, 然后show functions;会看到我们自己的udf