Flume+HBase采集和存储日志数据、多source多channel和多sink的复杂案例-LaoLiulaoliu-ChinaUnix博客

miraclemiracle.blog.chinaunix.net

首页　| 　博文目录　| 　关于我

laoliulaoliu

博客访问： 4669222
博文数量： 1214
博客积分： 13195
博客等级：上将
技术积分： 9105
用户组：普通用户
注册时间： 2007-01-19 14:41

个人简介

C++,python,热爱算法和机器学习

文章分类

全部博文（1214）

cloud（3）
operation（9）
tornado（4）
mac_os（1）
golang（4）
架构（13）
git（4）
security（29）
shell（1）
macbook（1）
ruby（13）
javascript（15）
design（3）
testing（1）
mac（1）
bigdata（69）
nosql（46）
R（9）
gcj/acm（6）
NLP（10）
小说（3）
matlab（4）
web（44）
java（66）
product（7）
c#（1）
language（4）
machine learning（76）
science（4）
opencourse（2）
windows（3）
search（33）
algorithm（65）
database（51）
compiler（11）
ACE（5）
poem（1）
programming（29）
python（140）
assembly（1）
linux（49）
C++（16）
book（2）
cate（1）
phliosophy（3）
mental（30）
Science fiction（1）
Software（5）
c（23）
network（65）
CS（15）
thinking（10）
BSD（13）
solaris10（2）
life（57）
Debian（16）
economy（7）
Mathematics（57）
OS（8）
ibm（2）
gentoo（32）
未分配的博文（8）

文章存档

2021年（13）

2020年（49）

2019年（14）

2018年（27）

2017年（69）

2016年（100）

2015年（106）

2014年（240）

2013年（5）

2012年（193）

2011年（155）

2010年（93）

2009年（62）

2008年（51）

2007年（37）

我的朋友

相关博文

Flume+HBase采集和存储日志数据、多source多channel和多sink的复杂案例

分类：架构设计与优化

2020-06-15 16:01:08

利用Flume 汇入数据到HBase：Flume-hbase-sink 使用方法详解
https://blog.csdn.net/mnasd/article/details/81878944

一、HBasesinks的三种序列化模式使用说明
1.1 HBasesink--SimpleHbaseEventSerializer
如下是展示如何使用 HBasesink--SimpleHbaseEventSerializer：

agenttest.channels = memoryChannel-1
agenttest.sinks = hbaseSink-1
agenttest.sinks.hbaseSink-1.type = org.apache.flume.sink.hbase.HBaseSink
agenttest.sinks.hbaseSink-1.table = test_hbase_table //HBase表名
agenttest.sinks.hbaseSink-1.columnFamily = familycolumn-1 //HBase表的列族名称
agenttest.sinks.hbaseSink-1.serializer= org.apache.flume.sink.hbase.SimpleHbaseEventSerializer
agenttest.sinks.hbaseSink-1.serializer.payloadColumn = columnname //HBase表的列族下的某个列名称
agenttest.sinks.hbaseSink-1.channels = memoryChannel-1
注：当指定存入到HBase表的某个列族的指定列column时，不能写成：

agenttest.sinks.hbaseSink-1.columnName = columnname
或者：
agenttest.sinks.hbaseSink-1.column = columnname
这些都是网上的错误写法！另外两个序列化模式也是不能这样使用。

1.2 HBasesink--RegexHbaseEventSerializer
如下是展示如何使用 HBasesink--RegexHbaseEventSerializer（使用正则匹配切割event，然后存入HBase表的多个列）：

agenttest.channels = memoryChannel-2
agenttest.sinks = hbaseSink-2
agenttest.sinks.hbaseSink-2.type = org.apache.flume.sink.hbase.HBaseSink
agenttest.sinks.hbaseSink-2.table = test_hbase_table
agenttest.sinks.hbaseSink-2.columnFamily = familycolumn-2
agenttest.sinks.hbaseSink-2.serializer= org.apache.flume.sink.hbase.RegexHbaseEventSerializer
// 比如我要对nginx日志做分割，然后按列存储HBase，正则匹配分成的列为: ([xxx] [yyy] [zzz] [nnn] ...) 这种格式, 所以用下面的正则：
agent.sinks.hbaseSink-2.serializer.regex = \\[(.*?)\\]\\ \\[(.*?)\\]\\ \\[(.*?)\\]\\ \\[(.*?)\\]
// 指定上面正则匹配到的数据对应的hbase的familycolumn-2 列族下的4个cloumn列名
agent.sinks.hbaseSink-2.serializer.colNames = column-1,column-2,column-3,column-4
#agent.sinks.hbaseSink-2.serializer.payloadColumn = test
agenttest.sinks.hbaseSink-2.channels = memoryChannel-2
1.3 AsyncHBaseSink--SimpleAsyncHbaseEventSerializer
如下是展示如何使用 AsyncHBaseSink--SimpleAsyncHbaseEventSerializer：

agenttest.channels = memoryChannel-3
agenttest.sinks = hbaseSink-3
agenttest.sinks.hbaseSink-3.type = org.apache.flume.sink.hbase.AsyncHBaseSink
agenttest.sinks.hbaseSink-3.table = test_hbase_table
agenttest.sinks.hbaseSink-3.columnFamily = familycolumn-3
agenttest.sinks.hbaseSink-3.serializer = org.apache.flume.sink.hbase.SimpleAsyncHbaseEventSerializer
agenttest.sinks.hbaseSink-3.serializer.payloadColumn = columnname //HBase表的列族下的某个列名称
agenttest.sinks.hbaseSink-3.channels = memoryChannel-3
二、具体案例示例---利用flume+HBase构建大数据采集汇总系统
2.1 利用SimpleHbaseEventSerializer序列化模式
我们首先在HBase里面建立一个表mikeal-hbase-table，拥有familyclom1和familyclom2两个列族：

hbase(main):102:0> create 'mikeal-hbase-table','familyclom1','familyclom2'
0 row(s) in 1.2490 seconds
=> Hbase::Table - mikeal-hbase-table
然后写一个flume的配置文件test-flume-into-hbase.conf：

# 从文件读取实时消息，不做处理直接存储到Hbase
agent.sources = logfile-source
agent.channels = file-channel
agent.sinks = hbase-sink

# logfile-source配置
agent.sources.logfile-source.type = exec
agent.sources.logfile-source.command = tail -f /data/flume-hbase-test/mkhbasetable/data/nginx.log
agent.sources.logfile-source.checkperiodic = 50
# 组合source和channel
agent.sources.logfile-source.channels = file-channel

# channel配置，使用本地file
agent.channels.file-channel.type = file
agent.channels.file-channel.checkpointDir = /data/flume-hbase-test/checkpoint
agent.channels.file-channel.dataDirs = /data/flume-hbase-test/data

# sink 配置为HBaseSink 和 SimpleHbaseEventSerializer
agent.sinks.hbase-sink.type = org.apache.flume.sink.hbase.HBaseSink
#HBase表名
agent.sinks.hbase-sink.table = mikeal-hbase-table
#HBase表的列族名称
agent.sinks.hbase-sink.columnFamily = familyclom1
agent.sinks.hbase-sink.serializer = org.apache.flume.sink.hbase.SimpleHbaseEventSerializer
#HBase表的列族下的某个列名称
agent.sinks.hbase-sink.serializer.payloadColumn = cloumn-1
# 组合sink和channel
agent.sinks.hbase-sink.channel = file-channel
从配置文件可以看出，我们选择本地的/data/flume-hbase-test/mkhbasetable/data/nginx.log日志目录作为实时数据采集源，选择本地文件目录/data/flume-hbase-test/data作为channel，选择HBase为sink（也就是数据流向写入HBase）。

注意：提交 flume-ng 任务的用户，比如flume用户，必须要有/data/flume-hbase-test/mkhbasetable/data/nginx.log 和/data/flume-hbase-test/data 目录与文件的读写权限；也必须要有HBase的读写权限。

启动Flume：

bin/flume-ng agent --name agent --conf /etc/flume/conf/agent/ --conf-file /etc/flume/conf/agent/test-flume-into-hbase.conf -Dflume.root.logger=DEBUG,console
在另外一个shell客户端，输入：

echo "nging-1" >> /data/flume-hbase-test/mkhbasetable/data/nginx.log;
echo "nging-2" >> /data/flume-hbase-test/mkhbasetable/data/nginx.log;
再查看mikeal-hbase-table表：

数据已经作为value插入到表里面。

2.2 利用SimpleAsyncHbaseEventSerializer序列化模式
为了示例清晰，先把mikeal-hbase-table表数据清空：

truncate 'mikeal-hbase-table' //truncate 和 delete 只删除数据不删除表的结构,
//drop 语句将删除表的结构被依赖的约束(constrain)、触发器(trigger)、索引(index)
然后写一个flume的配置文件test-flume-into-hbase-2.conf：

# 从文件读取实时消息，不做处理直接存储到Hbase
agent.sources = logfile-source
agent.channels = file-channel
agent.sinks = hbase-sink# logfile-source配置
agent.sources.logfile-source.type = exec
agent.sources.logfile-source.command = tail -f /data/flume-hbase-test/mkhbasetable/data/nginx.log
agent.sources.logfile-source.checkperiodic = 50

# channel配置，使用本地file
agent.channels.file-channel.type = file
agent.channels.file-channel.checkpointDir = /data/flume-hbase-test/checkpoint
agent.channels.file-channel.dataDirs = /data/flume-hbase-test/data

# sink 配置为 Hbase
agent.sinks.hbase-sink.type = org.apache.flume.sink.hbase.AsyncHBaseSink
agent.sinks.hbase-sink.table = mikeal-hbase-table
agent.sinks.hbase-sink.columnFamily = familyclom1
agent.sinks.hbase-sink.serializer = org.apache.flume.sink.hbase.SimpleAsyncHbaseEventSerializer
agent.sinks.hbase-sink.serializer.payloadColumn = cloumn-1

# 组合source、sink和channel
agent.sources.logfile-source.channels = file-channel
agent.sinks.hbase-sink.channel = file-channel
启动Flume：

bin/flume-ng agent --name agent --conf /etc/flume/conf/agent/ --conf-file /etc/flume/conf/agent/test-flume-into-hbase-2.conf -Dflume.root.logger=DEBUG,console
在另外一个shell客户端，输入：

echo "nging-1" >> /data/flume-hbase-test/mkhbasetable/data/nginx.log;
echo "nging-two" >> /data/flume-hbase-test/mkhbasetable/data/nginx.log;
echo "nging-three" >> /data/flume-hbase-test/mkhbasetable/data/nginx.log;
再查看mikeal-hbase-table表：

2.3 利用RegexHbaseEventSerializer序列化模式
RegexHbaseEventSerializer可以使用正则匹配切割event，然后存入HBase表的多个列。因此，本文简单展示如何使用RegexHbaseEventSerializer对event进行切割然后存存入HBase的多个列。

为了示例清晰，先把mikeal-hbase-table表数据清空：

truncate 'mikeal-hbase-table'
然后写一个flume的配置文件test-flume-into-hbase-3.conf：

# 从文件读取实时消息，不做处理直接存储到Hbase
agent.sources = logfile-source
agent.channels = file-channel
agent.sinks = hbase-sink

# logfile-source配置
agent.sources.logfile-source.type = exec
agent.sources.logfile-source.command = tail -f /data/flume-hbase-test/mkhbasetable/data/nginx.log
agent.sources.logfile-source.checkperiodic = 50

# channel配置，使用本地file
agent.channels.file-channel.type = file
agent.channels.file-channel.checkpointDir = /data/flume-hbase-test/checkpoint
agent.channels.file-channel.dataDirs = /data/flume-hbase-test/data

# sink 配置为 Hbase
agent.sinks.hbase-sink.type = org.apache.flume.sink.hbase.HBaseSink
agent.sinks.hbase-sink.table = mikeal-hbase-table
agent.sinks.hbase-sink.columnFamily = familyclom1
agent.sinks.hbase-sink.serializer = org.apache.flume.sink.hbase.RegexHbaseEventSerializer
# 比如我要对nginx日志做分割，然后按列存储HBase，正则匹配分成的列为: ([xxx] [yyy] [zzz] [nnn] ...) 这种格式, 所以用下面的正则:
agent.sinks.hbase-sink.serializer.regex = \\[(.*?)\\]\\ \\[(.*?)\\]\\ \\[(.*?)\\]
agent.sinks.hbase-sink.serializer.colNames = time,url,number

# 组合source、sink和channel
agent.sources.logfile-source.channels = file-channel
agent.sinks.hbase-sink.channel = file-channel
启动Flume：

bin/flume-ng agent --name agent --conf /etc/flume/conf/agent/ --conf-file /etc/flume/conf/agent/test-flume-into-hbase-3.conf -Dflume.root.logger=DEBUG,console
在另外一个shell客户端，输入：

echo "[2016-12-22-19:59:59] [] [10]" >> /data/flume-hbase-test/mkhbasetable/data/nginx.log;
echo "[2016-12-22 20:00:12] [] [19]" >> /data/flume-hbase-test/mkhbasetable/data/nginx.log;
再查看mikeal-hbase-table表：

可以看到数据已经按照规则：正则匹配分成的列为: ([xxx] [yyy] [zzz] [nnn] ...) ，进行切割，并且顺利地存入到mikeal-hbase-table表的time,url,number的三个column列。

三、多source，多channel和多sink的复杂案例
本文接下来展示一个比较复杂的flume导入数据到HBase的实际案例：多souce、多channel和多sink的场景。为了示例清晰，先把mikeal-hbase-table表数据清空：

truncate 'mikeal-hbase-table'
然后写一个flume的配置文件test-flume-into-hbase-multi-position.conf：

# 从文件读取实时消息，不做处理直接存储到Hbase
agent.sources = logfile-source-1 logfile-source-2
agent.channels = file-channel-1 file-channel-2
agent.sinks = hbase-sink-1 hbase-sink-2

# logfile-source配置
agent.sources.logfile-source-1.type = exec
agent.sources.logfile-source-1.command = tail -f /data/flume-hbase-test/mkhbasetable/data/nginx.log
agent.sources.logfile-source-1.checkperiodic = 50

agent.sources.logfile-source-2.type = exec
agent.sources.logfile-source-2.command = tail -f /data/flume-hbase-test/mkhbasetable/data/tomcat.log
agent.sources.logfile-source-2.checkperiodic = 50

# channel配置，使用本地file
agent.channels.file-channel-1.type = file
agent.channels.file-channel-1.checkpointDir = /data/flume-hbase-test/checkpoint
agent.channels.file-channel-1.dataDirs = /data/flume-hbase-test/data

agent.channels.file-channel-2.type = file
agent.channels.file-channel-2.checkpointDir = /data/flume-hbase-test/checkpoint2
agent.channels.file-channel-2.dataDirs = /data/flume-hbase-test/data2

# sink 配置为 Hbase
agent.sinks.hbase-sink-1.type = org.apache.flume.sink.hbase.HBaseSink
agent.sinks.hbase-sink-1.table = mikeal-hbase-table
agent.sinks.hbase-sink-1.columnFamily = familyclom1
agent.sinks.hbase-sink-1.serializer = org.apache.flume.sink.hbase.RegexHbaseEventSerializer
# 比如我要对nginx日志做分割，然后按列存储HBase，正则匹配分成的列为: ([xxx] [yyy] [zzz] [nnn] ...) 这种格式, 所以用下面的正则:
agent.sinks.hbase-sink-1.serializer.regex = \\[(.*?)\\]\\ \\[(.*?)\\]\\ \\[(.*?)\\]
agent.sinks.hbase-sink-1.serializer.colNames = time,url,number

agent.sinks.hbase-sink-2.type = org.apache.flume.sink.hbase.HBaseSink
agent.sinks.hbase-sink-2.table = mikeal-hbase-table
agent.sinks.hbase-sink-2.columnFamily = familyclom2
agent.sinks.hbase-sink-2.serializer = org.apache.flume.sink.hbase.RegexHbaseEventSerializer
agent.sinks.hbase-sink-2.serializer.regex = \\[(.*?)\\]\\ \\[(.*?)\\]\\ \\[(.*?)\\]
agent.sinks.hbase-sink-2.serializer.colNames = time,IP,number

# 组合source、sink和channel
agent.sources.logfile-source-1.channels = file-channel-1
agent.sinks.hbase-sink-1.channel = file-channel-1

agent.sources.logfile-source-2.channels = file-channel-2
agent.sinks.hbase-sink-2.channel = file-channel-2
启动Flume：

bin/flume-ng agent --name agent --conf /etc/flume/conf/agent/ --conf-file /etc/flume/conf/agent/test-flume-into-hbase-multi-position.conf -Dflume.root.logger=DEBUG,console
在另外一个shell客户端，输入：

echo "[2016-12-22 20:04:12] [] [16]" >> nginx.log;
echo "[2016-12-22 20:04:13] [123.41.90.135] [22]" >> tomcat.log;
echo "[2016-12-22 20:05:19] [] [24]" >> nginx.log;
echo "[2016-12-22 20:05:21] [134.92.146.109] [25]" >> tomcat.log;
再查看mikeal-hbase-table表：可以看到数据已经按照规则：正则匹配分成的列为: ([xxx] [yyy] [zzz] [nnn] ...) ，进行切割，并且顺利地存入到mikeal-hbase-table表，并且按照familyclom1 和 familyclom2 两个列族分配存到三个cloumn列里面。

在本方案中，我们要将数据存储到HBase中，所以使用flume中提供的hbase sink，同时，为了清洗转换日志数据，我们实现自己的AsyncHbaseEventSerializer。https://www.cnblogs.com/gaopeng527/p/5010985.html
public class AsyncHbaseLTEEventSerializer implements AsyncHbaseEventSerializer {
//表名
private byte[] table;
//列族
private byte[] colFam;
//当前事件
private Event currentEvent;
//列名
private byte[][] columnNames;
//用于向HBase批量存储数据
private final List puts = new ArrayList();
private final List incs = new ArrayList();
//当前行键
private byte[] currentRowKey;
private final byte[] eventCountCol = "eventCount".getBytes();

@Override
public void configure(Context context) {
//从配置文件中获取列名
String cols = new String(context.getString("columns"));
String[] names = cols.split(",");
columnNames = new byte[names.length][];
int i = 0;
for(String name:names){
columnNames[i++] = name.getBytes();
}
}

@Override
public void configure(ComponentConfiguration conf) {
// TODO Auto-generated method stub
}

@Override
public void cleanUp() {
// TODO Auto-generated method stub
table = null;
colFam = null;
currentEvent = null;
columnNames = null;
currentRowKey = null;

}

@Override
public List getActions() {
// 分割事件体获取各列的值
String eventStr = new String(currentEvent.getBody());
String[] cols = logTokenize(eventStr);
puts.clear();
//数据中的时间
String time=cols[1];
int n1 = 13-time.length();
StringBuilder sb = new StringBuilder(time);
for(int i=0;i sb.insert(0, '0');
}
try {
//使用自带的行键生成器生成行键
currentRowKey = SimpleRowKeyGenerator.getUUIDKey(cols[0]+"-"+sb.toString());
} catch (UnsupportedEncodingException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
// currentRowKey = (cols[0]+"-"+System.currentTimeMillis()).getBytes();
int n = cols.length;
// 添加每列数据
for(int i=0;i PutRequest putReq = new PutRequest(table, currentRowKey,colFam,columnNames[i],cols[i].getBytes());
puts.add(putReq);
}
return puts;
}

@Override
public List getIncrements() {
// 增加接收到的事件数量
incs.clear();
incs.add(new AtomicIncrementRequest(table, "totalEvents".getBytes(), colFam, eventCountCol));
return incs;
}

@Override
//初始化表名和列名
public void initialize(byte[] table, byte[] cf) {

this.table = table;
this.colFam = cf;
}

@Override
public void setEvent(Event event) {
// TODO Auto-generated method stub
this.currentEvent = event;
}

//从日志中获取列值信息
public String[] logTokenize(String eventStr) {

// String logEntryPattern = "^([\\d.]+) (\\S+) (\\S+) \\[([\\w:/]+\\s[+\\-]\\d{4})\\] \"(.+?)\" (\\d{3}) (\\d+|-) \"([^\"]+)\" \"([^\"]+)\"";
// Pattern p = Pattern.compile(logEntryPattern);
// Matcher matcher = p.matcher(eventStr);

/* if (!matcher.matches()){
System.err.println("Bad log entry (or problem with RE?):");
System.err.println(eventStr);
return null;
}
*/

/* String[] columns = new String[matcher.groupCount()];
for (int i = 0; i < matcher.groupCount(); i++){
columns[i] = matcher.group(i+1);
}*/

String[] s = eventStr.split("[:,]");
int n = s.length;
String[] columns = new String[n/2];
for(int i=0;2*i+1 columns[i] = s[2*i+1];
}
return columns;
}
}
2. 将上面的程序打包，放入flume的lib文件夹中
3. 配置Flume，实现采集和存储
配置文件flume-hbase.properties如下：

############################################
# flume-src-agent config
###########################################

#agent section
agent.sources = s
agent.channels = c
agent.sinks = r

#source section
#agent.sources.s.type = exec
#agent.sources.s.command = tail -f -n+1 /usr/local/test.log

agent.sources.s.type = spooldir
agent.sources.s.spoolDir = /usr/local/flume-hbase
agent.sources.s.fileHeader = true
agent.sources.s.batchSize = 100
agent.sources.s.channels = c

# Each sink's type must be defined
agent.sinks.r.type = asynchbase
agent.sinks.r.table = car_table
agent.sinks.r.columnFamily = lte
agent.sinks.r.batchSize = 100
agent.sinks.r.serializer = com.ncc.dlut.AsyncHbaseLTEEventSerializer
agent.sinks.r.serializer.columns = cid,time,pci,st,ed,ta,lng,lat

#Specify the channel the sink should use
agent.sinks.r.channel = c

# Each channel's type is defined.
agent.channels.c.type = memory
agent.channels.c.capacity = 1000
https://blog.csdn.net/yaoyasong/article/details/39400829

1. 首先开启Tomcat中的日志记录功能，并选择combined格式。

修改TOMCAT_PATH/conf/server.xml，增加日志记录：

prefix="localhost_access_log." suffix=".txt" renameOnRotate="true"

pattern="combined" />

这样，tomcat就会在logs目录下每天生成localhost_access_log文件并实时记录用户的访问情况。

public class AsyncHbaseLogEventSerializer implements AsyncHbaseEventSerializer{
private byte[] table;
private byte[] colFam;
private Event currentEvent;
private byte[][] columnNames;
private final List puts = new ArrayList();
private final List incs = new ArrayList();
private byte[] currentRowKey;
private final byte[] eventCountCol = "eventCount".getBytes();
public void initialize(byte[] table, byte[] cf) {
this.table = table;
this.colFam = cf;
}
public void configure(Context context) {
String cols = new String(context.getString("columns"));
String[] names = cols.split(",");
columnNames = new byte[names.length][];
int i = 0;
for (String name : names) {
columnNames[i++] = name.getBytes();
}
}
public void configure(ComponentConfiguration conf) {
}
public List getActions() {
// Split the event body and get the values for the columns
String eventStr = new String(currentEvent.getBody());
String[] cols = logTokenize(eventStr);
puts.clear();
String req = cols[4];
String reqPath = req.split(" ")[1];
int pos = reqPath.indexOf("?");
if (pos > 0) {
reqPath = reqPath.substring(0,pos);
}
if(reqPath.length() > 1 && reqPath.trim().endsWith("/")){
reqPath = reqPath.substring(0,reqPath.length()-1);
}
String req_ts_str = cols[3];
Long currTime = System.currentTimeMillis();
String currTimeStr = null;
if (req_ts_str != null && !req_ts_str.equals("")){
SimpleDateFormat df = new SimpleDateFormat("dd/MMM/yyyy:HH:mm:ss",Locale.US);
SimpleDateFormat df2 = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss");
try {
currTimeStr = df2.format(df.parse(req_ts_str));
currTime = df.parse(req_ts_str).getTime();
} catch (ParseException e) {
System.out.println("parse req time error,using system.current time.");
}
}
long revTs = Long.MAX_VALUE - currTime;
currentRowKey = (Long.toString(revTs) + reqPath).getBytes();
System.out.println("currentRowKey: " + new String(currentRowKey));
for (int i = 0; i < cols.length; i++){
PutRequest putReq = new PutRequest(table, currentRowKey, colFam, columnNames[i], cols[i].getBytes());
puts.add(putReq);
}
//增加列
PutRequest reqPathPutReq = new PutRequest(table, currentRowKey, colFam, "req_path".getBytes(), reqPath.getBytes());
puts.add(reqPathPutReq);
PutRequest reqTsPutReq = new PutRequest(table, currentRowKey, colFam, "req_ts".getBytes(), Bytes.toBytes(currTimeStr));
puts.add(reqTsPutReq);
String channelType = ChannelUtil.getType(cols[8]);
PutRequest channelPutReq = new PutRequest(table, currentRowKey, colFam, "req_chan".getBytes(), Bytes.toBytes(channelType));
puts.add(channelPutReq);
return puts;
}
public String[] logTokenize(String eventStr) {
String logEntryPattern = "^([\\d.]+) (\\S+) (\\S+) \\[([\\w:/]+\\s[+\\-]\\d{4})\\] \"(.+?)\" (\\d{3}) (\\d+|-) \"([^\"]+)\" \"([^\"]+)\"";
Pattern p = Pattern.compile(logEntryPattern);
Matcher matcher = p.matcher(eventStr);
if (!matcher.matches())
{
System.err.println("Bad log entry (or problem with RE?):");
System.err.println(eventStr);
return null;
}
String[] columns = new String[matcher.groupCount()];
for (int i = 0; i < matcher.groupCount(); i++)
{
columns[i] = matcher.group(i+1);
}
return columns;
}
public List getIncrements() {
incs.clear();
incs.add(new AtomicIncrementRequest(table, "totalEvents".getBytes(), colFam, eventCountCol));
return incs;
}
public void setEvent(Event event) {
this.currentEvent = event;
}
public void cleanUp() {
table = null;
colFam = null;
currentEvent = null;
columnNames = null;
currentRowKey = null;
}

————————————————
版权声明：本文为CSDN博主「曹雪朋」的原创文章，遵循CC 4.0 BY-SA版权协议，转载请附上原文出处链接及本声明。
原文链接：https://blog.csdn.net/qq_22473611/article/details/88101426

阅读(1771) | 评论(0) | 转发(0) |

上一篇：hbase 操作

下一篇：鸢尾花(iris)数据集分析

给主人留下些什么吧！~~

感谢所有关心和支持过ChinaUnix的朋友们

16024965号-6