使用flume+kafka+storm构建实时日志分析系统-yueys

yueys_canedy的ChinaUnix博客

首页　| 　博文目录　| 　关于我

yueys_canedy

博客访问： 426670
博文数量： 22
博客积分： 0
博客等级：民兵
技术积分： 1712
用户组：普通用户
注册时间： 2013-09-09 10:51

文章分类

全部博文（22）

flume（1）
ganglia（4）
python（2）
常用资料（2）
工具（3）
数据库（1）
Hadoop（2）
未分配的博文（7）

文章存档

2016年（3）

2015年（6）

2014年（1）

2013年（12）

我的朋友

相关博文

使用flume+kafka+storm构建实时日志分析系统

分类：大数据

2016-03-14 20:58:22

本文只会涉及flume和kafka的结合，kafka和storm的结合可以参考其他博客
1. flume安装使用
    下载flume安装包
    解压$ tar -xzvf apache-flume-1.5.2-bin.tar.gz -C /opt/flume
    flume配置文件放在conf文件目录下，执行文件放在bin文件目录下。
    1）配置flume
    进入conf目录将flume-conf.properties.template拷贝一份，并命名为自己需要的名字
    $ cp flume-conf.properties.template flume.conf
     修改flume.conf的内容，我们使用file sink来接收channel中的数据，channel采用memory channel，source采用exec source，配置文件如下：

点击(此处)折叠或打开

agent.sources = seqGenSrc
agent.channels = memoryChannel
agent.sinks = loggerSink
# For each one of the sources, the type is defined
agent.sources.seqGenSrc.type = exec
agent.sources.seqGenSrc.command = tail -F /data/mongodata/mongo.log
#agent.sources.seqGenSrc.bind = 172.168.49.130
# The channel can be defined as follows.
agent.sources.seqGenSrc.channels = memoryChannel
# Each sink's type must be defined
agent.sinks.loggerSink.type = file_roll
agent.sinks.loggerSink.sink.directory = /data/flume
#Specify the channel the sink should use
agent.sinks.loggerSink.channel = memoryChannel
# Each channel's type is defined.
agent.channels.memoryChannel.type = memory
# Other config values specific to each type of channel(sink or source)
# can be defined as well
# In this case, it specifies the capacity of the memory channel
agent.channels.memoryChannel.capacity = 1000
agent.channels.memory4log.transactionCapacity = 100

    2）运行flume agent
    切换到bin目录下，运行一下命令：
    $ ./flume-ng agent --conf ../conf -f ../conf/flume.conf --n agent -Dflume.root.logger=INFO,console
    在/data/flume目录下可以看到生成的日志文件。

2. 结合kafka
由于flume1.5.2没有kafka sink，所以需要自己开发kafka sink
可以参考flume 1.6里面的kafka sink，但是要注意使用的kafka版本，由于有些kafka api不兼容的
这里只提供核心代码，process()内容。

点击(此处)折叠或打开

Sink.Status status = Status.READY;
Channel ch = getChannel();
Transaction transaction = null;
Event event = null;
String eventTopic = null;
String eventKey = null;
try {
transaction = ch.getTransaction();
transaction.begin();
messageList.clear();
if (type.equals("sync")) {
event = ch.take();
if (event != null) {
byte[] tempBody = event.getBody();
String eventBody = new String(tempBody,"UTF-8");
Map<String, String> headers = event.getHeaders();
if ((eventTopic = headers.get(TOPIC_HDR)) == null) {
eventTopic = topic;
}
eventKey = headers.get(KEY_HDR);
if (logger.isDebugEnabled()) {
logger.debug("{Event} " + eventTopic + " : " + eventKey + " : "
+ eventBody);
}
ProducerData<String, Message> data = new ProducerData<String, Message>
(eventTopic, new Message(tempBody));
long startTime = System.nanoTime();
logger.debug(eventTopic+"++++"+eventBody);
producer.send(data);
long endTime = System.nanoTime();
}
} else {
long processedEvents = 0;
for (; processedEvents < batchSize; processedEvents += 1) {
event = ch.take();
if (event == null) {
break;
}
byte[] tempBody = event.getBody();
String eventBody = new String(tempBody,"UTF-8");
Map<String, String> headers = event.getHeaders();
if ((eventTopic = headers.get(TOPIC_HDR)) == null) {
eventTopic = topic;
}
eventKey = headers.get(KEY_HDR);
if (logger.isDebugEnabled()) {
logger.debug("{Event} " + eventTopic + " : " + eventKey + " : "
+ eventBody);
logger.debug("event #{}", processedEvents);
}
// create a message and add to buffer
ProducerData<String, String> data = new ProducerData<String, String>
(eventTopic, eventBody);
messageList.add(data);
}
// publish batch and commit.
if (processedEvents > 0) {
long startTime = System.nanoTime();
long endTime = System.nanoTime();
}
}
transaction.commit();
} catch (Exception ex) {
String errorMsg = "Failed to publish events";
logger.error("Failed to publish events", ex);
status = Status.BACKOFF;
if (transaction != null) {
try {
transaction.rollback();
} catch (Exception e) {
logger.error("Transaction rollback failed", e);
throw Throwables.propagate(e);
}
}
throw new EventDeliveryException(errorMsg, ex);
} finally {
if (transaction != null) {
transaction.close();
}
}
return status;

下一步，修改flume配置文件，将其中sink部分的配置改成kafka sink，如：

点击(此处)折叠或打开

producer.sinks.r.type = org.apache.flume.sink.kafka.KafkaSink
producer.sinks.r.brokerList = bigdata-node00:9092
producer.sinks.r.requiredAcks = 1
producer.sinks.r.batchSize = 100
#producer.sinks.r.kafka.producer.type=async
#producer.sinks.r.kafka.customer.encoding=UTF-8
producer.sinks.r.topic = testFlume1

type指向kafkasink所在的完整路径
下面的参数都是kafka的一系列参数，最重要的是brokerList和topic参数

现在重新启动flume，就可以在kafka的对应topic下查看到对应的日志

阅读(22897) | 评论(0) | 转发(1) |

上一篇：用户名、密码连接mongodb数据库

下一篇：python3.x+requests 爬取网站遇到中文乱码的解决方案

给主人留下些什么吧！~~

感谢所有关心和支持过ChinaUnix的朋友们

16024965号-6