Chinaunix首页 | 论坛 | 博客
  • 博客访问: 854166
  • 博文数量: 156
  • 博客积分: 5339
  • 博客等级: 大校
  • 技术积分: 1446
  • 用 户 组: 普通用户
  • 注册时间: 2005-08-10 21:41
文章分类

全部博文(156)

文章存档

2016年(3)

2015年(7)

2014年(3)

2013年(1)

2012年(8)

2011年(5)

2010年(1)

2009年(5)

2008年(4)

2007年(26)

2006年(47)

2005年(46)

分类: Python/Ruby

2015-02-26 14:54:57

合并多行数据:

点击(此处)折叠或打开

  1. # with an input plugin:
  2. # you can also use this codec with an output.
  3. input {
  4.   file {
  5.     codec => multiline {
  6.       charset => ... # string, one of ["ASCII-8BIT", "Big5", "Big5-HKSCS", "Big5-UAO", "CP949", "Emacs-Mule", "EUC-JP", "EUC-KR", "EUC-TW", "GB18030", "GBK", "ISO-8859-1", "ISO-8859-2", "ISO-8859-3", "ISO-8859-4", "ISO-8859-5", "ISO-8859-6", "ISO-8859-7", "ISO-8859-8", "ISO-8859-9", "ISO-8859-10", "ISO-8859-11", "ISO-8859-13", "ISO-8859-14", "ISO-8859-15", "ISO-8859-16", "KOI8-R", "KOI8-U", "Shift_JIS", "US-ASCII", "UTF-8", "UTF-16BE", "UTF-16LE", "UTF-32BE", "UTF-32LE", "Windows-1251", "GB2312", "IBM437", "IBM737", "IBM775", "CP850", "IBM852", "CP852", "IBM855", "CP855", "IBM857", "IBM860", "IBM861", "IBM862", "IBM863", "IBM864", "IBM865", "IBM866", "IBM869", "Windows-1258", "GB1988", "macCentEuro", "macCroatian", "macCyrillic", "macGreek", "macIceland", "macRoman", "macRomania", "macThai", "macTurkish", "macUkraine", "CP950", "CP951", "stateless-ISO-2022-JP", "eucJP-ms", "CP51932", "GB12345", "ISO-2022-JP", "ISO-2022-JP-2", "CP50220", "CP50221", "Windows-1252", "Windows-1250", "Windows-1256", "Windows-1253", "Windows-1255", "Windows-1254", "TIS-620", "Windows-874", "Windows-1257", "Windows-31J", "MacJapanese", "UTF-7", "UTF8-MAC", "UTF-16", "UTF-32", "UTF8-DoCoMo", "SJIS-DoCoMo", "UTF8-KDDI", "SJIS-KDDI", "ISO-2022-JP-KDDI", "stateless-ISO-2022-JP-KDDI", "UTF8-SoftBank", "SJIS-SoftBank", "BINARY", "CP437", "CP737", "CP775", "IBM850", "CP857", "CP860", "CP861", "CP862", "CP863", "CP864", "CP865", "CP866", "CP869", "CP1258", "Big5-HKSCS:2008", "eucJP", "euc-jp-ms", "eucKR", "eucTW", "EUC-CN", "eucCN", "CP936", "ISO2022-JP", "ISO2022-JP2", "ISO8859-1", "CP1252", "ISO8859-2", "CP1250", "ISO8859-3", "ISO8859-4", "ISO8859-5", "ISO8859-6", "CP1256", "ISO8859-7", "CP1253", "ISO8859-8", "CP1255", "ISO8859-9", "CP1254", "ISO8859-10", "ISO8859-11", "CP874", "ISO8859-13", "CP1257", "ISO8859-14", "ISO8859-15", "ISO8859-16", "CP878", "CP932", "csWindows31J", "SJIS", "PCK", "MacJapan", "ASCII", "ANSI_X3.4-1968", "646", "CP65000", "CP65001", "UTF-8-MAC", "UTF-8-HFS", "UCS-2BE", "UCS-4BE", "UCS-4LE", "CP1251", "external", "locale"] (optional), default: "UTF-8"
  7.       multiline_tag => ... # string (optional), default: "multiline"
  8.       negate => ... # boolean (optional), default: false
  9.       pattern => ... # string (required)
  10.       patterns_dir => ... # array (optional), default: []
  11.       what => ... # string, one of ["previous", "next"] (required)
  12.     }
  13.   }
  14. }
negate字段是一个选择开关,可以正向匹配和反向匹配

参考:https://github.com/chenryn/logstash-best-practice-cn/blob/master/codec/multiline.md
参考:http://www.logstash.net/docs/1.4.2/codecs/multiline

拷贝@timestamp字段:

点击(此处)折叠或打开

  1. filter {
  2.     ruby {
  3.             code => "event['read_time'] = event['@timestamp']"
  4.     }
  5.     mutate
  6.     {
  7.         add_field => ["read_time_string", "%{@timestamp}"]
  8.     }
  9. }
参考:http://stackoverflow.com/questions/25189872/logstash-how-to-make-a-copy-of-the-timestamp-field-while-maintaining-the-same

多行匹配:

在和 codec/multiline 搭配使用的时候,需要注意一个问题,grok 正则和普通正则一样,默认是不支持匹配回车换行的。就像你需要 =~ //m 一样也需要单独指定,具体写法是在表达式开始位置加 (?m) 标记。如下所示:

match => {
    "message" => "(?m)\s+(?\d+(?:\.\d+)?)\s+"
}
此段原文来自:https://github.com/chenryn/logstash-best-practice-cn/blob/master/filter/grok.md

最终的配置文件:

点击(此处)折叠或打开

  1. input {
  2.         file {
  3.                 type => "type"
  4.                 path => ["info.log"]
  5.                 exclude => ["*.gz", "access.log"]
  6.                 codec => multiline {
  7.                                      pattern => "^2015"
  8.                                      negate => true
  9.                                      what => "previous"
  10.                                     }
  11.         }
  12. }

  13. filter {
  14.     grok {
  15.         match => {
  16.             "message" => "(?m)%{TIMESTAMP_ISO8601:logtime}"
  17.         }
  18.     }
  19.     ruby {
  20.             code => "event['readtime'] = event['@timestamp']"
  21.     }
  22.     date {
  23.         #locale => "en"
  24.         match => ["logtime", "YYYY-MM-dd HH:mm:ss"]
  25.         #timezone => "UTC"
  26.         #target => "logtimestamp"
  27.         remove_field => [ "logtime"]
  28.    }
  29. }

  30. output {
  31.         stdout {}
  32.         redis {
  33.                 host => "127.0.0.1"
  34.                 port => 6379
  35.                 data_type => "list"
  36.                 key => "key_count"
  37.         }
  38. }
grok内置正则表达式:https://github.com/elasticsearch/logstash/blob/v1.4.2/patterns/grok-patterns
阅读(13978) | 评论(1) | 转发(0) |
给主人留下些什么吧!~~

cq5392017-06-06 17:40:35

请问pattern => \"^at  abcd\"  at 后面有空格,这种怎么写呢?

评论热议
请登录后评论。

登录 注册