Chinaunix首页 | 论坛 | 博客
  • 博客访问: 1188799
  • 博文数量: 253
  • 博客积分: 5892
  • 博客等级: 大校
  • 技术积分: 1942
  • 用 户 组: 普通用户
  • 注册时间: 2011-02-24 14:20
文章分类

全部博文(253)

文章存档

2012年(98)

2011年(155)

分类: LINUX

2011-10-15 18:07:18

日志分析工具Awstats的强大就在于它可以设置扩展。之遥你稍微懂一点Perl的正则表达式,你就可以让awstats无比强大。

Awstats的扩展设置,其实就是设定自定义报表。要给你的日志分析添加自定义报表,只需要修改awstats.domain.conf文件中的Extra Section。以下为自带的说明:

#—————————————————————————–
# EXTRA SECTIONS
#—————————————————————————–

# You can define your own charts, you choose here what are rows and columns
# keys. This feature is particularly useful for marketing purpose, tracking
# products orders for example.
# For this, edit all parameters of Extra section. Each set of parameter is a
# different chart. For several charts, duplicate section changing the number.
# Note: Each Extra section reduces AWStats speed by 8%.
#
# WARNING: A wrong setup of Extra section might result in too large arrays
# that will consume all your memory, making AWStats unusable after several
# updates, so be sure to setup it correctly.
# In most cases, you don’t need this feature.
#
# ExtraSectionNameX is title of your personalized chart.
# ExtraSectionCodeFilterX is list of codes the record code field must match.
#   Put an empty string for no test on code.
# ExtraSectionConditionX are conditions you can use to count or not the hit,
#   Use one of the field condition
#   (URL,URLWITHQUERY,QUERY_STRING,REFERER,UA,HOSTINLOG,HOST,VHOST,extraX)
#   and a regex to match, after a coma. Use “||” for “OR”.
# ExtraSectionFirstColumnTitleX is the first column title of the chart.
# ExtraSectionFirstColumnValuesX is a string to tell AWStats which field to
#   extract value from
#   (URL,URLWITHQUERY,QUERY_STRING,REFERER,UA,HOSTINLOG,HOST,VHOST,extraX)
#   and how to extract the value (using regex syntax). Each different value
#   found will appear in first column of report on a different row. Be sure
#   that list of different possible values will not grow indefinitely.
# ExtraSectionFirstColumnFormatX is the string used to write value.
# ExtraSectionStatTypesX are things you want to count. You can use standard
#   code letters (P for pages,H for hits,B for bandwidth,L for last access).
# ExtraSectionAddAverageRowX add a row at bottom of chart with average values.
# ExtraSectionAddSumRowX add a row at bottom of chart with sum values.
# MaxNbOfExtraX is maximum number of rows shown in chart.
# MinHitExtraX is minimum number of hits required to be shown in chart.
#

# Example to report the 20 products the most ordered by “order.cgi” script
#ExtraSectionName1=”Product orders”
#ExtraSectionCodeFilter1=”200 304″
#ExtraSectionCondition1=”URL,\/cgi\-bin\/order\.cgi||URL,\/cgi\-bin\/order2\.cgi”
#ExtraSectionFirstColumnTitle1=”Product ID”
#ExtraSectionFirstColumnValues1=”QUERY_STRING,productid=([^&]+)”
#ExtraSectionFirstColumnFormat1=”%s”
#ExtraSectionStatTypes1=PL
#ExtraSectionAddAverageRow1=0
#ExtraSectionAddSumRow1=1
#MaxNbOfExtra1=20
#MinHitExtra1=1
# There is also a global parameter ExtraTrackedRowsLimit that limits the
# number of possible rows an ExtraSection can report. This parameter is
# here to protect too much memory use when you make a bad setup in your
# ExtraSection. It applies to all ExtraSection independently meaning that
# none ExtraSection can report more rows than value defined by ExtraTrackedRowsLimit.
# If you know an ExtraSection will report more rows than its value, you should
# increase this parameter or AWStats will stop with an error.
# Example: 2000
# Default: 500
#
ExtraTrackedRowsLimit=500

  • Extra Section可以有多个报表,每个报表有一组参数,以数字为标号。如以下X是一个数字,可以按1,2…编下去,为组号的标号(注意一定要从1开始标,否则会处理失败):
  • ExtraSectionNameX 自定义图表的名称.
  • ExtraSectionCodeFilterX 必须要匹配的记录中返回代码,比如http日志中的 200 304,空字符不做检测
  • ExtraSectionConditionX 用来计数的条件,满足的行做计数处理,使用以下条件 (URL,URLWITHQUERY,QUERY_STRING,REFERER,UA,HOST,extraX)并在逗号后面用regex来测试满足条 件的字符串, 可以使用 “||” 当作 “OR” 做多个项之间的并列条件.
  • ExtraSectionFirstColumnTitleX 图表第一列的标题
  • ExtraSectionFirstColumnValuesX 给定条件字段中获取行值的一个regex格式字符串(同行条件部分,不同点是前面是测试满足条件的行,这儿是提取满足条件的值,做结果报表中的行值) (URL,URLWITHQUERY,QUERY_STRING,REFERER,UA,HOST,VHOST,extraX)逗号跟regex提取值表 达式。每个找到的不同值将在列表中有一行数据,计数的值将出现在该行的第一列. 确认获取到的不同值列表是可预计的(不会无限扩展,撑破内存)。
  • ExtraSectionFirstColumnFormatX 用来输出值的字符串,如:
  • ExtraSectionStatTypesX 想要计数的类型,可以使用标准代码字符。
  • ExtraSectionAddAverageRowX 在底部添加一个平均值行
  • ExtraSectionAddSumRowX 在底部添加一个合计行
  • MaxNbOfExtraX 在图表中显示的最大行数
  • MinHitExtraX 要在图表中做显示的最小值

ExtraSectionStatTypesX 是需要显示的数据类型,有以下几类可供使用:

  • U = Unique visitors
  • V = Visits
  • P = Number of pages
  • H = Number of hits (or mails)
  • B = Bandwith (or total mail size for mail logs)
  • L = Last access date
  • E = Entry pages
  • X = Exit pages
  • C = Web compression (mod_gzip,mod_deflate)

张国平老师的《》中强大的功能也可以使用awstats来实现,以下为一些awstats的扩展示例,来个抛砖引玉,各位网站分析或者SEO达人可以再做拓展。

#1 显示前20个推介网站

ExtraSectionName1=”Top 20 Referrers by Domain”
ExtraSectionCodeFilter1=”200 304″
ExtraSectionCondition1=”"
ExtraSectionFirstColumnTitle1=”Referring Domain”
ExtraSectionFirstColumnValues1=”REFERER,^http:\/\/www\.([^\/]+)\/||REFERER,^http:\/\/([^\/]+)\/”
ExtraSectionFirstColumnFormat1=”%s”
ExtraSectionStatTypes1=PHB
ExtraSectionAddAverageRow1=0
ExtraSectionAddSumRow1=1
MaxNbOfExtra1=20
MinHitExtra1=1

#2 显示被百度抓取的前20个文件

ExtraSectionName2=”Baidu crawls – Top 20″
ExtraSectionCodeFilter2=”"
ExtraSectionCondition2=”UA,(.*Baiduspider.*)”
ExtraSectionFirstColumnTitle2=”File name”
ExtraSectionFirstColumnValues2=”URL,(.*)”
ExtraSectionFirstColumnFormat2=”%s
ExtraSectionStatTypes2=HBL
ExtraSectionAddAverageRow2=0
ExtraSectionAddSumRow2=1
MaxNbOfExtra2=20
MinHitExtra2=1

#3 显示被百度抓取的前10个目录

ExtraSectionName3=”Baidu crawls – Top 10 Dir”
ExtraSectionCodeFilter3=”"
ExtraSectionCondition3=”UA,(.*Baiduspider.*)”
ExtraSectionFirstColumnTitle3=”Directory name”
ExtraSectionFirstColumnValues3=”URL,(/.*?/).*?”
ExtraSectionFirstColumnFormat3=”%s”
ExtraSectionStatTypes3=HBL
ExtraSectionAddAverageRow3=0
ExtraSectionAddSumRow3=1
MaxNbOfExtra3=10
MinHitExtra3=1

#4 显示网站前20个505错误页面

ExtraSectionName4=”Internal Server Errors (500)”
ExtraSectionCodeFilter4=”500″
ExtraSectionCondition4=”URL,^.*$”
ExtraSectionFirstColumnTitle4=”URL”
ExtraSectionFirstColumnValues4=”URL,^(.*)$”
ExtraSectionFirstColumnFormat4=”%s
ExtraSectionStatTypes4=HBL
ExtraSectionAddSumRow4=1
MaxNbOfExtra4=20
MinHitExtra4=1

#5 显示百度蜘蛛的IP

ExtraSectionName5=”Baidu crawls IP”
ExtraSectionCodeFilter5=”"
ExtraSectionCondition5=”UA,(.*Baiduspider.*)”
ExtraSectionFirstColumnTitle5=”IP地址”
ExtraSectionFirstColumnValues5=”HOST,(.*)”
ExtraSectionFirstColumnFormat5=”%s”
ExtraSectionStatTypes5=HBL
ExtraSectionAddAverageRow5=0
ExtraSectionAddSumRow5=1
MaxNbOfExtra5=50
MinHitExtra5=1

以上仅为示例,当然还有更多的功能等待你去扩展,最后送上一个Perl正则表达式的思维导图。(点击可看到原图)

阅读(1657) | 评论(0) | 转发(0) |
给主人留下些什么吧!~~