> my $hr=($ix+1); if ($hr>12) { $hr=$hr-12; }
在awstats 5.5以后中已经加入了针对中文主要搜索引擎的定义:这里是补充后的完整列表(包括了主要门户搜索和搜索门户)
62c60
< "baidu\.com","search\.sina\.com","search\.sohu\.com",
---
> "baidu\.com","sina\.com","3721\.com","163\.com","tom\.com","sohu\.com",
153c144
< "baidu\.com","word=", "search\.sina\.com", "word=", "search\.sohu\.com","word=",
---
> "baidu\.com","word=", "sina\.com", "word=", "3721\.com", "name=","163\.com","q=","tom\.com","word=","sohu\.com","word=",
250c234
< "baidu\.com","baidu", "search\.sina\.com","sina", "search\.sohu\.com","sohu",
---
> "baidu\.com","baidu", "sina\.com","sina", "3721\.com","3721","163\.com","netease","tom\.com","tom","sohu\.com","sohu",
对google的unicode查询还是需要一些查询补丁:
因为google对于windows 2000以上的ie浏览器缺省发送的查询都是utf-8格式的,而其他搜索引擎大部分使用的是系统本地编码:gb2312,因此需要将查询uri解码后,还要根据是否使用utf-8进行到gb2312的转码,否则同样的单词会在统计中留有utf-8和gb2312两条记录。
我增加了以下函数用于google utf-8字符的解码和类似于“\xc4\xbe\xd7\xd3\xc3\xc0”这样查询的解码
sub utf8_to_ascii {
my $string = shift;
my $encoding = shift;
# change \xc4\xbe\xd7\xd3\xc3\xc0 into %c4%be%d7%d3%c3%c0
$string =~ s/\\x(\w{2})/%\1/gi;
# uri unescape
$string = uri_unescape($string);
if ( $string =~ m/^([\x00-\x7f]|[\xc2-\xdf][\x80-\xbf]|\xe0[\xa0-\xbf][\x80-\xbf]|[\xe1-\xef][\x80-\xbf][\
x80-\xbf]|\xf0[\x90-\xbf][\x80-\xbf][\x80-\xbf]|[\xf1-\xf7][\x80-\xbf][\x80-\xbf][\x80-\xbf])*$/ )
{
$string = decode("utf-8", $string);
$string = encode($encoding, $string);
}
# trim space
$string =~ s/^\s+//;
$string =~ s/\s+$//;
# reverse "+", ";" to space
$string =~ s/;+//g;
$string =~ s/\s+/\+/g;
#print $string."\n";
return $string;
}
geoip 和 geo::ipfree(awstats 5.5+)
geoip和geo::ipfree都免费的是国家/ip的影射表,比通过dns反相解析域名得到的统计准确,而且速度快。geoip的api都是免费的,缺省库是免费的,收费的是它的数据更新服务。geo::ipfree不仅代码是公开的,而且库数据也是公开的,因此可以自己定制,我曾经设想做一个中国城市到ip的映射。
geoip安装:
先下载c库:geoip c解包后
%./configure; make
#make install
然后下载perl库:geoip perl解包后
%perl makefile.pl; make
#make install
geo::ipfree安装:
下载geo::ipfree解包后
%perl makefile
%make
#make install
配置:通过在配置文件中启用插件geoip或者geo::ipfree
awstats
http://awstats.sourceforge.net/
webalizer
http://www.webalizer.org/
日志分析工具
http://directory.google.com/top/computers/software/internet/site_management/log_analysis/
商业日志统计/分析工具
http://directory.google.com/top/computers/software/internet/site_management/log_analysis/commercial/
多站点的日志合并统计:
http://www.chedong.com/tech/rotate_merge_log.html
日志统计对于分析搜索引擎对站点的影响具有非常重要的意义
http://www.chedong.com/tech/google.html
awstats 本身也包含了很多插件,包括将多个站点的统计再次汇总输出,iis日志时间转换,url的标题映射等http://awstats.sourceforge.net/awstats_contrib.html