> "163\.com","q=",
> "3721\.com","name=",
228a237,238
> # minor chinese search engines
> "baidu\.","baidu", "163\.com","netease","sina\.","sina","sohu\.","sohu","3721\.com","3721",
针对google的unicode查询补丁:
因为google对于windows 2000上的ie浏览器缺省发送的查询都是utf-8格式的,而其他搜索引擎大部分使用的是系统本地编码:gb2312,因此需要将查询uri解码后,还要根据是否使用utf-8进行到gb2312的转码,否则同样的单词会在统计中留有utf-8和gb2312两条记录。
15,16c15,16
< use encode;
< use uri::escape;
---
>
>
5692,5694d5691
< #utf-8 encoding detection
< my $unicodedetected = 0;
< my $searchquery = "";
5696,5701d5692
< # google use: ie=utf-8
< # alltheweb use: cs=utf-8
< if ($param eq "ie=utf-8" || $param eq "cs=utf-8") {
< $unicodedetected = 1;
< }
<
5704d5694
< $param = uri_unescape($param);
5708,5712c5698,5700
< $param =~ s/^ +//;
< $param =~ s/ +$//;
< $param =~ tr/ /\+/s;
< $param =~ s/\+/ /s;