XMLHTTP抓取远程数据的后期处理

[入库:2005年8月18日] [更新:2007年3月25日]

本文简介:选择自 iuhxq 的 blog

<%

'作者信息:
'昵称:小灰
'qq:103895
'http://asp2004.net
'http://blog.csdn.net/iuhxq

hehe = hello("http://mmsg.qq.com/cgi-bin/gddylist?type=13&sort=1&page=3", "<html>", "</html>", ".*(<td width=""35%"" bgcolor=""#[\dabcde]{6}"">(.*)</td>)[.\n]*", "<font style=""font-size:9pt;"" color=blue>$2</font><br>")
response.write hehe

function hello(strurl, strstart, strend, patrn, replstr)
    str = getbody(strurl)
    str = mymid(str, strstart, strend)
    str = replacetest(patrn, replstr, str)
    hello = str
end function

function mymid(str, strstart, strend)
    if strstart = "" then
        i = 0
    else
        i = instr(str, strstart)
    end if
    if strend = "" then
        j = len(str)
    else
        j = instr(i, str, strend)
    end if
    mymid = mid(str, i, j - i + 1)
end function

function replacetest(patrn, replstr, str1)
    dim regex, match, matches
    set regex = new regexp
    regex.pattern = patrn
    regex.ignorecase = true
    regex.global = true
    set matches = regex.execute(str1)
    for each match in matches
        replacetest = replacetest&regex.replace(match.value, replstr)
    next
end function

function getbody(url)
    set objxml = createobject("microsoft.xmlhttp")
    with objxml
        .open "get", url, false, "", ""
        .send
        getbody = .responsebody
    end with
    getbody = bytestobstr(getbody, "gb2312")
    set objxml = nothing
end function

function bytestobstr(strbody, codebase)
    set objstream = server.createobject("adodb.stream")
    with objstream
        .type = 1
        .mode = 3
        .open
        .write strbody
        .position = 0
        .type = 2
        .charset = codebase
        bytestobstr = .readtext
        .close
    end with
    set objstream = nothing
end function
%>
其他调用示例:
hehe = hello("http://list.mp3.baidu.com/song/a.htm", "<table width=""90%"" border=""0"" align=""center"" cellpadding=""3"" cellspacing=""0"" bgcolor=""#f5f5f5"" >", "<div align=center>", ".*(<td width=""20%""><a href="".*\.htm"" target=_blank>)(.*)(</a></td>)[.\n]*", "<font style=""font-size:9pt;"" color=blue>$2</font><br>")

本文关键:XMLHTTP抓取远程数据的后期处理
 

本站最佳浏览方式为 分辨率 1024x768 IE 6.0(或更高版本的 IE浏览器)

go top