2009-06-20

判断搜索引擎爬虫

类归于: PHP — 标签:hansir85 @ 14:34

如何判断访问是否来自搜索引擎。
可以通过HTTP header 内的 HTTP USER AGENT 识别。
(还有别的方法)
Baiduspider+(+http://www.baidu.com/search/spider.htm)
这是百度的爬虫。

使用php变量,$_SERVER['HTTP_USER_AGENT']
以PHP程序为例:

$userAgent = strtolower($_SERVER['HTTP_USER_AGENT']);
$spiders = array('Baiduspider', 'Googlebot');//添加需要的爬虫关键字
foreach ($spiders as $spider)
{
$spider = strtolower($spider);
if (strpos($userAgent, $spider) !== false)
{
return 'is spider';
}
return 'is not spider';
}
?>

一些常用的爬虫
百度爬虫
Baiduspider+(+http://www.baidu.com/search/spider.htm)

雅虎爬虫,分别是雅虎中国和美国总部的爬虫
Mozilla/5.0 (compatible; Yahoo! Slurp China; http://misc.yahoo.com.cn/help.html)
Mozilla/5.0 (compatible; Yahoo! Slurp; http://help.yahoo.com/help/us/ysearch/slurp)

新浪爱问爬虫
iaskspider/2.0(+http://iask.com/help/help_index.html)
Mozilla/5.0 (compatible; iaskspider/1.0; MSIE 6.0)

搜狗爬虫
Sogou web spider/3.0(+http://www.sogou.com/docs/help/webmasters.htm#07″)
Sogou Push Spider/3.0(+http://www.sogou.com/docs/help/webmasters.htm#07″)
【早期用法:“sogou spider”】

Google爬虫
Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)

Google AdSense广告内容匹配爬虫
Mediapartners-Google/2.1

网易爬虫
Mozilla/5.0 (compatible; YodaoBot/1.0; http://www.yodao.com/help/webmaster/spider/;)
【早期采用“ OutfoxBot/0.5 (for internet experiments; http://”; outfoxbot@gmail.com)”】

Alexa排名爬虫
ia_archiver

MSN爬虫
msnbot/1.0 (+http://search.msn.com/msnbot.htm)
特点未知
msnbot-media/1.0 (+http://search.msn.com/msnbot.htm)

据称为北大天网的搜索引擎爬虫程序
P.Arthur 1.1

看来是Qihoo的
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; QihooBot 1.0)

Gigabot搜索引擎爬虫
Gigabot/2.0 (http://www.gigablast.com/spider.html)

来源 互联网 参考如何识别搜索引擎爬虫

随机日志

6 条评论 »

  1. I’ve said that least 1126266 times. The problem this like that is they are just too compilcated for the average bird, if you know what I mean

    Comment 由 a1126266 — 2011-11-20 @ 17:36

  2. I’ve said that least 4107120 times. The problem this like that is they are just too compilcated for the average bird, if you know what I mean

    Comment 由 a4107120 — 2011-11-20 @ 17:36

  3. tb1126266@1126266…

    I’ve said that least 1126266 times. …

    Trackback 由 I've said that least 1126266 times — 2011-11-20 @ 17:36

  4. I’ve said that least 3984650 times. The problem this like that is they are just too compilcated for the average bird, if you know what I mean

    Comment 由 a3984650 — 2011-11-20 @ 17:36

  5. I’ve said that least 1715887 times. The problem this like that is they are just too compilcated for the average bird, if you know what I mean

    Comment 由 a1715887 — 2011-11-20 @ 17:36

  6. I’ve said that least 1217782 times. The problem this like that is they are just too compilcated for the average bird, if you know what I mean

    Comment 由 a1217782 — 2011-11-20 @ 17:36

这篇文章上的评论 RSS feed TrackBack URL

留下评论

WordPress 所驱动