<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>4's symfony blog &#187; robot，爬虫</title>
	<atom:link href="http://www.foolbirds.com/t/robot%ef%bc%8c%e7%88%ac%e8%99%ab/feed" rel="self" type="application/rss+xml" />
	<link>http://www.foolbirds.com</link>
	<description>all about symfony</description>
	<lastBuildDate>Tue, 17 Aug 2010 01:22:43 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.8.4</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>判断搜索引擎爬虫</title>
		<link>http://www.foolbirds.com/%e5%88%a4%e6%96%ad%e6%90%9c%e7%b4%a2%e5%bc%95%e6%93%8e%e7%88%ac%e8%99%ab.html</link>
		<comments>http://www.foolbirds.com/%e5%88%a4%e6%96%ad%e6%90%9c%e7%b4%a2%e5%bc%95%e6%93%8e%e7%88%ac%e8%99%ab.html#comments</comments>
		<pubDate>Sat, 20 Jun 2009 06:34:53 +0000</pubDate>
		<dc:creator>hansir85</dc:creator>
				<category><![CDATA[PHP]]></category>
		<category><![CDATA[robot，爬虫]]></category>

		<guid isPermaLink="false">http://www.foolbirds.com/?p=863</guid>
		<description><![CDATA[如何判断访问是否来自搜索引擎。
可以通过HTTP header 内的 HTTP USER AGENT 识别。
（还有别的方法）
Baiduspider+(+http://www.baidu.com/search/spider.htm)
这是百度的爬虫。
使用php变量，$_SERVER['HTTP_USER_AGENT']
以PHP程序为例：

$userAgent = strtolower($_SERVER['HTTP_USER_AGENT']);
$spiders = array('Baiduspider', 'Googlebot');//添加需要的爬虫关键字
foreach ($spiders as $spider)
{
$spider = strtolower($spider);
if (strpos($userAgent, $spider) !== false)
{
return 'is spider';
}
return 'is not spider';
}
?&#62;

一些常用的爬虫
百度爬虫
Baiduspider+(+http://www.baidu.com/search/spider.htm)
雅虎爬虫，分别是雅虎中国和美国总部的爬虫
Mozilla/5.0 (compatible; Yahoo! Slurp China; http://misc.yahoo.com.cn/help.html)
Mozilla/5.0 (compatible; Yahoo! Slurp; http://help.yahoo.com/help/us/ysearch/slurp)
新浪爱问爬虫
iaskspider/2.0(+http://iask.com/help/help_index.html)
Mozilla/5.0 (compatible; iaskspider/1.0; MSIE 6.0)
搜狗爬虫
Sogou web spider/3.0(+http://www.sogou.com/docs/help/webmasters.htm#07″)
Sogou Push Spider/3.0(+http://www.sogou.com/docs/help/webmasters.htm#07″)
【早期用法：“sogou spider”】
Google爬虫
Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)
Google AdSense广告内容匹配爬虫
Mediapartners-Google/2.1
网易爬虫
Mozilla/5.0 (compatible; YodaoBot/1.0; http://www.yodao.com/help/webmaster/spider/;)
【早期采用“ OutfoxBot/0.5 (for internet experiments; [...]]]></description>
			<content:encoded><![CDATA[<p>如何判断访问是否来自搜索引擎。<br />
可以通过HTTP header 内的 HTTP USER AGENT 识别。<br />
（还有别的方法）<br />
Baiduspider+(+http://www.baidu.com/search/spider.htm)<br />
这是百度的爬虫。</p>
<p>使用php变量，$_SERVER['HTTP_USER_AGENT']<br />
以PHP程序为例：<br />
<code><br />
$userAgent = strtolower($_SERVER['HTTP_USER_AGENT']);<br />
$spiders = array('Baiduspider', 'Googlebot');//添加需要的爬虫关键字<br />
foreach ($spiders as $spider)<br />
{<br />
$spider = strtolower($spider);<br />
if (strpos($userAgent, $spider) !== false)<br />
{<br />
return 'is spider';<br />
}<br />
return 'is not spider';<br />
}<br />
?&gt;<br />
</code></p>
<p>一些常用的爬虫<br />
百度爬虫<br />
Baiduspider+(+http://www.baidu.com/search/spider.htm)</p>
<p>雅虎爬虫，分别是雅虎中国和美国总部的爬虫<br />
Mozilla/5.0 (compatible; Yahoo! Slurp China; http://misc.yahoo.com.cn/help.html)<br />
Mozilla/5.0 (compatible; Yahoo! Slurp; http://help.yahoo.com/help/us/ysearch/slurp)</p>
<p>新浪爱问爬虫<br />
iaskspider/2.0(+http://iask.com/help/help_index.html)<br />
Mozilla/5.0 (compatible; iaskspider/1.0; MSIE 6.0)</p>
<p>搜狗爬虫<br />
Sogou web spider/3.0(+http://www.sogou.com/docs/help/webmasters.htm#07″)<br />
Sogou Push Spider/3.0(+http://www.sogou.com/docs/help/webmasters.htm#07″)<br />
【早期用法：“sogou spider”】</p>
<p>Google爬虫<br />
Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)</p>
<p>Google AdSense广告内容匹配爬虫<br />
Mediapartners-Google/2.1</p>
<p>网易爬虫<br />
Mozilla/5.0 (compatible; YodaoBot/1.0; http://www.yodao.com/help/webmaster/spider/;)<br />
【早期采用“ OutfoxBot/0.5 (for internet experiments; http://”; outfoxbot@gmail.com)”】</p>
<p>Alexa排名爬虫<br />
ia_archiver</p>
<p>MSN爬虫<br />
msnbot/1.0 (+http://search.msn.com/msnbot.htm)<br />
特点未知<br />
msnbot-media/1.0 (+http://search.msn.com/msnbot.htm)</p>
<p>据称为北大天网的搜索引擎爬虫程序<br />
P.Arthur 1.1</p>
<p>看来是Qihoo的<br />
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; QihooBot 1.0)</p>
<p>Gigabot搜索引擎爬虫<br />
Gigabot/2.0 (http://www.gigablast.com/spider.html)</p>
<p>来源 互联网 参考<a href="http://www.160km.com/bear/?p=14">如何识别搜索引擎爬虫</a></p>
<h3  class="related_post_title">随机日志</h3><ul class="related_post"><li><a href="http://www.foolbirds.com/symfony-htaccess-and-root.html" title="Symfony框架中的“.htaccess”文件和“根”目录">Symfony框架中的“.htaccess”文件和“根”目录</a></li><li><a href="http://www.foolbirds.com/batch-in-symfony12.html" title="symfony1.2下的命令行程序(batch)">symfony1.2下的命令行程序(batch)</a></li><li><a href="http://www.foolbirds.com/use-form.html" title="form使用一例">form使用一例</a></li><li><a href="http://www.foolbirds.com/get-var-in-templates.html" title="如何在视图里接收变量">如何在视图里接收变量</a></li><li><a href="http://www.foolbirds.com/archlinux-lam.html" title="archlinux 安装 LAMP环境">archlinux 安装 LAMP环境</a></li><li><a href="http://www.foolbirds.com/vim%e9%85%8d%e5%90%88phpcs%e5%ae%9e%e7%8e%b0p%e8%87%aa%e5%8a%a8%e8%af%ad%e6%b3%95%e6%a3%80%e6%9f%a5.html" title="vim配合phpcs实现自动语法检查">vim配合phpcs实现自动语法检查</a></li><li><a href="http://www.foolbirds.com/%e4%bd%bf%e7%94%a8bit-ly%e7%9f%ad%e7%bd%91%e5%9d%80%e6%9c%8d%e5%8a%a1api%e7%ae%80%e5%8c%96url.html" title="使用Bit.ly短网址服务API简化URL">使用Bit.ly短网址服务API简化URL</a></li><li><a href="http://www.foolbirds.com/vim%e9%85%8d%e7%bd%ae%e6%96%87%e4%bb%b6%e6%b3%a8%e8%a7%a3.html" title="vim配置文件注解">vim配置文件注解</a></li><li><a href="http://www.foolbirds.com/symfony-folder-and-deploy.html" title="symfony目录结构及部署办法说明">symfony目录结构及部署办法说明</a></li><li><a href="http://www.foolbirds.com/default-values-for-form.html" title="如何给表单项赋初值">如何给表单项赋初值</a></li></ul>]]></content:encoded>
			<wfw:commentRss>http://www.foolbirds.com/%e5%88%a4%e6%96%ad%e6%90%9c%e7%b4%a2%e5%bc%95%e6%93%8e%e7%88%ac%e8%99%ab.html/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
