python - 禁止自己的网站被爬虫爬去?
大家讲道理
大家讲道理 2017-04-17 17:33:35
0
13
1007

禁止自己的网站被爬虫爬去?有什么方法啊

大家讲道理
大家讲道理

光阴似箭催人老,日月如移越少年。

reply all(13)
迷茫

Add a robots.txt file, content:

User-agent: *
Disallow: /
刘奇

Add robots.txt to tell the crawler not to crawl my website, but it will not be forcibly banned. This is just an agreement that both parties need to abide by.

巴扎黑

I don’t know if the crawler you are talking about refers to Baidu crawler or the crawler we wrote ourselves.

Baidu crawlers can just follow the method above. There are many ways to prevent other people's crawlers, such as dynamically generating all classes or ids. Because crawlers usually parse HTML to get what they want through class or id.

大家讲道理

It also depends on what kind of reptile it is
A gentleman? Miniature?
If this crawler can abide by the robots.txt agreement, then it’s fine
But this is just a gentleman’s agreement
If it encounters a villain, then it’s okay

迷茫

1) You can try gzip compression for JS. Many crawlers will not crawl gzip-compressed js.
2) Use log to analyze the logs of the web server. If it is malicious access to your key resources, and the other party is a fixed IP , you can try to ban the other party’s IP

黄舟

To be reasonable, it is impossible to do it absolutely

Peter_Zhu

It’s useless. First of all, if your website is open to people, it will naturally be open to crawlers, unless it is changed to an internal network. If you focus on preventing crawlers from getting up, you might as well improve the quality. Now it is a classified information website It’s all crawling around, but the user experience is basically not improved.

迷茫

Pfft, you can mess up the class and id so that there is no pattern and even the regular rules will not match

阿神

I don’t know if it’s possible to dynamically generate all web content using js

巴扎黑

First of all, it is difficult for you to prevent 100% crawlers from being crawled, unless it is an internal network as mentioned above.

But you can take some measures to prevent some low-tech crawlers from crawling your website.

For specific measures, you can go to Zhihu. To read this article, click here

Hope it helps you

Latest Downloads
More>
Web Effects
Website Source Code
Website Materials
Front End Template
About us Disclaimer Sitemap
php.cn:Public welfare online PHP training,Help PHP learners grow quickly!