Python输入中文的问题

Question

我写了一个爬乌云漏洞库的爬虫，其URL形式为http://www.wooyun.org/corps/公司名称/page/1，程序最后raw_input处输入公司名称即可跑出该公司的漏洞。现在的问题是中文编码的问题没解决好，如果公司的名称是英文如...

伊谢尔伦 · Answer

The URL contains Chinese characters and needs to be escaped..
Replace line 14 with

url = 'http://www.wooyun.org/corps/' + urllib.quote(corpName)+ '/page/' + str(pageNum)

Ubuntu Gnome Terminal has been successfully tested (Baidu)

阿神 · Answer

I don’t think it’s a character encoding problem, URL 里怎么能直接出现汉字？你不要以为浏览器里显示：http://www.wooyun.org/corps/公司名称/page/1 你就觉得浏览器请求的 URL 里面的 公司名称 it’s just Chinese characters.

In fact, when the browser makes a request, URL 里的汉字全都会用 URLEncode it turns out that there are no Chinese characters in the real request.

Just like you requested: http://www.wooyun.org/corps/阿里巴巴/page/1 ，这是不会成功的。
但你把URL写成：http://www.wooyun.org/corps/%E9%98%BF%E9%87%8C%E5%B7%B4%E5%B7%B4/page/1, you can successfully request to Alibaba’s page.

巴扎黑 · Answer

from urllib import quote

print quote('百度')