Community Learn Tools Library Leisure

English

Home > Backend Development > Python Tutorial > Python extracts the most popular Q&A content on Zhihu

Python extracts the most popular Q&A content on Zhihu

大家讲道理

Release： 2016-11-09 11:29:25

Original

1115 people have browsed it

#-*- coding: utf-8 -*-
import urllib.request
import re
from _io import open
def yunpan_search():
    url = "https://www.zhihu.com/explore"
    req = urllib.request.Request(url, headers = {
        &#39;Connection&#39;: &#39;Keep-Alive&#39;,
        &#39;Accept&#39;: &#39;text/html, application/xhtml+xml, */*&#39;,
       &#39;Accept-Language&#39;: &#39;en-US,en;q=0.8,zh-Hans-CN;q=0.5,zh-Hans;q=0.3&#39;,
        &#39;User-Agent&#39;: &#39;Mozilla/5.0 (Windows NT 6.3; WOW64; Trident/7.0; rv:11.0) like Gecko&#39;
})
    opener = urllib.request.urlopen(req)
    html = opener.read()
    html = html.decode(&#39;utf-8&#39;)
    rex = &#39;(?<=<textarea class="content hidden">\n).*?(?=<span class="answer-date-link-wrap">)&#39;
    m = re.findall(rex,html,re.S)
    f = open(&#39;/root/Desktop/zhihu.txt&#39;,&#39;w&#39;)
    for i in m:
        f.write(i)
        f.write(&#39;\n\n&#39;)
    f.close()
    print("抓取成功!")
    file = open(&#39;/root/Desktop/zhihu.txt&#39;,&#39;r+&#39;)
    fullfile = file.readlines()
    text = []
    p = re.compile(r&#39;\w*&#39;, re.L)
    pp = re.compile(r"(&;)*")
    for line in fullfile:
        lines = p.sub(&#39;&#39;,line)
        liness = pp.sub(&#39;&#39;,lines)
        text.append(liness)
    file.seek(0)
    file.truncate(0)
    file.writelines(text)
    file.close()
    print("处理成功！")
 
if __name__==&#39;__main__&#39;:
    yunpan_search()

Copy after login

Related labels：

代码片段，代码分享，PHP代码分享，Java代码分享 Ruby代码分享，Python代码分享，HTML代码分享，CSS代

source：php.cn

Previous article：Simulate login packet python implementation Next article：python method to convert text into speech

Statement of this Website

The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Latest Articles by Author

.Net Core distributed mail system

1970-01-01 08:00:00
WeChat third-party login demo

2023-03-07 22:34:01
Events in BOM, DOM and JS

1970-01-01 08:00:00
.net core generates entity classes based on database

1970-01-01 08:00:00
cordova basic commands

1970-01-01 08:00:00
Analyze mysql row record modifications based on binlog

1970-01-01 08:00:00
php simple crawler

2023-03-07 22:32:01
2017 recruitment season: Super summary of PHP interview questions!

1970-01-01 08:00:00
Detailed explanation of the use of python os module

1970-01-01 08:00:00
How is autoreload implemented in Django developer mode?

1970-01-01 08:00:00

Latest Issues

Group MySQL results by ID for looping over I have a table with flight data in mysql. I'm writing a php code that will group and displ...

From 2024-04-06 17:27:56

0

1

406

Parent's padding ignored by sticky positioned child How to prevent sticky elements from going behind the header? The current code snippet uses...

From 2024-04-06 11:42:51

0

1

338

How to know when a Vue component is fully initialized? So, in my Vue component, I have an async created method and several variables with async w...

From 2024-04-05 14:20:24

0

1

1442

The restated title is: The English translation of Stripe Connect Subscription Split Fees is "Stripe Connect Subscription Split Fees" Hi, I'm building a platform for clients using StripePHPAPI that sells subscriptions on a m...

From 2024-04-02 20:15:22

0

1

418

How to align images using CSS grid? I'm trying to build part of a website using CSS grid. I'm trying to get the list and image...

From 2024-04-01 22:08:31

0

1

338

Related Topics

More>

Popular Recommendations

Popular Tutorials

More>

Related Tutorials

Popular Recommendations

Latest courses

Latest Downloads

More>

Web Effects

Website Source Code

Website Materials

Front End Template