python - 爬虫获取页面链接，求问如何判断是最新的链接？

Question

小弟想做一个自动转发网站新闻的微博机器人作为python练手项目。我知道需要api对接、需要爬取网站的新闻链接和标题。但是如何只提取最新的新闻呢？以下是按照我的要求过滤后，输出所有新闻的代码: {代码...} 我想...

伊谢尔伦 · Answer

Are you crawling the live broadcast?

You can set a variable lasttime to record the time of the last crawl

from datetime import datetime

#type datetime
lasttime

bar = soup.find_all('li', attrs={'data-label': True})
news = len(bar)
for i in range(news):
    d = datetime.strptime(bar[i].text[-19:], "%Y-%m-%d %H:%M:%S")
    if u'巴塞罗那' in bar[i]['data-label'].split(',') and d > lasttime:
        print bar[i]

阿神 · Answer

In fact, this problem is very common, that is, heavy sentences. First, you need to add a unique identifier to each news, such as a timestamp, or the connection method in the live broadcast bar: "http://news.zhibo8.cc/zuqiu/2016-10-18/5805df3d3422f", you can Available:

20161018-5805df3d3422f

As the unique ID of the news, or more strictly, add the football logo, such as 0:

0-20161018-5805df3d3422f

With a unique ID, it is much easier to handle. There are many ways. For example, maintain a list in memory, which stores the IDs of the news on the current page in order, and then crawl the page next time, then the new ones on the page The news is the news after the first id in the current list. Then update the list. You can delete old news from the list. For example, if n new news are added, then the last n news will be deleted. Regardless of space or time, it's pretty good.
If you still want to save news, then save the deleted news to the database every time.

迷茫 · Answer

Don’t all news pages have a time field?

大家讲道理 · Answer

Your goal is to extract the latest news and include the keywords you set! ! In fact, the simplest way is to set time.sleep(60) and re-crawl the web page data after one minute. Then you can get the latest news, right? Also, your question contains too little information,

Php8, I'm coming too

Learn website layout in 30 minutes

Shangguan Oracle Beginner to Proficient Video Tutorial

Your first line of UNI-APP code

Flutter from scratch to app launch

Brother Lian New Linux Video Tutorial

AXURE 9 Video Tutorial (Suitable for Product Manager Interactive Product Design UI)

Zero Basic Proficiency PS Video Tutorial

16 day UI video tutorial to get you started

PS Techniques and Slicing Techniques Video Tutorial

Alibaba Cloud Environment Construction and Project Launch Video Tutorial

Overview of Computer Networks - Basic Knowledge that Programmers Must Master

Essential Tutorial for Programmers - HTTP Protocol Explanation

Websocket Video Tutorial