如何在 Qt 中使用 QWebPage 有效率地檢索多個 URL？-Python教學-PHP中文網

如何在 Qt 中使用 QWebPage 有效率地檢索多個 URL？

DDD

發布： 2024-10-27 11:42:30

原創

684 人瀏覽過

How to Efficiently Retrieve Multiple URLs Using QWebPage in Qt?

使用 QWebPage 檢索多個 URL

在此場景中，您嘗試使用 Qt 的 QWebPage 來呈現動態更新的頁面。但是，您在嘗試渲染第二個頁面時經常遇到崩潰。

問題分析

問題出在您的方法上。您正在為每個 URL 取得初始化一個新的 QApplication 和 QWebPage。相反，建議維護單一 QApplication 和 QWebPage，使用訊號和自訂處理來處理同一實例中的多個 URL。

建議的解決方案

WebPage 類別

以下是PyQt5 和PyQt4 的自訂🎜>以下是PyQt5 和PyQt4 的自訂🎜>

<code class="python">from PyQt5.QtCore import pyqtSignal, QUrl
from PyQt5.QtWidgets import QApplication
from PyQt5.QtWebEngineWidgets import QWebEnginePage

class WebPage(QWebEnginePage):
    htmlReady = pyqtSignal(str, str)

    def __init__(self, verbose=False):
        super().__init__()
        self._verbose = verbose
        self.loadFinished.connect(self.handleLoadFinished)

    def process(self, urls):
        self._urls = iter(urls)
        self.fetchNext()

    def fetchNext(self):
        try:
            url = next(self._urls)
        except StopIteration:
            return False
        else:
            self.load(QUrl(url))
        return True

    def processCurrentPage(self, html):
        self.htmlReady.emit(html, self.url().toString())
        if not self self.fetchNext():
            QApplication.instance().quit()

    def handleLoadFinished(self):
        self.toHtml(self.processCurrentPage)

    def javaScriptConsoleMessage(self, *args, **kwargs):
        if self._verbose:
            super().javaScriptConsoleMessage(*args, **kwargs)</code>

登入後複製

PyQt4 WebPage

用法

<code class="python">from PyQt4.QtCore import pyqtSignal, QUrl
from PyQt4.QtGui import QApplication
from PyQt4.QtWebKit import QWebPage

class WebPage(QWebPage):
    htmlReady = pyqtSignal(str, str)

    def __init__(self, verbose=False):
        super(WebPage, self).__init__()
        self._verbose = verbose
        self.mainFrame().loadFinished.connect(self.handleLoadFinished)

    def process(self, urls):
        self._urls = iter(urls)
        self.fetchNext()

    def fetchNext(self):
        try: 
            url = next(self._urls)
        except StopIteration:
            return False
        else:
            self.mainFram().load(QUrl(url))
        return True

    def processCurrentPage(self):
        self.htmlReady.emit(self.mainFrame().toHtml(), self.mainFrame().url().toString())
        if not self.fetchNext():
            QApplication.instance().quit()

    def javaScripConsoleMessage(self ,* args, **kwargs):
        if self._verbose:
            super(WebPage, self).javaScriptConsoleMessage(*args, **kwargs)</code>

登入後複製

用法

<code class="python">from PyQt5.QtCore import QUrl
from PyQt5.QtWidgets import QApplication

# PyQt5
url_list = ['https://example.com', 'https://example2.com']
app = QApplication(sys.argv)
webpage = WebPage(verbose=True)
webpage.htmlReady.connect(my_html_processor)
webpage.process(url_list)
sys.exit(app.exec_())

# PyQt4
from PyQt4.QtCore import QUrl
from PyQt4.QtGui import QApplication
url_list = ['https://example.com', 'https://example2.com']
app = QApplication(sys.argv)
webpage = WebPage(verbose=True)
webpage.htmlReady.connect(my_html_processor)
webpage.process(url_list)
sys.exit(app.exec_())</code>

登入後複製

用法

用法在此程式碼中，my_html_processor 是一個可以自訂的函數，用於處理已處理的HTML 和每個頁面的URL 資訊。透過實作此方法，您可以防止先前遇到的崩潰和隨機行為，從而實現更穩定、更有效率的網頁抓取工作流程。

以上是如何在 Qt 中使用 QWebPage 有效率地檢索多個 URL？的詳細內容。更多資訊請關注PHP中文網其他相關文章！