网页爬虫 - Python+Selenium+PhantomJs爬虫,如何取得新打开页面的源码?
高洛峰
高洛峰 2017-04-18 10:21:55
0
2
811

我在做一个python爬虫,使用了selenium库和phantomjs浏览器。我在一个网页中触发了一个click事件打开了一个新的网页,然后我用browser.page_source得到的却是原来那个网页非新打开网页的源码,请问我该如何取得新打开页面的源码呢?

高洛峰
高洛峰

拥有18年软件开发和IT教学经验。曾任多家上市公司技术总监、架构师、项目经理、高级软件工程师等职务。 网络人气名人讲师,...

reply all(2)
黄舟

If the link opens a new tab, your driver will still use the current window by default,

Alternatively, you can pass a “window handle” to the “switch_to_window()” method. Knowing this, it’s possible to iterate over every open window like so:

for handle in driver.window_handles:
    driver.switch_to_window(handle)

For example, if your browser has several tabs, then window_handles saves the instance objects corresponding to these tabs, so if you only have one web page open currently, then the newly opened page is window_handles[1 ]
After switching to that page, get the source code.

Peter_Zhu

If it is opened in the current window, it is possible that the new page has not been loaded yet and the url and data of the new page cannot be obtained by then. You can use wait here and set some conditions to ensure that the new page is loaded before proceeding. Code As follows:

from selenium.webdriver.support.ui import WebDriverWait
# 等待新页面生成
WebDriverWait(self.browser, 5).until(
    expected_conditions.presence_of_element_located((By.ID, "username")
    )
Latest Downloads
More>
Web Effects
Website Source Code
Website Materials
Front End Template