How to Extract Hidden Information from #shadow-roots Using Selenium Python?-Python Tutorial-php.cn

How to Extract Hidden Information from #shadow-roots Using Selenium Python?

Patricia Arquette

Release： 2024-10-19 06:44:01

Original

404 people have browsed it

How to Extract Hidden Information from #shadow-roots Using Selenium Python?

Extracting Information from a #shadow-root using Selenium Python

In the realm of web scraping, extracting data from elements concealed within #shadow-roots can pose a significant challenge. This article explores the techniques to overcome this obstacle using Selenium Python.

Problem:

Consider the URL https://www.tiendasjumbo.co/buscar?q=mani from an online store. To extract product labels and other fields from this site, a user attempted the following approach:

<code class="python">from selenium import webdriver
import time
from random import randint

driver = webdriver.Firefox(executable_path="C:\Program Files (x86)\geckodriver.exe")
driver.implicitly_wait(10)
time.sleep(4)

url = "https://www.tiendasjumbo.co/buscar?q=mani"
driver.maximize_window()
driver.get(url)
driver.find_element_by_xpath('//h1[@class="impulse-title"]')</code>

Copy after login

However, this approach failed, and switching iframes proved equally unsuccessful.

Solution:

The key to extracting data from this site lies in recognizing that the products are located within a #shadow-root. To access these elements, Selenium provides the shadowRoot.querySelector() method. Using this method, the product label can be extracted using the following Locator Strategy:

<code class="python">driver.get('https://www.tiendasjumbo.co/buscar?q=mani')
item = driver.execute_script("return document.querySelector('impulse-search').shadowRoot.querySelector('div.group-name-brand h1.impulse-title span.formatted-text')")
print(item.text)</code>

Copy after login

Running this script outputs the product label:

<code class="text">La especial mezcla de nueces, maní, almendras y marañones x 450 g</code>

Copy after login

References:

For further insights, refer to the following resources:

Unable to locate the Sign In element within #shadow-root (open) using Selenium and Python
How to locate the First name field within shadow-root (open) within the website https://www.virustotal.com using Selenium and Python

Note:

Regarding Microsoft Edge and Google Chrome version 96, changes to shadow root return values for Selenium have been introduced. Refer to the links provided in the solution for more information on addressing these changes in different programming languages.

The above is the detailed content of How to Extract Hidden Information from #shadow-roots Using Selenium Python?. For more information, please follow other related articles on the PHP Chinese website!