嗨,我需要精通網頁抓取的人的幫助,因為我是程式新手。我的任務是從工作連結中提取「關於客戶」部分。我的腳本僅提取一個“關於客戶端”,但對於其他鏈接,它不會執行此操作並引發錯誤。問題是有一個 xml 文件鏈接,我從中提取作業鏈接,當這些鏈接打開時,html 代碼位於我使用 selenium 的 java 腳本下。我已經嘗試了所有方法,但沒有得到解決方案。 `def extract_client_info(job_url):
client_info = {'關於顧客': np.nan}
if job_url and job_url != "N/A": try: # Open the job URL driver.get(job_url) # Wait for the page to load WebDriverWait(driver, 30).until( EC.presence_of_element_located((By.CSS_SELECTOR, '.cfe-about-client-v2')) ) # Extract specific details about_client_section = driver.find_element(By.CSS_SELECTOR, '.cfe-about-client-v2') client_location = about_client_section.find_element(By.CSS_SELECTOR, '[data-qa="client-location"]').text.strip() client_job_posting_stats = about_client_section.find_element(By.CSS_SELECTOR, '[data-qa="client-job-posting-stats"]').text.strip() if about_client_section.find_elements(By.CSS_SELECTOR, '[data-qa="client-job-posting-stats"]') else "N/A" client_company_profile = about_client_section.find_element(By.CSS_SELECTOR, '[data-qa="client-company-profile"]').text.strip() # Combine extracted information client_info['About the Client'] = ( f"Location: {client_location}\n" f"Job Posting Stats: {client_job_posting_stats}\n" f"Company Profile: {client_company_profile}" ) except Exception as e: print(f"Failed to get 'About the Client' for {job_url}: {e}") client_info['About the Client'] = np.nan finally: # Wait for 10 seconds before making the next request time.sleep(10) return client_info`
以上是需要幫助!的詳細內容。更多資訊請關注PHP中文網其他相關文章!