Python Web刮擦與登錄示例-Python教學-PHP中文網

✅ 1. Required Libraries

✅ 2. Example: Login and Scrape a Page

✅ 3. Key Points to Remember

✅ 4. Real-World Example (Generic Pattern)

✅ 5. Alternative: Use Selenium for JavaScript-heavy Sites

首頁

後端開發

Python教學

Python Web刮擦與登錄示例

下次还敢

Jul 31, 2025 am 09:24 AM

使用requests.Session()發送POST請求並維護會話以登錄網站；2. 通過BeautifulSoup解析登錄頁面獲取隱藏字段如CSRF令牌；3. 構造包含用戶名、密碼和令牌的登錄數據並提交；4. 驗證登錄是否成功，檢查響應內容中的“Logout”或“Dashboard”等標誌；5. 登錄成功後用同一會話訪問受保護頁面並抓取所需內容；6. 對於JavaScript動態渲染的頁面應改用Selenium模擬瀏覽器操作；7. 始終遵守網站的robots.txt和使用條款，避免生產環境硬編碼憑證，推薦使用環境變量存儲敏感信息，最終確保scraping 行為合法合規。

python web scraping with login example

If you need to scrape a website that requires login, you'll typically need to send a POST request with your credentials first, maintain the session, and then access the protected pages. Here's a practical example using Python's requests and BeautifulSoup libraries to log in and scrape a page behind authentication.

We'll use a dummy login form structure (like many real sites) and show how to handle it.

✅ 1. Required Libraries

Install the needed packages if you haven't:

 pip install requests beautifulsoup4

 import requests
from bs4 import BeautifulSoup

# Step 1: Start a session
session = requests.Session()

# Step 2: URL of the login page (example)
login_url = &#39;https://httpbin.org/post&#39; # Placeholder - replace with actual login URL
target_url = &#39;https://example.com/dashboard&#39; # Page you want to scrape after login

# Step 3: Get login page (to extract hidden form fields like CSRF tokens if needed)
login_page = session.get(&#39;https://example.com/login&#39;)
soup = BeautifulSoup(login_page.content, &#39;html.parser&#39;)

# Optional: Extract hidden inputs (eg, CSRF token)
csrf_token = soup.find(&#39;input&#39;, {&#39;name&#39;: &#39;csrf_token&#39;})[&#39;value&#39;] # Adjust name as needed

# Step 4: Prepare login payload
payload = {
    &#39;username&#39;: &#39;your_username&#39;,
    &#39;password&#39;: &#39;your_password&#39;,
    &#39;csrf_token&#39;: csrf_token # Include if present
}

# Step 5: Submit login form
response = session.post(&#39;https://example.com/login&#39;, data=payload)

# Step 6: Check if login was successful
if "Logout" in response.text or "Dashboard" in response.text:
    print("✅ Login successful")
else:
    print("❌ Login failed")
    print(response.status_code)
    print(response.text[:500]) # Debug output
    exit()

# Step 7: Scrape a protected page
protected_page = session.get(target_url)
soup = BeautifulSoup(protected_page.content, &#39;html.parser&#39;)

# Example: Extract page title or specific content
print("Page Title:", soup.title.string)
# Or scrape data
data = soup.find_all(&#39;div&#39;, class_=&#39;content&#39;) # Adjust selector
for item in data:
    print(item.get_text(strip=True))

✅ 3. Key Points to Remember

Session Persistence : Use requests.Session() to keep cookies and stay logged in.
Inspect the Login Form : Use browser DevTools (F12) to:
- Find the correct login URL (form's action attribute)
- Check input field names (eg, username , email , password , csrf_token )
CSRF & Hidden Fields : Many sites require tokens — always check for hidden inputs.

HTTPS & Security : Never hardcode credentials in production. Use environment variables:

 import os
username = os.getenv(&#39;LOGIN_USER&#39;)
password = os.getenv(&#39;LOGIN_PASS&#39;)

Respect robots.txt and Terms of Service — scraping may be prohibited.

✅ 4. Real-World Example (Generic Pattern)

 import requests
from bs4 import BeautifulSoup
import os

session = requests.Session()

# Load credentials
USER = os.getenv(&#39;USERNAME&#39;, &#39;test@example.com&#39;)
PASS = os.getenv(&#39;PASSWORD&#39;, &#39;secret&#39;)

# Fetch login page
resp = session.get(&#39;https://example.com/login&#39;)
soup = BeautifulSoup(resp.text, &#39;html.parser&#39;)

# Extract CSRF token
token = soup.find(&#39;input&#39;, {&#39;name&#39;: &#39;authenticity_token&#39;})[&#39;value&#39;]

# Login data
data = {
    &#39;authenticity_token&#39;: token,
    &#39;user[email]&#39;: USER,
    &#39;user[password]&#39;: PASS,
    &#39;commit&#39;: &#39;Log in&#39;
}

# Post to login
r = session.post(&#39;https://example.com/sessions&#39;, data=data)

# Now scrape
dashboard = session.get(&#39;https://example.com/my-account&#39;)

✅ 5. Alternative: Use Selenium for JavaScript-heavy Sites

If the login is handled by JavaScript (eg, React, Vue), use Selenium :

 from selenium import webdriver
from selenium.webdriver.common.by import By

driver = webdriver.Chrome()
driver.get("https://example.com/login")

driver.find_element(By.NAME, "username").send_keys("your_user")
driver.find_element(By.NAME, "password").send_keys("your_pass")
driver.find_element(By.XPATH, "//button[@type=&#39;submit&#39;]").click()

# Wait and go to target page
driver.implicitly_wait(5)
driver.get("https://example.com/profile")

print(driver.page_source)
driver.quit()

Basically, for simple forms: requests Session works great. For dynamic sites: go with Selenium . Always test on a small scale and check the site's policies.

以上是Python Web刮擦與登錄示例的詳細內容。更多資訊請關注PHP中文網其他相關文章！

本網站聲明

本文內容由網友自願投稿，版權歸原作者所有。本站不承擔相應的法律責任。如發現涉嫌抄襲或侵權的內容，請聯絡admin@php.cn

熱AI工具

Undress AI Tool

免費脫衣圖片

Undresser.AI Undress

人工智慧驅動的應用程序，用於創建逼真的裸體照片

AI Clothes Remover

用於從照片中去除衣服的線上人工智慧工具。

Clothoff.io

AI脫衣器

Video Face Swap

使用我們完全免費的人工智慧換臉工具，輕鬆在任何影片中換臉！

熱工具

記事本++7.3.1

好用且免費的程式碼編輯器

SublimeText3漢化版

中文版，非常好用

禪工作室 13.0.1

強大的PHP整合開發環境

Dreamweaver CS6

視覺化網頁開發工具

SublimeText3 Mac版

神級程式碼編輯軟體(SublimeText3)

熱門話題

Laravel 教程

1604

PHP教程

1509

276

Related knowledge

Python類可以有多個構造函數嗎？ Jul 15, 2025 am 02:54 AM

Yes,aPythonclasscanhavemultipleconstructorsthroughalternativetechniques.1.Usedefaultargumentsinthe__init__methodtoallowflexibleinitializationwithvaryingnumbersofparameters.2.Defineclassmethodsasalternativeconstructorsforclearerandscalableobjectcreati

python for Loop範圍 Jul 14, 2025 am 02:47 AM

在Python中，使用for循環配合range()函數是控制循環次數的常見方式。 1.當明確知道循環次數或需按索引訪問元素時使用；2.range(stop)從0到stop-1，range(start,stop)從start到stop-1，range(start,stop,step)加入步長；3.注意range不包含結束值，且在Python3返回可迭代對象而非列表；4.可通過list(range())轉換為列表，倒序時用負步長。

用於量子機學習的Python Jul 21, 2025 am 02:48 AM

要入門量子機器學習（QML），首選工具是Python，需安裝PennyLane、Qiskit、TensorFlowQuantum或PyTorchQuantum等庫；接著通過運行示例熟悉流程，如使用PennyLane構建量子神經網絡；然後按照數據集準備、數據編碼、構建參數化量子線路、經典優化器訓練等步驟實現模型；實戰中應避免一開始就追求復雜模型，關注硬件限制，採用混合模型結構，並持續參考最新文獻和官方文檔以跟進發展。

從Python中的Web API訪問數據 Jul 16, 2025 am 04:52 AM

使用Python調用WebAPI獲取數據的關鍵在於掌握基本流程和常用工具。 1.使用requests發起HTTP請求是最直接的方式，通過get方法獲取響應並用json()解析數據；2.對於需要認證的API，可通過headers添加token或key；3.需檢查響應狀態碼，推薦使用response.raise_for_status()自動處理異常；4.面對分頁接口，可通過循環依次請求不同頁面並加入延時避免頻率限制；5.處理返回的JSON數據時需根據結構提取信息，複雜數據可用pandas轉換為Data

python一行，如果還有 Jul 15, 2025 am 01:38 AM

Python的onelineifelse是三元操作符，寫法為xifconditionelsey，用於簡化簡單的條件判斷。它可用於變量賦值，如status="adult"ifage>=18else"minor"；也可用於函數中直接返回結果，如defget_status(age):return"adult"ifage>=18else"minor"；雖然支持嵌套使用，如result="A"i

成品python大片在線觀看入口 python免費成品網站大全 Jul 23, 2025 pm 12:36 PM

本文為您精選了多個頂級的Python“成品”項目網站與高水平“大片”級學習資源入口。無論您是想尋找開發靈感、觀摩學習大師級的源代碼，還是系統性地提昇實戰能力，這些平台都是不容錯過的寶庫，能幫助您快速成長為Python高手。

python如果還有示例 Jul 15, 2025 am 02:55 AM

寫Python的ifelse語句關鍵在於理解邏輯結構與細節。 1.基礎結構是if條件成立執行一段代碼，否則執行else部分，else可選；2.多條件判斷用elif實現，順序執行且一旦滿足即停止；3.嵌套if用於進一步細分判斷，建議不超過兩層；4.簡潔場景可用三元表達式替代簡單ifelse。注意縮進、條件順序及邏輯完整性，才能寫出清晰穩定的判斷代碼。

python for循環逐行讀取文件 Jul 14, 2025 am 02:47 AM

使用for循環逐行讀取文件是一種高效處理大文件的方法。 1.基本用法是通過withopen()打開文件並自動管理關閉，結合forlineinfile遍歷每一行，line.strip()可去除換行符和空格；2.若需記錄行號，可用enumerate(file,start=1)讓行號從1開始；3.處理非ASCII文件時應指定encoding參數如utf-8，以避免編碼錯誤。這些方法簡潔實用，適用於大多數文本處理場景。

See all articles

Python Web刮擦與登錄示例

✅ 1. Required Libraries

✅ 2. Example: Login and Scrape a Page

✅ 3. Key Points to Remember

✅ 4. Real-World Example (Generic Pattern)

✅ 5. Alternative: Use Selenium for JavaScript-heavy Sites

熱AI工具

Undress AI Tool

Undresser.AI Undress

AI Clothes Remover

Clothoff.io

Video Face Swap

熱門文章

熱工具

記事本++7.3.1

SublimeText3漢化版

禪工作室 13.0.1

Dreamweaver CS6

SublimeText3 Mac版

熱門話題