代码是否可以在网络抓取中跳过一次迭代？ IndexError：弹出索引超出范围

Question

所以我有一个代码，它从14页（到目前为止）中删除矿物的名称+价格并将其保存到.txt文件中。我首先尝试仅使用Page1，然后我想添加更多页面以获取更多数据。但随后代码抓取了一些它不应该抓取的东西——随机名称/字符串。我没想到它会抢到那个，但它确实抢到了，并且给这个分配了错误的价格！它发生在具有这种“意外名称”的矿物之后，然后列表中的整个其余部分都有错误的价格。见下图：因此，由于该字符串与其他字符串

P粉391955763 · Answer

您可以尝试下一个示例以及分页

import requests
from bs4 import BeautifulSoup

for URL in range(0,100,25):
    headers = {"User-Agent": "Mozilla/5.0"}

    soup = BeautifulSoup(requests.get(f'https://www.fabreminerals.com/search_results.php?LANG=EN&SearchTerms=&submit=Buscar&MineralSpeciment=&Country=&Locality=&PriceRange=&checkbox=enventa&First={URL}', headers=headers).text, "lxml")

    names = [ x.get_text(strip=True) for x in soup.select('table tr td font a')][:25]
    print(names)
    prices = [ x.get_text(strip=True) for x in soup.select('table tr td font:nth-child(3)')][:25]
    print(prices)

    # with open("Minerals.txt", "a+", encoding='utf-8') as file:
    #     for name, price in zip(names, prices):
    #             # print(f"{name}
{price}")
    #             # print("-" * 50)
    #             filename = str(name)+" "+str(price)+"
"
    #             split1 = filename.split(' / ')          
    #             cutted1 = split1.pop(0)
    #             split2 = cutted1.split(": ")
    #             try:
    #                 cutted2 = split2.pop(1)
    #             except IndexError:
    #                 continue
    #             two_prices = cutted2+" "+split1.pop(0)+"
"
    #             file.write(two_prices)

输出：

["NX51AH2:
'lepidolite' after Elbaite with Elbaite", "TH27AL9:
'Pearceite' with Calcite", "TFM69AN5:
'Stilbite'", 'SM90CEX:
Acanthite', 'TMA97AN5:
Acanthite', 'TB90AE8:
 Acanthite', 'TZ71AK9:
Acanthite', 'EC63G1:
Acanthite', 'MN56K9:
Acanthite', 'TF89AL3:
Acanthite (Se-bearing) with Polybasite (Se-bearing) and Calcite', 'TP66AJ8:
Acanthite (Se-bearing) with Pyrite', 'TY86AN2:
Acanthite after Polybasite', 'TA66AF6:
Acanthite with Calcite', 'JFD104AO2:
Acanthite with Calcite', 'TX36AL6:
Acanthite with Calcite', 'TA48AH1:
Acanthite with Chalcopyrite', 'EF89L9:
Acanthite with Pyrite and Calcite', 'TX89AN0:
Acanthite with Siderite and Proustite', 'EA56K0:
Acanthite with Silver', 'EC48K0:
Acanthite with Silver', '11AT12:
Acanthite, Calcite', '9EF89L9:
Acanthite, Pyrite, Calcite', 'SM75TDA:
Adamite', '2M14:
Adamite', '20MJX66:
Adamite']
['Price:€580 / US8 / ¥84010 / AUD0', 'Price:€220 / US7 / ¥31860 / AUD0', 'Price:€450 / US4 / ¥65180 / AUD0', 'Price:€90 / US / ¥13030 / AUD0', 'Price:€240 / US7 / ¥34760 / AUD0', 'Price:€540 / US7 / 
¥78220 / AUD0', 'Price:€580 / US8 / ¥84010 / AUD0', 'Price:€85 / US / ¥12310 / AUD0', 'Price:€155 / US9 / ¥22450 / AUD0', 'Price:€460 / US4 / ¥66630 / AUD0', 'Price:€1500 / US47 / ¥217290 / AUD10', 'Price:€1600 / US51 / ¥231770 / AUD60', 'Price:€160 / US5 / ¥23170 / AUD0', 'Price:€240 / US7 / ¥34760 / AUD0', 'Price:€1200 / US38 / ¥173830 / AUD50', 'Price:€290 / US9 / ¥42000 / AUD0', 'Price:€480 / US5 / ¥69530 / AUD0', 'Price:€4800 / US53 / ¥695320 / AUD00', 'Price:€150 / US4 / ¥21720 / AUD0', 'Price:€290 / US9 / ¥42000 / AUD0', 'Price:€70 / US / ¥10140 / AUD0', 'Price:€320 / US0 / ¥46350 / AUD0', 'Price:€75 / US / ¥10860 / AUD0', 'Price:€90 / US / ¥13030 / AUD0', 'Price:€140 / US4 / ¥20280 / AUD5']
['5TD76M9:
Adamite', 'MA54AE9:
Adamite (variety Cu-bearing adamite) with Calcite', 'EA11Y6:
Adamite (variety cuprian)', 'EB14Y6:
Adamite (variety cuprian)', 'MC11X8:
Adamite (variety cuprian) with Smithsonite', 'JRM10AN8:
Aegirine', 'MFA46AP3:
Aegirine with Zircon, Orthoclase and Quartz (variety smoky)', 'EM48AF8:
Alabandite with Calcite', 'MC92T6:
Alabandite with Calcite and Rhodochrosite', 'TF16AN1:
Alabandite with Rhodochrosite', 'TX17S1:
Alabandite with Rhodochrosite', 'TD34S1:
Alabandite with Rhodochrosite', '10TR46:
Almandine', 'HM90EJ:
Analcime', 'EFH36AP3:
Analcime with Natrolite, Rhodochrosite and Serandite', 'ELR67AP1:
Analcime with Quartz', 'EML88AP1:
Analcime with Quartz', 'TF87AF4:
Andorite', 'TR88AJ3:
Andorite', 'ND56AN0:
Andorite with Zinkenite', 'SM180NH:
Andradite (variety demantoid)', 'MT86AL3:
Andradite (variety demantoid) with Calcite', 'MA27AL7:
Andradite (variety demantoid) with Calcite', 'TC80TL:
Andradite (variety topazolite) with Clinochlore', 'TC85TE:
Andradite (variety topazolite) with Clinochlore']
['Price:€180 / US5 / ¥26070 / AUD0', 'Price:€840 / US6 / ¥121680 / AUD90', 'Price:€60 / US / ¥8690 / 
AUD', 'Price:€90 / US / ¥13030 / AUD0', 'Price:€70 / US / ¥10140 / AUD0', 'Price:€580 / US8 / ¥84010 / AUD0', 'Price:€1600 / US51 / ¥231770 / AUD68', 'Price:€2700 / US86 / ¥391120 / AUD60', 'Price:€740 / US3 / ¥107190 / AUD40', 'Price:€110 / US3 / ¥15930 / AUD0', 'Price:€220 / US7 / ¥31860 / AUD0', 'Price:€920 / US9 / ¥133270 / AUD10', 'Price:€140 / US4 / ¥20280 / AUD0', 'Price:€90 / US / ¥13030 / AUD0', 'Price:€130 / US4 / ¥18830 / AUD0', 'Price:€260 / US8 / ¥37660 / AUD0', 'Price:€380 / US2 / ¥55040 / AUD0', 'Price:€240 / US7 / ¥34760 / AUD0', 'Price:€390 / US2 / ¥56490 / AUD0', 'Price:€150 / US4 / ¥21720 / AUD0', 'Price:€180 / US5 / ¥26070 / AUD0', 'Price:€1600 / US51 / ¥231770 / AUD60', 'Price:€2200 / US70 / ¥318690 / AUD90', 'Price:€80 / US / ¥11580 / AUD0', 'Price:€85 / US / ¥12310 / AUD0']
['T29NAK3:
Andradite (variety topazolite) with Clinochlore', 'TC85TV:
Andradite (variety topazolite) with Clinochlore', 'T89GH5:
Andradite (variety topazolite) with Clinochlore', 'TQ94Q0:
Andradite (variety topazolite) with Stilbite', 'SM140TFV:
Andradite on Microcline', 'HM140NG:
Andradite with Calcite', 'GM66R9:
Andradite with Clinochlore', 'SM70TYW:
Andradite with Epidote', 'TC290TVH:
Andradite with Epidote and Microcline', 'TKX11AO7:
Andradite with Microcline', 'TC2100TEJ:
Andradite with Microcline', 'TH16AN2:
Andradite with Microcline', 'TTX66AO7:
Andradite with Microcline', 'TC2150TJL:
Andradite with Microcline', 'TQ96AN2:
Andradite with Microcline', 'TF48AF2:
Anglesite', 'MA47AL4:
Anglesite with Galena', 'LQ88AE6:
Anglesite with Galena', 'ER90AL8:
Anglesite with Galena', 'TP70AE1:
Anglesite with Galena', 'N54NAL5:
Anglesite with Galena', 'GV96R9:
Anhydrite with Calcite and Pyrite', '11TV99:
Anhydrite, Calcite', 'MG26AL4:
Anorthoroselite with Calcite', 'XM260NFF:
Aragonite']
['Price:€240 / US7 / ¥34760 / AUD0', 'Price:€85 / US / ¥12310 / AUD0', 'Price:€220 / US7 / ¥31860 / AUD0', 'Price:€980 / US11 / ¥141960 / AUD10', 'Price:€140 / US4 / ¥20280 / AUD0', 'Price:€140 / US4 / ¥20280 / AUD0', 'Price:€160 / US5 / ¥23170 / AUD0', 'Price:€70 / US / ¥10140 / AUD0', 'Price:€90 / US / ¥13030 / AUD0', 'Price:€70 / US / ¥10140 / AUD0', 'Price:€100 / US3 / ¥14480 / AUD0', 'Price:€110 / US3 / ¥15930 / AUD0', 'Price:€140 / US4 / ¥20280 / AUD0', 'Price:€150 / US4 / ¥21720 / AUD0', 'Price:€220 / US7 / ¥31860 / AUD0', 'Price:€380 / US2 / ¥55040 / AUD0', 'Price:€220 / US7 / ¥31860 / AUD0', 'Price:€360 / US1 / ¥52140 / AUD0', 'Price:€540 / US7 / ¥78220 / AUD0', 'Price:€540 / US7 / ¥78220 / AUD0', 'Price:€940 / US9 / ¥136160 / AUD50', 'Price:€220 / US7 / ¥31860 / AUD0', 'Price:€460 / US4 / ¥66630 / AUD0', 'Price:€140 / US4 / ¥20280 / AUD0', 'Price:€60 / US / ¥8690 / AUD'] 
['XM295EAR:
Aragonite', 'ETE46AP2:
Aragonite', 'EXM26AP0:
Aragonite', 'EYB26AP0:
Aragonite', 'EXE56AP2:
Aragonite', 'ETF46AP0:
Aragonite', 'XM2160ERF:
Aragonite', 'EXM46AP0:
Aragonite', 'XM2190MEX:
Aragonite', 'XM2780EFT:
Aragonite', 'EHM93AO9:
Aragonite', 'TYB37AO8:
Aragonite (variety Cu-bearing aragonite)', 'SM99AM3:
Aragonite (variety cuprian)', '1M06:
Aragonite (variety flos ferri)', 'TG69AL3:
Aragonite (variety tarnowitzite)', 'MLC96AO2:
Aragonite on Calcite', 'MLE68AO2:
Aragonite on Calcite', 'MTB66AP3:
Aragonite with Quartz (variety hematoide)', 'MXF96AP3:
Aragonite with Quartz (variety hematoide)', 'MRR47AP3:
Aragonite with Quartz (variety hematoide)', 'MTR37AP3:
Aragonite with Quartz (variety hematoide)', 'JFD193AP3:
Arfvedsonite with Microcline', 'TFX76AO7:
Arsenopyrite with Calcite, Pyrite, Sphalerite and Rhodochrosite', 'NB37AL3:
Arsenopyrite with Muscovite', 'HM220NX:
Arsenopyrite with Muscovite']
['Price:€95 / US / ¥13760 / AUD6', 'Price:€140 / US4 / ¥20280 / AUD0', 'Price:€140 / US4 / ¥20280 / AUD0', 'Price:€140 / US4 / ¥20280 / AUD0', 'Price:€150 / US4 / ¥21720 / AUD0', 'Price:€150 / US4 / 
¥21720 / AUD0', 'Price:€160 / US5 / ¥23170 / AUD6', 'Price:€160 / US5 / ¥23170 / AUD0', 'Price:€190 / US6 / ¥27520 / AUD3', 'Price:€780 / US4 / ¥112990 / AUD03', 'Price:€880 / US8 / ¥127470 / AUD50', 'Price:€240 / US7 / ¥34760 / AUD0', 'Price:€480 / US5 / ¥69530 / AUD0', 'Price:€100 / US3 / ¥14480 / AUD0', 'Price:€460 / US4 / ¥66630 / AUD0', 'Price:€190 / US6 / ¥27520 / AUD0', 'Price:€360 / US1 
/ ¥52140 / AUD0', 'Price:€160 / US5 / ¥23170 / AUD6', 'Price:€190 / US6 / ¥27520 / AUD3', 'Price:€230 / US7 / ¥33310 / AUD4', 'Price:€230 / US7 / ¥33310 / AUD4', 'Price:€240 / US7 / ¥34760 / AUD0', 'Price:€170 / US5 / ¥24620 / AUD0', 'Price:€220 / US7 / ¥31860 / AUD0', 'Price:€220 / US7 / ¥31860 / AUD0']

P粉677684876 · Answer

您只需使 CSS 选择器更加具体，以便仅识别直接位于字体元素内部（而不是向下几层）的链接：

soup.select("table tr td font>a")

添加进一步的条件，即链接指向单个项目而不是页面底部的下一页/上一页链接也将有所帮助：

soup.select("table tr td font>a[href*='CODE']")