Scrapy INSERT ke dalam 'new_table' hanya jika tiada rekod wujud dalam 'current table'
P粉122932466
P粉122932466 2024-03-29 19:36:16
0
1
317

Saya mencuba beberapa pengikisan tapak web. Saya berjaya mengikis data daripada jadual pangkalan data semasa. Tetapi saya mahu memasukkan "jadual_baru" hanya jika rekod tidak wujud dalam "jadual semasa"

Kod saya ialah (talian paip)

table = 'products' table2 = 'new_products'` def save(self, row): cursor = self.cnx.cursor() cursor.execute("SELECT DISTINCT product_id FROM products;") old_ids = [row[0] for row in cursor.fetchall()] create_query = ("INSERT INTO " + self.table + "(rowid, date, listing_id, product_id, product_name, price, url) " "VALUES (%(rowid)s, %(date)s, %(listing_id)s, %(product_id)s, %(product_name)s, %(price)s, %(url)s)") cursor.execute(create_query, row) lastRecordId = cursor.lastrowid self.cnx.commit() cursor.close() print("Item saved with ID: {}" . format(lastRecordId)) if not product_id in old_ids: create_query = ("INSERT INTO " + self.table2 + "(rowid, date, listing_id, product_id, product_name, price, url) " "VALUES (%(rowid)s, %(date)s, %(listing_id)s, %(product_id)s, %(product_name)s, %(price)s, %(url)s)")

Ia tidak berfungsi dengan betul dan terdapat ralat.

2022-05-06 12:26:57 [scrapy.core.scraper] ERROR: Error processing {'date': '2022-05-06 12:26:57.575507', 'listing_id': '0190199600119', 'price': '4199.00', 'product_id': '1209298', 'product_name': 'APPLE 11" Magic Türkçe Q Klavye Siyah', 'rowid': 456274953331128512, 'url': 'https://www.mediamarkt.com.tr/tr/product/APPLE%2011%22%20Magic%20T%C3%BCrk%C3%A7e%20Q%20Klavye%20Siyah-1209298.html'} Traceback (most recent call last): File "/usr/lib/python3/dist-packages/twisted/internet/defer.py", line 654, in _runCallbacks current.result = callback(current.result, *args, **kw) File "/usr/local/lib/python3.8/dist-packages/scrapy/utils/defer.py", line 162, in f return deferred_from_coro(coro_f(*coro_args, **coro_kwargs)) File "/root/teknosa/teknosa/pipelines.py", line 28, in process_item self.save(dict(item)) File "/root/teknosa/teknosa/pipelines.py", line 62, in save if not product_id in old_ids: NameError: name 'product_id' is not defined Saving item into db ...

Saya mempunyai product_id yang unik.

Jika tiada product_id dalam jadual semasa, masukkan product_id ini ke dalam "new_products"

Bagaimana cara membuat ini?

Terima kasih.

Suntingan terakhir: Saya mendapat ralat ini.

2022-05-07 18:17:11 [scrapy.core.scraper] ERROR: Error processing {'date': '2022-05-07 18:17:11.902622', 'listing_id': '8713439219357', 'price': '99.00', 'product_id': '1175529', 'product_name': 'TRUST 21935 NANGA USB 3.1 Kart Okuyucu', 'rowid': -411152717288573423, 'url': 'https://www.mediamarkt.com.tr/tr/product/TRUST%2021935%20NANGA%20USB%203.1%20Kart%20Okuyucu-1175529.html'} Traceback (most recent call last): File "/usr/local/lib/python3.8/dist-packages/mysql/connector/connection_cext.py", line 523, in cmd_query self._cmysql.query(query, _mysql_connector.MySQLInterfaceError: Duplicate entry '-411152717288573423' for key 'products.rowid' During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/usr/lib/python3/dist-packages/twisted/internet/defer.py", line 654, in _runCallbacks current.result = callback(current.result, *args, **kw) File "/usr/local/lib/python3.8/dist-packages/scrapy/utils/defer.py", line 162, in f return deferred_from_coro(coro_f(*coro_args, **coro_kwargs)) File "/root/teknosa/teknosa/pipelines.py", line 28, in process_item self.save(dict(item)) File "/root/teknosa/teknosa/pipelines.py", line 69, in save cursor.execute(create_query, row) File "/usr/local/lib/python3.8/dist-packages/mysql/connector/cursor_cext.py", line 269, in execute result = self._cnx.cmd_query(stmt, raw=self._raw, File "/usr/local/lib/python3.8/dist-packages/mysql/connector/connection_cext.py", line 528, in cmd_query raise errors.get_mysql_exception(exc.errno, msg=exc.msg, mysql.connector.errors.IntegrityError: 1062 (23000): Duplicate entry '-411152717288573423' for key 'products.rowid'

P粉122932466
P粉122932466

membalas semua (1)
P粉278379495

Kalau nak selit saja kalau tak ada, tak perlu buat macam mana. Tidak perlu memilih kesemuanya dan lihat sama ada yang anda cari ada di sana.

Apa yang anda perlukan ialah mencipta indeks unikuntukproduc_id

dalam jadual 2

Kemudian tukar kod kepada:

table = 'products' table2 = 'new_products'` def save(self, row): create_query = ("INSERT INTO " + self.table + "(rowid, date, listing_id, product_id, product_name, price, url) " "VALUES (%(rowid)s, %(date)s, %(listing_id)s, %(product_id)s, %(product_name)s, %(price)s, %(url)s)") cursor.execute(create_query, row) lastRecordId = cursor.lastrowid self.cnx.commit() print("Item saved with ID: {}" . format(lastRecordId)) create_query = ("INSERT INTO " + self.table2 + "(rowid, date, listing_id, product_id, product_name, price, url) " "VALUES (%(rowid)s, %(date)s, %(listing_id)s, %(product_id)s, %(product_name)s, %(price)s, %(url)s) ON DUPLICATE KEY UPDATE product_id=product_id") cursor.execute(create_query, row) self.cnx.commit()

Jika anda menggunakan ON DUPLICATE KEY, apabila ia menjumpai baris pendua (id_produk sedia ada), sistem akan cuba mengemas kini product_id kepada product_id yang sama, jadi ia tidak akan berkesan.

Jika anda menetapkan autocommit=True, anda boleh memadamkan commit ini.

Edit

Jika, seperti yang anda katakan dalam ulasan anda, anda perlu memasukkan ke dalam jadual baharu hanya jika ia tidak wujud dalam jadual, anda boleh menukar kod anda seperti ini:

Anda perlu menukar nama pembolehubah dalam barisold_ids = [row[0] untuk row incursor.fetchall()]kerana anda menukar nilai parameterrow2. Masalah anda terletak pada pernyataan if, pembolehubah product_id tidak wujud dan perlu diubah suai

table = 'products' table2 = 'new_products'` def save(self, row): cursor = self.cnx.cursor() cursor.execute("SELECT DISTINCT product_id FROM products;") old_ids = [element[0] for element in cursor.fetchall()] create_query = ("INSERT INTO " + self.table + "(rowid, date, listing_id, product_id, product_name, price, url) " "VALUES (%(rowid)s, %(date)s, %(listing_id)s, %(product_id)s, %(product_name)s, %(price)s, %(url)s)") cursor.execute(create_query, row) lastRecordId = cursor.lastrowid self.cnx.commit() cursor.close() print("Item saved with ID: {}" . format(lastRecordId)) if not row['product_id'] in old_ids: create_query = ("INSERT INTO " + self.table2 + "(rowid, date, listing_id, product_id, product_name, price, url) " "VALUES (%(rowid)s, %(date)s, %(listing_id)s, %(product_id)s, %(product_name)s, %(price)s, %(url)s)")
    Muat turun terkini
    Lagi>
    kesan web
    Kod sumber laman web
    Bahan laman web
    Templat hujung hadapan
    Tentang kita Penafian Sitemap
    Laman web PHP Cina:Latihan PHP dalam talian kebajikan awam,Bantu pelajar PHP berkembang dengan cepat!