Saya mencuba beberapa pengikisan tapak web. Saya berjaya mengikis data daripada jadual pangkalan data semasa. Tetapi saya mahu memasukkan "jadual_baru" hanya jika rekod tidak wujud dalam "jadual semasa"
Kod saya ialah (talian paip)
table = 'products' table2 = 'new_products'` def save(self, row): cursor = self.cnx.cursor() cursor.execute("SELECT DISTINCT product_id FROM products;") old_ids = [row[0] for row in cursor.fetchall()] create_query = ("INSERT INTO " + self.table + "(rowid, date, listing_id, product_id, product_name, price, url) " "VALUES (%(rowid)s, %(date)s, %(listing_id)s, %(product_id)s, %(product_name)s, %(price)s, %(url)s)") cursor.execute(create_query, row) lastRecordId = cursor.lastrowid self.cnx.commit() cursor.close() print("Item saved with ID: {}" . format(lastRecordId)) if not product_id in old_ids: create_query = ("INSERT INTO " + self.table2 + "(rowid, date, listing_id, product_id, product_name, price, url) " "VALUES (%(rowid)s, %(date)s, %(listing_id)s, %(product_id)s, %(product_name)s, %(price)s, %(url)s)")
Ia tidak berfungsi dengan betul dan terdapat ralat.
2022-05-06 12:26:57 [scrapy.core.scraper] ERROR: Error processing {'date': '2022-05-06 12:26:57.575507', 'listing_id': '0190199600119', 'price': '4199.00', 'product_id': '1209298', 'product_name': 'APPLE 11" Magic Türkçe Q Klavye Siyah', 'rowid': 456274953331128512, 'url': 'https://www.mediamarkt.com.tr/tr/product/APPLE%2011%22%20Magic%20T%C3%BCrk%C3%A7e%20Q%20Klavye%20Siyah-1209298.html'} Traceback (most recent call last): File "/usr/lib/python3/dist-packages/twisted/internet/defer.py", line 654, in _runCallbacks current.result = callback(current.result, *args, **kw) File "/usr/local/lib/python3.8/dist-packages/scrapy/utils/defer.py", line 162, in f return deferred_from_coro(coro_f(*coro_args, **coro_kwargs)) File "/root/teknosa/teknosa/pipelines.py", line 28, in process_item self.save(dict(item)) File "/root/teknosa/teknosa/pipelines.py", line 62, in save if not product_id in old_ids: NameError: name 'product_id' is not defined Saving item into db ...
Saya mempunyai product_id yang unik.
Jika tiada product_id dalam jadual semasa, masukkan product_id ini ke dalam "new_products"
Bagaimana cara membuat ini?
Terima kasih.
Suntingan terakhir: Saya mendapat ralat ini.
2022-05-07 18:17:11 [scrapy.core.scraper] ERROR: Error processing {'date': '2022-05-07 18:17:11.902622', 'listing_id': '8713439219357', 'price': '99.00', 'product_id': '1175529', 'product_name': 'TRUST 21935 NANGA USB 3.1 Kart Okuyucu', 'rowid': -411152717288573423, 'url': 'https://www.mediamarkt.com.tr/tr/product/TRUST%2021935%20NANGA%20USB%203.1%20Kart%20Okuyucu-1175529.html'} Traceback (most recent call last): File "/usr/local/lib/python3.8/dist-packages/mysql/connector/connection_cext.py", line 523, in cmd_query self._cmysql.query(query, _mysql_connector.MySQLInterfaceError: Duplicate entry '-411152717288573423' for key 'products.rowid' During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/usr/lib/python3/dist-packages/twisted/internet/defer.py", line 654, in _runCallbacks current.result = callback(current.result, *args, **kw) File "/usr/local/lib/python3.8/dist-packages/scrapy/utils/defer.py", line 162, in f return deferred_from_coro(coro_f(*coro_args, **coro_kwargs)) File "/root/teknosa/teknosa/pipelines.py", line 28, in process_item self.save(dict(item)) File "/root/teknosa/teknosa/pipelines.py", line 69, in save cursor.execute(create_query, row) File "/usr/local/lib/python3.8/dist-packages/mysql/connector/cursor_cext.py", line 269, in execute result = self._cnx.cmd_query(stmt, raw=self._raw, File "/usr/local/lib/python3.8/dist-packages/mysql/connector/connection_cext.py", line 528, in cmd_query raise errors.get_mysql_exception(exc.errno, msg=exc.msg, mysql.connector.errors.IntegrityError: 1062 (23000): Duplicate entry '-411152717288573423' for key 'products.rowid'
Kalau nak selit saja kalau tak ada, tak perlu buat macam mana. Tidak perlu memilih kesemuanya dan lihat sama ada yang anda cari ada di sana.
Apa yang anda perlukan ialah mencipta indeks unikuntukproduc_id
dalam jadual 2Kemudian tukar kod kepada:
Jika anda menggunakan ON DUPLICATE KEY, apabila ia menjumpai baris pendua (id_produk sedia ada), sistem akan cuba mengemas kini product_id kepada product_id yang sama, jadi ia tidak akan berkesan.
Jika anda menetapkan autocommit=True, anda boleh memadamkan commit ini.
Edit
Jika, seperti yang anda katakan dalam ulasan anda, anda perlu memasukkan ke dalam jadual baharu hanya jika ia tidak wujud dalam jadual, anda boleh menukar kod anda seperti ini:
Anda perlu menukar nama pembolehubah dalam barisold_ids = [row[0] untuk row incursor.fetchall()]kerana anda menukar nilai parameter
row
2. Masalah anda terletak pada pernyataan if, pembolehubah product_id tidak wujud dan perlu diubah suai