I am a newbie in operation and maintenance. After recently doing simple processing of company logs, I added a certain amount to a list, and then used the ES interface to add the data in batches to another ElasticSearch, and then displayed it in some ways. There is a problem here. If data is inserted, an exception occurs. For example, the es host to be inserted crashes, etc., but there is no problem with the data source. This should move the processed data to the cache and then stop the program. (The data source is kafka. Even if the program is stopped, it can still be consumed after restarting.)
So I would like to ask, for relatively lightweight python programs or scripts, how should these exceptions be stored? What about the data that needs to be stored first?
When es is abnormal, after the py script captures the exception, you can use
cPickle
或者pickle
将处理好的数据序列化后保存到一个文件上(你可以理解成写入一个文件, 不同在于cPickle
和pickle
to directly write objects such as dictionaries or lists to files), and then wait for your es to recover and restart the py script, then the script will give priority to detecting whether This temporary file exists. If it exists and is not empty, the data in it will be taken out and written into es