I am a newbie in operation and maintenance. After recently doing simple processing of company logs, I added a certain amount to a list, and then used the ES interface to add the data in batches to another ElasticSearch, and then displayed it in some ways. There is a problem here. If data is inserted, an exception occurs. For example, the es host to be inserted crashes, etc., but there is no problem with the data source. This should move the processed data to the cache and then stop the program. (The data source is kafka. Even if the program is stopped, it can still be consumed after restarting.)
So I would like to ask, for relatively lightweight python programs or scripts, how should these exceptions be stored? What about the data that needs to be stored first?
可以在es异常时, py脚本捕获到异常后, 使用
cPickle
或者pickle
将处理好的数据序列化后保存到一个文件上(你可以理解成写入一个文件, 不同在于cPickle
和pickle
能够将字典或者列表这样的对象直接写入文件), 然后等你es恢复了, 重新启动该py脚本, 那么该脚本优先去检测是否存在这个临时文件, 如果有, 并非空, 则把里面的数据取出, 写入es