The original intention of writing this script is that during the use of Alibaba Cloud RDS, an exception occurs in the database and requires quick recovery. There are many similar kill scripts on the Internet, all implemented through mysqladmin. However, the Ali-RDS environment has the following limitations:
Does not provide users with SUPER permissions, that is, users can only kill their own threads
When the number of connections increases sharply, external users cannot log in, including the console
In order to solve the problem of morning 2 Big problem, this python script uses multi-threading on the db instance to reserve a connection for each user, and reads the command configuration file mysqk.ini in real time. When it finds that there is a need to kill, it uses the existing connection of the corresponding user to find it. Threads that meet the conditions in information_schema.processlist are killed.
Note: This script was rewritten once in September. The version in July (branch old_0.5.0) is per instance per user, corresponding to one thread. There are too many threads for one db instance, so it is always difficult to see. Elegant, so it was changed to one db instance and one thread to maintain sessions of multiple users at the same time. At the same time, the new version also adds more functions, such as checking by time window, including or excluding specific connections, email notifications, and configuration item coverage.
Features
Always maintain a long connection through mysql ping, and have an automatic restart mechanism after disconnection to solve the embarrassing situation of no connection available
Each db instance has its own thread, avoiding the need to log in to individual users to kill them separately Complicated operations. If you have SUPER permissions, you can also simplify the configuration to make it compatible.
Be able to deal with scenarios that need to kill threads separately:
Transactions in
Sleep state that run for more than N seconds for a long time (generally not recommended, but sometimes kill it , can quickly release the connection for administrators to use)
Exclude some threads that cannot be killed, such as Binlog dump
Threads containing specific keywords need to be killed
When a qualified thread appears, the current processlist, engine status, lock_wait will be Make a snapshot and send it via email
There is a dry_run mode for trial operation, which executes all the checking processes but does not actually kill
Supports running only within the time window, considering that some long tasks are not checked at night
Password encryption
< ;!-- more -->
Quick use
requires pip to install the MySQL-python and pycrypto libraries. It is only tested on python 2.7.
Set the username and password information for the connection in settings.py. It is assumed here that the authentication information to be checked in the same batch of DBs is the same. The specified user is used for login authentication and to inform the script which users need to be checked.
The password must be encrypted through prpcryptec.py, and the encrypted key must be written into the KEY_DB_AUTH variable of the script itself. (If you are worried about leakage, compile mysqk.py into pyc to run)
In the
db_info section of the mysqk.ini main configuration file, set the database address that needs to be checked, such as db01=10.0.200.100:3306
can be db01 separately Wait for specifying the option that requires killing the thread. [id_db01] then reuses the option of [db_commkill] by default
db_comconfig section sets db_puser as a user with permissions who can view all processlists, and has been specified in DB_AUTH of settings.py
I only want to perform checks, and do not want to actually kill the exception Thread, confirm that dry_run is not equal to 0
Here we go!
Configuration item description
mysqk.ini:
mail_config
Email notification related settings, smtp service address and authentication information.
mail_receiver= Set to empty, which means no mail will be sent
db_info
Set which database instances to check and kill.
Format:
The db instances that appear here will be checked and can be commented with ;, but the script needs to be restarted.
db_comconfig
Check the public configuration and take effect in real time.
db_puser: Specify a user name for show processlist, required permissions: PROCESS, information_schema library viewing. It can be considered as a representative user, checking the abnormal thread, and providing the result to the user who has the permission to kill the thread.
run_max_count: The number of times to perform checks, which is a global control switch. Every time this value is modified, the check will be restarted, that is, a clean start, so that the newly modified configuration will take effect.
is 0, which means the script does not perform any checks and simply maintains the connection to the database alive. The survival check frequency is determined by CHECK_CONFIG_INTERVAL × CHECK_PING_MULTI in settings.py.
is 999, which means that the connection thread will be checked consistently in the background (but there may not be any that meet the kill conditions). The frequency of the check is specified in CHECK_CONFIG_INTERVAL in settings.py.
is other values. , it means that the check will be stopped after the number of checks is reached.
dry_run: whether to enable trial run mode, 0 means real kill, 1 or other value means trial run. Trial run mode can be used to monitor and alert for slow queries. Note that the same session thread ID only alerts once
run_time_window: The time window of the running check, the format is 08:00-22:00, no check will be executed outside this time, leaving blank means no limit. The main consideration is that "abnormal" threads may occur in some statistical tasks at night.
db_commkill
kill uses public configuration, takes effect in real time, and will be overridden by the options in the id_
k_user: A very critical option, indicating which database users you want to check and kill. Multiple ones are separated by commas (do not include quotes). When
is all , it means to check all users specified by DB_AUTH in settings.py
is none , it means not to kill any abnormal threads. The effect is equivalent to setting dry_run mode
k_longtime: If the execution of SQL exceeding the set value is considered abnormal. Generally greater than CHECK_CONFIG_INTERVAL
k_sleep: If the SQL statement of Sleep exceeds the set seconds, it is considered abnormal. If it is 0, it means not to kill the thread in the sleep state.
k_exclude: Exclude those threads with specific keywords, such as copy threads, administrator connections, etc.
k_include: Threads containing these specific keywords need to be killed. Note that it works on the premise that k_user and k_exclude are met.
The values of k_exclude and k_include are in the regular format that supports the python re module. Do not use quotation marks.
id_dbid
The configuration items in this area are the same as db_commconfig and are used for kill options for individual dbs.
Usage suggestions
Two combination modes:
Set dry_run=0, default k_user=none, when an exception occurs in the database, actively modify the k_user value of the corresponding db, dynamic kill
Set dry_run=1, default k_user=all , which is equivalent to running in daemon mode. If there is a slow query, you will be notified by email, and the information at that time will be recorded. Of course, you can also dry_run=0, k_user=all, let the program run in the background and kill it, but it is highly not recommended in a production environment. .
There are logs and snapshot files to view.
Configuration file example
mysqlk.ini:
[mail_config]mail_host=smtp.exmail.qq.commail_user=xxx@ecqun.commail_pass=xxxxxxmail_receiver=[db_info]crm0=192.168.1.125:3306crm1=192.168.1.1 26:3306crm2 =192.168.1.127:3306crm3=192.168.1.128:3306base=10.0.200.142:3306[db_commconfig]db_puser=ecuser; how many kill times once this config file changed; 0: DISABLE all kill; 999: always kill threads that meet kill conditions ; default: 1; can not be inheritrun_max_count=999dry_run=1
[mail_config] mail_host=smtp.exmail.qq.com mail_user=xxx@ecqun.com mail_pass=xxxxxx mail_receiver= [db_info] crm0=192.168.1.125:3306 crm1=192.168.1.126:3306 crm2=192.168.1.127:3306 crm3=192.168.1.128:3306 base=10.0.200.142:3306 [db_commconfig] db_puser=ecuser ; how many kill times once this config file changed ; 0: DISABLE all kill ; 999: always kill threads that meet kill conditions ; default: 1 ; can not be inherit run_max_count=999 dry_run=1 run_time_window=08:00-22:00 [db_commkill] k_user=all k_longtime=10 k_lock=1 k_sleep=0 k_exclude=Binlog|ecdba|Daemon k_include=select sleep\(17\) [id_crm0] ; k_user: who's threads to be killed. use comma to separate ; none: do not kill anyone's threads ; all: kill all user's threads (with other where conditions) ; default: none k_user=all ; k_longtime: filter the threads who's running time is longer than this ; 0: ignore the time > x condition ; default: 10 k_longtime=10 ; k_sleep: whether kill sleepd threads or not ; 0: do not kill command='Sleep' threads from processlist ; when it set to 1, usually it's subset of k_longtime condition ; default: 0 k_sleep=0 [id_crm1] k_user=ecuser k_longtime=10 k_sleep=0 [id_crm2] k_user=all k_longtime=10 k_sleep=0 [id_crm3]