Words written before: It is always after a disaster occurs that one remembers the importance of disaster recovery; it is always after one has suffered a loss that one remembers someone’s reminder
This article is mainly for mysql database
Design principles
1. Don’t do calculations in the database: CPU calculations must be moved to the business layer 2. Control the amount of data in a single table: the records in a single table are controlled at 10 million 3. Control the number of columns: the number of fields is controlled within 20 4. Balance paradigm and redundancy Yu: Sacrifice paradigm design and redundant data to improve efficiency 5. Reject 3B: Reject large SQL, large transactions, large batches 6. Use UTF8 for table character set 7. Use INNODB storage engine
Data table design
1. Use the most effective (smallest) data type as much as possible
tinyint(1Byte)
smallint(2Byte)
mediumint(3Byte)
int(4Byte)
bigint(8Byte)
bad case:int(1)/int(11)
2. Don’t store numbers as strings, convert characters into numbers, use int to store ip instead of char(15) 3. Use enum or set first, sex enum ('F', 'M') 4, avoid Use NULL fields
NULL字段很难查询优化
NULL字段的索引需要额外空间
NULL字段的复合索引无效
bad case:`name` char(32) default null`age` int not null
good case:`age` int not null default 0
5. Use less text/blob. The performance of varchar will be much higher than that of text. If blob is really unavoidable, please split the table. 6. Do not store images in the database. 7. For MyISAM tables, if there are no variable-length columns (VARCHAR, TEXT or BLOB column), using a fixed-size record format. This is faster but unfortunately may waste some space. Even if you have used the CREATE option to make the VARCHAR column ROW_FORMAT=fixed, you can still be prompted to use fixed-length rows 8. Use sample character set, such as latin1. Try to use utf-8 as little as possible, because utf-8 takes up 3 times the space of latin1. You can use latin1 on fields that do not need to use utf-8, such as mail, url, etc. 9. Accuracy and space conversion. Floating point types generally use less space than DECIMAL types when storing data in the same numerical range. The FLOAT field uses 4 bytes to store data. The DOUBLE type requires 8 bytes and has higher accuracy and a larger numerical range. DECIMAL type data will be converted to DOUBLE type 10. The library name, table name, and field name must have a fixed naming length, within 12 characters. ;Library names, table names, and field names are not allowed to exceed 32 characters. The meaning of the name must be understood; MySQL reserved words are prohibited in library names, table names, and field names; temporary library and table names must be prefixed with tmp and suffixed with date; backup libraries and tables must be bak prefix, and suffixed with date 11. The physical length of the InnoDB table row record does not exceed 8KB. The default data page of InnoDB is 16KB. Based on the characteristics of B+Tree, a data page needs to store at least 2 records. Therefore, when the actual storage length of a large column exceeds 8KB (especially TEXT/BLOB columns), it will cause "page-overflow storage", similar to "row migration" in ORACLE. Therefore, if large columns must be used ( Especially TEXT/BLOB type) and if they are read and written frequently, it is best to split these columns into sub-tables and do not store them together with the main table. If they are not too frequent, you can consider keeping them in the main table. If Change the innodb page size option to 8KB, then the physical length of the row record is recommended not to exceed 4KBIndex class
2. Character fields must build prefix indexes 3. Do not perform column operations in the index, bad case: select id where age +1 = 10; 4. It is recommended to use auto-increment columns for innodb primary keys
5. Do not use foreign keys, please ensure the constraints by the program. 6. Avoid creating indexes on prefixes that already have indexes. For example: If index (a, b) exists, remove index (a) 7. Control the length of a single index. Use key (name (8)) to create an index on the first few characters of the data. 8. Use indexes selectively. It is not very good to use indexes on columns that change rarely, such as gender columns. 9. Optimize table can compress and sort the index. Be careful not to run it frequently. 10. Analyze table can update data. 11. Index selectivity is not repeated. The index value is also called the ratio of the number of data rows in the cardinality table. Index selectivity = cardinality/data rows. count(distinct(username))/count(*) is the index selectivity. The benefit of high index selectivity is mysql When searching for matches, you can filter more rows. The unique index has the best selectivity, with a value of 1 12. Do not use duplicate or redundant indexes. For the index of the INNODB engine, the primary key index must be indexed every time the data is modified. The corresponding index value in the auxiliary index is modified, which may cause a large amount of data migration, paging, and fragmentation. 13. For string columns with a length of more than 20, it is best to create a prefix index rather than a full column index (for example: ALTER TABLE t1 ADD INDEX(user(20))), which can effectively improve index utilization, but its disadvantage is that the prefix index is not used when sorting this column. The length of the prefix index can be determined based on the statistics of the field. Generally, it is slightly larger than the average length. 14. Regularly use the pt-duplicate-key-checker tool to check and delete duplicate indexes. For example, if the index idx1(a, b) already covers index idx2(a), you can delete the idx2 index
sql statement design class
1. The sql statement is as simple as possible. One sql can only be operated on one CPU. Split large statements into small statements to reduce lock time. One large sql can block the entire library (make full use of QUERY CACHE and make full use of multi-core CPU)
2. Simple transactions, the transaction time should be as short as possible, bad case: upload image transaction
3. Avoid using trig/func, triggers and functions are not used, replace them with client programs
4. No need for select *, which consumes cpu, io, memory, and bandwidth , this kind of program is not scalable
5. OR is rewritten as IN()
or的效率是n级别
in的消息时log(n)级别
in的个数建议控制在200以内
select id from t where phone=’159′ or phone=’136′ =>select id from t where phone in (’159′, ’136′);
6. OR is rewritten as UNION
MySQL’s index merging is very stupid
select id from t where phone = '159' or name = 'john';
=>
select id from t where phone='159' union select id from t where name='jonh';
7. Avoid negative %, such as not in/like
8. Use count(*) with caution
9. Limit efficient paging
The larger the limit, the lower the efficiency
select id from t limit 10000, 10;
=>
select id from t where id > 10000 limit 10;
10. Use union all instead of union. Union has the overhead of deduplication 11. Use less joins
12. Use group by, grouping, and automatic sorting
13. Please use same type comparison
14. Use load data to import data, load data is about 20 times faster than insert; 15. Updates to data should be broken up and updated in batches. Do not update too much data at one time 16. Use performance analysis tools
Sql explain / showprofile / mysqlsla
17. Use --log-slow-queries --long-query-time=2 to view slow query statements. Then use explain to analyze the query and make optimizations
show profile;
mysqlsla;
mysqldumpslow;
explain;
show slow log;
show processlist;
show query_response_time(percona)
optimize When data is inserted, updated, and deleted, some data migration and paging are inevitable, and then some fragments will appear. Over time, fragments accumulate and affect performance. This requires the DBA to regularly optimize the database to reduce fragments, which is done through the optimize command. For example, operating on MyISAM table: optimize table table name
18. It is forbidden to run large queries in the database 19. Use precompiled statements and only pass parameters, which is more efficient than passing SQL statements; parse once and use multiple times; reduce the probability of SQL injection 20. It is forbidden to use order by rand () 21. A single SQL statement is prohibited from updating multiple tables at the same time 22. Avoid mathematical operations in the database (MySQL is not good at mathematical operations and logical judgment) 23. SQL statements require all R&D, and all SQL keywords are Capital letters, only one space is allowed for each word 24. If you can use NOT IN, you can use NOTIN. There are too many pitfalls. . Will check out empty and NULL
Attention
1. Even if it is conditional filtering based on indexes, if the optimizer realizes that the total amount of data that needs to be scanned exceeds 30% (it seems to be 20% in ORACLE, MySQL is currently 30%, it may be adjusted in the future), it will change directly. The execution plan is a full table scan, and indexes are no longer used. 2. When joining multiple tables, select the table with the greatest filterability (not necessarily the smallest amount of data, but the one with the greatest filterability after adding the WHERE condition). for the driver table. In addition, if there is sorting after JOIN, the sorting field must belong to the driver table, so that the index on the driver table can be used to complete the sorting. 3. In most cases, the cost of sorting is usually higher, so if you see the execution plan There is Using filesort, give priority to creating a sorting index 4. Use pt-query-digest to regularly analyze slow query log, and combine it with Box Anemometer to build a slow query log analysis and optimization system
Words written before: It is always after a disaster occurs that one remembers the importance of disaster recovery; it is always after one has suffered a loss that one remembers someone’s reminder
This article is mainly for mysql database
Design principles
1. Don’t do calculations in the database: CPU calculations must be moved to the business layer
2. Control the amount of data in a single table: the records in a single table are controlled at 10 million
3. Control the number of columns: the number of fields is controlled within 20
4. Balance paradigm and redundancy Yu: Sacrifice paradigm design and redundant data to improve efficiency
5. Reject 3B: Reject large SQL, large transactions, large batches
6. Use UTF8 for table character set
7. Use INNODB storage engine
Data table design
1. Use the most effective (smallest) data type as much as possible
2. Don’t store numbers as strings, convert characters into numbers, use int to store ip instead of char(15)
3. Use enum or set first,
sex
enum ('F', 'M')4, avoid Use NULL fields
5. Use less text/blob. The performance of varchar will be much higher than that of text. If blob is really unavoidable, please split the table.
6. Do not store images in the database. 7. For MyISAM tables, if there are no variable-length columns (VARCHAR, TEXT or BLOB column), using a fixed-size record format. This is faster but unfortunately may waste some space. Even if you have used the CREATE option to make the VARCHAR column ROW_FORMAT=fixed, you can still be prompted to use fixed-length rows
8. Use sample character set, such as latin1. Try to use utf-8 as little as possible, because utf-8 takes up 3 times the space of latin1. You can use latin1 on fields that do not need to use utf-8, such as mail, url, etc. 9. Accuracy and space conversion. Floating point types generally use less space than DECIMAL types when storing data in the same numerical range. The FLOAT field uses 4 bytes to store data. The DOUBLE type requires 8 bytes and has higher accuracy and a larger numerical range. DECIMAL type data will be converted to DOUBLE type
10. The library name, table name, and field name must have a fixed naming length, within 12 characters. ;Library names, table names, and field names are not allowed to exceed 32 characters. The meaning of the name must be understood; MySQL reserved words are prohibited in library names, table names, and field names; temporary library and table names must be prefixed with tmp and suffixed with date; backup libraries and tables must be bak prefix, and suffixed with date
11. The physical length of the InnoDB table row record does not exceed 8KB. The default data page of InnoDB is 16KB. Based on the characteristics of B+Tree, a data page needs to store at least 2 records. Therefore, when the actual storage length of a large column exceeds 8KB (especially TEXT/BLOB columns), it will cause "page-overflow storage", similar to "row migration" in ORACLE. Therefore, if large columns must be used ( Especially TEXT/BLOB type) and if they are read and written frequently, it is best to split these columns into sub-tables and do not store them together with the main table. If they are not too frequent, you can consider keeping them in the main table. If Change the innodb
page
size option to 8KB, then the physical length of the row record is recommended not to exceed 4KB Index class
1. Use indexes carefully and rationally
2. Character fields must build prefix indexes
3. Do not perform column operations in the index, bad case: select id where age +1 = 10;
4. It is recommended to use auto-increment columns for innodb primary keys
5. Do not use foreign keys, please ensure the constraints by the program.
2. Simple transactions, the transaction time should be as short as possible, bad case: upload image transaction6. Avoid creating indexes on prefixes that already have indexes. For example: If index (a, b) exists, remove index (a)
7. Control the length of a single index. Use key (name (8)) to create an index on the first few characters of the data. 8. Use indexes selectively. It is not very good to use indexes on columns that change rarely, such as gender columns.
9. Optimize table can compress and sort the index. Be careful not to run it frequently.
10. Analyze table can update data. 11. Index selectivity is not repeated. The index value is also called the ratio of the number of data rows in the cardinality table. Index selectivity = cardinality/data rows. count(distinct(username))/count(*) is the index selectivity. The benefit of high index selectivity is mysql When searching for matches, you can filter more rows. The unique index has the best selectivity, with a value of 1
12. Do not use duplicate or redundant indexes. For the index of the INNODB engine, the primary key index must be indexed every time the data is modified. The corresponding index value in the auxiliary index is modified, which may cause a large amount of data migration, paging, and fragmentation. 13. For string columns with a length of more than 20, it is best to create a prefix index rather than a full column index (for example: ALTER TABLE t1 ADD INDEX(user(20))), which can effectively improve index utilization, but its disadvantage is that the prefix index is not used when sorting this column. The length of the prefix index can be determined based on the statistics of the field. Generally, it is slightly larger than the average length. 14. Regularly use the pt-duplicate-key-checker tool to check and delete duplicate indexes. For example, if the index idx1(a, b) already covers index idx2(a), you can delete the idx2 index
sql statement design class
1. The sql statement is as simple as possible. One sql can only be operated on one CPU. Split large statements into small statements to reduce lock time. One large sql can block the entire library (make full use of QUERY CACHE and make full use of multi-core CPU)
3. Avoid using trig/func, triggers and functions are not used, replace them with client programs
4. No need for select *, which consumes cpu, io, memory, and bandwidth , this kind of program is not scalable5. OR is rewritten as IN()
6. OR is rewritten as UNION
MySQL’s index merging is very stupid
7. Avoid negative %, such as not in/like
8. Use count(*) with caution9. Limit efficient paging
The larger the limit, the lower the efficiency
10. Use union all instead of union. Union has the overhead of deduplication 11. Use less joins
12. Use group by, grouping, and automatic sorting
13. Please use same type comparison14. Use load data to import data, load data is about 20 times faster than insert;
15. Updates to data should be broken up and updated in batches. Do not update too much data at one time
16. Use performance analysis tools
17. Use --log-slow-queries --long-query-time=2 to view slow query statements. Then use explain to analyze the query and make optimizations
optimize When data is inserted, updated, and deleted, some data migration and paging are inevitable, and then some fragments will appear. Over time, fragments accumulate and affect performance. This requires the DBA to regularly optimize the database to reduce fragments, which is done through the optimize command. For example, operating on MyISAM table: optimize table table name
18. It is forbidden to run large queries in the database
19. Use precompiled statements and only pass parameters, which is more efficient than passing SQL statements; parse once and use multiple times; reduce the probability of SQL injection
20. It is forbidden to use order by rand ()
21. A single SQL statement is prohibited from updating multiple tables at the same time
22. Avoid mathematical operations in the database (MySQL is not good at mathematical operations and logical judgment)
23. SQL statements require all R&D, and all SQL keywords are Capital letters, only one space is allowed for each word
24. If you can use NOT IN, you can use NOTIN. There are too many pitfalls. . Will check out empty and NULL
Attention
1. Even if it is conditional filtering based on indexes, if the optimizer realizes that the total amount of data that needs to be scanned exceeds 30% (it seems to be 20% in ORACLE, MySQL is currently 30%, it may be adjusted in the future), it will change directly. The execution plan is a full table scan, and indexes are no longer used. 2. When joining multiple tables, select the table with the greatest filterability (not necessarily the smallest amount of data, but the one with the greatest filterability after adding the WHERE condition). for the driver table. In addition, if there is sorting after JOIN, the sorting field must belong to the driver table, so that the index on the driver table can be used to complete the sorting. 3. In most cases, the cost of sorting is usually higher, so if you see the execution plan There is Using filesort, give priority to creating a sorting index
4. Use pt-query-digest to regularly analyze slow query log, and combine it with Box Anemometer to build a slow query log analysis and optimization system
I saw something interesting today:
Which algorithms/data structures should I “recognize” and know by name?