Home >Database >Mysql Tutorial >How to write elegant SQL native statements

How to write elegant SQL native statements

步履不停
步履不停Original
2019-06-18 14:54:332654browse

How to write elegant SQL native statements

Foreword:

When the previous article talked about the basic architecture of Mysql, it started with "How are SQL query statements executed in the MySql architecture?" A comprehensive explanation was given. I know the specific execution process of sql query statements in the MySql architecture, but in order to write sql statements better and faster, I think it is very necessary to know the execution order of each clause in the sql statement. Friends who have read the previous article should know that the execution of each clause at the end of the SQL statement should be completed in the executor, and the storage engine provides a data reading and writing interface to the executor. Now let’s start our study of the complete execution sequence of each clause in the

statement (executed according to the sequence number)

  1. from (Note: This also includes the sub-statements in from )

  2. join

  3. on

  4. ##where

  5. group by (start using the alias in select, and you can use it in subsequent statements)

  6. avg,sum.... and other aggregate functions

  7. having

  8. select

  9. distinct

  10. order by

  11. limit

Execution order analysis of each clause

All query statements are executed starting from from. During the execution process, each clause Each step will generate a virtual table for the next step, and this virtual table will be used as the input of the next execution step.

1. from

form is the beginning of a query statement.

  • If it is a table, this table will be operated directly;

  • If this from is followed by a subquery, the subquery will be executed first. The content in the query and the result of the subquery are the first virtual table T1. (Note: The execution process in the subquery is also in the order described in this article).

  • If you need to associate tables, use join, please see 2, 3

2. join

If from is followed by Multiple tables, join association, will first perform a Cartesian product on the first two tables, then the first virtual table T1 will be generated (note: a relatively small table will be selected as the base table here);

3. on

Perform ON filtering on virtual table T1, and only those matching rows will be recorded in virtual table T2. (Note that if there is a third table associated with it here, the Cartesian product of T2 and the third table will be used to produce the T3 table. Continue to repeat the 3. on step to generate the T4 table, but the following sequence will not be explained for the time being. For T3 and T4 here, just continue from a table association query T2)

4. where

Perform WHERE condition filtering on virtual table T2. Only matching records will be inserted into virtual table T3.

5.group by

The group by clause combines the unique values ​​in into a group to obtain virtual table T4. If group by is applied, all subsequent steps can only operate on T4 columns or perform 6. Aggregation functions (count, sum, avg, etc.). (Note: The reason is that the final result set after grouping only contains one row from each group. Remember, otherwise there will be many problems here, and the following code misunderstandings will be specifically mentioned.)

6. avg,sum .... Waiting for aggregate functions

The aggregate function only performs some processing on the grouped results to obtain some desired aggregate values, such as sums, statistical quantities, etc., and does not generate virtual tables.

7. having

Apply the having filter to generate T5. The HAVING clause is mainly used in conjunction with the GROUP BY clause. The having filter is the first and only filter that is applied to grouped data.

8. select

Perform the select operation, select the specified column, and insert it into virtual table T6.

9. distinct

Deduplicate the records in T6. Remove the same rows to generate virtual table T7. (Note: In fact, if the group by clause is applied, distinct is redundant. The reason is also that when grouping, the unique values ​​in the column are grouped into one group, and only for Each group returns one row of records, so all the records will be different.)

10. order by

Apply the order by clause. Sort T7 according to order_by_condition. At this time, a cursor is returned instead of a virtual table. SQL is based on the theory of sets. A set does not pre-sort its rows. It is just a logical collection of members, and the order of the members is irrelevant. A query that sorts a table can return an object that contains a logical organization in a specific physical order. This object is called a cursor.
Some notes on oder by

  • Because the return value of order by is a cursor, the query using the order by clause cannot be applied to table expressions.

  • Order by sorting is very costly. Unless you must sort, it is best not to specify order by,

  • order by. Two parameters asc (ascending order) desc (descending order)

11. limit

Retrieve the records of the specified row, generate virtual table T9, and return the result.

The parameter after limit can be a limit m or a limit m n, which means from the mth piece to the nth piece of data.

(Note: Many developers like to use this statement to solve paging problems. For small data, there is no problem using the LIMIT clause. When the amount of data is very large, using LIMIT n, m is very inefficient. . Because the LIMIT mechanism starts scanning from the beginning every time. If you need to read 3 pieces of data starting from the 600,000th row, you need to scan and locate the 600,000th row first, and then read it. The scanning process It is a very inefficient process. Therefore, when processing big data, it is very necessary to establish a certain caching mechanism at the application layer)

Develop a sql query written for a certain requirement

SELECT `userspk`.`avatar` AS `user_avatar`, 
`a`.`user_id`, 
`a`.`answer_record`, 
 MAX(`score`) AS `score`FROM (select * from pkrecord  order by score desc) as a 
INNER JOIN `userspk` AS `userspk` ON `a`.`user_id` = `userspk`.`user_id`WHERE `a`.`status` = 1 AND `a`.`user_id` != 'm_6da5d9e0-4629-11e9-b5f7-694ced396953' GROUP BY `user_id`ORDER BY `a`.`score` DESC LIMIT 9;

Result:

How to write elegant SQL native statements

  • Let me briefly talk about what I want to query:

想要查询pk记录表中分数最高的9个用户记录和他们的头像。

  • 通过这段sql实际想一遍sql各字句的执行顺序

pk记录表的数据结构设计,每个用户每天每个馆下可能会有多条记录,所以需要进行分组,并且查询结果只想拿到每个分组内最高的那条记录

这段sql的一些说明:

  1. 可能有些同学会认为子查询没有必要 直接查询pk记录表就可以,但是并不能拿到预期的结果,因为分组后的每个组结果是不进行排序的,而且max拿到的最高分数肯定是对应的该分组下最高分数,但是其它记录可能就不是最高分数对应的那条记录。所以子查询非常有必要,它能够对原始的数据首先进行排序,分数最高的那条就是第一条对应的第一条记录。

看一下代码和执行结果与带有子查询的进行比较,就能理解我上面说的一段话:

//不使用子查询SELECT `userspk`.`avatar` AS `user_avatar`, 
`pkrecord`.`user_id`, 
`pkrecord`.`answer_record`, 
`pkrecord`.`id`, 
 MAX(`score`) AS `score`FROM pkrecordINNER JOIN `userspk` AS `userspk` ON `pkrecord`.`user_id` = `userspk`.`user_id`WHERE `pkrecord`.`status` = 1 AND `pkrecord`.`user_id` != 'm_6da5d9e0-4629-11e9-b5f7-694ced396953' GROUP BY `user_id`ORDER BY `pkrecord`.`score` DESC LIMIT 9;

查询结果

How to write elegant SQL native statements2. 在子查询中对数据已经进行排序后,外层排序方式如果和子查询排序分数相同,都是分数倒序,外层的排序可以去掉,没有必要写两遍。

sql语句中的别名

别名在哪些情况使用

在 SQL 语句中,可以为表名称及字段(列)名称指定别名

  • 表名称指定别名

同时查询两张表的数据的时候: 未设置别名前:

SELECT article.title,article.content,user.username FROM article, userWHERE article.aid=1 AND article.uid=user.uid

设置别名后:

SELECT a.title,a.content,u.username FROM article AS a, user AS u where a.aid=1 and a.uid=u.uid

好处:使用表别名查询,可以使 SQL 变得简洁而更易书写和阅读,尤其在 SQL 比较复杂的情况下

  • 查询字段指定别名

查询一张表,直接对查询字段设置别名

SELECT username AS name,email FROM user

查询两张表

好处:字段别名一个明显的效果是可以自定义查询数据返回的字段名;当两张表有相同的字段需要都被查询出,使用别名可以完美的进行区分,避免冲突

SELECT a.title AS atitle,u.username,u.title AS utitle FROM article AS a, user AS u where a.uid=u.uid
  • 关联查询时候,关联表自身的时候,一些分类表,必须使用别名。

  • 别名也可以在group         by与having的时候都可使用

  • 别名可以在order by排序的时候被使用

    查看上面一段sql

  • delete , update MySQL都可以使用别名,别名在多表(级联)删除尤为有用

delete t1,t2 from t_a t1 , t_b t2 where t1.id = t2.id
  • 子查询结果需要使用别名

    查看上面一段sql

别名使用注意事项

  • 虽然定义字段别名的 AS 关键字可以省略,但是在使用别名时候,建议不要省略 AS 关键字

书写sql语句的注意事项

书写规范上的注意

  • 字符串类型的要加单引号

  • select后面的每个字段要用逗号分隔,但是最后连着from的字段不要加逗号

  • 使用子查询创建临时表的时候要使用别名,否则会报错。

为了增强性能的注意

  • 不要使用“select * from ……”返回所有列,只检索需要的列,可避免后续因表结构变化导致的不必要的程序修改,还可降低额外消耗的资源

  • 不要检索已知的列

select  user_id,name from User where user_id = ‘10000050’
  • 使用可参数化的搜索条件,如=, >, >=, , !=, !>, !

  • 当需要验证是否有符合条件的记录时,使用exists,不要使用count(*),前者在第一个匹配记录处返回,后者需要遍历所有匹配记录

  • Where子句中列的顺序与需使用的索引顺序保持一致,不是所有数据库的优化器都能对此顺序进行优化,保持良好编程习惯(索引相关)

  • 不要在where子句中对字段进行运算或函数(索引相关)

  1. 如where  amount / 2 > 100,即使amount字段有索引,也无法使用,改成where amount > 100 * 2就可使用amount列上的索引

  2. 如where substring( Lastname, 1, 1) = ‘F’就无法使用Lastname列上的索引,而where Lastname like ‘F%’或者where Lastname >= ‘F’ and Lastname

  • 在有min、max、distinct、order by、group by操作的列上建索引,避免额外的排序开销(索引相关)

  • 小心使用or操作,and操作中任何一个子句可使用索引都会提高查询性能,但是or条件中任何一个不能使用索引,都将导致查询性能下降,如where member_no = 1 or provider_no = 1,在member_no或provider_no任何一个字段上没有索引,都将导致表扫描或聚簇索引扫描(索引相关)

  • Between一般比in/or高效得多,如果能在between和in/or条件中选择,那么始终选择between条件,并用>=和=和

  • Adjust the order of join operations to optimize performance. The join operation is top-down. Try to put the association of two tables with smaller result sets in front to improve performance. (join related) Note: I will take out two separate articles to explain indexing and association in detail. I will only briefly mention it in this note.

For more MySQL related technical articles, please visit the MySQL Tutorial column to learn!

The above is the detailed content of How to write elegant SQL native statements. For more information, please follow other related articles on the PHP Chinese website!

Statement:
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn