打造出色查询：如何优化SQL查询？

2013 年 8 月 2 日

本文转载自公众号“读芯术”(ID：AI_Discovery)。

我们致力于打造能够较好运行并延续较长一段时间的query(查询)。本文将给出关于优化SQL语句的几点建议，希望能够帮到你。

1. 尝试不去用select *来查询SQL，而是选择专用字段。

反例：

select * from employee;

正例：

select id，name fromemployee;

理由：

通过只用必要字段进行查询，能够节省资源并减少网络开销。
这样做可能不会使用覆盖索引，会导致一个查询返回到表中。

2. 如果已知只有一个查询结果，推荐使用limit 1

假设有一张员工表格，想在其中找到一名叫jay的员工。

CREATE TABLE employee ( 
id int(11) NOT NULL, 
name varchar(255) DEFAULT NULL, 
age int(11) DEFAULT NULL, 
date datetime DEFAULT NULL, 
sex int(1) DEFAULT NULL, 
PRIMARY KEY (`id`) );

反例：

select id，name from employeewhere name='jay';

正例：

select id，name from employeewhere name='jay' limit 1;

理由：添加limit 1后，查找到相应的记录时，便不会继续查找下去，效率会大大提高。

3. 尝试避免在 where 子句中使用or来连接条件

创建一个新的用户表格，其有一个常规索引userId，表格结构如下：

CREATE TABLE `user` ( 
  `id` int(11) NOT NULL AUTO_INCREMENT, 
  `userId` int(11) NOT NULL, 
  `age` int(11) NOT NULL, 
  `name` varchar(255) NOT NULL, 
  PRIMARY KEY (`id`), 
  KEY `idx_userId` (`userId`) )

现在假设需要查询userid为1或为18岁的用户，使用如下的SQL就会很简单。

反例：

select * from user where userid = 1 or age = 18;

正例：

//se union all 
select * from user where userid=1 
union all 
select * from user where age = 18;//Or write two separate SQL 
select * from user where userid=1;

理由：or 的使用可能会使索引无效，因此需要进行全表扫描。

在or 无索引的情况下，假设已采用userId索引，但是当涉及到 age(年龄)查询条件时，必须执行全表扫描，其过程分为三步：全表扫描+索引扫描+合并。

图源：unsplash

4. 尽可能避免在where子句中使用!=或<>运算符，否则，引擎将放弃使用索引并执行全表扫描。

反例：

select age,name from user where age<>18;

正例：

//You can consider separate two sql writeselect age,name from user where age <18; 
select age,name from user where age>18;

理由：使用!=和<>可能使索引无效。

5. 优化limit分页

通常用limits来实现日常分页，但当偏移量特别大时，查询效率便会降低。因为Mysql不会跳过偏移量，而是直接获取数据。

反例：

select id，name，age from employeelimit 10000，10;

正例：

//Solution 1: Return the largest record (offset) of the last query 
select id，name from employeewhere id>10000 limit 10;//Solution 2: order by + index 
select id，name from employeeorder by id limit 10000，10;

理由：

如果使用了优化方案1，则会返回最末的查询记录(偏移量)，因此可以跳过该偏移量，效率自然会大幅提高。
选项二：使用+索引排序，也可以提高查询效率。

6. 优化like语句

在日常开发中，如果使用模糊关键字查询，我们很容易想到like，但like可能会使索引无效。

反例：

select userId，name from user where userId like '%123';

正例：

select userId，name from user where userId like '123%';

理由：https://medium.com/@pawanjain.432/hey-thanks-dovid-for-pointing-out-a-typo-in-13-1000a4103fe6

7. 使用where条件限制将要查询的数据来避免返回额外行

假设要查询一名用户是否为会员，老式执行代码会这样做。

反例：

List userIds = sqlMap.queryList("select userId from userwhere isVip=1");boolean isVip = userIds.contains(userId);

正例：

Long userId = sqlMap.queryObject("select userId from user whereuserId='userId' and isVip='1' ")boolean isVip = userId！=null;

理由：能够检查需要的数据，避免返回非必要数据，并能节省费用和计算机开销。

图源：unsplash

8. 考虑在where子句中使用默认值而不是null

反例：

select * from user where age is not null;

正例：

select * from user where age>0; //Set 0 as default

理由：如果用默认值取代null值，则通常可以建立索引，与此同时，表达式将相对清晰。

9. 如果插入数据过多，可以考虑批量插入

反例：

for(User 
    u :list){ 
INSERT into user(name,age)values(#name#,#age#) 
}

正例：

//One batch of 500 inserts, carried out in batchesinsert intouser(name,age) values 
 
    (#{item.name},#{item.age})

理由：批量插入性能良好且省时。

打个比方，在有电梯的情况下，你需要将1万块砖移送到建筑物的顶部。电梯一次可以放置适当数量的砖块(最多500块)，你可以选择一次运送一块砖，也可以一次运送500块。哪种方案更好?

10. 谨慎使用distinct关键词

Distinct关键词通常用于过滤重复记录以返回唯一记录。当其被用于查询一个或几个字段时，Distinct关键词将为查询带来优化效果。然而，在字段过多的情况下，Distinct关键词将大大降低查询效率。

反例：

SELECT DISTINCT * from user;

正例：

select DISTINCT name from user;

理由：带有“distinct”语句的CPU时间和占用时间高于没有“ distinct”的语句。

如果在查询多字段时使用distinct，数据库引擎将比较数据，并滤除重复的数据。然而，该比较和滤除过程将消耗系统资源和CPU时间。

图源：unsplash

11. 删除多余和重复的索引

反例：

KEY `idx_userId` (`userId`) 
KEY `idx_userId_age` (`userId`,`age`)

正例：

//Delete the userId index, because the combined index (A, B) is equivalentto creating the (A) and (A, B) indexesKEY `idx_userId_age` (`userId`,`age`)

理由：若保留重复的索引，那么优化程序在优化查询时也需要对其进行一一考量，这会影响性能。

12. 如果数据量很大，优化 modify或delete语句

避免同时修改或删除过多数据，因其将导致CPU利用率过高，从而影响他人对数据库的访问。

反例：

//Delete 100,000 or 1 million+ at a time? 
delete from user where id <100000;//Or use single cycle operation, lowefficiency and long time 
for（User user：list）{ 
   delete from user；}

正例：

//Delete in batches, such as 500 each timedelete user where id<500; 
delete product where id>=500 and id<1000；

理由：一次删除过多数据，可能会导致lock wait timeout exceed error(锁定等待超时错误)，因此建议分批操作。

13. 使用explain分析SQL方案

在日常开发中编写SQL时，尝试养成习惯：使用explain来分析自己编写的SQL，尤其是索引。

explain select * from user where userid = 10086 or age =18;

14. 尝试用union all代替union

如果搜索结果里没有重复的记录，我推荐用union all代替union。

反例：

select * from user where userid=1 
union 
select * from user where age = 10

正例：

select * from user where userid=1 
union all 
select * from user where age = 10

理由：

如果使用union，则无论有没有重复的搜索结果，它都会尝试对其进行合并、排序，然后输出最终结果。
若已知搜索结果中没有重复记录，用union all代替union将提高效率。

15. 尽可能使用数字字段。如果字段仅包含数字信息，尝试不将其设置为字符类型。

反例：

king_id` varchar（20） NOT NULL;

正例：

king_id` int(11) NOT NULL;

理由：与数字字段相比，字符类型将降低查询和连接的性能，并会增加存储开销。

16. 尽可能用varchar或nvarchar代替char或nchar

反例：

deptName` char(100) DEFAULT NULL

正例：

deptName` varchar(100) DEFAULT NULL

理由：

首先，由于可变长度字段的存储空间很小，该方法可以节省存储空间。
其次，对于查询而言，在相对较小的字段中搜索会更有效率。

优化和加速SQL查询是门技术活，常常思考和尝试，你会打开新世界的大门。

【责任编辑：赵宁宁 TEL：（010）68476606】

About The Author

bjmayor

程序员，码农，php,python,ios,android,go，产品经理，创业。

M	T	W	T	F	S	S
« Jan
				1	2	3
4	5	6	7	8	9	10
11	12	13	14	15	16	17
18	19	20	21	22	23	24
25	26	27	28	29	30	31

1. 尝试不去用select *来查询SQL，而是选择专用字段。

2. 如果已知只有一个查询结果，推荐使用limit 1

3. 尝试避免在 where 子句中使用or来连接条件

4. 尽可能避免在where子句中使用!=或<>运算符，否则，引擎将放弃使用索引并执行全表扫描。

5. 优化limit分页

6. 优化like语句

7. 使用where条件限制将要查询的数据来避免返回额外行

8. 考虑在where子句中使用默认值而不是null

9. 如果插入数据过多，可以考虑批量插入

10. 谨慎使用distinct关键词

11. 删除多余和重复的索引

12. 如果数据量很大，优化 modify或delete语句

13. 使用explain分析SQL方案

14. 尝试用union all代替union

15. 尽可能使用数字字段。如果字段仅包含数字信息，尝试不将其设置为字符类型。

16. 尽可能用varchar或nvarchar代替char或nchar

Related Posts

About The Author

bjmayor