各位用户为了找寻关于简单谈谈MySQL的loose index scan的资料费劲了很多周折。这里教程网为您整理了关于简单谈谈MySQL的loose index scan的相关资料,仅供查阅,以下为您介绍关于简单谈谈MySQL的loose index scan的详细内容
众所周知,InnoDB采用IOT(index organization table)即所谓的索引组织表,而叶子节点也就存放了所有的数据,这就意味着,数据总是按照某种顺序存储的。所以问题来了,如果是这样一个语句,执行起来应该是怎么样的呢?语句如下:
? 1select
count
(
distinct
a)
from
table1;
列a上有一个索引,那么按照简单的想法来讲,如何扫描呢?很简单,一条一条的扫描,这样一来,其实做了一次索引全扫描,效率很差。这种扫描方式会扫描到很多很多的重复的索引,这样说的话优化的办法也是很容易想到的:跳过重复的索引就可以了。于是网上能搜到这样的一个优化的办法:
? 1select
count
(*)
from
(
select
distinct
a
from
table1) t;
从已经搜索到的资料看,这样的执行计划中的extra就从using index变成了using index for group-by。
但是,但是,但是,好在我们现在已经没有使用5.1的版本了,大家基本上都是5.5以上了,这些现代版本,已经实现了loose index scan:
很好很好,就不需要再用这种奇技淫巧去优化SQL了。
文档里关于group by这里写的有点意思,说是最大众化的办法就是进行全表扫描并且创建一个临时表,这样执行计划就会难看的要命了,肯定有ALL和using temporary table了。
5.0之后group by在特定条件下可能使用到loose index scan,
? 1 2 3 4 5 6CREATE
TABLE
log_table (
id
INT
NOT
NULL
PRIMARY
KEY
,
log_machine
VARCHAR
(20)
NOT
NULL
,
log_time DATETIME
NOT
NULL
) ENGINE=InnoDB
DEFAULT
CHARSET=utf8;
CREATE
INDEX
ix_log_machine_time
ON
log_table (log_machine, log_time);
1
? 1 2SELECT
MAX
(log_time)
FROM
log_table;
SELECT
MAX
(log_time)
FROM
log_table
WHERE
log_machine
IN
(
'Machine 1'
);
这两条sql都只需一次index seek便可返回,源于索引的有序排序,优化器意识到min/max位于最左/右块,从而避免范围扫描; extra显示Select tables optimized away ; 2
代码如下: SELECT MAX(log_time) FROM log_table WHERE log_machine IN (‘Machine 1','Machine 2','Machine 3','Machine 4');
执行计划type 为range(extra显示using where; using index),即执行索引范围扫描,先读取所有满足log_machine约束的记录,然后对其遍历找出max value; 改进
代码如下: SELECT MAX(log_time) FROM log_table WHERE log_machine IN (‘Machine 1','Machine 2','Machine 3','Machine 4') group by log_machine order by 1 desc limit 1;这满足group by选择loose index scan的要求,执行计划的extra显示using index for group-by,执行效果等值于
? 1 2 3 4
SELECT
MAX
(log_time)
FROM
log_table
WHERE
log_machine
IN
(‘Machine 1
')
Union
SELECT MAX(log_time) FROM log_table WHERE log_machine IN (‘Machine 2'
)
…..
即对每个log_machine执行loose index scan,rows从原来的82636下降为16(该表总共1,000,000条记录)。
Group by何时使用loose index scan?
适用条件:
1 针对单表操作 2 Group by使用索引的最左前缀列 3 只支持聚集函数min()/max() 4 Where条件出现的列必须为=constant操作 , 没出现在group by中的索引列必须使用constant 5 不支持前缀索引,即部分列索引 ,如index(c1(10)) 执行计划的extra应该显示using index for group-by 假定表t1有个索引idx(c1,c2,c3)
? 1 2 3 4 5 6 7 8SELECT
c1, c2
FROM
t1
GROUP
BY
c1, c2;
SELECT
DISTINCT
c1, c2
FROM
t1;
SELECT
c1,
MIN
(c2)
FROM
t1
GROUP
BY
c1;
SELECT
c1, c2
FROM
t1
WHERE
c1 < const
GROUP
BY
c1, c2;
SELECT
MAX
(c3),
MIN
(c3), c1, c2
FROM
t1
WHERE
c2 > const
GROUP
BY
c1, c2;
SELECT
c2
FROM
t1
WHERE
c1 < const
GROUP
BY
c1, c2;
SELECT
c1, c2
FROM
t1
WHERE
c3 = const
GROUP
BY
c1, c2
SELECT
c1, c3
FROM
t1
GROUP
BY
c1, c2;
--无法使用松散索引
而SELECT c1, c3 FROM t1 where c3= const GROUP BY c1, c2;则可以
紧凑索引扫描tight index scan Group by在无法使用loose index scan,还可以选择tight,若两者都不可选,则只能借助临时表; 扫描索引时,须读取所有满足条件的索引键,要么是全索引扫描,要么是范围索引扫描; Group by的索引列不连续;或者不是从最左前缀开始,但是where条件里出现最左列;
? 1 2SELECT
c1, c2, c3
FROM
t1
WHERE
c2 =
'a'
GROUP
BY
c1, c3;
SELECT
c1, c2, c3
FROM
t1
WHERE
c1 =
'a'
GROUP
BY
c2, c3;
5.6的改进 事实上,5.6的index condition push down可以弥补loose index scan缺失带来的性能损失。 KEY(age,zip)
? 1 2 3 4 5 6 7mysql> explain SELECT name FROM people WHERE age BETWEEN 18 AND 20 AND zip IN (12345,12346, 12347);
+----+-------------+--------+-------+---------------+------+---------+------+-------+-------------+
|
id
| select_type | table |
type
| possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+--------+-------+---------------+------+---------+------+-------+-------------+
| 1 | SIMPLE | people | range | age | age | 4 | NULL | 90556 | Using where |
+----+-------------+--------+-------+---------------+------+---------+------+-------+-------------+
1 row
in
set
(0.01 sec)
根据key_len=4可以推测出sql只用到索引的第一列,即先通过索引查出满足age (18,20)的行记录,然后从server层筛选出满足zip约束的行; pre-5.6,对于复合索引,只有当引导列使用"="时才有机会在索引扫描时使用到后面的索引列。
? 1 2 3 4 5 6 7mysql> explain SELECT name FROM people WHERE age=18 AND zip IN (12345,12346, 12347);
+----+-------------+--------+-------+---------------+------+---------+------+------+-------------+
|
id
| select_type | table |
type
| possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+--------+-------+---------------+------+---------+------+------+-------------+
| 1 | SIMPLE | people | range | age | age | 8 | NULL | 3 | Using where |
+----+-------------+--------+-------+---------------+------+---------+------+------+-------------+
1 row
in
set
(0.00 sec)
对比一下查询效率
? 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15mysql> SELECT sql_no_cache name FROM people WHERE age=19 AND zip IN (12345,12346, 12347);
+----------------------------------+
| name |
+----------------------------------+
| 888ba838661aff00bbbce114a2a22423 |
+----------------------------------+
1 row
in
set
(0.06 sec)
mysql> SELECT SQL_NO_CACHE name FROM people WHERE age BETWEEN 18 AND 22 AND zip IN (12345,12346, 12347);
+----------------------------------+
| name |
+----------------------------------+
| ed4481336eb9adca222fd404fa15658e |
| 888ba838661aff00bbbce114a2a22423 |
+----------------------------------+
2 rows
in
set
(1 min 56.09 sec)
对于第二条sql,可以使用union改写,
? 1 2 3 4 5 6 7 8 9mysql> SELECT name FROM people WHERE age=18 AND zip IN (12345,12346, 12347)
-> UNION ALL
-> SELECT name FROM people WHERE age=19 AND zip IN (12345,12346, 12347)
-> UNION ALL
-> SELECT name FROM people WHERE age=20 AND zip IN (12345,12346, 12347)
-> UNION ALL
-> SELECT name FROM people WHERE age=21 AND zip IN (12345,12346, 12347)
-> UNION ALL
-> SELECT name FROM people WHERE age=22 AND zip IN (12345,12346, 12347);
而mysql5.6引入了index condition pushdown,从优化器层面解决了此类问题。