MySQL 故障诊断:一个 ALTER TALBE 执行了很久,你慌不慌?

先了解下 MySQL 数据字典

当我们对一张大表执行了一个 ALTER TABLE 操作,执行了很久,也不知道是否执行完成,进程在那挂着,此时的你,干瞪眼,进度看不到,进程不敢杀,就问你慌不慌?

在具体解决这个慌乱之前,我们先了解下 MySQL 的数据字典。

数据字典我们用一句话来概括,就是数据的数据,用于查询数据库中数据的信息内容。

MySQL 在 5.7 开始,对数据字典的使用有了很大的改进,使用上更加的方便,提供的能力也更高。performance_schema 可以查询事务信息、获取元数据锁、跟踪事件、统计内存使用情况等等。

到这里你是不是发现了什么?

performance_schema 这个是什么,是不是用它来让我们解除慌张?

有关 MySQL 数据字典,可以看我的另外一个 Chat:

MySQL 地基基础:数据字典

使用一些重要的重要功能

既然了解到 MySQL 数据字典可以帮助我们,那它是如何实现呢?

我们先看看官网是如何解释的

https://dev.mysql.com/doc/refman/5.7/en/monitor-alter-table-performance-schema.html

You can monitor ALTER TABLE progress for InnoDB tables using Performance Schema.

There are seven stage events that represent different phases of ALTER TABLE. Each stage event reports a running total of WOR_COMPLETED and WORK_ESTIMATED for the overall ALTER TABLE operation as it progresses through its different phases. WORK_ESTIMATED is calculated using a formula that takes into account all of the work that ALTER TABLE performs, and may be revised during ALTER TABLE processing. WORK_COMPLETED and WORK_ESTIMATED values are an abstract representation of all of the work performed by ALTER TABLE.

大意就是我们可以在 Performance Schema 看到 ALTER TABLE 的进度,包括执行的时间,大概剩余的时间。

想要使用这个能力,我们需要开启几个功能。

Enable the stage/innodb/alter% instruments

mysql> UPDATE performance_schema.setup_instruments
    ->        SET ENABLED = 'YES'
    ->        WHERE NAME LIKE 'stage/innodb/alter%';
Query OK, 0 rows affected (0.01 sec)
Rows matched: 7  Changed: 0  Warnings: 0

Enable the stage event consumer tables

mysql> UPDATE performance_schema.setup_consumers
    ->        SET ENABLED = 'YES'
    ->        WHERE NAME ='events_stages_current';
Query OK, 1 row affected (0.00 sec)
Rows matched: 1  Changed: 1  Warnings: 0

mysql> UPDATE performance_schema.setup_consumers
    ->        SET ENABLED = 'YES'
    ->        WHERE NAME ='events_stages_history';
Query OK, 1 row affected (0.01 sec)
Rows matched: 1  Changed: 1  Warnings: 0

mysql> UPDATE performance_schema.setup_consumers
    ->        SET ENABLED = 'YES'
    ->        WHERE NAME ='events_stages_history_long';
Query OK, 1 row affected (0.00 sec)
Rows matched: 1  Changed: 1  Warnings: 0

功能开启了,接下来我们进行直观的验证环节。

直观的观察事件执行进度

首先,我们有一张大表(你可以用 SysBench 建一个,或者其他各种途径都可以),这里我已经有一张大表 sbtest.sbtest1,表结构如下:

mysql> desc sbtest.sbtest1;
+-------+-----------+------+-----+---------+----------------+
| Field | Type      | Null | Key | Default | Extra          |
+-------+-----------+------+-----+---------+----------------+
| id    | int(11)   | NO   | PRI | NULL    | auto_increment |
| k     | int(11)   | NO   | MUL | 0       |                |
| c     | char(120) | NO   |     |         |                |
| pad   | char(60)  | NO   |     |         |                |
+-------+-----------+------+-----+---------+----------------+
4 rows in set (0.00 sec)

数据量 500W:

mysql> select count(*) from  sbtest.sbtest1;
+----------+
| count(*) |
+----------+
|  5000000 |
+----------+
1 row in set (0.67 sec)

新增一个字段:

mysql> alter table sbtest.sbtest1 add d char(20);

重头戏来了,查看一下进度:

mysql> select * from performance_schema.events_stages_current\G;
*************************** 1. row ***************************
         THREAD_ID: 28
          EVENT_ID: 14
      END_EVENT_ID: NULL
        EVENT_NAME: stage/innodb/alter table (read PK and internal sort)
            SOURCE: 
       TIMER_START: 159726265417733000
         TIMER_END: 159819571346680000
        TIMER_WAIT: 93305928947000
    WORK_COMPLETED: 118256
    WORK_ESTIMATED: 302958
  NESTING_EVENT_ID: 13
NESTING_EVENT_TYPE: STATEMENT
1 row in set (0.00 sec)

......

mysql> select * from performance_schema.events_stages_current\G;
*************************** 1. row ***************************
         THREAD_ID: 28
          EVENT_ID: 14
      END_EVENT_ID: NULL
        EVENT_NAME: stage/innodb/alter table (read PK and internal sort)
            SOURCE: 
       TIMER_START: 159726265417733000
         TIMER_END: 159910492100061000
        TIMER_WAIT: 184226682328000
    WORK_COMPLETED: 230688
    WORK_ESTIMATED: 302958
  NESTING_EVENT_ID: 13
NESTING_EVENT_TYPE: STATEMENT
1 row in set (0.01 sec)

多执行几次,发现数据是有变化的,这些内容代表了什么呢?

  • THREAD_ID:线程 ID
  • EVENT_ID:事件 ID
  • END_EVENT_ID:结束事件 ID
  • EVENT_NAME:事件名称,说明了当前执行的事件
  • SOURCE:源码位置
  • TIMER_START:事件开始时间(皮秒)
  • TIMER_END:事件结束时间(皮秒,如果没有执行完成,时间就是当前之间)
  • TIMER_WAIT:事件等待事件(皮秒)
  • WORK_COMPLETED:任务完成情况
  • WORK_ESTIMATED:任务估算情况
  • NESTING_EVENT_ID:事件对应的父事件 ID
  • NESTING_EVENT_TYPE:父事件类型(STATEMENT、STAGE、WAIT)

收下这个常用的 SQL

  1. 查看事件任务完成情况:
mysql> SELECT pt.INFO, ec.THREAD_ID, ec.EVENT_NAME, ec.WORK_COMPLETED, ec.WORK_ESTIMATED, pt.STATE FROM performance_schema.events_stages_current ec left join performance_schema.threads th on ec.thread_id = th.thread_id left join information_schema.PROCESSLIST pt on th.PROCESSLIST_ID = pt.ID where pt.INFO like 'ALTER%';
+-------------------------------------------+-----------+------------------------------------------------------+----------------+----------------+----------------+
| INFO                                      | THREAD_ID | EVENT_NAME                                           | WORK_COMPLETED | WORK_ESTIMATED | STATE          |
+-------------------------------------------+-----------+------------------------------------------------------+----------------+----------------+----------------+
| alter table sbtest.sbtest1 add d char(20) |        28 | stage/innodb/alter table (read PK and internal sort) |         201496 |         308223 | altering table |
+-------------------------------------------+-----------+------------------------------------------------------+----------------+----------------+----------------+
1 row in set (0.25 sec)
  1. 查看任务完成事件:
mysql> select stmt.sql_text as sql_text, concat(work_completed, '/' , work_estimated) as progress, (stage.timer_end - stmt.timer_start) / 1e12 as current_seconds, (stage.timer_end - stmt.timer_start) / 1e12 * (work_estimated-work_completed) / work_completed as remaining_seconds from performance_schema.events_stages_current stage, performance_schema.events_statements_current stmt where stage.thread_id = stmt.thread_id and stage.nesting_event_id = stmt.event_id;
+-------------------------------------------+---------------+-----------------+--------------------+
| sql_text                                  | progress      | current_seconds | remaining_seconds  |
+-------------------------------------------+---------------+-----------------+--------------------+
| alter table sbtest.sbtest1 add d char(20) | 135192/308223 |   102.461512532 | 131.13954949201502 |
+-------------------------------------------+---------------+-----------------+--------------------+
1 row in set (0.00 sec)

总结

这样我们通过 MySQL 的这个数据字典就可以很直观地看到 ALTER 的执行情况了,当你看到这样的执行进度,是不是就不那么慌了。