前一篇文章介绍了当mysql的复制出现slave sql 进程终止时的解决办法,其中之一使用了
sql_slave_skip_counter 来使sql 线程跳过遇到错误的事件记录!本文浅析一下 sql_slave_skip_counter的具体用法和意义! set global sql_slave_skip_counter = N This statement skips the next N events from the master. (即是跳过N个events,这里最重要的是理解event的含义!在mysql中,对于sql的 binary log 实际上是由一连串的event组成的一个组,即事务组。) 在备库上设置 global sql_slave_skip_counter =N 会跳过当前时间来自于master的之后N个事件,这对于恢复由某条SQL语句引起的从库复制有效. 此语句只在当slave threads是停止时才有效,否则将发生一条错误..每忽略一个事件,N 减一,直到N减为0! When using this statement, it is important to understand that the binary log is actually organized as a sequence of groups known as event groups. Each event group consists of a sequence of events. For transactional tables, an event group corresponds to a transaction. For nontransactional tables, an event group corresponds to a single SQL statement. Note A single transaction can contain changes to both transactional and nontransactional tables.When you use SET GLOBAL sql_slave_skip_counter to skip events and the result is in the middle of a group, the slave continues to skip events until it reaches the end of the group. Execution then starts with the next event group ### comment ### Setting this variable isn't like setting other server variables: you can't read the variable back again as @@sql_slave_skip_counter, and it isn't really a "global variable." Rather, it's a variable that only the slave thread reads. When you restart the slave threads again with START SLAVE, the slave skips statements and decrements the variable until it reaches 0, at which point it begins executing statements again. You can watch this happening by executing SHOW SLAVE STATUS, where the variable's value appears in the Skip_Counter column. This is the only place you can see its value. The effect is that the setting isn't persistent. If you set it to 1, start the slave, and the slave has an error in replication sometime later, the variable won't still be set to 1. It'll be 0. At that point, if you want the slave to skip the statement that caused the error, you'll have to set it to 1 again. 有关"SET GLOBAL sql_slave_skip_counter"的语法可以参考 1 使用含有 stop slave 的命令, 在主库上创建测试表,并使用shell 插入数据!
mysql> create table tab_skip(id int);
Query OK, 0 rows affected (0.80 sec)
echo "insert into tab_skip(id) values($i)" | mysql -h127.0.0.1 test ;
在备库 使用 set global sql_slave_skip_counter=1;命令做测试
echo "slave stop;set global sql_slave_skip_counter=1; slave start;show slave status\G" | mysql -h127.0.0.1 -P3306 test ;
[root@rac3 mysql]# mysql
mysql> use test;
Database changed
mysql> select count(1) from tab_1;
+----------+
| count(1) |
+----------+
| 100 |
+----------+
1 row in set (0.00 sec)
备库上面,少了 10条数据!因为正是执行set global sql_slave_skip_counter=1;使备库执行sql replay的时候忽略了事件!
[root@rac4 mysql]# mysql
mysql> use test;
Database changed
mysql> select count(1) from tab_1;
+----------+
| count(1) |
+----------+
| 90 |
+----------+
1 row in set (0.00 sec)
有网友测试的是在备库上执行没有stop slave 语句的命令,但是在5.5.18版本上面是不允许的!
[root@rac3 mysql]# for i in {1..100}; do echo $i; echo "insert into tab_2(id) values($i)" | mysql -h127.0.0.1 test ; sleep 2;done;
1
在备库上执行,注:"set global sql_slave_skip_counter=1; slave start;show slave status\G" 没有stop slave 语句,报错!
[root@rac4 mysql]# for i in {1..10}; do echo $i; echo "set global sql_slave_skip_counter=1; slave start;show slave status\G" | mysql -h127.0.0.1 -P3306 test ; sleep 2;done;
1
ERROR 1198 (HY000) at line 1: This operation cannot be performed with a running slave; run STOP SLAVE first
2
ERROR 1198 (HY000) at line 1: This operation cannot be performed with a running slave; run STOP SLAVE first
3
ERROR 1198 (HY000) at line 1: This operation cannot be performed with a running slave; run STOP SLAVE first
4
ERROR 1198 (HY000) at line 1: This operation cannot be performed with a running slave; run STOP SLAVE first
5
使用 该参数能够解决从服务器sql 进程停止导致的数据库不同步,但是也有一定的风险,比如在高并发的数据库环境下,可能会导致数据丢失!
另见另一位网友的 (多少有些出入,他的可以不使用stop slave)