Or how wsrep_on can bring you to have a cluster with usless data.

redflag

This is a WARNING article, and it comes out after I have being working on define internal blueprint on how to perform DDL operation using RSU safely.

The fun, if fun we want to call it, comes as usual by the fact that I am a curious guy and I often do things my way and not always following the official instructions.

Anyhow, lets us go straight to the point and describe what can happen on ANY MySQL/Galera installation.

The environment

The test environment, MySQL/Galera (Percona PXC 5.6.20 version).

The cluster was based on three nodes local no geographic distribution, no other replication in place then Galera.

Haproxy on one application node, simple application writing in this table:

Table: tbtest1

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
CREATE TABLE: CREATE TABLE 'tbtest1' (
'auAPP1nc' bigint(11) NOT NULL AUTO_INCREMENT,
'a' int(11) NOT NULL,
'uuid' char(36) COLLATE utf8_bin NOT NULL,
'b' varchar(100) COLLATE utf8_bin NOT NULL,
'c' char(200) COLLATE utf8_bin NOT NULL,
'counter' bigint(20) DEFAULT NULL,
'time' timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
'partitionid' int(11) NOT NULL DEFAULT '0',
'date' date NOT NULL,
'strrecordtype' char(3) COLLATE utf8_bin DEFAULT NULL,
PRIMARY KEY ('auAPP1nc','partitionid'),
KEY 'IDX_a' ('a'),
KEY 'IDX_uuid' ('uuid')
) ENGINE=InnoDB AUTO_INCREMENT=482 DEFAULT CHARSET=utf8 COLLATE=utf8_bin
 

 

Small app

1
2
3
4
5
6
7
8
9
10
11
12
13
14
#! /bin/bash --
i=1
while :
do
	echo "$i "
	mysql -ustress -pxxx -h192.168.0.35 -P 3307 -e "SET @HH=@@HOSTNAME;
insert into test.tbtest1 (a,uuid,b,c,strrecordtype,date,partitionid) 
values($i,UUID(),@HH,'a','APP1'|'APP2',now(),RAND()*100)";
	i=$((i + 1))
	if [ $i -eq 100 ]
	then
		break
	fi
	sleep 0.5;
done
 

 

 

Server Information

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
(root@localhost:pm) [(none)]>\s SHOW global STATUS LIKE 'wsrep_provider_version';
--------------
/home/mysql/templates/PCX-56/bin/mysql Ver 14.14 Distrib 5.6.20-68.0, FOR Linux (i686) USING EditLine wrapper
Connection id:	90
Current DATABASE:
Current user:	root@localhost
SSL:	NOT IN USE
Current pager:	stdout
USING OUTFILE:	''
USING delimiter:	;
Server version:	5.6.20-68.0-25.7-log Percona XtraDB Cluster BINARY (GPL) 5.6.20-25.7, Revision 886, wsrep_25.7.r4126
Protocol version:	10
Connection:	Localhost via UNIX socket
Server characterset:	utf8
Db characterset:	utf8
Client characterset:	utf8
Conn. characterset:	utf8
UNIX socket:	/home/mysql/instances/galera1-56/mysql.sock
Uptime:	2 min 38 sec
Threads: 3 Questions: 282 Slow queries: 0 Opens: 78 FLUSH TABLES: 3 Open TABLES: 8 Queries per second avg: 1.784
--------------
+------------------------+---------------+
| Variable_name | Value |
+------------------------+---------------+
| wsrep_provider_version | 3.7(r7f44a18) |
+------------------------+---------------+
1 row IN SET (0.01 sec)
 

 

 

Facts

In MySQL/Galera there is variable that allow us to say to the server to do not replicate. This variable is wsrep_on and when we set it as OFF the server will not replicate any statement to the other node.

This is quite useful when in the need to perform actions on an single node, like when you need to perform DDL on RSU mode.

But this flexibility can bite you quite badly.

I had done a simple small change to the widely use command:

 

SET wsrep_on=OFF;

 

I just add GLOBAL:

SET GLOBAL wsrep_on=OFF;

 

 

To be honest I was expecting to have the command rejected, but no it was accept and this is what happened:

I had run the small loop (see above) on two application servers, one pointing to HAProxy and writing APP1 in the field strrecordtype, the other pointing directly to the node where I will issue the command with wsrep_on inserting APP2.

The results:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
(root@localhost:pm) [test]>select @@HOSTNAME;select count(*) AS APP2_COUNTER FROM tbtest1 
WHERE strrecordtype='APP2';
select count(*) AS APP1_COUNTER FROM tbtest1 WHERE strrecordtype='APP1';
+---------------+
| @@HOSTNAME    |
+---------------+
| tusacentral03 |
+---------------+
1 row IN SET (0.00 sec)
+-------------+
|APP2_COUNTER |
+-------------+
| 99          |
+-------------+
1 row IN SET (0.00 sec)
+-------------+
|APP1_COUNTER |
+-------------+
| 99          |
+-------------+
1 row IN SET (0.00 sec)
(root@localhost:pm) [test]>
(root@localhost:pm) [test]>SET GLOBAL wsrep_on=OFF; <------------- It should not be GLOBAL
(root@localhost:pm) [test]>select @@HOSTNAME;select count(*) AS APP2_COUNTER FROM tbtest1 
WHERE strrecordtype='APP2';
select count(*) AS APP1_COUNTER FROM tbtest1 WHERE strrecordtype='APP1';
+---------------+
| @@HOSTNAME    |
+---------------+
| tusacentral01 |
+---------------+
1 row IN SET (0.00 sec)
+-------------+
|APP2_COUNTER |
+-------------+
| 0 |
+-------------+
1 row IN SET (0.00 sec)
+-------------+
|APP1_COUNTER |
+-------------+
| 66              | <-------------------- 1/3 lost because HAProxy think that the node is ok ...
+-------------+
1 row IN SET (0.00 sec)

 

 

As you can see in the tusacentral03 (which is the one where I issue SET GLOBAL wsrep_ON=OFF), I have ALL the records inserted in the local node and ALL the records coming from the others node.

But on the node tusacentral01, I had NO records related to APP2, but more relevant I had lost 1/3 of my total inserts.

Why?

Well this is quite clear, and unfortunately is by design.

If I issue wsrep_ON=OFF with GLOBAL the server will apply the setting to ALL sessions, meaning all session on that will STOP to replicate.

In the source code the section relevant to this is quite clear:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
#wsrep_mysqld.cc
#line 1395
int wsrep_to_isolation_begin(THD *thd, char *db_, char *table_,
                             const TABLE_LIST* table_list)
{
 
  /*
    No isolation for applier or replaying threads.
   */
  if (thd->wsrep_exec_mode == REPL_RECV) return 0;
 
  int ret= 0;
  mysql_mutex_lock(&thd->LOCK_wsrep_thd);
 
  if (thd->wsrep_conflict_state == MUST_ABORT)
  {
    WSREP_INFO("thread: %lu, %s has been aborted due to multi-master conflict",
               thd->thread_id, thd->query());
    mysql_mutex_unlock(&thd->LOCK_wsrep_thd);
    return WSREP_TRX_FAIL;
  }
  mysql_mutex_unlock(&thd->LOCK_wsrep_thd);
 
  DBUG_ASSERT(thd->wsrep_exec_mode == LOCAL_STATE);
  DBUG_ASSERT(thd->wsrep_trx_meta.gtid.seqno == WSREP_SEQNO_UNDEFINED);
 
  if (thd->global_read_lock.can_acquire_protection())
  {
    WSREP_DEBUG("Aborting APP1: Global Read-Lock (FTWRL) in place: %s %lu",
                thd->query(), thd->thread_id);
    return -1;
  }
 
  if (wsrep_debug && thd->mdl_context.has_locks())
  {
    WSREP_DEBUG("thread holds MDL locks at TI begin: %s %lu",
                thd->query(), thd->thread_id);
  }
 
  /*
    It makes sense to set auto_increment_* to defaults in APP1 operations.
    Must be done before wsrep_APP1_begin() since Query_log_event encapsulating
    APP1 statement and auto inc variables for wsrep replication is constructed
    there. Variables are reset back in THD::reset_for_next_command() before
    processing of next command.
   */
  if (wsrep_auto_increment_control)
  {
    thd->variables.auto_increment_offset = 1;
    thd->variables.auto_increment_increment = 1;
  }
 
  if (thd->variables.wsrep_on && thd->wsrep_exec_mode==LOCAL_STATE) <------- Here we have a check for wsrep_on 
  {
    switch (wsrep_OSU_method_options) {
    case WSREP_OSU_APP1: ret =  wsrep_APP1_begin(thd, db_, table_,
                                               table_list); break;
    case WSREP_OSU_APP2: ret =  wsrep_APP2_begin(thd, db_, table_); break;
    }
    if (!ret)
    {
      thd->wsrep_exec_mode= TOTAL_ORDER;
    }
  }
  return ret;
}
enum wsrep_exec_mode {
    LOCAL_STATE,
    REPL_RECV,
    TOTAL_ORDER,
    LOCAL_COMMIT
};
 

 

 

So what happen is that the server check if the thd object has that variable ON and has LOCAL_STATE, if so it replicates, if not it does nothing.

But as said while this makes sense in the SESSION scope, it does not in the GLOBAL.

 

Not only, setting wsrep_on to OFF in global scope does NOT trigger any further action from MySQL/Galera, like for instance the possible FACT that the node could be desynchronize from the remaining cluster.

The interesting effect of this is that HAProxy has NO WAY to know that the node had stop to replicate, and as such the server can receive the requests but those will not replicate to the other node causing data diversion.

 

You can say, that a DBA SHOULD know what he is doing, and as such he/her should be MANUALLY desync the node and then issue the command.

My point instead is that I don't see ANY good reason to have wsrep_on as global variable; instead I see this as a very dangerous and conceptually wrong "feature".

 

Browsing the Codership manual, I noticed that the wsrep_on variable comes with the "L" flag, meaning that the variable is NOT suppose to be GLOBAL.

But it is ...

I also had dig in the code and:

1
2
3
4
5
6
7
8
9
10
11
12
wsrep_var.cc
#line58
 
bool wsrep_on_update (sys_var *self, THD* thd, enum_var_type var_type)
{
  if (var_type == OPT_GLOBAL) {
    // FIXME: this variable probably should be changed only per session
    thd->variables.wsrep_on = global_system_variables.wsrep_on;
  }
  return false;
}
 

 

That is interesting isn't it?

Wondering when this comment was inserted and why it was ignored.

 

Anyhow the source of all problems is here in the wsrep_on variable definition:

1
2
3
4
5
6
7
static Sys_var_mybool Sys_wsrep_on (
       "wsrep_on", "To enable wsrep replication ",
       SESSION_VAR(wsrep_on),                      <----------------------- This allow global 
       CMD_LINE(OPT_ARG), DEFAULT(TRUE), 
       NO_MUTEX_GUARD, NOT_IN_BINLOG, ON_CHECK(0),
       ON_UPDATE(wsrep_on_update));
 

 

The variable was defined as SESSION_VAR instead of SESSION_ONLY, and as such used also in global scope.

 

As already state, this is from my point of view a conceptual error not a bug, but something that should not exists at all, because in a cluster where I have data certify/replicate/synchronize there should NOT be any option for a DBA/user to bypass at GLOBAL level the data validation/replication process.

 

To note, and to make things worse, after I had done the test I can easily set wsrep_on back, and my node will continue to act as part of the cluster as if all the nodes are equal, while they are not.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
(root@localhost:pm) [test]>select @@HOSTNAME;select count(*) AS RSU_COUNTER FROM tbtest1 
WHERE strrecordtype='RSU';
select count(*) AS TOI_COUNTER FROM tbtest1 WHERE strrecordtype='TOI';
+---------------+
| @@HOSTNAME    |
+---------------+
| tusacentral03 |
+---------------+
1 row IN SET (0.00 sec)
 
+-------------+
| RSU_COUNTER |
+-------------+
|         181 |
+-------------+
1 row IN SET (0.00 sec)
 
+-------------+
| TOI_COUNTER |
+-------------+
|         177 |
+-------------+
1 row IN SET (0.00 sec)
 
+---------------+
| @@HOSTNAME    |
+---------------+
| tusacentral01 |
+---------------+
1 row IN SET (0.00 sec)
 
+-------------+
| RSU_COUNTER |
+-------------+
|          77 |
+-------------+
1 row IN SET (0.00 sec)
 
+-------------+
| TOI_COUNTER |
+-------------+
|         139 |
+-------------+
 

 

As you can see the cluster continue to insert data using HAProxy and all the node, but it has a data set that is inconsistent.

Conclusions

  • Never use SET GLOBAL with wsrep_on
  • IF you are so crazy to do so, be sure no one is writing on the node.
  • I am sure this is a mistake in the logic and as such this variable should be change from the source, in the code defining the variable SESSION_ONLY and not SESSION_VAR
    Or wsrep_on can damage you quite badly.

Latest conferences

We have 8656 guests and no members online