TusaCentral - ProxySQL and MHA integration

ProxySQL and MHA integration
MHA (Master High Availability Manager and tools for MySQL), is almost fully integrated with the ProxySQL process.

What it means is that you can count on the MHA standard feature to manage the failover, and on ProxySQL to manage the traffic and shift from one server.

This is one of the main difference between using MHA and VIP vs MHA and ProxySQL.

There is no need to move IPs or re-define DNS.

The following is an example of configuration file for MHA in case you use it with ProxySQL:

[server default]
    user=mha
    password=mha
    ssh_user=root
    repl_password=replica
    manager_log=/tmp/mha.log
    manager_workdir=/tmp
    remote_workdir=/tmp
    master_binlog_dir=/opt/mysql_instances/mha1/logs
    client_bindir=/opt/mysql_templates/mysql-57/bin
    client_libdir=/opt/mysql_templates/mysql-57/lib
    master_ip_failover_script=/opt/tools/mha/mha4mysql-manager/samples/scripts/master_ip_failover
    master_ip_online_change_script=/opt/tools/mha/mha4mysql-manager/samples/scripts/master_ip_online_change
    log_level=debug
 
    [server1]
    hostname=mha1r
    ip=192.168.1.104
    candidate_master=1
 
    [server2]
    hostname=mha2r
    ip=192.168.1.107
    candidate_master=1
 
    [server3]
    hostname=mha3r
    ip=192.168.1.111
    candidate_master=1
 
    [server4]
    hostname=mha4r
    ip=192.168.1.109
    no_master=1

The only thing you need to be sure is to comment out the "FIX ME " lines in the sample/scripts.

In the MHA installation direcotry look for :

mha4mysql-manager/samples/scripts/master_ip_failover

mha4mysql-manager/samples/scripts/master_ip_online_change

After that just install MHA as you are used to.
In ProxySQL be sure to have mha users and the servers set.

One very important thing when using ProxySQL with standard replication is that we need to set additional privileges to the ProxySQL monitor user.
It must have "Replication Client" set or it will fail to check the SLAVE LAG.

Also the Servers MUST have a defined value for the attribute max_replication_lag, or the check will be ignore.

As a reminder:

INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight,max_replication_lag) VALUES ('192.168.1.104',600,3306,1000,0);
INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight,max_replication_lag) VALUES ('192.168.1.104',601,3306,1000,10);
INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight,max_replication_lag) VALUES ('192.168.1.107',601,3306,1000,10);
INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight,max_replication_lag) VALUES ('192.168.1.111',601,3306,1000,10);
INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight,max_replication_lag) VALUES ('192.168.1.109',601,3306,1000,10);
INSERT INTO mysql_replication_hostgroups VALUES (600,601);
LOAD MYSQL SERVERS TO RUNTIME; SAVE MYSQL SERVERS TO DISK;
INSERT INTO mysql_query_rules (username,destination_hostgroup,active) VALUES('mha_W',600,1);
INSERT INTO mysql_query_rules (username,destination_hostgroup,active) VALUES('mha_R',601,1);
INSERT INTO mysql_query_rules (username,destination_hostgroup,active,retries,match_digest) VALUES('mha_RW',600,1,3,'^SELECT.*FOR UPDATE');
INSERT INTO mysql_query_rules (username,destination_hostgroup,active,retries,match_digest) VALUES('mha_RW',601,1,3,'^SELECT');
LOAD MYSQL QUERY RULES TO RUNTIME;SAVE MYSQL QUERY RULES TO DISK;
INSERT INTO mysql_users (username,password,active,default_hostgroup,default_schema,transaction_persistent) VALUES ('mha_W','test',1,600,'test_mha',1);
INSERT INTO mysql_users (username,password,active,default_hostgroup,default_schema,transaction_persistent) VALUES ('mha_R','test',1,601,'test_mha',1);
INSERT INTO mysql_users (username,password,active,default_hostgroup,default_schema,transaction_persistent) VALUES ('mha_RW','test',1,600,'test_mha',1);
LOAD MYSQL USERS TO RUNTIME;SAVE MYSQL USERS TO DISK

Ok all is ready let us rock'n'roll.

Controlled fail-over.First of all the masterha_manager should not be running or you will get error.

Then let us start some traffic.

Write
sysbench --test=/usr/share/doc/sysbench/tests/db/oltp.lua --mysql-host=192.168.1.50 --mysql-port=3311 --mysql-user=mha_RW --mysql-password=test --mysql-db=mha_test --db-driver=mysql --oltp-tables-count=50 --oltp-tablesize=5000 --max-requests=0 --max-time=900 --oltp-point-selects=5 --oltp-read-only=off --oltp-dist-type=uniform --oltp-reconnect-mode=transaction --oltp-skip-trx=off --num-threads=10 --report-interval=10 --mysql-ignore-errors=all  run
Read only
sysbench --test=/usr/share/doc/sysbench/tests/db/oltp.lua --mysql-host=192.168.1.50 --mysql-port=3311 --mysql-user=mha_RW --mysql-password=test --mysql-db=mha_test --db-driver=mysql --oltp-tables-count=50 --oltp-tablesize=5000 --max-requests=0 --max-time=900 --oltp-point-selects=5 --oltp-read-only=on --num-threads=10 --oltp-reconnect-mode=query --oltp-skip-trx=on --report-interval=10  --mysql-ignore-errors=all run

Let it run for a bit then check:

mysql> SELECT * FROM stats_mysql_connection_pool WHERE hostgroup BETWEEN 600 AND 601 ORDER BY hostgroup,srv_host DESC;
+-----------+---------------+----------+--------+----------+----------+--------+---------+---------+-----------------+-----------------+------------+
| hostgroup | srv_host      | srv_port | STATUS | ConnUsed | ConnFree | ConnOK | ConnERR | Queries | Bytes_data_sent | Bytes_data_recv | Latency_ms |
+-----------+---------------+----------+--------+----------+----------+--------+---------+---------+-----------------+-----------------+------------+
| 600       | 192.168.1.104 | 3306     | ONLINE | 10       | 0        | 20     | 0       | 551256  | 44307633        | 0               | 285        | <--- current Master
| 601       | 192.168.1.111 | 3306     | ONLINE | 5        | 3        | 11     | 0       | 1053685 | 52798199        | 4245883580      | 1133       |
| 601       | 192.168.1.109 | 3306     | ONLINE | 3        | 5        | 10     | 0       | 1006880 | 50473746        | 4052079567      | 369        |
| 601       | 192.168.1.107 | 3306     | ONLINE | 3        | 5        | 13     | 0       | 1040524 | 52102581        | 4178965796      | 604        |
| 601       | 192.168.1.104 | 3306     | ONLINE | 7        | 1        | 16     | 0       | 987548  | 49458526        | 3954722258      | 285        |
+-----------+---------------+----------+--------+----------+----------+--------+---------+---------+-----------------+-----------------+------------+

And now let us try to perform the failover.
To do we will instruct MHA to do a switch, and to set the OLD master as new slave:

masterha_master_switch --master_state=alive --conf=/etc/mha.cnf --orig_master_is_new_slave --interactive=0 --running_updates_limit=0

Now let us check what happened:

[ 160s] threads: 10, tps: 354.50, reads: 3191.10, writes: 1418.50, response time: 48.96ms (95%), errors: 0.00, reconnects:  0.00
[ 170s] threads: 10, tps: 322.50, reads: 2901.98, writes: 1289.89, response time: 55.45ms (95%), errors: 0.00, reconnects:  0.00
[ 180s] threads: 10, tps: 304.60, reads: 2743.12, writes: 1219.91, response time: 58.09ms (95%), errors: 0.10, reconnects:  0.00 <--- moment of the switch
[ 190s] threads: 10, tps: 330.40, reads: 2973.40, writes: 1321.00, response time: 50.52ms (95%), errors: 0.00, reconnects:  0.00
[ 200s] threads: 10, tps: 304.20, reads: 2745.60, writes: 1217.60, response time: 58.40ms (95%), errors: 0.00, reconnects:  1.00
[ 210s] threads: 10, tps: 353.80, reads: 3183.80, writes: 1414.40, response time: 48.15ms (95%), errors: 0.00, reconnects:  0.00

Check ProxySQL :

mysql> SELECT * FROM stats_mysql_connection_pool WHERE hostgroup BETWEEN 600 AND 601 ORDER BY hostgroup,srv_host DESC;
+-----------+---------------+----------+--------+----------+----------+--------+---------+---------+-----------------+-----------------+------------+
| hostgroup | srv_host      | srv_port | STATUS | ConnUsed | ConnFree | ConnOK | ConnERR | Queries | Bytes_data_sent | Bytes_data_recv | Latency_ms |
+-----------+---------------+----------+--------+----------+----------+--------+---------+---------+-----------------+-----------------+------------+
| 600       | 192.168.1.107 | 3306     | ONLINE | 10       | 0        | 10     | 0       | 123457  | 9922280         | 0               | 658        | <--- new master
| 601       | 192.168.1.111 | 3306     | ONLINE | 2        | 6        | 14     | 0       | 1848302 | 91513537        | 7590137770      | 1044       |
| 601       | 192.168.1.109 | 3306     | ONLINE | 5        | 3        | 12     | 0       | 1688789 | 83717258        | 6927354689      | 220        |
| 601       | 192.168.1.107 | 3306     | ONLINE | 3        | 5        | 13     | 0       | 1834415 | 90789405        | 7524861792      | 658        |
| 601       | 192.168.1.104 | 3306     | ONLINE | 6        | 2        | 24     | 0       | 1667252 | 82509124        | 6789724589      | 265        |
+-----------+---------------+----------+--------+----------+----------+--------+---------+---------+-----------------+-----------------+------------+

In this case the servers were not behind the master and switch happened quite fast.

We can see that the WRITE operations that normally are an issue given the need to move around an VIP or change name resolution, have a very limited hiccup.

Read operations were not affected, at all.

Nice eh?

Do you know how long it takes to do a switch in this conditions?

real 0m2.710s yes 2.7 seconds.

This is another evidence that most of the time in case of switch with MHA is cause by the need to redirect traffic from A to B using the network.

Crash fail-over

What happened if instead have a nice switch we have to cover a real failover?

First of all let us start masterha_manager:

nohup masterha_manager --conf=/etc/mha.cnf --wait_on_monitor_error=60 --wait_on_failover_error=60 >> /tmp/mha.log 2>&1

Then let us start some load again.

Finally go to the MySQL node that us master xxx.xxx.xxx.107

ps aux|grep mysql
mysql    18755  0.0  0.0 113248  1608 pts/0    S    Aug28   0:00 /bin/sh /opt/mysql_templates/mysql-57/bin/mysqld_safe --defaults-file=/opt/mysql_instances/mha1/my.cnf
mysql    21975  3.2 30.4 4398248 941748 pts/0  Sl   Aug28  93:21 /opt/mysql_templates/mysql-57/bin/mysqld --defaults-file=/opt/mysql_instances/mha1/my.cnf --basedir=/opt/mysql_templates/mysql-57/ --datadir=/opt/mysql_instances/mha1/data --plugin-dir=/opt/mysql_templates/mysql-57//lib/plugin --log-error=/opt/mysql_instances/mha1/mysql-3306.err --open-files-limit=65536 --pid-file=/opt/mysql_instances/mha1/mysql.pid --socket=/opt/mysql_instances/mha1/mysql.sock --port=3306

And kill the MySQL process.

kill -9 21975 18755

As before let us check what happened application side:

[  80s] threads: 4, tps: 213.20, reads: 1919.10, writes: 853.20, response time: 28.74ms (95%), errors: 0.00, reconnects:  0.00
[  90s] threads: 4, tps: 211.30, reads: 1901.80, writes: 844.70, response time: 28.63ms (95%), errors: 0.00, reconnects:  0.00
[ 100s] threads: 4, tps: 211.90, reads: 1906.40, writes: 847.90, response time: 28.60ms (95%), errors: 0.00, reconnects:  0.00
[ 110s] threads: 4, tps: 211.10, reads: 1903.10, writes: 845.30, response time: 29.27ms (95%), errors: 0.30, reconnects:  0.00 <-- issue starts
[ 120s] threads: 4, tps: 198.30, reads: 1785.10, writes: 792.40, response time: 28.43ms (95%), errors: 0.00, reconnects:  0.00
[ 130s] threads: 4, tps: 0.00, reads: 0.60, writes: 0.00, response time: 0.00ms (95%), errors: 0.00, reconnects:  0.40         <-- total stop in write
[ 140s] threads: 4, tps: 173.80, reads: 1567.80, writes: 696.30, response time: 34.89ms (95%), errors: 0.40, reconnects:  0.00 <-- writes restart
[ 150s] threads: 4, tps: 195.20, reads: 1755.10, writes: 780.50, response time: 33.98ms (95%), errors: 0.00, reconnects:  0.00
[ 160s] threads: 4, tps: 196.90, reads: 1771.30, writes: 786.80, response time: 33.49ms (95%), errors: 0.00, reconnects:  0.00
[ 170s] threads: 4, tps: 193.70, reads: 1745.40, writes: 775.40, response time: 34.39ms (95%), errors: 0.00, reconnects:  0.00
[ 180s] threads: 4, tps: 191.60, reads: 1723.70, writes: 766.20, response time: 35.82ms (95%), errors: 0.00, reconnects:  0.00

So it takes ~10 seconds to perform failover.

To better understand let see what happened on MHA-land:

Tue Aug 30 09:33:33 2016 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Tue Aug 30 09:33:33 2016 - [info] Reading application default configuration from /etc/mha.cnf..
... Read conf and start
Tue Aug 30 09:33:47 2016 - [debug] Trying to get advisory lock..
Tue Aug 30 09:33:47 2016 - [info] Ping(SELECT) succeeded, waiting until MySQL doesn't respond..
... Wait for errors
Tue Aug 30 09:34:47 2016 - [warning] Got error on MySQL select ping: 2006 (MySQL server has gone away) <--- Error time
Tue Aug 30 09:34:56 2016 - [warning] Connection failed 4 time(s)..                                     <--- Finally MHA decide to do something 
Tue Aug 30 09:34:56 2016 - [warning] Master is not reachable from health checker!
Tue Aug 30 09:34:56 2016 - [warning] Master mha2r(192.168.1.107:3306) is not reachable!
Tue Aug 30 09:34:56 2016 - [warning] SSH is reachable.
Tue Aug 30 09:34:58 2016 - [info] Master failover to mha1r(192.168.1.104:3306) completed successfully. <--- end of the failover

MHA sees the server failing at xx:47, but because the retry and checks validation it actually fully acknowledge the downtime at xx:56.

As such ~8 seconds after.

To perform the whole failover it then takes only ~2 seconds (again).

No movable IP or dns involve and as such the operations were fast.

This is obviously true in case the servers have the binary-log there, different story may be if MHA has also to manage and push data from binarylog to MySQL.

As you can see ProxySQL may help a lot in reducing the timing also for this scenario, totally skipping the network related operations, that as we know, are the ones causing more trouble in these cases.

Sidebar

Main Menu Mobile

MySQL Blogs

ProxySQL and MHA integration

Crash fail-over

Related Articles

login