My MySQL tips valid-rss-rogers

 

Missed OpportunityIs “that” time of the year … when autumn is just around the corner and temperature start to drop.
But is also the time for many exciting conferences in the MySQL world.
We have the Oracle Open world this week, and many interesting talks around MySQL 8.
Then in 2 weeks we will be at the Percona live Europe in Amsterdam, which is not a MySQL (only) conference anymore (https://www.cvent.com/events/percona-live-open-source-database-conference-europe-2019/agenda-6321c2468b1b43328f97212f3e53f4de.aspx).
Percona had move to a more “polyglot” approach not only in its services but also during the events.
This is obviously an interesting experiment, that allow people from different technologies to meet and discuss. At the end of the day it is a quite unique situation and opportunity, the only negative effect is that it takes space from the MySQL community, who is suffering a bit in terms of space, attendee and brainstorming focus on MySQL deep dive.
Said that there are few interesting talks I am looking to attend:
• Security and GDPR, many sessions
• MySQL 8.0 Performance: Scalability & Benchmarks
• Percona will also present the Percona cluster version 8, which is a must attend session
Plus the other technologies which I am only marginally interested to.

After Percona live in Amsterdam there will be a ProxySQL technology day in Ghent (https://proxysql.com/blog/proxysql-technology-day-ghent-oct3rd2019). Ghent is a very nice city and worth a visit, to reach it from Amsterdam is only 2hrs train. Given this event is the 3td of October I will just move there immediately after PLEU.
The ProxySQL event is a mid-day event starting at 5PM, with 30 minutes sessions focus on best practices on how to integrate the community award winning solution “ProxySQL” with the most common scenario and solutions.
I like that because I am expecting to see and discuss real cases and hands on issues with the participants.

So, a lot of things, right?
But once more, I want to raise the red flag about the lack of a MySQL community event.
We do have many events, most of them are following companies focus, and they are sparse and not well synchronized. Given that more than anything else, we miss A MySQL event. A place where we can group around and not only attract DBAs from companies who use it and sometime abuse it, but also a place for all of us to discuss and coordinate the efforts.

In the meantime, see you in Amsterdam, then Ghent, then Fosdem then …
Good MySQL to all

To set correct system variable values is the essential step to get the correct server behavior against the workload.  

In MySQL we have many System variables that can be changed at runtime, most of them can be set at session or at global scope. small things

To change the value of a system variable at global level in the past user need to have SUPER privileges. Once the system variable value is modified as global, the server will change his behavior for the session, and obviously as global scope. For instance, one of the most commonly adjusted variables is probably max_connections. If you have max_connection=100 in your my.cnf or as default value, and during the day as DBA you notice that the number of them is not enough, it is easy just to add new connections on the fly, the command:

SET GLOBAL MAX_CONNECTIONS=500;

Will do the work. But here is the issue. We had changed a GLOBAL value, that apply to the whole server, but this change is ephemeral and if the server restarts the setting is lost. In the past I have seen many times servers with different configurations between my.cnf and current Server settings. To prevent this or at least keep it under control good DBAs had develop scripts to checks if and where the differences exists and fix it. The main issue is that very often, we forget to update the configuration file while doing the changes, or we do on purpose to do "Fine tuning first” and forgot after.

What's new in MySQL8 about that? Well we have a couple of small changes. First of all, the privileges, as for MySQL8 user can have SYSTEM_VARIABLES_ADMIN or SUPER to modify the GLOBAL system variables. The other news is related to the ability to have GLOBAL changes to variable to PERSIST on disk and finally to know who did it and when. The new option for SET command is PERSIST. So, if I have:

(root@localhost) [(none)]>show global variables like 'max_connections';
+-----------------+-------+
| Variable_name   | Value |
+-----------------+-------+
| max_connections | 1500  |
+-----------------+-------+

and I want to change the value of max_connection and be sure this value is reloaded at restart, I will do:

(root@localhost) [(none)]>set PERSIST max_connections=150;

(root@localhost) [(none)]>show global variables like 'max_connections';
+-----------------+-------+
| Variable_name   | Value |
+-----------------+-------+
| max_connections | 150   |
+-----------------+-------+

With the usage of PERSIST, MySQL will write the information related to: - key (variable name) - value - timestamp (including microseconds) - user - host A new file in the data directory: mysqld-auto.cnf contains the information The file is in Json format and will have the following:

{ "Version" : 1 , "mysql_server" : {
 "max_connections" : { 
"Value" : "150" , "Metadata" : {
 "Timestamp" : 1565363808318848 , "User" : "root" , "Host" : "localhost" 
} } } }

The information is also in Performance Schema:

select a.VARIABLE_NAME,b.VARIABLE_value ,SET_TIME,SET_USER,SET_HOST  
    from performance_schema.variables_info a 
        join performance_schema.global_variables b 
        on a.VARIABLE_NAME=b.VARIABLE_NAME  
    where b.VARIABLE_NAME like 'max_connections'\G

*************************** 1. row ***************************
 VARIABLE_NAME: max_connections
VARIABLE_value: 150
      SET_TIME: 2019-08-09 11:16:48.318989
      SET_USER: root
      SET_HOST: localhost

As you see the information present, report who did the change, from where, when, and the value. Unfortunately, there is no history here, but this can be easily implemented. To clear the PERSIST settings, run RESET PERSIST and all the Persistent setting will be removed. To be clear if you have:

{ "Version" : 1 , "mysql_server" : {
  "max_connections" : { "Value" : "151" , "Metadata" : { "Timestamp" : 1565367524946371 , "User" : "root" , "Host" : "localhost" } } , 
  "wait_timeout" : { "Value" : "151" , "Metadata" : { "Timestamp" : 1565368154036275 , "User" : "root" , "Host" : "localhost" } } 
} }

RESET PERSISTE will do:

{ "Version" : 1 , "mysql_server" : {  }

Which is removing ALL THE SETTINGS not just one.

Anyhow why is this a good thing to have?

First because we have no excuse now, when we change a variable, we have all the tools needed to make sure we will have it up at startup if this is the intention of the change.

Second is good because storing the information in a file, and not only showing it from PS, allow us to include such information in any automation tool we have.
This in the case we decide to take action or just to keep track of it, like comparison with my.cnf and fixing the discrepancies automatically also at service down or when cloning.


On this let me say that WHILE you can change the value in the file mysqld-auto.cnf, and have the server at restart use that value as the valid one.
This is not recommended, instead please fix the my.cnf and remove the information related to PERSIST.
To touch that file is also dangerous because if you do stupid things like removing a double quote or in any way affecting the Json format, the server will not start, but there will be NO error in the log.

{ "Version" : 1 , "mysql_server" : { "wait_timeout": { "Value : "150" , "Metadata" : { "Timestamp" : 1565455891278414, "User" : "root" , "Host" : "localhost" } } } }
                                                           ^^^ missing double quote 
tusa@tusa-dev:/opt/mysql_templates/mysql-8.0.17futex$ ps aux|grep 8113
tusa      8119  0.0  0.0  14224   896 pts/1    S+   12:54   0:00 grep --color=auto 8113
[1]+  Exit 1                  bin/mysqld --defaults-file=./my.cnf

I have opened a bug for this (https://bugs.mysql.com/bug.php?id=96501).

A short deep dive in the code (you can jump it if you don't care)

The new feature is handled in the files <source>/sql/persisted_variable.(h/cc) The new structure dealing with the PERSIST actions is:

struct st_persist_var {
  std::string key;
  std::string value;
  ulonglong timestamp;
  std::string user;
  std::string host;
  bool is_null;
  st_persist_var();
  st_persist_var(THD *thd);
  st_persist_var(const std::string key, const std::string value,
                 const ulonglong timestamp, const std::string user,
                 const std::string host, const bool is_null);
};

And the main steps are in the constructors st_persist_var. About that, should be noted that when creating the timestamp the code is generating a value that is NOT fully compatible with the MySQL functions FROM_UNIXTIME. The code assigning the timestamp value pass/assign also the microseconds passing them from the timeval (tv) structure:

st_persist_var::st_persist_var(THD *thd) {
  timeval tv = thd->query_start_timeval_trunc(DATETIME_MAX_DECIMALS);
  timestamp = tv.tv_sec * 1000000ULL + tv.tv_usec;
  user = thd->security_context()->user().str;
  host = thd->security_context()->host().str;
  is_null = false;
}

Where:

    tv.tv_sec = 1565267482
    tv.tc_usec = 692276

this will generate: timestamp = 1565267482692276 this TIMESTAMP is not valid in MySQL and cannot be read from the time functions, while the segment related to tv.tv_sec = 1565267482 works perfectly.

(root@localhost) [(none)]>select FROM_UNIXTIME(1565267482);
+---------------------------+
| FROM_UNIXTIME(1565267482) |
+---------------------------+
| 2019-08-08 08:31:22       |
+---------------------------+

(root@localhost) [(none)]>select FROM_UNIXTIME(1565267482692276);
+---------------------------------+
| FROM_UNIXTIME(1565267482692276) |
+---------------------------------+
| NULL                            |
+---------------------------------+

this because the timestamp with microsecond is formatted differently in MySQL : PERSIST_code = 1565267482692276 MySQL = 1565267482.692276 If I run: select FROM_UNIXTIME(1565267482.692276); I get the right result:

+----------------------------------+
| FROM_UNIXTIME(1565267482.692276) |
+----------------------------------+
| 2019-08-08 08:31:22.692276       |
+----------------------------------+

of course, I can use the trick:

select FROM_UNIXTIME(1565267482692276/1000000);
+-----------------------------------------+
| FROM_UNIXTIME(1565267482692276/1000000) |
+-----------------------------------------+
| 2019-08-08 08:31:22.6922                |
+-----------------------------------------+

Well that's all for the behind the scene info, keep in mind if you want to deal with the value coming from the Json file.

Conclusion

Sometimes the small things can be better than the HUGE shining things. I saw many times DBAs in trouble because they do not have this simple feature in MySQL, and many MySQL failing to start, or behave not as expected. Given that, I welcome PERSIST and I am sure that the people who manage thousands of servers, have different workloads and automation in place, will see this as a good thing as well.

References:

https://dev.mysql.com/doc/refman/8.0/en/persisted-system-variables.html
https://dev.mysql.com/doc/refman/8.0/en/date-and-time-functions.html#function_timestamp
https://lefred.be/content/where-does-my-mysql-configuration-variable-value-come-from/
https://lefred.be/content/what-configuration-settings-did-i-change-on-my-mysql-server/

 

During the last conferences, I had the chance to discuss a lot with many colleagues. Some of them, like me, feel we have lost ourselves, not totally but a bit. morepowerful together

I had start to think why, and this is my reasoning, not saying is right, but I think is not far from the truth.

Let us start just mentioning few events in the 2019 schedule, February Fosdem, end of May Percona live US, mid-June DataOps Barcelona, September Oracle open World, end of September-October Percona Live Europe, plus an undefined number of secondary events or MariaDB roadshows.

Where is the MySQL (and related) conference?

Do not get me wrong, we talk about MySQL a lot during each one of the above (or MariaDB, tometo/tomato). But it seems to me, that in a subtle way, the community had lost his conference.

There is nothing wrong if each company is trying to get the most out of their investment, so implicitly driving the show to get the best return for their own business.

But … as community are we getting what we need?

If we take a look to the past, we see that the MySQL conference in Santa Clara, was a real milestone. It is true that MySQL was a less mature product, and as such a lot of innovation and discussion were presented every year. But it was not only because that.

In my opinion the singularity of the event was making it a more precious moment, with a lot of focus and will to share.

Not only, the fact that we did not have dozens of events was grouping more people, as speakers/experts and attendee. This was generating more discussions and exchanges, resulting in ideas that we as community had the chance to develop in the forthcoming months.

Here another very important factor, we as part of the community, as engineers, respect each other and we have no limit in sharing knowledge and help each other. No matter if one of us make a mistake, next time he will do it right, and we had always help to go in that direction.

Having the chance to discuss face to face during the event, was great, having the chance to continue the discussion after the event even better, in theory having the chance to do that multiple time in the year should be the best thing. But… there is a but, all these smalls (or smaller) events are companies’ events, not community, sorry to say that but is the reality. What it means is that the interactions are, often, limited, restricted, penalized by the competition between the companies.

Mark Callaghan, some times ago, mention in a short post, that was sad to see so many bright engineers sparse cross companies, and often not able to collaborate to make things better in a common effort, following a shared path (Mark that was my interpretation if that was not your intention, my apology, but also if, I think this is true).

 

This is sad indeed, and a waste of energy. It is also very sad, because we know each other, directly or indirectly. We bring no ressentiment to each other, and we like to discuss, share, help and get help.

Damn we love to work together! No matter the company we are working with, I am sure Oracle developers would love to work with MariaDB guys, as well as Percona’s, and vice versa.

I am not naïve, I understand the need to have sane competition and the need to have “some” differences. But maybe we are not considering the long term, we are a small community, we are a very small fish ball, if we do not help each other we will lose in long term. This is already happening with Postgres growing day by day evolving under our blind eyes.

Looking our small ball, what I see is fragmentation, I see companies trying to find new magic word, or magic trends, to package the same old shit, to survive in the market.

That is not innovation, that is not evolution, that is giving up to what is the most important concept of an opensource community.

Going back to the number of events and their relevance. This increasing number of events, is the natural consequence of the lack of coordination and proper interaction. I do not care how many we will have, if they are bringing us, food for brain.

If they are going to be useful to make MySQL (and related) better.

But if the scope is only promoting the company business, that is ok… just it is not a community event, and in that case, we need to recognize and accept, that we do not have an open conference any longer. And we need one, well two. One for Americas, and another one for EurAsia (get use to the geo definition please, there is where the market is evolving).

We need a strong, unified, focused community. We need to have all the actors collaborate on a common strategy, not doing chicken fights for crumbs. We need a steering committee and the company’s commitment to adhere to the committee indications.

We need that, or we will be forgotten soon because obsolete.

That is what I think… please let me know your ideas, we are a community... let us discuss how to do things better!

MariaDB 10.4 has being declared GA, and few presentations on Galera4 in the last conferences were hold.

So, I thought, it is time to give it a try and see what is going on.

It is not a secret that I consider the Codership guys my heroes, and that I have push for Galera as solid and very flexible HA solution, for many years.

Given that my first comment is that it was a shame to have Galera4 available only in MariaDB, I would have preferred to test the MySQL vanilla version for many reasons, but mainly because the MySQL/Oracle is and remain the official and main line of the MySQL software, you like it or not, and as such I want to see how the Galera4 behave with that. Anyhow Codership state that the other versions will be out AFTER the summer, and I hope this will be true.

To test the new version given I do not have the vanilla MySQL, I decide to use the other distribution coming from Percona. At the end the test where done comparing MariaDB 10.4.x with PXC 5.7.x. In short Galera4 Vs Galera3.

I setup on the same machines the two different software, and I configure as close as possible. Said that I did 2 main set of tests: Data ingest and OLTP, both running for 90 minutes, not pushing like hell, but gently simulate some traffic. Configuration files can be found here.

Galera4 stream replication was disable, following the Codership instruction (wsrep_trx_fragment_size=0).

Test1 Ingest

For the ingest test I had use my stresstool application (here) with only 10 threads and 50 batch inserts each thread, the schema definition is in the windmills.json file.

As always, an image says more than many words:

ingest execution by thread

In general terms, PXC was able to execute same load in less than MariaDB.

ingest events thread

And PXC was able to deal with a higher number of events per thread as well.

The average galera latency was around 9ms in the writer and around 5ms for the receivers in PXC. With same load, same machines, same configuration:

Screen Shot 2019 07 24 at 21658 PM

The latency in MariaDB was significantly higher around 19ms for the writer, and between 9 and 5 ms for the receivers.

Screen Shot 2019 07 24 at 13350 PM

In short overall PXC 5.7 with galera3 was performing better than MariaDB 10.4 with galera4.

The amount of data on transmitted and received on PXC was higher (good) than Mariadb:

PXC:

Screen Shot 2019 07 24 at 22127 PM

MariaDB:

Screen Shot 2019 07 24 at 14453 PM

OLTP

For oltp test I have sysbenc with oltp r/w tests, 180 threads (90 from two different application nodes), 200K rows for table, 40 tables and 90 minutes run.

Let see what happened:

oltp evens write

PXC was performing better than MariaDB, executing more writes/s and and more events_thread/sec.

Checking the latency, we can see:

oltp latency

Also in this case PXC was having less average latency than MariaDB in the execution.

What about galera?

For PXC/Galera3, the average galera latency was around 3.5ms in the writer and less in the receivers:

Screen Shot 2019 07 26 at 124919 PM

In this case the latency in Galera4 was same or less of the one in Galera3:

Screen Shot 2019 07 26 at 51341 PM

Also analyzing the MAX latency:

Galera3

Screen Shot 2019 07 26 at 124936 PM

Galera4

Screen Shot 2019 07 26 at 51354 PM

We can see that Galera4 was dealing with it much better than the version3.

 

I have done many other checks and it seems to me that in the OLTP, but I do not exclude this is valid for ingest as well, Galera4 is in some way penalize by the MariaDB platform.

I am just at the start of my investigation and I may be wrong, but I cannot confirm or deny until Codership will release the code for MySQL.

Conclusions

Galera4 seems to come with some very good new feature, please review Seppo presentation, and one thing I noticed it comes with optimized node communication, reducing the latency fluctuation.

Just this for me is great, plus we will have stream replication that can be very useful but I have not tested it yet.

Nevertheless, I would not move to it just yet. I would wait to have the other MySQL distribution supported, do some tests, and see where the performance problem is.

Because at the moment also with not heavy load, current version of PXC 5.7/Galera3 runs better than MariaDB/Galera4, so why I should migrate to a platform that locks me in like MariaDB, and do not give me benefit (yet)? Also considering that once Galera4 will be available for the standard MySQL versions, we can have all the good coming from Galera4, without being lock in by MariaDB.

A small note about MariaDB, while I was playing with it, I noticed that by default MariaDB comes with the plugin level BETA, which means potentially run in production code that is still in beta stage, no comment!surprise emo

References

https://github.com/Tusamarco/blogs/tree/master/Galera4

https://galeracluster.com/

Seppo presentation https://www.slideshare.net/SakariKeskitalo/galera-cluster-4-presentation-in-percona-live-austin-2019

https://mariadb.org/mariadb-10-4-4-now-available/

ProxySQL in its versions up to 1.x did not natively support Percona XtraDB Cluster (PXC). Instead, it relied on the flexibility offered by the scheduler. This approach allowed users to implement their own preferred way to manage the ProxySQL behaviour in relation to the Galera events. From version 2.0 we can use native ProxySQL support for PXC.. The mechanism to activate native support is very similar to the one already in place for group replication. In brief it is based on the table [runtime_]mysql_galera_hostgroups and the information needed is mostly the same:

 

  • writer_hostgroup: the hostgroup ID that refers to the WRITER
  • backup_writer_hostgroup: the hostgoup ID referring to the Hostgorup that will contain the candidate servers
  • reader_hostgroup: The reader Hostgroup ID, containing the list of servers that need to be taken in consideration
  • offline_hostgroup: The Hostgroup ID that will eventually contain the writer that will be put OFFLINE
  • active: True[1]/False[0] if this configuration needs to be used or not
  • max_writers: This will contain the MAX number of writers you want to have at the same time. In a sane setup this should be always 1, but if you want to have multiple writers, you can define it up to the number of nodes.
  • writer_is_also_reader: If true [1] the Writer will NOT be removed from the reader HG
  • max_transactions_behind: The number of wsrep_local_recv_queue after which the node will be set OFFLINE. This must be carefully set, observing the node behaviour.
  • comment: I suggest to put some meaningful notes to identify what is what.

Given the above let us see what we need to do in order to have a working galera native solution. I will have three Servers:

  192.168.1.205 (Node1)
  192.168.1.21  (Node2)
  192.168.1.231 (node3)

As set of Hostgroup, I will have:

Writer  HG-> 100
Reader  HG-> 101
BackupW HG-> 102
offHG   HG-> 9101

To set it up

Servers first:
 
INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight) VALUES ('192.168.1.205',101,3306,1000);
INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight) VALUES ('192.168.1.21',101,3306,1000);
INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight) VALUES ('192.168.1.231',101,3306,1000);

Then the galera settings:
 
insert into mysql_galera_hostgroups (writer_hostgroup,backup_writer_hostgroup,reader_hostgroup, offline_hostgroup,active,max_writers,writer_is_also_reader,max_transactions_behind) values (100,102,101,9101,0,1,1,16);

As usual if we want to have R/W split we need to define the rules for it:
 
insert into mysql_query_rules (rule_id,proxy_port,schemaname,username,destination_hostgroup,active,retries,match_digest,apply) values(1040,6033,'windmills','app_test',100,1,3,'^SELECT.*FOR UPDATE',1);
insert into mysql_query_rules (rule_id,proxy_port,schemaname,username,destination_hostgroup,active,retries,match_digest,apply) values(1041,6033,'windmills','app_test',101,1,3,'^SELECT.*@@',1);

save mysql query rules to disk;
load mysql query rules to run;

Then another important variable... the server version, please do yourself a good service ad NEVER use the default.
 
update global_variables set variable_value='5.7.0' where variable_name='mysql-server_version';
LOAD MYSQL VARIABLES TO RUNTIME;SAVE MYSQL VARIABLES TO DISK;

Finally activate the whole thing:
 
save mysql servers to disk;
load mysql servers to runtime;

 

 

 

One thing to note before we go ahead. In the list of servers I had:

  1. Filled only the READER HG
  2. Used the same weight

This because of the election mechanism ProxySQL will use to identify the writer, and the (many) problems that may be attached to it.

For now let us go ahead and see what happens when I load this information to runtime. Before running the above commands:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
+--------+-----------+---------------+----------+---------+----------+----------+--------+---------+-------------+---------+-------------------+-----------------+-----------------+------------+
| weight | hostgroup | srv_host      | srv_port | STATUS  | ConnUsed | ConnFree | ConnOK | ConnERR | MaxConnUsed | Queries | Queries_GTID_sync | Bytes_data_sent | Bytes_data_recv | Latency_us |
+--------+-----------+---------------+----------+---------+----------+----------+--------+---------+-------------+---------+-------------------+-----------------+-----------------+------------+
+--------+-----------+---------------+----------+---------+----------+----------+--------+---------+-------------+---------+-------------------+-----------------+-----------------+------------+
After:
 
+--------+-----------+---------------+----------+---------+----------+----------+--------+---------+-------------+---------+-------------------+-----------------+-----------------+------------+
| weight | hostgroup | srv_host      | srv_port | STATUS  | ConnUsed | ConnFree | ConnOK | ConnERR | MaxConnUsed | Queries | Queries_GTID_sync | Bytes_data_sent | Bytes_data_recv | Latency_us |
+--------+-----------+---------------+----------+---------+----------+----------+--------+---------+-------------+---------+-------------------+-----------------+-----------------+------------+
| 1000   | 100       | 192.168.1.231 | 3306     | ONLINE  | 0        | 0        | 0	 | 0	   | 0           | 0	   | 0                 | 0               | 0               | 501        |
| 1000   | 101       | 192.168.1.231 | 3306     | ONLINE  | 0        | 0        | 0	 | 0	   | 0           | 0	   | 0                 | 0               | 0               | 501        |
| 1000   | 101       | 192.168.1.21  | 3306     | ONLINE  | 0        | 0        | 0	 | 0	   | 0           | 0	   | 0                 | 0               | 0               | 546        |
| 1000   | 101       | 192.168.1.205 | 3306     | ONLINE  | 0        | 0        | 0	 | 0	   | 0           | 0	   | 0                 | 0               | 0               | 467        |
| 1000   | 102       | 192.168.1.21  | 3306     | ONLINE  | 0        | 0        | 0	 | 0	   | 0           | 0	   | 0                 | 0               | 0               | 546        |
| 1000   | 102       | 192.168.1.205 | 3306     | ONLINE  | 0        | 0        | 0	 | 0	   | 0           | 0	   | 0                 | 0               | 0               | 467        |
+--------+-----------+---------------+----------+---------+----------+----------+--------+---------+-------------+---------+-------------------+-----------------+-----------------+------------+

 

1
2
3
4
5
6
7
8
9
10
11
12
mysql> SELECT * FROM runtime_mysql_galera_hostgroups \G
*************************** 1. row ***************************
       writer_hostgroup: 100
backup_writer_hostgroup: 102
       reader_hostgroup: 101
      offline_hostgroup: 9101
                active: 0  <----------- note this 
            max_writers: 1
  writer_is_also_reader: 1
max_transactions_behind: 16
                comment: NULL
1 row IN SET (0.01 sec)

 

As we can see, ProxySQL had taken the nodes from my READER group and distribute them adding node 1 in the writer and node 2 as backup_writer.
But – there is a but – wasn't my rule set with Active=0? Indeed it was, and I assume this is a bug (#Issue  1902).
The other thing we should note is that ProxySQL had elected as writer node 3 (192.168.1.231).
As I said before what should we do IF we want to have a specific node as preferred writer? We need to modify its weight.
So say we want to have node 1 (192.168.1.205) as writer we will need something like this:

 
1
2
3
4
INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight) VALUES ('192.168.1.205',101,3306,10000);
INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight) VALUES ('192.168.1.21',101,3306,100);
INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight) VALUES ('192.168.1.231',101,3306,100);
 

Doing that will give us :

1
2
3
4
5
6
7
8
9
10
+--------+-----------+---------------+----------+--------+----------+----------+--------+---------+-------------+---------+-------------------+-----------------+-----------------+------------+
| weight | hostgroup | srv_host      | srv_port | STATUS | ConnUsed | ConnFree | ConnOK | ConnERR | MaxConnUsed | Queries | Queries_GTID_sync | Bytes_data_sent | Bytes_data_recv | Latency_us |
+--------+-----------+---------------+----------+--------+----------+----------+--------+---------+-------------+---------+-------------------+-----------------+-----------------+------------+
| 10000  | 100       | 192.168.1.205 | 3306     | ONLINE | 0        | 0        | 0      | 0       | 0           | 0       | 0                 | 0               | 0               | 2209       |
| 100    | 101       | 192.168.1.231 | 3306     | ONLINE | 0        | 0        | 0      | 0       | 0           | 0       | 0                 | 0               | 0               | 546        |
| 100    | 101       | 192.168.1.21  | 3306     | ONLINE | 0        | 0        | 0      | 0	  | 0           | 0	  | 0                 | 0               | 0               | 508        |
| 10000  | 101       | 192.168.1.205 | 3306     | ONLINE | 0        | 0        | 0      | 0	  | 0           | 0	  | 0                 | 0               | 0               | 2209       |
| 100    | 102       | 192.168.1.231 | 3306     | ONLINE | 0        | 0        | 0      | 0       | 0           | 0       | 0                 | 0               | 0               | 546        |
| 100    | 102       | 192.168.1.21  | 3306     | ONLINE | 0        | 0        | 0      | 0       | 0           | 0       | 0                 | 0               | 0               | 508        |
+--------+-----------+---------------+----------+--------+----------+----------+--------+---------+-------------+---------+-------------------+-----------------+-----------------+------------+

If you noticed, given we had set the WEIGHT in node 1 higher, this node will become also the most utilized for reads.

We probably do not want that, so let us modify the reader weight.

1
UPDATE mysql_servers SET weight=10 WHERE hostgroup_id=101 AND hostname='192.168.1.205';

At this point if we trigger the failover, with set global wsrep_reject_queries=all; on node 1. ProxySQL will take action and will elect another node as writer:

1
2
3
4
5
6
7
8
9
+--------+-----------+---------------+----------+---------+----------+----------+--------+---------+-------------+---------+-------------------+-----------------+-----------------+------------+
| weight | hostgroup | srv_host      | srv_port | STATUS  | ConnUsed | ConnFree | ConnOK | ConnERR | MaxConnUsed | Queries | Queries_GTID_sync | Bytes_data_sent | Bytes_data_recv | Latency_us |
+--------+-----------+---------------+----------+---------+----------+----------+--------+---------+-------------+---------+-------------------+-----------------+-----------------+------------+
| 100    | 100       | 192.168.1.231 | 3306     | ONLINE  | 0        | 0        | 0      | 0       | 0           | 0       | 0                 | 0               | 0               | 562        |
| 100    | 101       | 192.168.1.231 | 3306     | ONLINE  | 0        | 0        | 0      | 0       | 0           | 0       | 0                 | 0               | 0               | 562        |
| 100    | 101       | 192.168.1.21  | 3306     | ONLINE  | 0        | 0        | 0      | 0	      | 0           | 0	      | 0                 | 0               | 0               | 588        |
| 100    | 102       | 192.168.1.21  | 3306     | ONLINE  | 0        | 0        | 0      | 0       | 0           | 0       | 0                 | 0               | 0               | 588        |
| 10000  | 9101      | 192.168.1.205 | 3306     | ONLINE  | 0        | 0        | 0      | 0       | 0           | 0       | 0                 | 0               | 0               | 468        |
+--------+-----------+---------------+----------+---------+----------+----------+--------+---------+-------------+---------+-------------------+-----------------+-----------------+------------+

 

Node 3 (192.168.1.231) is the new writer and node 1 is in the special group for OFFLINE. Let see now what will happen IF we put back node 1.

1
2
3
4
5
6
7
8
9
10
+--------+-----------+---------------+----------+---------+----------+----------+--------+---------+-------------+---------+-------------------+-----------------+-----------------+------------+
| weight | hostgroup | srv_host      | srv_port | STATUS  | ConnUsed | ConnFree | ConnOK | ConnERR | MaxConnUsed | Queries | Queries_GTID_sync | Bytes_data_sent | Bytes_data_recv | Latency_us |
+--------+-----------+---------------+----------+---------+----------+----------+--------+---------+-------------+---------+-------------------+-----------------+-----------------+------------+
| 10000  | 100       | 192.168.1.205 | 3306     | ONLINE  | 0        | 0        | 0      | 0       | 0           | 0       | 0                 | 0               | 0               | 449        |
| 100    | 101       | 192.168.1.231 | 3306     | ONLINE  | 0        | 0        | 0      | 0       | 0           | 0       | 0                 | 0               | 0               | 532        |
| 100    | 101       | 192.168.1.21  | 3306     | ONLINE  | 0        | 0        | 0      | 0       | 0           | 0       | 0                 | 0               | 0               | 569        |
| 10000  | 101       | 192.168.1.205 | 3306     | ONLINE  | 0        | 0        | 0      | 0       | 0           | 0       | 0                 | 0               | 0               | 449        |
| 100    | 102       | 192.168.1.231 | 3306     | ONLINE  | 0        | 0        | 0      | 0       | 0           | 0       | 0                 | 0               | 0               | 532        |
| 100    | 102       | 192.168.1.21  | 3306     | ONLINE  | 0        | 0        | 0      | 0       | 0           | 0       | 0                 | 0               | 0               | 569        |
+--------+-----------+---------------+----------+---------+----------+----------+--------+---------+-------------+---------+-------------------+-----------------+-----------------+------------+

Ooops the READER has come back with the HIGHEST value and as such it will be the most used node, once more. To fix it, we need to re-run the update as before.

But there is a way to avoid this? In short the answer is NO! This, in my opinion, is BAD and is worth a feature request, because this can really put a node on the knees.

Now this is not the only problem. There is another point that is probably worth discussion, which is the fact ProxySQL is currently doing FAILOVER/FAILBACK.

Failover, is obviously something we want to have, but failback is another discussion.

The point is, once the failover is complete and the cluster has redistributed the incoming requests, doing a failback is an impacting operation that can be a disruptive one too.

If all nodes are treated as equal, there is no real way to prevent it, while if YOU set a node to be the main writer, something can be done, let us see what and how. Say we have:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight) VALUES ('192.168.1.205',101,3306,1000);
INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight) VALUES ('192.168.1.21',101,3306,100);
INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight) VALUES ('192.168.1.231',101,3306,100);
 
+--------+-----------+---------------+----------+--------+----------+----------+--------+---------+-------------+---------+-------------------+-----------------+-----------------+------------+
| weight | hostgroup | srv_host      | srv_port | STATUS | ConnUsed | ConnFree | ConnOK | ConnERR | MaxConnUsed | Queries | Queries_GTID_sync | Bytes_data_sent | Bytes_data_recv | Latency_us |
+--------+-----------+---------------+----------+--------+----------+----------+--------+---------+-------------+---------+-------------------+-----------------+-----------------+------------+
| 1000   | 100       | 192.168.1.205 | 3306     | ONLINE | 0        | 0        | 0      | 0	  | 0           | 0	  | 0                 | 0               | 0               | 470        |
| 100    | 101       | 192.168.1.231 | 3306     | ONLINE | 0        | 0        | 0      | 0	  | 0           | 0	  | 0                 | 0               | 0               | 558        |
| 100    | 101       | 192.168.1.21  | 3306     | ONLINE | 0        | 0        | 0      | 0	  | 0           | 0	  | 0                 | 0               | 0               | 613        |
| 10     | 101       | 192.168.1.205 | 3306     | ONLINE | 0        | 0        | 0      | 0	  | 0           | 0	  | 0                 | 0               | 0               | 470        |
| 100    | 102       | 192.168.1.231 | 3306     | ONLINE | 0        | 0        | 0      | 0	  | 0           | 0	  | 0                 | 0               | 0               | 558        |
| 100    | 102       | 192.168.1.21  | 3306     | ONLINE | 0        | 0        | 0      | 0       | 0           | 0       | 0                 | 0               | 0               | 613        |
+--------+-----------+---------------+----------+--------+----------+----------+--------+---------+-------------+---------+-------------------+-----------------+-----------------+------------+

Let us put the node down set global wsrep_reject_queries=all; And check:

1
2
3
4
5
6
7
8
9
+--------+-----------+---------------+----------+---------+----------+----------+--------+---------+-------------+---------+-------------------+-----------------+-----------------+------------+
| weight | hostgroup | srv_host      | srv_port | STATUS  | ConnUsed | ConnFree | ConnOK | ConnERR | MaxConnUsed | Queries | Queries_GTID_sync | Bytes_data_sent | Bytes_data_recv | Latency_us |
+--------+-----------+---------------+----------+---------+----------+----------+--------+---------+-------------+---------+-------------------+-----------------+-----------------+------------+
| 100    | 100       | 192.168.1.231 | 3306     | ONLINE  | 0        | 0        | 0      | 0	      | 0           | 0	      | 0                 | 0               | 0               | 519        |
| 100    | 101       | 192.168.1.231 | 3306     | ONLINE  | 0        | 0        | 0      | 0	      | 0           | 0	      | 0                 | 0               | 0               | 519        |
| 100    | 101       | 192.168.1.21  | 3306     | ONLINE  | 0        | 0        | 0      | 0	      | 0           | 0	      | 0                 | 0               | 0               | 506        |
| 100    | 102       | 192.168.1.21  | 3306     | ONLINE  | 0        | 0        | 0      | 0       | 0           | 0       | 0                 | 0               | 0               | 506        |
| 1000   | 9101      | 192.168.1.205 | 3306     | ONLINE  | 0        | 0        | 0      | 0       | 0           | 0       | 0                 | 0               | 0               | 527        |
+--------+-----------+---------------+----------+---------+----------+----------+--------+---------+-------------+---------+-------------------+-----------------+-----------------+------------+

We can now manipulate the weight in the special OFFLINE group and see what happen:

1
UPDATE mysql_servers SET weight=10 WHERE hostgroup_id=9101 AND hostname='192.168.1.205'

Then I put the node up again: set global wsrep_reject_queries=none;

1
2
3
4
5
6
7
8
9
10
+--------+-----------+---------------+----------+---------+----------+----------+--------+---------+-------------+---------+-------------------+-----------------+-----------------+------------+
| weight | hostgroup | srv_host      | srv_port | STATUS  | ConnUsed | ConnFree | ConnOK | ConnERR | MaxConnUsed | Queries | Queries_GTID_sync | Bytes_data_sent | Bytes_data_recv | Latency_us |
+--------+-----------+---------------+----------+---------+----------+----------+--------+---------+-------------+---------+-------------------+-----------------+-----------------+------------+
| 100    | 100       | 192.168.1.231 | 3306     | ONLINE  | 0        | 0        | 0      | 0       | 0           | 0       | 0                 | 0               | 0               | 537        |
| 100    | 101       | 192.168.1.231 | 3306     | ONLINE  | 0        | 0        | 0      | 0       | 0           | 0       | 0                 | 0               | 0               | 537        |
| 100    | 101       | 192.168.1.21  | 3306     | ONLINE  | 0        | 0        | 0      | 0       | 0           | 0       | 0                 | 0               | 0               | 573        |
| 10     | 101       | 192.168.1.205 | 3306     | ONLINE  | 0        | 0        | 0      | 0	   | 0           | 0	   | 0                 | 0               | 0               | 458	|
| 100    | 102       | 192.168.1.21  | 3306     | ONLINE  | 0        | 0        | 0      | 0	   | 0           | 0	   | 0                 | 0               | 0               | 573	|
| 10     | 102       | 192.168.1.205 | 3306     | ONLINE  | 0        | 0        | 0      | 0       | 0           | 0       | 0                 | 0               | 0               | 458        |
+--------+-----------+---------------+----------+---------+----------+----------+--------+---------+-------------+---------+-------------------+-----------------+-----------------+------------+

 

That's it, the node is back but with no service interruption.

At this point we can decide if make this node reader like the others, or wait and plan a proper time of the day when we can put it back as writer, while, in the meanwhile it has a bit of load to warm its bufferpool.

The other point – and important information – is what is ProxySQL is currently checking on Galera? From reading the code Proxy will trap the following:

  • read_only
  • wsrep_local_recv_queue
  • wsrep_desync
  • wsrep_reject_queries
  • wsrep_sst_donor_rejects_queries
  • primary_partition

Plus the standard sanity checks on the node. Finally to monitor the whole situation we can use this:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
mysql> SELECT * FROM mysql_server_galera_log ORDER BY time_start_us DESC LIMIT 10;
+---------------+------+------------------+-----------------+-------------------+-----------+------------------------+-------------------+--------------+----------------------+---------------------------------+-------+
| hostname      | port | time_start_us    | success_time_us | primary_partition | read_only | wsrep_local_recv_queue | wsrep_local_state | wsrep_desync | wsrep_reject_queries | wsrep_sst_donor_rejects_queries | error |
+---------------+------+------------------+-----------------+-------------------+-----------+------------------------+-------------------+--------------+----------------------+---------------------------------+-------+
| 192.168.1.231 | 3306 | 1549982591661779 | 2884            | YES               | NO        | 0                      | 4                 | NO           | NO                   | NO                              | NULL  |
| 192.168.1.21  | 3306 | 1549982591659644 | 2778            | YES               | NO        | 0                      | 4                 | NO           | NO                   | NO                              | NULL  |
| 192.168.1.205 | 3306 | 1549982591658728 | 2794            | YES               | NO        | 0                      | 4                 | NO           | YES                  | NO                              | NULL  |
| 192.168.1.231 | 3306 | 1549982586669233 | 2827            | YES               | NO        | 0                      | 4                 | NO           | NO                   | NO                              | NULL  |
| 192.168.1.21  | 3306 | 1549982586663458 | 5100            | YES               | NO        | 0                      | 4                 | NO           | NO                   | NO                              | NULL  |
| 192.168.1.205 | 3306 | 1549982586658973 | 4132            | YES               | NO        | 0                      | 4                 | NO           | YES                  | NO                              | NULL  |
| 192.168.1.231 | 3306 | 1549982581665317 | 3084            | YES               | NO        | 0                      | 4                 | NO           | NO                   | NO                              | NULL  |
| 192.168.1.21  | 3306 | 1549982581661261 | 3129            | YES               | NO        | 0                      | 4                 | NO           | NO                   | NO                              | NULL  |
| 192.168.1.205 | 3306 | 1549982581658242 | 2786            | YES               | NO        | 0                      | 4                 | NO           | NO                   | NO                              | NULL  |
| 192.168.1.231 | 3306 | 1549982576661349 | 2982            | YES               | NO        | 0                      | 4                 | NO           | NO                   | NO                              | NULL  |
+---------------+------+------------------+-----------------+-------------------+-----------+------------------------+-------------------+--------------+----------------------+---------------------------------+-------+

As you can see above the log table keeps track of what is changed. In this case, it reports that node 1 has wsrep_reject_queries activated, and it will continue to log this until we set wsrep_reject_queries=none.

Conclusions

ProxySQL galera native integration is a useful feature to manage any Galera implementation, no matter whether it's Percona PXC, MariaDB cluster or MySQL/Galera.

The generic approach is obviously a good thing, still it may miss some specific extension like we have in PXC with the performance_schema pxc_cluster_view table.

I've already objected about the failover/failback, and I am here again to remind you: whenever you do a controlled failover REMEMBER to change the weight to prevent an immediate failback.

This is obviously not possible in the case of a real failover, and, for instance, a simple temporary eviction will cause two downtimes instead only one.

Some environments are fine with that others not so. Personally I think there should be a FLAG in the configuration, such that we can decide if failback should be executed or not.  

Latest conferences

We have 138 guests and no members online

oracle_ace