MySQL

My MySQL tipsvalid-rss-rogers




MySQL 8: Load Fine Tuning With Resource Groups PDF Print E-mail
Written by Marco Tusa   
Tuesday, 28 August 2018 00:00

High-CPU

MySQL Resource Groups, introduced in MySQL 8, provide the ability to manipulate the assignment of running threads to specific resources, thereby allowing the DBA to manage application priorities. Essentially, you can assign a thread to a specific virtual CPU. In this post, I'm going to take a look at how these might work in practice. Let us start with a disclaimer. What I am going to discuss here is NOT common practice. This is advanced load optimization, and you should approach/implement it ONLY if you are 100% sure of what you are doing, and, more importantly, if you know what you are doing, and why you are doing it.

Overview

MySQL 8 introduced a feature that is explained only in a single documentation page. This feature can help a lot if used correctly, and hopefully they will not deprecate or remove it after five minutes. It is well hidden in the Optimization: Optimizing the MySQL Server chapter. I am talking about resource groups. Resource groups permit assigning threads running within MySQL to particular groups so that threads execute according to the resources available to this group. Group attributes enable control over resources to enable or restrict resource consumption by threads in the group. DBAs can modify these attributes as appropriate for different workloads. Currently, CPU affinity (ie: assigning to a specific CPU) is a manageable resource, represented by the concept of “virtual CPU” as a term that includes CPU cores, hyperthreads, hardware threads, and so forth. MySQL determines, at startup, how many virtual CPUs are available. Database administrators with appropriate privileges can associate virtual CPUs with resource groups and assign threads to these groups. In short, you can define that, this specific thread (ergo connection unless connection pooling OR ProxySQL with multiplexing), will use that specific CPU and will have the given priority. Setting this by thread can be:

  1. Dangerous
  2. Not useful

Dangerous, because if you set this to a thread when using connection pooling OR ProxySQL and multiplexing, you may end up assigning a limitation to queries that instead you wanted to run efficiently. Not useful because unless you spend the time looking at the processlist (full), and/or have a script running all the time that catches what you need, 99% of the time you will not be able to assign the group efficiently. So? Another cool useless feature??? Nope… Resource groups can be referenced inside a single statement, which means I can have ONLY that query utilizing that resource group. Something like this will do the magic:

 

1
2
SELECT /*+ RESOURCE_GROUP(NAME OF THE RG) */ id, millid, date,active,kwatts_s FROM sbtest29 WHERE id=44

 

But if I run:

 

1
SELECT id, millid, date,active,kwatts_s FROM sbtest29 WHERE id=44

 

 

 

No resource group utilization even if I am using the same connection. This is cool, isn’t it?

What is the possible usage?

In general, you can see this as a way to limit the negative impact of queries that you know will be problematic for others. Good examples are:

  • ETL processes for data archiving, reporting, data consolidation and so on
  • Applications that are not business critical and can wait, while your revenue generator application cannot
  • GUI Client applications, used by some staff of your company, that mainly create problems for you while they claim they are working.

"Marco, that could make sense … but what should I do to have it working? Rewrite my whole application to add this feature?" Good question! Thanks! We can split the task of having a good Resource Group implementation into 3 steps:

  1. You must perform an analysis of what you want to control. You need to identify the source (like tcp/ip if it is fixed, username) and design which settings you want for your resource groups. Identify if you only want to reduce the CPU priority, or if you want to isolate the queries on a specific CPU, or a combination of the two.
  2. Implement the resource groups in MySQL.
  3. Implement a way to inject the string comment into the SQL.

About the last step, I will show you how to do this in a very simple way with ProxySQL, but hey … this is really up to you. I will show you the easy way but if you prefer a more difficult route, that's good for me too.

The Setup

In my scenario, I have a very noisy secondary application written by a very, very bad developer that accesses my servers, mostly with read queries, and occasionally with write updates. Reads and writes are obsessive and create an impact on the MAIN application. My task is to limit the impact of this secondary application without having the main one affected. To do that I will create two resource groups, one for WRITE and another for READ. The first group, Write_app2, will have no cpu affiliation, but will have lowest (19) priority:

 

1
CREATE RESOURCE GROUP Write_app2 TYPE=USER THREAD_PRIORITY=19;

 

The second group, Select_app2, will have CPU affiliation AND lowest priority;

 

1
CREATE RESOURCE GROUP Select_app2 TYPE=USER VCPU=5 THREAD_PRIORITY=19;

 

 

Finally, I have identified that the application is connecting from several sources BUT it uses a common username APP2. Given that, I will use the user name to inject the instructions into the SQL using ProxySQL (I could have also used the IP, or the schema name, or destination port, or something in the submitted SQL. In short, any possible filter in the query rules). To do that I will need four query rules:

 

1
2
3
4
5
insert into mysql_query_rules (rule_id,proxy_port,username,destination_hostgroup,active,retries,match_digest,apply,active) values(80,6033,'app1',80,1,3,'^SELECT.*FOR UPDATE',1,1);
insert into mysql_query_rules (rule_id,proxy_port,username,destination_hostgroup,active,retries,match_digest,apply,active) values(81,6033,'app1',81,1,3,'^SELECT.*',1,1);
insert into mysql_query_rules (rule_id,proxy_port,username,destination_hostgroup,active,retries,match_digest,apply,active) values(82,6033,'app2',80,1,3,'^SELECT.*FOR UPDATE',1,1); 
insert into mysql_query_rules (rule_id,proxy_port,username,destination_hostgroup,active,retries,match_digest,apply,active) values(83,6033,'app2',81,1,3,'^SELECT.*',1,1); 
 

 

To identify and redirect the query for R/W split.

1
2
INSERT INTO mysql_query_rules (rule_id,active,username,match_pattern,replace_pattern,apply,comment) VALUES (32,1,'app2',"(^SELECT)\s*(.*$)","\1 /*+ RESOURCE_GROUP(Select_app2) */ \2 ",0,"Lower prio and CPU bound on Reader");
INSERT INTO mysql_query_rules (rule_id,active,username,match_pattern,replace_pattern,apply,comment) VALUES (33,1,'app2',"^(INSERT|UPDATE|DELETE)\s*(.*$)","\1 /*+ RESOURCE_GROUP(Write_app2) */ \2 ",0,"Lower prio on Writer");

 

and a user definition like:
1
2
3
insert into mysql_users (username,password,active,default_hostgroup,default_schema,transaction_persistent) values ('app2','test',1,80,'mysql',1);
insert into mysql_users (username,password,active,default_hostgroup,default_schema,transaction_persistent) values ('app1','test',1,80,'mysql',1);

 


One important step you need to do ON ALL the servers you want to include in the Resource Group utilization, is to be sure you have CAP_SYS_NICE capability set. On Linux, resource group thread priorities are ignored unless the CAP_SYS_NICE capability is set. MySQL package installers for Linux systems should set this capability. For installation using a compressed tar file binary distribution or from source, the CAP_SYS_NICE capability can be set manually using the setcap command, specifying the path name to the mysqld executable (this requires sudo access). You can check the capabilities using getcap. For example:

 

1
2
3
shell> sudo setcap cap_sys_nice+ep <Path to you mysqld executable>
shell> getcap ./bin/mysqld
./bin/mysqld = cap_sys_nice+ep

 

 

 

If manual setting of CAP_SYS_NICE is required, then you will need to do it every time you perform a new install. As reference here is a table about CPU priority:

Priority Range Windows Priority Level
-20 to -10 THREAD_PRIORITY_HIGHEST
-9 to -1 THREAD_PRIORITY_ABOVE_NORMAL
0 THREAD_PRIORITY_NORMAL
1 to 10 THREAD_PRIORITY_BELOW_NORMAL
11 to 19 THREAD_PRIORITY_LOWEST

  Summarizing here the whole set of steps on my environment: 1) Check the CAP_SYS_NICE

 

1
2
getcap /opt/mysql_templates/mysql-8P/bin/mysqld
setcap cap_sys_nice+ep /opt/mysql_templates/mysql-8P/bin/mysqld

 

2) Create the user in MySQL and resource groups

 

1
2
3
4
create user app2@'%' identified by 'test';
GRANT ALL PRIVILEGES ON `windmills2`.* TO `app2`@`%`;
CREATE RESOURCE GROUP Select_app2 TYPE=USER VCPU=5 THREAD_PRIORITY=19;
CREATE RESOURCE GROUP Write_app2 TYPE=USER THREAD_PRIORITY=19;

 

 

 

To check :

 

1
SELECT * FROM INFORMATION_SCHEMA.RESOURCE_GROUPS;

 

3) Create ProxySQL user and rules

1
2
3
4
5
6
7
8
9
10
11
INSERT INTO mysql_users (username,password,active,default_hostgroup,default_schema,transaction_persistent) VALUES ('app2','test',1,80,'mysql',1);
INSERT INTO mysql_users (username,password,active,default_hostgroup,default_schema,transaction_persistent) VALUES ('app1','test',1,80,'mysql',1);
LOAD MYSQL USERS TO RUNTIME;SAVE MYSQL USERS TO DISK;

insert into mysql_query_rules (rule_id,proxy_port,username,destination_hostgroup,active,retries,match_digest,apply,active) values(83,6033,'app2',80,1,3,'^SELECT.*FOR UPDATE',1,1);
insert into mysql_query_rules (rule_id,proxy_port,username,destination_hostgroup,active,retries,match_digest,apply,active) values(84,6033,'app2',81,1,3,'^SELECT.*',1,1);
insert into mysql_query_rules (rule_id,proxy_port,username,destination_hostgroup,active,retries,match_digest,apply,active) values(85,6033,'app2',80,0,3,'.',1,0);
INSERT INTO mysql_query_rules (rule_id,active,username,match_pattern,replace_pattern,apply,comment) VALUES (32,0,'app2',"(^SELECT)\s*(.*$)","\1 /*+ RESOURCE_GROUP(Select_app2) */ \2 ",0,"Lower prio and CPU bound on Reader");
INSERT INTO mysql_query_rules (rule_id,active,username,match_pattern,replace_pattern,apply,comment) VALUES (33,0,'app2',"^(INSERT|UPDATE|DELETE)\s*(.*$)","\1 /*+ RESOURCE_GROUP(Write_app2) */ \2 ",0,"Lower prio on Writer");
 
LOAD MYSQL QUERY RULES TO RUNTIME;SAVE MYSQL QUERY RULES TO DISK;


For several reasons I will add the resource groups query rules as INACTIVE for now. Done…

Testing

Will this work? We need to see the impact of the bad application on my production application. Then we need to see IF implementing the tuning will work or not. To do a basic check I run four tests:

  • test1 run both apps with read/write and rule disabled for RG
  • test2 run an application a time without RG
  • test3 run only App2 with RG to see the cost on the execution
  • test4 run both to see what happen with RG

Test 1

Master

master1_T1

Slave

master2_T1

The aim of this test is to have an idea, right away, of what happens when both applications are running, without limits. As we can see during the test all cores are utilized, some more consistently and some a bit less so, but nothing huge. What is interesting is to see the effect on the response time and the number of events each application is able to execute:

execution_time1

The execution graph indicates a very high time in Insert, and Delete for App1, with the results showing very bad performance only 9 inserts, 1333 deletes and 165 selects.

events_by_crud1

But what is the application actually supposed to do? Test 2 will tell us, creating de facto our baseline.

Test 2

In this test I had run each application separately, so no interference.

Master App1

master1_T2app1

Master App2

master1_T2app2

Slave App1

master2_T2app1

Slave App2

master2_T2app2

Nothing significantly different in the CPU utilization when App1 was running, while we can see a bit less utilization in the case of App2.

The impact on the performance is, however, more significant: 

execution_time2

events_by_crud2

Execution time for insert, delete drops significantly for App1 and we can see that the application SHOULD be able to insert ~1320 events and perform a significantly higher number of operations. Same for App2, but here we care more about the OLTP than the ETL application. So, what will happen IF we activate the Resource Group flags to the App2 (ETL) performance? Let's see with test 3.

Test 3

Running only App2 with active resource groups Master App2

master1_T3app2

Slave App2

master2_T3app2

On the master, what the RG settings will do is just reduce the priority, given that no other process is running and no other application is connected, the impact is not high. On the other hand, on the slave we can clearly see that now App2 can only use core 5 as indicated in our configuration. So far so good, what will be the performance loss? Let's see:

execution_time3

Comparing the two tests 2 and 3, we can see that in applying the resource groups our ETL application has a minimal but existing impact. That is expected, desired and must be noted. The impact is not high in this test, but it can expand the running time in real world.

events_by_crud3

It's time to combine all and see what is going on.

Test 4

Run our OLTP application while the ETL is running under Resource Group. Master

master1_T4

Slave

master2_T4

Looking at the CPU utilization these graphs are very similar to the ones in test1, but the result is totally different:

 

execution_time4

The execution time for App1 (OLTP) has dropped significantly while the performance has increased almost as if nothing else is running. At the same time App2 has lost performance, and this must be taken into account, but it will not stop/prevent the ETL process to run.

events_by_crud4

It is possible to do more tuning in the case that ETL is too compromised. Or maybe modify the Servers layout such as adding a Slave and dedicating it to ETL reads. The combinations and possibilities are many.

Conclusion

Just looking to the final graphs will help us to reach our conclusions: 

execution_time

events_by_crud

 

Comparing the two tests 1 and 4 we can see how using the Resource Group will help us to correctly balance the workload and optimize the performance in the case of unavoidable contention between different applications. At the same time, using Resource Group alone as a blanket setting is not optimal because it can fail its purpose. Instead of providing some improvement, it can unpredictably affect all the traffic. It is also not desirable to modify the code in order to implement it at query level, given the possible impact of doing that in cost and time. The introduction of ProxySQL with query rewrite, allows us to utilize the per query option, without the need for any code modification, and allow us to specify what we want, with very high level of granularity. Once more do not do this by yourself unless you are more than competent and know 100% what you are doing. In any case, remember that an ETL process may take longer and that you need to plan your work/schedule accordingly. Good MySQL everyone.

References

Last Updated on Tuesday, 28 August 2018 09:21
 
PXC loves firewalls (and System Admins loves iptables) PDF Print E-mail
Written by Marco Tusa   
Monday, 18 June 2018 00:00

pxc-setting-a-firewall-iptables-300x199

 

 

Let them stay together. In the last YEARS, I have seen quite often that users, when installing a product such as PXC, instead of spending five minutes to understand what to do just run iptables -F and save. In short, they remove any rules for their firewall.

With this post, I want to show you how easy it can be to do the right thing instead of putting your server at risk. I'll show you how a slightly more complex setup like PXC (compared to MySQL), can be easily achieved without risky shortcuts. iptables is the utility used to manage the chains of rules used by the Linux kernel firewall, which is your basic security tool. Linux comes with a wonderful firewall built into the kernel.

As an administrator, you can configure this firewall with interfaces like ipchains — which we are not going to cover — and iptables, which we shall talk about. iptables is stateful, which means that the firewall can make decisions based on received packets. This means that I can, for instance, DROP a packet if it's coming from bad-guy.com. I can also create a set of rules that either will allow or reject the package, or that will redirect it to another rule. This potentially can create a very complex scenario.

 

 

However, for today and for this use case let's keep it simple… Looking at my own server:

iptables -v -L
Chain INPUT (policy ACCEPT 0 packets, 0 bytes)
 pkts bytes target     prot opt in     out     source               destination
 250K   29M ACCEPT     all  --  any    any     anywhere             anywhere             state RELATED,ESTABLISHED
    6   404 ACCEPT     icmp --  any    any     anywhere             anywhere
    0     0 ACCEPT     all  --  lo     any     anywhere             anywhere
    9   428 ACCEPT     tcp  --  any    any     anywhere             anywhere             state NEW tcp dpt:ssh
    0     0 ACCEPT     tcp  --  any    any     anywhere             anywhere             state NEW tcp dpt:mysql
    0     0 ACCEPT     tcp  --  any    any     anywhere             anywhere             
  210 13986 REJECT     all  --  any    any     anywhere             anywhere             reject-with icmp-host-prohibited

Chain FORWARD (policy ACCEPT 0 packets, 0 bytes)
 pkts bytes target     prot opt in     out     source               destination
    0     0 REJECT     all  --  any    any     anywhere             anywhere             reject-with icmp-host-prohibited

Chain OUTPUT (policy ACCEPT 241K packets, 29M bytes)
 pkts bytes target     prot opt in     out     source               destination

That's not too bad, my server is currently accepting only SSH and packets on port 3306. Please note that I used the -v option to see more information like IN/OUT and that allows me to identify that actually row #3 is related to my loopback device, and as such it's good to have it open. The point is that if I try to run the PXC cluster with these settings it will fail, because the nodes will not be able to see each other. A quite simple example when try to start the second node of the cluster:

2018-05-21T17:56:14.383686Z 0 [Note] WSREP: (3cb4b3a6, 'tcp://10.0.0.21:4567') connection to peer 584762e6 with addr tcp://10.0.0.23:4567 timed out, no messages seen in PT3S

Starting a new node will fail, given that the connectivity will not be established correctly. In the Percona documentation there is a notes section in which we mention that these ports must be open to have the cluster working correctly.:

  • 3306 For MySQL client connections and State Snapshot Transfer that use the mysqldump method.
  • 4567 For Galera Cluster replication traffic, multicast replication uses both UDP transport and TCP on this port.
  • 4568 For Incremental State Transfer.
  • 4444 For all other State Snapshot Transfer.

Of course, if you don’t know how to do it that could be a problem, but it is quite simple. Just use the following commands to add the needed rules:

iptables -I INPUT 2 --protocol tcp --match tcp --dport 3306 --source 10.0.0.1/24 --jump ACCEPT
iptables -I INPUT 3 --protocol tcp --match tcp --dport 4567 --source 10.0.0.1/24 --jump ACCEPT
iptables -I INPUT 4 --protocol tcp --match tcp --dport 4568 --source 10.0.0.1/24 --jump ACCEPT
iptables -I INPUT 5 --protocol tcp --match tcp --dport 4444 --source 10.0.0.1/24 --jump ACCEPT
iptables -I INPUT 6 --protocol udp --match udp --dport 4567 --source 10.0.0.1/24 --jump ACCEPT

Once you have done this check the layout again and you should have something like this:

[root@galera1h1n5 gal571]# iptables -L
Chain INPUT (policy ACCEPT)
target prot opt source destination
ACCEPT all -- anywhere anywhere state RELATED,ESTABLISHED
ACCEPT tcp -- 10.0.0.0/24 anywhere tcp dpt:mysql
ACCEPT tcp -- 10.0.0.0/24 anywhere tcp dpt:tram
ACCEPT tcp -- 10.0.0.0/24 anywhere tcp dpt:bmc-reporting
ACCEPT tcp -- 10.0.0.0/24 anywhere tcp dpt:krb524
ACCEPT udp -- 10.0.0.0/24 anywhere udp dpt:tram
ACCEPT icmp -- anywhere anywhere
ACCEPT tcp -- anywhere anywhere tcp dpt:ssh
ACCEPT tcp -- anywhere anywhere tcp dpt:mysql
REJECT all -- anywhere anywhere reject-with icmp-port-unreachable

Chain FORWARD (policy ACCEPT)
target prot opt source destination
REJECT all -- anywhere anywhere reject-with icmp-port-unreachable

Chain OUTPUT (policy ACCEPT)
target prot opt source destination

Try to start the secondary node, and — tadaaa — the node will connect, will provision itself, and finally will start correctly. All good? Well not really, you still need to perform a final step. We need to make our server accessible also for PMM monitoring agents. You have PMM right? If you don’t take a look here and you will want it. :D Anyhow PMM will not work correctly with the rules I have, and the result will be an empty set of graphs when accessing the server statistics. Luckily, PMM has a very easy way to help you identify the issue:

[root@galera1h1n5 gal571]# pmm-admin check-network
PMM Network Status

Server Address | 192.168.1.52
Client Address | 192.168.1.205

* System Time
NTP Server (0.pool.ntp.org) | 2018-05-24 08:05:37 -0400 EDT
PMM Server | 2018-05-24 12:05:34 +0000 GMT
PMM Client | 2018-05-24 08:05:37 -0400 EDT
PMM Server Time Drift | OK
PMM Client Time Drift | OK
PMM Client to PMM Server Time Drift | OK

* Connection: Client --> Server
-------------------- -------
SERVER SERVICE STATUS
-------------------- -------
Consul API OK
Prometheus API OK
Query Analytics API OK

Connection duration | 1.051724ms
Request duration | 311.924µs
Full round trip | 1.363648ms

* Connection: Client <-- Server
-------------- ------------ -------------------- ------- ---------- ---------
SERVICE TYPE NAME REMOTE ENDPOINT STATUS HTTPS/TLS PASSWORD
-------------- ------------ -------------------- ------- ---------- ---------
linux:metrics galera1h1n5 192.168.1.205:42000 DOWN NO NO
mysql:metrics gal571 192.168.1.205:42002 DOWN NO NO

When an endpoint is down it may indicate that the corresponding service is stopped (run 'pmm-admin list' to verify).
If it's running, check out the logs /var/log/pmm-*.log

When all endpoints are down but 'pmm-admin list' shows they are up and no errors in the logs,
check the firewall settings whether this system allows incoming connections from server to address:port in question.

Also you can check the endpoint status by the URL: http://192.168.1.52/prometheus/targets

What you want more? You have all the information to debug and build your new rules. I just need to open the ports 42000 42002 on my firewall:

iptables -I INPUT 7 --protocol tcp --match tcp --dport 42000 --source 192.168.1.1/24 --jump ACCEPT
iptables -I INPUT 8 --protocol tcp --match tcp --dport 42002 --source 192.168.1.1/24 --jump ACCEPT

Please note that we are handling the connectivity for PMM using a different range of IPs/subnet. This because it is best practice to have PXC nodes communicate to a dedicated network/subnet (physical and logical). Run the test again:

* Connection: Client <-- Server
-------------- ------------ -------------------- ------- ---------- ---------
SERVICE TYPE NAME REMOTE ENDPOINT STATUS HTTPS/TLS PASSWORD
-------------- ------------ -------------------- ------- ---------- ---------
linux:metrics galera1h1n5 192.168.1.205:42000 OK YES YES
mysql:metrics gal571 192.168.1.205:42002 OK YES YES

Done … I just repeat this on all my nodes and I will have set my firewall to handle the PXC related security. Now that all my settings are working well I can save my firewall’s rules:

iptables-save > /etc/sysconfig/iptables

For Ubuntu you may need some additional steps as for (https://help.ubuntu.com/community/IptablesHowTo#Using_iptables-save.2Frestore_to_test_rules) There are some nice tools to help you even more, if you are very lazy, like UFW and the graphical one, GUFW. Developed to ease iptables firewall configuration, ufw provides a user friendly way to create an IPv4 or IPv6 host-based firewall. By default UFW is disabled in Ubuntu. Given that ultimately they use iptables, and their use is widely covered in other resources such as the official Ubuntu documentation, I won't cover these here.

Conclusion

Please don't make the mistake of flushing/ignoring your firewall, when to make this right is just a matter of 5 commands. It's easy enough to be done by everyone and it's good enough to stop the basic security attacks. Happy MySQL (and PXC) to everyone.

Last Updated on Wednesday, 18 July 2018 16:25
 
No orange pants this year PDF Print E-mail
Written by Marco Tusa   
Friday, 20 April 2018 09:08

Well here we go another MySQL conference / Percona Live.

Another huge and important event/milestone for the MySQL community. 

But this time I will not be there.

First time in many years I had to decline, drop my speech and say... "No I am so sorry, this time I cannot come".

marco

Was not an easy choice, not only because I am always excited to meet old colleagues, but also because PLSC is a great moment for brainstorming and to identify what could be good to push-on or to investigate better.

Especially this year where we have so many different interesting topics and so many different technologies as well. 

And of course the MySQL 8 GA will make a huge wave, but that was expected. To be honest what I am more interested to see is... what the real adoption of it will means. In Percona, we are working to be able to have it's adoption to happen as smoother as possible.

 

So I will really miss to be there, my hope is to be able to see the videos and get at least the presentations slides, to review the contents offline... but I will miss all the interactions and Q/A.

Anyhow I went through the schedule, and this is the list of speeches I would like to go listen and raise questions, knowing me ... not fix in the stone at all but you know is a start:

 

day 1

Make Your Database Dream of Electric Sheep: Designing for Autonomous Operation

MySQL at Twitter: No More Forkin' - Migrating to MySQL Community version

Containerizing Databases at New Relic: What We Learned

Tuning PostgreSQL for High-Write Workloads

Consistent Reads Using ProxySQL and GTID

Benchmark Noise Reduction: How to Configure Your Machines for Stable Results

Cassandra on RocksDB

ClickHouse in Real Life: Case Studies and Best Practices

 

day 2

Migrating to Vitess at (Slack) Scale

Stateful applications on Mesosphere DC/OS

Aurora PostgreSQL Deep Dive

Data Management in Kubernetes Using Kanister

A Seat At the Blockchain and Cryptocurrency Table for NoSQL Database Technologies

 

I wish to all of you to enjoy the conference, remember ask ask ask, the more you will interact with the speaker during the sessions, the better the presentation will be for all.

Hope to see you all in PL Europe in Germany and in PL 2019 

Have fun!!! And learn! 

Last Updated on Thursday, 19 July 2018 11:41
 
Leveraging ProxySQL with AWS Aurora to Improve Performance PDF Print E-mail
Written by Marco Tusa   
Thursday, 05 April 2018 00:00

Or How ProxySQL Out-performs Native Aurora Cluster Endpoints

In this blog post, I'll look at how you can use ProxySQL with AWS Aurora to further leverage database performance. My previous article described how easy is to replace the native Aurora connector with ProxySQL. In this article, you will see WHY you should do that. It is important to understand that aside from the basic optimization in the connectivity and connection management, ProxySQL also provides you with a new set of features that currently are not available in Aurora. Just think:

  • Better caching
  • Query filtering
  • Sharding
  • Query substitution
  • Firewalling
  • ... and more

We will cover areas like scalability, security and performance. In short, I think is more than worth it to spend some time and give ProxySQL with AWS Aurora a try.

The tests

I will show you the results from two different kinds of tests. One is sysbench-oriented, the other simulates a more complex application using Java, data object utilization and a Hikari connection pool in the middle as well. For the EC2 and Aurora platform I used:

  • Application/ProxySQL T2.xlarge eu-central-1a
  • 2 Aurora MySQL 5.7.12 db.t2.medium eu-central-1a
  • 1 Aurora MySQL 5.7.12 db.t2.medium eu-central-1b for AZ redundancy

The code for the application is available here, and for sysbench tests here. All the data and configurations for the application are available here. I ran three tests using both bench apps, obviously with Aurora as it comes and with ProxySQL. For the ProxySQL configuration see my previous article. The tests were read_only / Write_only / read_write. For Aurora, I only increased the number of connections and kept the how it comes out of the box approach. Note each test was run at least three times at different moments of the day, and on a different day. The data reported as final is the BEST performing result for each one.

The Results

For the impatient among us, here is a summary table of the tests: Sysbench:
summary_sysbench

Java App:
summary_for_java_app

Now if this is enough for you, you can go to the conclusion and start to use ProxySQL with AWS Aurora. If you would like to know a bit more, continue reading. Aside from any discussion on the benchmark tool and settings, I really focused on identifying the differences between the two “connectors”. Given the layer below was exactly the same, any difference is due to the simple substitution of the endpoint.

Sysbench

Read Only

The first image reports the number of events achieved at the time of the test. It is quite clear that when using ProxySQL, sysbench ran more events.
In this graph, higher is better:
read_events
In this graph, lower is better:

reads_latency

 

As we can see, the latency when using an Aurora cluster entry point is higher. True, we are talking about milliseconds, but it is not just the value that matters, but also the distribution:

Aurora cluster endpoint ProxySQL
Screen Shot 2018-03-26 at 7.17.04 PM
Screen Shot 2018-03-26 at 7.17.20 PM

An image is worth a thousand words! We can see, the behavior stays constant in analyzing the READS executed, with ProxySQL performing better.

  In this graph, higher is better:
reads_reads

  In this graph, higher is better:
reads_sysb_queries

Closing with the number of total queries performed, in which ProxySQL surpassed the Cluster endpoint by ~ 4K queries.

Write Only

For writing, things go a bit different. We see that all lines intersect, and the values are very close one to the other.

I will let the images speak for themselves:

In this graph, higher is better:

write_events_sysb

In this graph, lower is better:
write_latency_sysb

Latency spiked in each ProxySQL test, and it may require additional investigation and tuning.

 In this graph, higher is better:
write_write_sysb

While the rates of writes/sec intersect with each other frequently, in the end ProxySQL resulted in more writes than the native endpoint.

In this graph, higher is better:
write_sysb_queries

In the end, a difference exists and is consistent across the different test iterations, but is minimal. We are talking a range of 25 - 50 entries in total.

This result is not surprising, and it will be clear why later in the article.


Read and Write

As expected in the read and write test, we see a different situation.

ProxySQL is still performing better than the default entry point, but not by such a big margin as in read-only tests.

In this graph, higher is better:
rw_events_sysb

In this graph, lower is better
rw_latency_sysb

Latency and events are following the expected trend, where read operations are executed more efficiently with ProxySQL and writes are close, but NOT the same as in the write only test. rw_queies_sysb

As a result, the number of queries in ProxySQL is approximately 13% better than the default entry point.

Java Application Tests

What about the Java application? First of all, we need to remember that the application used a connection pool mechanism (Hikari), and the connection pool was present in all cases (for both Aurora cluster endpoint or ProxySQL). Given that a small delay in establishing the first connection was expected, you can easily see this in the MAX value of the connection latency. In this graph, lower is better.
app_con_latency_summary

The connection latency reported here is expressed in nanoseconds and is the measure of the time taken by the connection provider to return an active connection to the application from the moment the application requested it. In other words, how long the HikariCP is taking to choose/check/return an open connection. As you can see, the MAX value is drastically higher, and this was expected since it is the connection initialization. While not really interesting in terms of performance, this value is interesting because it gives us the dimension of the cost in the CP to open a new connection, which in the worse case is 25 milliseconds. As the graphs show, ProxySQL manages both cases (first call, reassignment) more efficiently.
In this graph, higher is better.
app_crud_summary

In the CRUD summary table, we can see the number of SQL commands executed per second for each CRUD action and for each test. Once more we can see that when using ProxySQL, the application performed much better and significantly executed more operations (especially in the R/W test).

  In this graph, higher is better.
app_evnts_summary

This graph represents the total number of events run at the time of the test. An event is a full application cycle, which sees the application generate the data needed to fill the SQL (no matter if it is for read/write), create the SQL, request the connection, push the SQL, get and read the resultset returned and give back the connection. Once more, ProxySQL shows better performance.
In this graph, lower is better.

app_exectime_summary

The execution time reported in this graph is the time taken by the application to run a whole event. This is it, execution time is the time of a full cycle. The faster the cycle is executed, the better the application is performing. The time is express in milliseconds and it goes from a very fast read, which probably accesses the cache in Aurora, to almost two seconds taken to insert a batch of rows. Needless to say, the tests using ProxySQL performed better.

But Why?

Why do the tests using ProxySQL perform better? After all, it is just an additional step in the middle, which also has a cost in intercepting the queries and managing the connections.

So why the better performance? The answer is simple and can be found in the Aurora manual: https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/Aurora.Overview.html#Aurora.Overview.Endpoints.
The Cluster endpoint is an endpoint for an Aurora DB cluster that connects to the current primary instance for that DB cluster. Each Aurora DB cluster has a cluster endpoint and one primary instance.

That endpoint receives the read and write request and sends them to the same instance.The main use for it is to perform failover if needed. At the same time, the Reader endpoint is an endpoint for an Aurora DB cluster that connects to one of the available Aurora Replicas for that DB cluster.

Each Aurora DB cluster has a reader endpoint. If there is more than one Aurora Replica, the reader endpoint directs each connection request to one of the Aurora Replicas. The reader endpoint only load balances connections to available Aurora Replicas in an Aurora DB cluster. It does not load balance specific queries.

If you want to load balance queries to distribute the read workload for a DB cluster, you need to manage that in your application and use instance endpoints to connect directly to Aurora Replicas to balance the load.
This means that to perform a Read/Write split, your application must manage two entry points and you will NOT have much control over how the queries are handled or to which replica instance they are directed.

This could lead to unexpected results and delays. Needless to say, ProxySQL does all that by default (as described in my previous article). Now that we've clarified how Aurora entry points behave, let's see about the performance difference.

cross-server-graphs

How do we read this graph? From left to right:

  • read_only test with an Aurora cluster endpoint
  • read_only test with ProxySQL
  • write_only with an Aurora cluster endpoint
  • write_only with ProxySQL
  • read and write with an Aurora cluster endpoint
  • read and write with ProxySQL

Here we go! As we can see, the tests with ProxySQL used the two configured instances, splitting R/W without the need to do anything on the application side. I purposely avoided the AZ replica because I previously identified it as having higher latency, so I can exclude it and use it ONLY in the case of an emergency.

The effects are clear in the next graph.
cpu_utilization

When using the cluster endpoint, given all the load was on a single instance, the CPU utilization is higher and that became a bottleneck. When using ProxySQL, the load is spread across the different instances, allowing real read scalability. This has immediate benefits in read and read/write operations, allowing better load distribution that results in better performance.

Conclusions

Aurora is a very interesting technology and can be a very good solution for read scaling.
But at the moment, the way AWS offers data connectivity with the Cluster endpoints and Reader endpoints can negatively affect performance.

The lack of configuration and the limitation of using different endpoints lead to confusion and less optimized utilization.

The introduction of ProxySQL, which now supports (from version 2) Aurora, allows an architect, SA or DBA to properly configure the environment. You can very granularly choose how to use each instance, without the need to have the application modify how it works. This helps keep the data layer solution separate from the application layer. Even better, this additional set of flexibility does not come with a cost.

On the contrary, it improves resource utilization and brings higher performance using less powerful instances. Given the cost of Aurora, this is not a secondary benefit.

  I suggest you try installing ProxySQL v2 (or higher) in front of your Aurora cluster. If you don't feel confident and prefer to have us help you, contact us and we will be more than happy to support you!

Last Updated on Wednesday, 04 April 2018 21:36
 
How to Implement ProxySQL with AWS Aurora PDF Print E-mail
Written by Marco Tusa   
Wednesday, 04 April 2018 00:00

ProxySQL with AWS AuroraIn this post, we'll look at how to implement ProxySQL with AWS Aurora. Recently, there have been a few discussions and customer requests that focused on AWS Aurora and how to make the various architectures and solutions more flexible. Flexible how, you may ask? Well, there are the usual expectations:

  • How do you improve resource utilization?
  • How I can filter (or block) things?
  • Can I shard with Aurora?
  • What is the best way to implement query caching?
  • … and more.

The inclusion of ProxySQL solves many of the points above. We in Consulting design the solutions for our customers by applying the different functionalities to better match customers needs. Whenever we deal with Aurora, we've had to exclude ProxySQL because of some limitations in the software. Now however, ProxySQL 2.0 supports Aurora, and it does it amazingly. This article will show you how to implement ProxySQL with AWS Aurora. The the next article Leveraging AWS Aurora performance will show you WHY.

The Problem

ProxySQL has two different ways to deal with backend servers. One is using replication mechanisms, like standard Async replication and Group Replication. The other is to use the scheduler, as in the case of Percona XtraDB Cluster, MariaDB Cluster, etc. While we can use the scheduler as a solution for Aurora, it is not as immediate and well-integrated as the embedded support for replication, given that we normally opted not to use it in this specific case (Aurora). But what WAS the problem with Aurora? An Aurora cluster bases its definition of Writer vs. Readers using the innodb_read_only variable. So, where is the problem? Well actually no problem at all, just that ProxySQL up to version 2 for replication only supported the generic variable READ_ONLY. As such, it was not able to correctly identify the Writer/Readers set.

The Solution

In October 2017, this issue was opened (https://github.com/sysown/proxysql/issues/1195 )and the result was, as usual, a quite simple and flexible solution.

Brainstorming, a possible solution could be to add another column in mysql_replication_hostgroups to specify what needs to be checked, either read_only or innodb_read_only, or even super_read_only

This lead to the ProxySQL team delivering (“commit fe2f16d6df15252f0107a6a224dad7b1efdb13f6”):

Added support for innodb_read_only and super_read_only

MYHGM_MYSQL_REPLICATION_HOSTGROUPS "CREATE TABLE mysql_replication_hostgroups 
(writer_hostgroup INT CHECK (writer_hostgroup>=0) NOT NULL PRIMARY KEY , 
reader_hostgroup INT NOT NULL CHECK (reader_hostgroup<>writer_hostgroup AND reader_hostgroup>=0) , 
check_type VARCHAR CHECK (LOWER(check_type) IN ('read_only','innodb_read_only','super_read_only')) NOT NULL DEFAULT 'read_only' , 
comment VARCHAR NOT NULL DEFAULT '' , UNIQUE (reader_hostgroup))"

Which in short means they added a new column to the mysql_replication_hostgroup table. ProxySQL continues to behave exactly the same and manages the servers and the replication groups as usual. No need for scripts or other crazy stuff.

Implementation

Here we are, the HOW TO part. The first thing to keep in mind is that when you implement a new Aurora cluster, you should always consider having at least two instances in the same AZ and another instance in a remote AZ. To implement ProxySQL, you should refer directly to


the instances, NOT to the cluster entry-point. To be clear, you must take this for each instance:

The information is available in the web-admin interface, under the instance or using the command:

aws rds describe-db-instances

And filter the result for:

"Endpoint": {
                "Port": 3306,
                "Address": "proxysqltestdb.c7wzm8xxmrze.eu-central-1.rds.amazonaws.com"
            },

To run ProxySQL with RDS in general, you need to install it on an intermediate server or on the application box. Once you decide which one fits your setup better, you must download or git clone ProxySQL v2.0+. DO NOT use v1.4.x, as it does not contain these new features and will not work as expected. Once you have all the Aurora instances up, it is time to configure ProxySQL.

Below is an example of all the commands used during the installation:


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
GRANT usage, replication client ON *.* TO monitor@'%' IDENTIFIED BY 'monitor';
 
DELETE FROM mysql_servers WHERE hostgroup_id IN (70,71);
DELETE FROM mysql_replication_hostgroups WHERE writer_hostgroup=70;
INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight,max_connections) VALUES ('proxysqltestdb.c7wzm8xxmrze.eu-central-1.rds.amazonaws.com',70,3306,1000,2000);
INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight,max_connections) VALUES ('proxysqltestdb.c7wzm8xxmrze.eu-central-1.rds.amazonaws.com',71,3306,1000,2000);
INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight,max_connections) VALUES ('proxysqltestdb2.c7wzm8xxmrze.eu-central-1.rds.amazonaws.com',71,3306,1000,2000);
INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight,max_connections) VALUES ('proxysqltestdb-eu-central-1b.c7wzm8xxmrze.eu-central-1.rds.amazonaws.com',71,3306,1,2000);
 
INSERT INTO mysql_replication_hostgroups(writer_hostgroup,reader_hostgroup,comment,check_type) VALUES (70,71,'aws-aurora','innodb_read_only');
LOAD MYSQL SERVERS TO RUNTIME; SAVE MYSQL SERVERS TO DISK;
 
DELETE FROM mysql_query_rules WHERE rule_id IN (50,51,52);
INSERT INTO mysql_query_rules (rule_id,proxy_port,username,destination_hostgroup,active,retries,match_digest,apply) VALUES(50,6033,'m8_test',70,0,3,'.',1);
INSERT INTO mysql_query_rules (rule_id,proxy_port,username,destination_hostgroup,active,retries,match_digest,apply) VALUES(51,6033,'m8_test',70,1,3,'^SELECT.*FOR UPDATE',1);
INSERT INTO mysql_query_rules (rule_id,proxy_port,username,destination_hostgroup,active,retries,match_digest,apply) VALUES(52,6033,'m8_test',71,1,3,'^SELECT.*$',1);
LOAD MYSQL QUERY RULES TO RUNTIME;SAVE MYSQL QUERY RULES TO DISK;
 
DELETE FROM mysql_users WHERE username='m8_test';
INSERT INTO mysql_users (username,password,active,default_hostgroup,default_schema,transaction_persistent) VALUES ('m8_test','test',1,70,'mysql',1);
LOAD MYSQL USERS TO RUNTIME;SAVE MYSQL USERS TO DISK;
 
UPDATE global_variables SET variable_value="67108864" WHERE variable_name='mysql-max_allowed_packet';
UPDATE global_variables SET Variable_Value=0  WHERE Variable_name='mysql-hostgroup_manager_verbose'; 
LOAD mysql VARIABLES TO run;save mysql VARIABLES TO disk;

 

The above will give you a ready-to-go ProxySQL setup that supports Aurora cluster, performing all the usual operations ProxySQL does, including proper W/R split and more for a user named 'm8_test'. The key is in passing the value 'innodb_read_only' for the column check_type in the table mysql_replication_hostgroups. To check the status of your ProxySQL, you can use this command (which gives you a snapshot of what is going to happen):


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
 watch -n 1 'mysql --defaults-file=~/.my.cnf -h 127.0.0.1 -P 6032 -t -e "select b.weight, c.* from stats_mysql_connection_pool c left JOIN runtime_mysql_servers b ON  c.hostgroup=b.hostgroup_id and c.srv_host=b.hostname and c.srv_port = b.port where hostgroup in( 50,52,70,71) order by hostgroup,srv_host desc;" -e " select srv_host,command,avg(time_ms), count(ThreadID) from stats_mysql_processlist group by srv_host,command;" -e "select * from stats_mysql_users;";mysql  --defaults-file=~/.my.cnf -h 127.0.0.1 -P 6032  -t -e "select * from stats_mysql_global "|egrep -i  "(mirror|memory|stmt|processor)"' 
+--------+-----------+--------------------------------------------------------------------------+----------+--------+----------+----------+--------+---------+-------------+---------+-------------------+-----------------+-----------------+------------+
| weight | hostgroup | srv_host                                                                 | srv_port | STATUS | ConnUsed | ConnFree | ConnOK | ConnERR | MaxConnUsed | Queries | Queries_GTID_sync | Bytes_data_sent | Bytes_data_recv | Latency_us |
+--------+-----------+--------------------------------------------------------------------------+----------+--------+----------+----------+--------+---------+-------------+---------+-------------------+-----------------+-----------------+------------+
| 1000   | 70        | proxysqltestdb.c7wzm8xxmrze.eu-central-1.rds.amazonaws.com               | 3306     | ONLINE | 0        | 0        | 0         | 0       | 0           | 0       | 0                 | 0               | 0               | 5491       |
| 1000   | 71        | proxysqltestdb2.c7wzm8xxmrze.eu-central-1.rds.amazonaws.com              | 3306     | ONLINE | 0        | 5        | 5         | 0       | 5           | 73      | 0                 | 5483            | 28442           | 881        | 
| 1000   | 71        | proxysqltestdb.c7wzm8xxmrze.eu-central-1.rds.amazonaws.com               | 3306     | ONLINE | 0        | 5        | 5         | 0       | 5           | 82      | 0                 | 6203            | 32217           | 5491       | 
| 1     | 71        | proxysqltestdb-eu-central-1b.c7wzm8xxmrze.eu-central-1.rds.amazonaws.com | 3306     | ONLINE | 0        | 0        | 0         | 0       | 0           | 0       | 0                 | 0               | 0               | 1593       |
+--------+-----------+--------------------------------------------------------------------------+----------+--------+----------+----------+--------+---------+-------------+---------+-------------------+-----------------+-----------------+------------+
+----------+----------------------+--------------------------+
| username | frontend_connections | frontend_max_connections |
+----------+----------------------+--------------------------+
| m8_test  | 0                    | 10000                    |
+----------+----------------------+--------------------------+
| Query_Processor_time_nsec    | 0              |
| Com_backend_stmt_prepare     | 0              |
| Com_backend_stmt_execute     | 0              |
| Com_backend_stmt_close       | 0              |
| Com_frontend_stmt_prepare    | 0              |
| Com_frontend_stmt_execute    | 0              |
| Com_frontend_stmt_close      | 0              |
| Mirror_concurrency           | 0              |
| Mirror_queue_length          | 0              |
| SQLite3_memory_bytes         | 2652288        |
| ConnPool_memory_bytes        | 712720         |
| Stmt_Client_Active_Total     | 0              |
| Stmt_Client_Active_Unique    | 0              |
| Stmt_Server_Active_Total     | 0              |
| Stmt_Server_Active_Unique    | 0              |
| Stmt_Max_Stmt_id             | 1              |
| Stmt_Cached                  | 0              |
| Query_Cache_Memory_bytes     | 0              |

 

At this point, you can connect your application and see how ProxySQL allows you to perform much better than the native cluster entry point. This will be expanded in the next article: Leverage AWS Aurora performance.

Conclusions

I had my first issue with the native Aurora connector a long time ago, but I had nothing to replace it. ProxySQL is a very good alternative to standard cluster access, with more options/controls and it also allows us to perform close-to-application caching, which is much more efficient than the remote MySQL one (http://www.proxysql.com/blog/scaling-with-proxysql-query-cache). In the next article I will illustrate how, in a simple setup, ProxySQL can help in achieving better results than using the default Aurora cluster endpoint.

Last Updated on Wednesday, 04 April 2018 21:37
 
«StartPrev12345678910NextEnd»

Page 1 of 14
 

Who's Online

We have 38 guests online