My MySQL tips valid-rss-rogers


This is not a technical article; it starts from my own thinking related to the work leaded in many years in international environments and with people coming from all around the world. Many though, about me, about the interaction with colleagues, about interactions with customers, may be few dolphins here in terms of bit, but not in terms of spirit, human respect and the research of being good.


My thinking and I

I am a lucky guy; I had the chance to travel a lot from when I was really young, so young that I cannot remember my first time. I have done mine the natural fact that we are different and we have different approaches in human relations.

Once become adult I was even luckier, because I started to work in international environment, so I had the chance to learn many different “cultural” approaches from the source on the top of my travels.

What have always amazed me, are the many subtle differences existing between people also leaving close each other, and very tight similarities also if staying from the other side of the world.

There are so many that anthropologically is very difficult to classify them all. But I am not an anthropologist so I can do mistake there and in my mind I have built up a differentiation, that refining to the extreme reduce the whole to two categories, not the good and the bad or we and the others, given we sometime shift from one category to the other during our day.

But let us say that it is a common behaviour during our day to define if “that” day we were here or there.

What I am talking about is the different approach that we can call the “One man hero” & “family man”.


The One man/woman Hero

He is he, now way to be better.  He save the world alone, he is firmly convinced to be (almost) always right. Whatever comes out from his mouth is holy true, what is in his mind no human can even imagine. If he stops doing the entire world stops as well.

When talking with others no need to take care of any debate, just listen to him.  Teams should follow him try to remain in his shadow.

He makes no mistakes, but if he does, he fell.


The Family Man/Woman

We must do it together, he has his own ideas and he is sure he is right, but you can be right as well. He proposes ideas not impose, want to hear from you, your concern, or thinking, especially if not in line. He learns from you, you learn from him.  He tries to keep his space/territory in order respecting his own attitude; if you do the same the world will be ok.

He can fell, but the “family” will stay up, and help him to rise again.


What and how this fits in consulting/support?

Well in many ways, starting from the interactions in the working place. It is funny to note that, any or almost any, company would like to have “the family man” on board, given the attitude to convince people not to impose, the capacity of sharing and so on. Also when he is sick or if he goes away the family survives.

On the other hands customers, often, want to have the hero with the magic wand. He comes save the world, takes the money and goes. What a big relief for them.

If the above is true, at the end the situation will not be so dramatic. We can act as family man inside our company and as the one man hero when dealing with the external.

Given life is fun, the reality is much more complicated, and the infinite combination starts putting together different types in the same room.  So you can have a lot of fun if behave as the omh (one man hero) and in the same room on the other side there is another one, who will wins, who will be the one having the right answer? On the other hands having a room filled with fm(family man) will bring to very interesting long discussions, but it is possible that for the time a solution is found the humans are extinct.


So as usual the right measure and the right combination of the two elements is the right path to go.

There are different points that must always take in consideration from my point of view:

1.       No one is better than someone else

2.       Stating to be good is not enough we must be good

3.       Shit happens

4.       Hiding shit bring more shit

5.       Learning from mistakes makes you stronger

6.       If you do not make mistakes you are lying

7.       Explain your ideas allow you to have better ideas

8.       Propose not impose

9.       If you know you right, be ready to accept that also someone else can be right as well

10.   If you know and can prove he is wrong explain not denigrate

The funny thing is considering the same points from the customer point of view:

1.       Better for you to be better than me

2.       You must be good, period

3.       Shit don’t happens if it does your fault

4.       If shit happens and it is my fault, then it is your fault because you did not tell me the right thing

5.       Your mistakes should never happen on my environment

6.       Better you know what you do so train yourself out of my “house”

7.       You must have good ideas always

8.       You must be sure of what you say

9.       If I know I am right I am right

10.   If you can prove I am wrong,  say it such that no one will get offended


In any case the golden rules for me are:

1.       Say always the true to the customer or colleague

2.       Base your suggestion and recommendation on observation and real facts

About 2, you can have inspiration that is good, but ten you must prove it.


The list can/should be longer, but what I mean here is that there are different approaches when we are acting as Consultant/Support or as customer, and we shift from one point to the other without even thinking.

If you say no, think when YOU are calling for assistance because your cable TV is not working and you are going to lose your favorite match, or because the Internet connection drops.


Back to the point please

But return back to the point, dealing with International consulting/support.

As said we must be good, and I am firmly convinced that if we act “Always” as a family man with our colleagues, be good will be a shared efforts and to achieve good results will be a lot easier.


The International teams first

Family’s teams

Let us start from here;  there are culture that see the family approach as must, you cannot see a good result as your result is a team result, as well the failure. We have this in Japan, Asia in general, but it is becoming more and more frequent in Europe as well. We have some “strange” behavior in country like Italy and Spain, but normally it works.

When I say strange I mean the “passing the buck” approach that unfortunately is very common in latin countries.

In the same context also the physical behavior and how the voice is used is different, avoiding to being rude or too direct; instead being more descriptive and verbose.


One Man Hero’s team

A different approach is used in some companies, where the leadership is still considering the OMH behavior a benefit. It is easier to find such approach in North America and European companies where the leadership is still bounded to old Top->Down models.  There the people or resource (I hate that term but it is a good indicator of the company approach) composing the teams, are often referring to someone above them in the pyramid, not interacting directly.  The results are shared bottom up, then top down. Not following an internal exchange. This is very important because the awareness re of the internal problem is not equally shared. Being part of group as such expose you to a lot of possible mistakes because bad communication.


My preference and thoughts

It is quite obvious that from the “internal” operation I put my vote on the FM and not on the other, period.

The difficulty exists in keeping all the people align and trusting each other. There will always be, especially when setting up a new team, someone that feels himself as the ONE. Not because he is bad but because the culture, the background and the lack of trust in the others.

This is probably the most important and crucial point, trusting each other in a team is not easy, requires more courage than acting alone.

Building up relation between different people is easier when you can have them in the same space at the same time, but it becomes a heavy task when happening with people distributed around the globe.  The differences start from the working hours, the holidays they have, the need to talk they have, and so on.

Again I was lucky, I had my first international team, when I was able to handle it directly, so at least the Time zone issue was reduced to minimum. But the respect you must have for their differences is the same, so different holidays, different working time, different culture that require to be taken in account.  When happens to me to deal with remote team, the first time, wow that was fun, just to organize the kick-off meeting I take at least 48 hours.  No was not easy to deal with it in a “family mode” and this is also why very often some companies goes for the OMH, choosing to impose instead propose.

Here I would like to remember to all a fantastic statement: “People do not leave their jobs, they run away from their boss”. Keep an eye on places where people are leaving, there you will probably find a bad manager, and I have seen that recently happening.


Now about Customer

It is now time to see how we can combine an international team with all his internal interactions with customers, sometime international, sometime no.

Here must distinguish, Consulting and Support are two different categories, or to be more precise, onsite Consulting and remote work, two different way to interact with customer.


Dealing with customer onsite

When you go on site at customer, also if you are a family man, be careful to interact in the correct way, do not be too flexible, given that could be misunderstood and read as weakness or worse. You are alone, no family with you, you must trust yourself fully. Better to switch to the One Man Hero mode, the lions will come and they want your body. This is true for the first 5 minutes, where the first 10’’ gives you the chance to evaluate how much you can be flexible or not at all.

What happens if you have in front of you a team headed by an OMH?  I have seen a constant behavior here; OMH meets you alone, before introducing you to the rest of the team. The first aspect that you should consider is if he is interacting with you at your working level or if he is there to coordinate. If he is a manager, be careful it doesn’t matter what nationality he will be, he will not understand you if you talk about technical aspects (few cases do, very few), and he will not understand the explanations, skip them, go straight to the Executive Summary.   If he is a working level guy, then culture, nationality and background could make the difference.  In some cases you must be dramatically direct close to be arrogant, and say what you have to say, like I have seen working in many cases with US people. But if you have in front of you a Latin guy or someone from Arabic countries, then do not do it. You will have an enemy before even open your mouth. Let him be opening all the conversation and discussions, try to keep all the conversation following indirect patterns; only when you are sure he start to trust you, start to gently propose him that there could be a “different” way of doing things. Then remember if he will reduce the talks with you, means you have a problem, try to fix it talking in 1:1 never with others.

In those context if the customer has a mixed team, follow the internal dynamics will helps a lot.

In short dealing with a customer face by face is primarily a matter of human experience, and then comes how good you are, but you know that you must be good, period.


Dealing with customer remote

Two level of contact here, using any system or by voice.

The golden rule here is … “Scripta manent verba volant” (Google … if you don’t know Latin).


System or email

Systems, including emails are done to reduce the overhead created by the human interaction, and to optimize relevant information exchange. When you have to exchange information with the customer by email, or by ticketing system, nationality, culture and background are not so relevant.  What is relevant is to be clear, and descriptive, period.

Good customer/managers know that, and tempt to push you to use that way, when communicating with them.

Never the less the use of system could hide some pitfalls and you must be careful trying to at least be aware of the possible issues.  There is not a real way to identify what kind of team the customer has (base on the FM or OMH model), but few elements could help: when you see only one contact point, or many different one for different topic, when you don’t see a common account used, then be careful, behind the scene is lurking a team managed by an OMH focus on lack of communication. On the other hands good sign is when they gives you one contact (email, ticket system, tracing tool) that is used in parallel by multiple team member, very likely they communicate each other, and so you can as well.



Huston … we have a problem.  You can only guess the first time with who you are dealing with. This is why you should never ever, talk directly with the customer. The sales guys or whoever talks with customer as main contact, must first brief you.  He should be able to say to you (if he do not, suggest to fire him) how to interact, and you should have a place where take note on customer information, and share that with the other members of the teams.

Voice and conference calls, are the preferred tools of customer that have very basic understanding of distributed work and international teams. Normally they have no idea how confusing a conference call could be when coordinating real work in case of problem.

When dealing with customer around the world we have to keep in mind that not all of them talk good English, or English at all, as well teams members, from both sides. This is one of the main issues during conference call, after line quality, for misunderstanding between us and customer.

For geographic and language reasons the North America is the one mostly using it. The problems start when they forgot there are others 201 countries in the world, 6000 spoken languages, where the most used (excluding Chinese) is Spanish with 329 million vs English 328 million, and last but not least there are other 20Hrs in the Time Zones and people work there too, and sometime sleep.

In short, keep an eye on the world clock and whenever there is the need of a call show some respect and at least talk slowly and clear, avoiding slang or too many jokes.

That’s it, multi-cultural, multi-language teams should not have conference calls with customer too often or at all, but the coordination level should. Those calls should be focus on planning, and customer review, nothing more all other communication should be written.

Doing that will help preventing any friction between the different ways oh behave.


Lesson learn

There is always something to learn. In this case I think is that the power, flexibility, creativity and much more obtained having a team compose by multi-cultural elements, is so more than the difficulties to manage it that also companies not dealing with international customer should go for it.

Language and cultural differences could be an issue but is possible to handle them; the only require ingredient is showing a little bit of respect and ask for it as well.

The problem is keeping the beast under control specking about OMH, but someone more expert then me had already spoke about it in the MUST READING book: “Good Boss, Bad Boss” by Robert I. Sutton.


What conclusion? There is not a conclusion; I have just put in write a week-end/Easter thought.

The only thing I would like to stress a bit is that whenever there are teams based on multi-cultural members, from Service provider side or customer side, the standard way of acting are not working anymore.

The key there is respect, once more respect, no one is better, we are different and we must respect that. There is no need to do huge things, small simple ones are good enough. Talk slowly when talking, respect other people traditions: don’t ask for consultancy on 26 of December, don’t plan to send your Guru to Cairo on Friday and so on.

Our world is a common net from many points of view, but still lack in understanding how different is dealing with company, teams and customer limited to a specific territory and/or country; comparing with the company, team and customer distributed around the globe.

Excellent managers that work great and achieve  great results in UK (or Italy, US, France, Japan and so on)  for company working within the country with people comeing only for that country, miserable fails if require to lead the same role in a distributed, international environment.


Dealing with international teams and customer, require specific background. Managers, team leader, Coordinators that are not use to work in an international environment , should not be place in a leading position given they total lack of experience in handling correctly the PEOPLE, not resources, but PEOPLE.

One clarification, a company that has 100 offices in 100 countries and use a vertical structure, is not an international distributed company; it only has many boxes, period.




MySQL Search facility is gone …


Well it happens, after a long period of time without one second of relax The MySQL Dolphin Search, nick name Fetch for the friends, decide to go on vacation.



He was already not working as expected, and I remember times when together with other colleagues we were pushing on him to be more precise, and less restrictive, more “concept” oriented than “word” oriented. Not really accepting comments he was his way, and in any case we were loving him.

What happen recently I don't know, from when I left I didn't had the chance to talk with him too much, but just recently I was in the need to have his support, but he was not there... instead there was a message, saying that a search for “InnoDB” :

Skip navigation links

Oracle Secure Enterprise Search


* Help

* Search


Oracle Secure Enterprise Search Error

A problem has been encountered while processing your request.

Based on the information we have, we suspect the following:


The exception did not contain any error message. Report this problem to your administrator, or contact your Oracle representative.


Of course I can use his friend G, but you know I miss him, and also if I was not too nice with him sometime, I have just realize the great work he was doing, and how much added value he was providing to the documentation I use every day.


So, please whoever knows where he went for vacation, tell him to comes back.




The Binary Log


The binary log in MySQL has two main declared purpose, replication and PTR (point in time recovery), as declared in the MySQL manual. In the MySQL binary log are stored all that statements that cause a change in the database. In short statements like DDL such as ALTER, CREATE, and so on, and DML like INSERT, UPDATE and DELETE.

As said binary log transform multi thread concurrent data/structure modification in to a serialize steps of activity. To handle this, a lock for the binary log—the LOCK_log mutex—is acquired just before the event is written to the binary log and released just after the event has been written. Because all session threads for the server log statements to the binary log, it is quite common for several session threads to block on this lock.

The other side effect of this is on SLAVE side. There the binary log becomes the relay-log, and will be processed SINGLE threaded.

In the MASTER it looks like:


While on SLAVE:




It is quite clear how the SLAVE cannot work efficiently in the case we will have a high level of  data modification on the MASTER.


And now the story

The other day I was working on a client site, I was performing some simple benchmarking to identify the correct thresholds for monitoring tools and so on.

While doing that I start to stress a little his server a 24CPU 48GB RAM SAN attached storage running Linux kernel 2.6.

The client had decided, before I or (we) can say anything to use MMM for HA, given that to have an up to date replication is a crucial factor for their HA architecture.

The test I was running execute inserts and selects, of 650 Bytes per insert using scalable number of thread, modifiable complexity in the SELECT, can do single insert or batch inserts and finally can do also BLOB/TEXT inserts (in that case the size per insert will jump up).

I was executing my standard low traffic exercises, when I realized that with only 6 threads doing 318 inserts per second (204Kb/s) the slaves were lagging behind the master for an indecent number of seconds.

I was expecting some lag but not so much. I stop any activity on the Master and wait for slave to recover, it takes 25 minutes to the slaves to recover/apply 20 seconds of inserts on the master.

I was shocked. That was too much considering I normally reach the 36 threads running at the same time on 5.1 and 144 on 5.5  before starting to see scaling issue and performance degradations (not in the scope of this article to publish that results).

Anyhow the issue was on replication not on Master. I was in the need to find a solution for the client or at least try before suggesting an architecture review.

Given re-inventing the wheel it is not my “motto” I started to look around for possible solutions, and parallelization seems the most interesting one.


My first though goes to Tungsten from Continuent, this also thanks to the very nice bogs coming from Giuseppe Maxia :


Giuseppe was providing a superb introduction on one of the most interesting (actually the only really usable) solution currently available, the “Tungsten Replicator”.


I was reading the articles and get excited more and more.
That was exactly what I was looking for, until I read “A Shard, in this context, is the criteria used to split the transactions across channels. By default, it happens by database”.


Ooops, it will not work for my customer and me, given that his instance is mainly single schema, or at least the major amount of traffic is directed to only one schema.

I pinged Giuseppe, and we had a very interesting talk, he confirms the approach by object/schema, suggesting alternatives for my issue. Unfortunately given the special case no sharding could be done or table distribution, so for the moment and for the specific case this solution will not fit.

Looking around I had seen that almost all use the same approach, which is parallel replication by OBJECT where object is again and unfortunately, mainly the SCHEMA.

Sharding replication by schema is an approach that could work in some cases but not all.

If you have many schemas, with a limited number of well-distributed Questions per second, I agree this solution is a very good one.

If you have only one schema with very high level of Questions per seconds, this will not work anymore, for sure we can distribute the tables in different schemas, but what about transactions doing inserts/modifications in multiple tables?

We should be sure that each action is running on isolate state, so no relation with others. In short it will not work.

Having the possibility to dream, I would like to have parallel replication by MySQL User/Connections.

I know it could looks as a crazy idea, but trying to replicate the behavior of the MASTER on the SLAVES starts from the user(s) behavior, not from the Object inside the server instance.

Trying to move objects from A to B (see from MASTER to SLAVE) is really complicate because it needs to take in to account how people->application behave on a specific OBJECT or set of them. The status of the OBJECTS is subject to constant changes and most of them at the same time, so concurrency and versioning needs to be preserve.

In short what I was thinking was to realize a conceptual shift from REPLICATION to MIMIC approach.

I started my dream looking on what we already have and how it could be used.

We have a flexible binary log format (version4), we have a known set of SCHEMAS, USERS, and connections going from users to schemas. We have (at least for Innodb) the possibility to have good concurrent and versioning mechanism. The MAX_USER_CONNECTIONS can be used to tune the number of threads the specific user can use, and assign also the permissions.

Setting MAX_USER_CONNECTIONS=30 to USER1@ will help us in correctly dimensioning the effort that replication by user will have to do in checking for consistency.

I am aware of the issue given the interaction and relation between user/transactions, and I see the need of having a coordinator that will eventually manage the correct flush of the transaction by epoch.

What I was thinking is that the coordinator on the slave could open as much as MAX_USER_CONNECTIONS on the master, and use them as flushing threads.

As said before, I was thinking to trust the internal versioning, so not supporting transaction engines are excluded.

The epoch and/or Transaction ID would then be used to flush them in correct order, where the flush will be done to the pages in memory and then use internal mechanism for flushing to disk(s), talking of InnoDB.

It will not be real parallelization but also not a serialization given we will trust the internal versioning, so we will respect the EPOCH for the launch order not waiting for the completion.

On MASTER the replicator agent will create an X number of Replication channel for each authorized/configured user per SCHEMA, users and permission must be the same also on SLAVE.

The binary log write could looks like:


On the SLAVE, the logs will be copied and then processed opening 1 to MAX_USER_CONNECTIONS Thread(s). 
Coordination between insert will be taken by replication MUTEX per User based on the EPOCH.


I know quite difficult to realize given the fact that we will have more users writing on different schemas. To have it working correctly, a MUTEX should check the correct process execution at Relay Log Coordinator, ensuring also a correct execution order base on the EPOCH.

Sounds like another serialization, also if we will not wait for the transaction to complete, but only to start.

It seems that a mixed approach will be a better solution, such that replication by SHARD will be used for DB and then per USER inside only by SHARD.  This means archiving action per schema and per user transaction. Replicating the work on the SLAVE ensuring to have the same start order of the transaction for the modifications. This is not so crazy as most can think and not so far from sharding by schema, except the multi-threading during the writes.

Main point will be archiving correctly the information in the binary log:



And execute them in the right order as well, keeping also into account the User/Transaction order:


Relay logs will contain information divide by SCHEMA. Processed using multi thread per User/Connection and organize with reference to the epoch of the starting execution order on the MASTER. Different grades of parallel or serialization execution could be set on the how the single Schema relay coordinator will work.

I know it will not be full parallelization but is not serializing as well, given the possible parallel execution of N transaction per user.


I think that there could be some risk here so in the case we should be able to:

-       Configure the number of executing threads per user

-       Execute only one thread per user

-       Choose between EPOCH execution and on Lock relief execution


Fast/short comment on the new binary log.

From binary log V4 we have the possibility to extend what the binary log store in a flexible way.

The extra headers (from byte 19 to X) are there but not used, while that sounds to me the most interesting place where to store additional information regarding what and how happen on the MASTER.

Not only, an HASH value representing the data on the master after the operation, would be of incredible help in checking if the replicated data on the SLAVE is consistent with the MASTER or not, avoiding to do that with silly methods like we are force to do today.

Below the binary log V4 format description:

v4 format description event (size >= 91 bytes; the size is 76 + the number of event types):

| event  | timestamp         0 : 4    |
| header +----------------------------+
|        | type_code         4 : 1    | = FORMAT_DESCRIPTION_EVENT = 15
|        +----------------------------+
|        | server_id         5 : 4    |
|        +----------------------------+
|        | event_length      9 : 4    | >= 91
|        +----------------------------+
|        | next_position    13 : 4    |
|        +----------------------------+
|        | flags            17 : 2    |
| event  | binlog_version   19 : 2    | = 4
| data   +----------------------------+
|        | server_version   21 : 50   |
|        +----------------------------+
|        | create_timestamp 71 : 4    |
|        +----------------------------+
|        | header_length    75 : 1    |
|        +----------------------------+
|        | post-header      76 : n    | = array of n bytes, one byte per event
|        | lengths for all            |   type that the server knows about
|        | event types                |
v4 event header:
| timestamp         0 : 4    |
| type_code         4 : 1    |
| server_id         5 : 4    |
| event_length      9 : 4    |
| next_position    13 : 4    |
| flags            17 : 2    |
| extra_headers    19 : x-19 |


My favorite mottos are:
 “only the ones that do nothing do not make mistakes”


“don’t shoot on the pianist”.

As I say from the beginning mine is a dream, and I am just proposing an idea, specially because I see that almost all were following the same mind path focused on OBJECTS, probably for a good reason, but never the less I am following my dream.

It could happen that someone would like to see/try to work on that or that will use part of the idea revisiting modifying and making it possible to implement.

For the moment it is too complex and taking too much time, for me to do it alone, but it could be a good idea for a plug-in in my next life.



In addition to Giuseppe blog, which for now it seems to me the only one proposing something that could be implement, I did some research, but at the end the actors are always the same:


MySQL forge




I already received a comment fromGiuseppe about how to handle the Locks and prevent dead lock... working on that (I have some ideas and details will provide soon).





The Binary Log

The binary log in MySQL has two main declared purpose, replication and PTR (point in time recovery), as declared in the MySQL manual. In the MySQL binary log are stored all that statements that cause a change in the database. In short statements like DDL such as ALTER, CREATE, and so on, and DML like INSERT, UPDATE and DELETE.

As said binary log transform multi thread concurrent data/structure modification in to a serialize steps of activity. To handle this, a lock for the binary log—the LOCK_log mutex—is acquired just before the event is written to the binary log and released just after the event has been written. Because all session threads for the server log statements to the binary log, it is quite common

for several session threads to block on this lock.

It is not in the scope of this discussion to provide a detailed description of the binary log, which could be found at

What it is relevant is that MySQL write "a" log and read "a" log, serializing what is happening in the MASTER in a multi thread scenario. Also relevant is that MySQL has to deal with different behavior in relation to TRANSACTIONAL and NON-TRANSACTIONAL storage engine, trying to behave consistently.

In order to do that MySQL had changes his behavior while evolving, so it changes the way he was and is using the TRANSACTION CACHE also creating not few confusion.


Transaction Cache


What is the Transaction Cache or TC? It is a "per thread" cache used by MySQL to store the statements that will be needed to be written in the binary log.

This cache is active ONLY when the Binary log is active, otherwise it will not be used, and, as it declare, it is a "Transaction" cache so used to store statements executed in a transaction, and that will be consistently flushed in the binary log serializing the multi thread execution.

But what if we also have non transactional storage engine involved in our operations? Well here is where the fun begins, and also where the difference in how MySQL acts makes our life quite complicate.



Actors involved (who control what)


The TC (transaction Cache) is currently controlled by few parameters, and it is important to have them in mind:


o   this variable is to enable the binary log, if not present all the structure related to it is not active;


o   The size of the cache to hold changes to the binary log during a transaction


o   this variable determines the size of the cache for the binary log to hold nontransactional statements issued during a transaction.


o   This variable when enabled, causes updates to non-transactional tables to be written directly to the binary log, rather than to the transaction cache.

      Binlog_cache_use vs Binlog_cache_disk_use

o   Status Variables to use for checking the efficiency of the cache.

      Binlog_stmt_cache_use vs Binlog_stmt_cache_disk_use

o   Status Variables to use for checking the efficiency of the cache.



MySQL Transaction cache and MySQL versions


Ok let start trying to define some milestones, like what is what, in which version, and what works and what doesn't.

binlog_cache_size is present from the start but behave different, in:

5.1 controls the whole transaction cache

5.5.3 up to 5.5.8 controls both size of the statement and none cache

5.5.9 and newer controls only the transaction cache demanded to transactions



5.1 not present

5.5.9 it controls the dimension of the transaction cache demanded to store the NON transactional statements present in a transaction.



5.1.44 Introduced. it is DISABLE by default. If enable it allows all statements executed, also AFTER a Transaction (we will see later what this means), to be directly flushed to the binary log without going to the TC. It has effect only if the binlog_format is set to STATEMENT or MIXED.

5.5.2 Introduced.

5.5.5 It was modify to be active ONLY if binlog_format is STATEMENT in any other case it is ignored.

This feature, regardless the MySQL version, create more problems then solutions, and honestly I think it should be removed, please see the bug ( to understand what I mean. In any case it has now a very limited scope of action, and better to consider it as not present.



How the whole works


The first point to take in mind, is that the TC is used only during a transaction, also the new TCStatement, is suppose to store only statement that are issued during a transaction, and it will be flush at the moment of the COMMIT, actually it go the queue for acquiring the LOG_LOCK mutex.


All other statements will directly go to the LOG_LOCK mutex queue.

Said that let us see how the TC works and what it does and how.


In general terms when you mix the Transactional and NON-transactional statements these rules apply, unless the infamous

binlog_direct_non_transactional_updates is enable (only 5.1):

1. If the statement is marked as transactional, it is written to the transaction cache.

2. If the statement is not marked as transactional and there are no statements in the

transaction cache, the statement is written directly to the binary log.

3. If the statement is not marked as transactional, but there are statements in the

transaction cache, the statement is written to the transaction cache.


Interesting point here is what and how is define as NON Transactional or as TRANSACTIONAL, from reading the manual and the many reference we have that:


After MySQL 5.1.31

a statement is considered non-transactional if it changes only non-transactional tables;

a statement is transactional if it changes only transactional tables.

Finally a statement that changes both non-transactional and transactional tables is considered “mixed”. Mixed statements, like transactional statements, are cached and logged when the transaction commits.


Before MySQL 5.1.31

a statement was considered non-transactional if the first changes it makes change non-transactional tables;

a statement was considered transactional if the first changes it makes change transactional tables.


In term of code:

Before MySQL 5.1.31



INSERT INTO innodb_tbl VALUES (a),(b);

INSERT INTO myisam_tbl SELECT * FROM innodb_tbl;


NON-Transactional (the first command also if inside a declared transaction):


INSERT INTO myisam_tbl VALUES (a),(b);

INSERT INTO innodb_tbl SELECT * FROM innodb_tbl;


After MySQL 5.1.31



INSERT INTO innodb_tbl VALUES (a),(b);

INSERT INTO innodb_tbl SELECT * FROM innodb_tbl;



INSERT INTO myisam_tbl VALUES (a),(b);

INSERT INTO myisam_tbl SELECT * FROM innodb_tbl;



INSERT INTO innodb_tbl VALUES (a),(b);

INSERT INTO myisam_tbl SELECT * FROM innodb_tbl;



A graph could help as well to understand what happens between threads and flush of the TC in the binary log.

In the graphs we have three treads running in parallel, remember that inside mysql the main entity is the THD "The THD class defines a thread descriptor. It contains the information pertinent to the thread that is handling the given request. Each client connection is handled by a thread. Each thread has a descriptor object."




The first thread will do all the operations using TRANSACTION storage engine (InnoDB);


the second thread will use NON-TRANSACTIONAL (MyISAM) in the first insert, then TRANSACTIONAL (InnoDB);


the third statement will do first a TRANSACTIONAL insert and then will use it for a NON-TRANSACTIONAL.


In the first one I have assume the 5.1.31 and over behavior.



The three statements start almost at the same time, and none of them has data in any cache or buffer. Said that it is probably true that statement 1 in Thread2 will be executed (fully) faster, given it will take less as overhead, and giving that is the first statement in the THD2 transaction, it will be treated as NON-TRANSACTIONAL and directly sent to the binary-log's queue.


THD1 is fully TRANSACTIONAL, it will do his internal action then flush to binary-log.


Then THD2 had done also the second part of the operation TRANSACTIONAL and it is flushing it as well.


Last the THD3 which had first insert a record from TRANSACTIONAL table and then use it to populate the NON-TRANSACTIONAL.




It looks more or less fine, isen't it? Well but assume that all the three THD share the InnoDB table and that as per default the ISOLATION LEVEL is Repeatable Read... Do you get it??


Actions and value on the MASTER will do and set values, that are potentially different from the ones in the SLAVE given the different order in the execution.


Immagine to have updates, instead insert and/or WHERE condition by values. Results could be different given the position, and remember, we are still talking about the STATEMENT replication given it is still the default ALSO in 5.5.


Finally remember that the issue it is NOT that THD1/2/3 could potentially change each other value, but the fact that they do something on the MASTER which could potentially different in the SLAVE.


That's it, the changes introduced in MySQL were done to reduce this wrong behavior, or to in some way try to keep consistency.


Let see what and how in the others graphs.



What about using binlog_direct_non_transactional_updates, will it helps?


I admit that I have exaggerated a little bit, in this example but my intention was to show how dangerous this option could be.



THD2 statement 1, as usual will be the first one, then THD1 consistently apply the whole set, but then it could happens that given the option set binlog_direct_non_transactional_updates THD3 will be faster in flushing the NON-TRANSACTIONAL statement, writing to the binary log before THD2, adding fragmentation and breaking the order in the binary log.


As said I have exaggerate a little, but theoretically this is very likely to happen in a standard context.

So in addition to the already found bug(s) (see the reference section), this option for me it is not really an additional help, but an issue generator.



Transaction Cache in MySQL 5.5


From MySQL 5.5.4 the TC is now divide in two parts, one control/store the TRANSACTIONAL statements, another store ONLY the NON_TRANSACTIONAL statements.


What does it means? In theory it means that any statement which was suppose to be part of a TRANSACTION but related to a NON-TRANSACTIONAL table (like statement at the start of a transaction), instead going directly to the binary log should be moved to the TC-statement, and then flushed when the TC is flushed.


The involvement of the binlog_direct_non_transactional_updates complicate  a little all the scenario and in addition to that, it also create some major issue in the correct flush, so I am ignoring it here.


Let us see what happens using the new TC mechanism:


THD1 is as usual consistent in the way it flush to binary log. Arbitrarily I have decide that this time THD2 is taking a little bit more time like a millisecond, and that is enough for THD1 to acquire the LOG_LOCK before it.


The scope as you have probably guess was to make more ORDERED the graph, given that this time having one or the other flushing before or after will not really make any difference this time.


This because the TCStatement will be flushed (at least this is what I have seen and understood) when the TC is flushed.

Result is that the flush will take place using same the order on master and on the slave.


Finally THD3 also if will have the possibility to use the TCstatements it will not giving the fact that the statement using a NON-TRANSACTIONAL storage engine is NOT the first statement, so the mixed transaction will be placed in the TC as expected.





There was not so much to say from the beginning, to be honest, the TC and TCstatements are useful only for that case we do mix-transactions. The only significant changes are between after 5.1.31 and before and the introduction of the TCstatement. All the other tentative to add flexibility to this area, had in my vision increased bad behaviors.


It is easy for Master and Slave to store or modify data in a unexpected way, issue that any DBA face probably too often in MySQL.

I am not saying that all that cases should bring back to this specific issue, but for sure it is one of the less known and difficult to track, because many things do influence the how MySQL will flush to binary log.


The golden rule is, do not mix NON-TRANSACTIONAL and TRANSACTIONAL, but if you really have to do that, be sure of doing it in using 5.5.9 and above, which at least gives you more insurance of flushing to binary log the commands consistently.


Last notes


Remember that any statement NON-TRANSACTIONAL in the TC cache will be flushed on COMMIT or ROLLBACK, and that in the latest case a WARNING message should be written in the MySQL error log.


Note that I will post later (when I will have the time) some code and related raw data for documentation purpose.




Overview on what a binlog is

The binary log contains “events” that describe database changes such as table creation operations or changes to table data.
It also contains events for statements that potentially could have made changes (for example, a DELETE which matched no rows),
unless row-based logging is used.
The binary log also contains information about how long each statement took that updated data.

The binary log has two important purposes:
  • For replication, the binary log is used on master replication servers as a record of the statements to be sent to slave servers.
    The master server sends the events contained in its binary log to its slaves, which execute those events to make the same data
    changes that were made on the master. See Section 16.2, “Replication Implementation”.
  • Certain data recovery operations require use of the binary log. After a backup has been restored, the events in the binary log
    that were recorded after the backup was made are re-executed. These events bring databases up to date from the
    point of the backup. See Section 6.5, “Point-in-Time (Incremental) Recovery Using the Binary Log”.

Running a server with binary logging enabled makes performance slightly slower.
However, the benefits of the binary log in enabling you to set up replication and for
restore operations generally outweigh this minor performance decrement.


Binlog main settings

The binlog is controlled by few settings in the my.cnf file, let quickly see them (credits MySQL manual):

Enable binary logging. The option value, if given, is the basename for the log sequence.
The server creates binary log files in sequence by adding a numeric suffix to the basename.
It is recommended that you specify a basename, otherwise, MySQL uses host_name-bin as the basename.

Parameter to do or ignore databases


If the value of this variable is greater than 0, the MySQL server synchronizes its binary log to disk (using fdatasync()) after every sync_binlog writes to the binary log.
There is one write to the binary log per statement if autocommit is enabled, and one write per transaction otherwise.
The default value of sync_binlog is 0, which does no synchronizing to disk—in this case, the server relies
on the operating system to flush the binary log's contents from to time as for any other file.
A value of 1 is the safest choice because in the event of a crash you lose at most one statement or transaction from the binary log.
However, it is also the slowest choice (unless the disk has a battery-backed cache, which makes synchronization very fast).

The size of the cache to hold the SQL statements for the binary log during a transaction.
A binary log cache is allocated for each client if the server supports any transactional storage engines and if the server has the binary log enabled (--log-bin option).
If you often use large, multiple-statement transactions, you can increase this cache size to get more performance.

If a multiple-statement transaction requires more than this many bytes of memory, the server generates a Multi-statement transaction required more than 'max_binlog_cache_size' bytes of storage error.
The minimum value is 4096.

If a write to the binary log causes the current log file size to exceed the value of this variable, the server rotates the binary logs(closes the current file and opens the next one).
The minimum value is 4096 bytes. The maximum and default value is 1GB.

The number of days for automatic binary log file removal. The default is 0, which means “no automatic removal.”
Possible removals happen at startup and when the binary log is flushed.

An example of usage of all of them:

log_bin = /var/mysql/log/binlog51/mysql-bin
relay-log = /var/mysql/log/relaylog/relay

expire_logs_days = 30
max_binlog_size = 500M
log-slave-updates = 1
max_binlog_cache_size = 4294967295
sync_binlog = 1


In this case what we have define is an automatic purge of the Binary logs after 30 days.
30 days is a safe window for the replication to happen, so there should not be any issue here in removing the binary logs.
Not always we are so lucky, and some time we have to clean up our disks to save space.

In this case is a good idea to set-up a procedure to perform checks against the slaves before performing a purge, you can do it by yourself
or trust one of the many script on the web, good starting point is Maatkit.

Now let us assume that for any reason you have purged a binary log from the master, and that the slave was not already there.

What will happen is that you will receive a very nice message from MySQL replication:
"Last_Error: Could not parse relay log event entry. The possible reasons are: the master's binary log is corrupted
(you can check this by running 'mysqlbinlog' on the binary log), the slave's relay log is corrupted (you can check this by running 'mysqlbinlog' on the relay log),
a network problem, or a bug in the master's or slave's MySQL code. If you want to check the master's binary log or slave's relay log, you will be able to
know their names by issuing 'SHOW SLAVE STATUS' on this slave."

Nothing unexpected ... you have deleted it with the PURGE so log not there anymore.
Assuming you have the backup of the purged binlog (as it should ALWAYS be), copy back the file(s), and edit the <binlog>.index file, adding the missed entry previously PURGED.

Then check if it sounds fine:

mysql> show binary logs;

It will show the added entry.

Well Nice ALL done !!!

Assuming from that we could restart the slave... So let us try.
What we will get is something frustrating:

100929 11:21:47 [Note] Slave SQL thread initialized, starting replication
in log 'mysql-bin.002671' at position 1073741674, relay log './xxxxxxxxxxxx-relay-bin.000001' position: 4
100929 11:21:47 [Note] Slave I/O thread: connected to master This email address is being protected from spambots. You need JavaScript enabled to view it.:3306',
replication started in log 'mysql-bin.002671' at position 1073741674
100929 11:21:47 [ERROR] Error reading packet from server: Could not find first log
file name in binary log index file ( server_errno=1236)
100929 11:21:47 [ERROR] Got fatal error 1236: 'Could not find first log file name
in binary log index file' from master when reading data from binary log
100929 11:21:47 [Note] Slave I/O thread exiting, read up to log 'mysql-bin.002671', position 1073741674
100929 11:22:08 [Note] Error reading relay log event: slave SQL thread was killed



Why this happens?

The situation seems a little bit confusing, right?
The MySQL server say that it CAN see them, but files are not valid for the replication process, so what is going on?

Well first of HOW I DO FIX it this will help us to understand.
Here a comment, I got comment form some MySQL DBA saying that the way is to RESTART MySQL, true if you do that it will fix it,
but you DO NOT NEED TO restart MySQL in this case, instead do the right thing:

mysql> FLUSH LOGS;


Then restart the slave process on the SLAVE:


mysql> show slave status\G
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_User: xxxx
Master_Port: 3306
Connect_Retry: 60
Master_Log_File: mysql-bin.xxxx
Read_Master_Log_Pos: xxxxx
Relay_Log_File: xxxxxxxxxxxx-relay-bin.xxxx
Relay_Log_Pos: xxxxxx
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
Last_Errno: 0
Skip_Counter: 0
Exec_Master_Log_Pos: 131915378
Relay_Log_Space: 563985741
Until_Condition: None
Until_Log_Pos: 0
Master_SSL_Allowed: No
Seconds_Behind_Master: 946273
1 row in set (0.00 sec)


WOW it works ..., well this is expected, but I want to spend one second on WHY it works.

What we have to do in this cases is to read the source code and once more see what is happening there.

Reading the code we will see that FLUSH LOGS, will also call this function, which will instanciate a new MYSQL_LOG object.

The point is all here.

 if (log_type == LOG_BIN) 
if (!no_auto_events)
We log the whole file name for log file as the user may decide
to change base names at some point.
THD *thd = current_thd; /* may be 0 if we are reacting to SIGHUP */
Rotate_log_event r(thd,new_name+dirname_length(new_name),
bytes_written += r.data_written;
Update needs to be signalled even if there is no rotate event
log rotation should give the waiting thread a signal to
discover EOF and move on to the next log.
name=0; // Don't free name

The function will close the current BINLOG and also the the <binlog>.index associated, then will ROTATE
the binarylog, and create new one with a new name, finally reopen a new <binlog>.index.

/* reopen index binlog file, BUG#34582 */
if (!open_index_file(index_file_name, 0))
open(old_name, save_log_type, new_name_ptr,
io_cache_type, no_auto_events, max_size, 1);


Lesson Learn

Also if MySQL allow us to do nice changes on the fly, editing and modifying the text files like binlog.index,
internally these changes are not considered valid unless the descriptors for these objects are refresh.
Refresh that is done in different way one of them is using FLUSH LOGS.

Final Consideration

So far so good and nothing really new except that we have done another dig in MySQL code, to understand what and how it works.
There will be much more to say about, and I admit that I am doing it simpler then it is, but it was not my intention to write an article
on code digging, but only showing how also simple and well know actions like the one here reported, could be misleading, if we do not
have a better understanding.
This also put in evidence that it would have being much better:

  1. refresh the binlog.index automatically when it is modify
  2. do not accept/show the binlog.index modified IF there is not an internal refresh, this is confusing, misleading and not consistent.

Hope this (simple) issue will be fix by Oracle in on of the many code review to go.



void MYSQL_LOG::new_file(bool need_lock)
char new_name[FN_REFLEN], *new_name_ptr, *old_name;
enum_log_type save_log_type;

if (!is_open())
DBUG_PRINT("info",("log is closed"));

if (need_lock)


if binlog is used as tc log, be sure all xids are "unlogged",
so that on recover we only need to scan one - latest - binlog file
for prepared xids. As this is expected to be a rare event,
simple wait strategy is enough. We're locking LOCK_log to be sure no
new Xid_log_event's are added to the log (and prepared_xids is not
increased), and waiting on COND_prep_xids for late threads to
catch up.
if (prepared_xids)
while (prepared_xids)
pthread_cond_wait(&COND_prep_xids, &LOCK_prep_xids);

/* Reuse old name if not binlog and not update log */
new_name_ptr= name;

If user hasn't specified an extension, generate a new log name
We have to do this here and not in open as we want to store the
new file name in the current binary log file.
if (generate_new_name(new_name, name))
goto end;

if (log_type == LOG_BIN)
if (!no_auto_events)
We log the whole file name for log file as the user may decide
to change base names at some point.
THD *thd = current_thd; /* may be 0 if we are reacting to SIGHUP */
Rotate_log_event r(thd,new_name+dirname_length(new_name),
bytes_written += r.data_written;
Update needs to be signalled even if there is no rotate event
log rotation should give the waiting thread a signal to
discover EOF and move on to the next log.
name=0; // Don't free name

Note that at this point, log_type != LOG_CLOSED (important for is_open()).

new_file() is only used for rotation (in FLUSH LOGS or because size >
max_binlog_size or max_relay_log_size).
If this is a binary log, the Format_description_log_event at the beginning of
the new file should have created=0 (to distinguish with the
Format_description_log_event written at server startup, which should
trigger temp tables deletion on slaves.

/* reopen index binlog file, BUG#34582 */
if (!open_index_file(index_file_name, 0))
open(old_name, save_log_type, new_name_ptr,
io_cache_type, no_auto_events, max_size, 1);

if (need_lock)




Latest conferences

We have 59 guests and no members online