Overview

In the last few months, we have seen more and more discussion on the use of Galera solution for MySQL Clustering.

I have being one of those that had heavily test and implement Galera solution, actually with quite good results and I have also presented SOME of them at Oracle Connect.

On the other side I have be work with MySQL NDB for years (at least from 2007) at many customers site, from simple to complex setups.

So also if I cannot consider myself as mega expert, I think to have some good experience and insight on both platform.

The point here is that I was not happy in reading some articles comparing the two, not because the kind of tests, or results.

Not because I prefer this or that, but simply because, from my point of view, it does not make any sense to compare the two.

We can spend pages and pages in discussing the point here, but I want try to give a simple generalize idea of WHY it makes no sense in few lines.

NDB brief list

  • NDB is not a simple storage engine and can work independently, MySQL is “just “ a client.
  • NDB is mainly an in memory database and also if it support table on disk the cost of them not always make sense.
  • NDB is fully synchronous, no action can be returned to client until transactions on all nodes are really accepted.
  • NDB use horizontal partition to equally distribute data cross node, but none of them has the whole dataset (unless you use one node group only, which happens ONLY when you don’t know how to use it).
  • NDB Replicate data for a specific factor, which is the number of replica, and that replication factor will not change with the increase of the nodes number.
  • Clients retrieve data from NDB as whole, but internally data is retrieve by node, often use parallel execution. (Not going in the details here on the difference between select methods like match by key, range, IN option and so on).
  • NDB scale by node group that means it really scale in the Dataset dimension it can manage and operation it could execute, and it really scale!

Galera brief list

  • Galera is an additional layer working inside the MySQL context.
  • Galera require InnoDB to work.
  • Galera offer “virtually synchronous” replication.
  • Galera replicate the full dataset across ALL nodes.
  • Galera data replication overhead, increase with the number of nodes present in the cluster.
  • Galera replicate data from one node to cluster on the commit, but apply them on each node by a FIFO queue (multi thread).
  • Galera do not offers any parallelism between the nodes when retrieving the data; clients rely on the single node they access.

So why they cannot be compare?

It should be quite clear that the two, are very different from starting from the main conceptualization, given NDB is a cluster of many node groups with distribute dataset, while Galera is a very efficient (highly efficient) replication layer.

But just to avoid confusion:

  1. NDB does data partition and data distribution with redundant factor.
  2. Galera just replicate data all over.
  3. NDB apply parallel execution to the incoming request, involving more node groups in data fetch.
  4. Galera is not involved at all in the data fetch and clients need to connect to one node or more by themselves, means application require managing parallel request in case of need.
  5. In NDB the more node groups you add the more you get in possible operation per second and data archived/retrieved. 
  6. In Galera the more nodes you add, the more overhead you generate in the replication, so more data will require to be “locally” commit by the replication layer, until when the number of nodes and operations executed on them will compromise the performance for each node.

Conclusion

NDB Cluster is a real cluster solution, design to internally scale and to perform internally all the required operation to guarantee high availability and synchronous data distribution.

Galera is a very efficient solution to bypass the current inefficient mechanism MySQL has for the replication.

Galera allow to create a cluster of MySQL nodes, in virtually synchronous replication. This with almost zero complexity added on the standard MySQL management.

Never the less the obtaining platform is composed by separate nodes, which for the good or the bad is not a system of distributed data.

 

Given that, the scenario where we can use Galera or NDB are dramatically different, trying to compare them is like comparing a surfboard with a snowboard.

I love them both, and honestly I am expecting to see Galera deployment to dramatically increase in 2013, but I am still respecting my motto “use the right tool for the job”.

 

Let us try to make our life easier and avoid confusions.

Happy MySQL to all!!

Ho-ho-ho

 

{joscommentenable}