Home MySQL Blogs

Move to Italy

Days
Hours
Minutes
Seconds

Blogs

Support Wikipedia

Login Form



MySQL

My MySQL tipsvalid-rss-rogers




A Missed Opportunity? PDF Print E-mail
Written by Marco Tusa   
Wednesday, 10 August 2011 17:00

I am reading the announcement about the Percona MySQL event in place of the O'Reilly.

sakila_speaker_offWhat I see and what I feel  is that the whole MySQL community, including Oracle, Percona, SkySQL,MariaDB and any other company doing business with MySQL have lost a good opportunity (so far).

For many years we have seen and hear comments about having this event more focus on users, and less bound to a single company. Now what? We will have a single company promoting the event. I don't care if it is done by Percona instead Oracle or SkySQL, actually I like all of them and I knew personally almost all of them as well.

What I care/like is to have this event NOT limited to a single company.

There was an attempt to have it organized in a different way (see Giuseppe  and Sheeri  comments), but then what?

I remember a long time ago (1983) when I was in Delhi and we have to run from the waiting room to the aeroplane in order to take a sit, that because the lack of coordination at the airport.
Are we in the same situation today?
The one that will move faster will get the place and the other will need to find a place after?

As a user I don't see that as a good thing, lacking in coordination, not be able to find a way to agree on how to do things, is never good.
As someone coming from MySQL I don't like it at all, where is the spirit of cooperation we had in MySQL is it completely gone?

After the MySQL diaspora, I have seen many time personal interest (and company interest) take over the common benefit, but at the end what happened is that we have now a potentially stronger MySQL community.
This because we need to see all of us (Oracle, Percona, SkySQL, MariaDB and any other related like Pythian where I am currently working) as competitor but also as part of the same “community”, we can compete for crumbs or we can compete for excellence, to be better. Wondering how many others community have such huge opportunity.

So I am really wonder what happened here? Are all of us so weak that we are not be able to organize an event as the MySQL conference?

I think that, as already happened in the past, all the actors would like to see this event happening as an OPEN and Independent event, focus on users and ... on business as well.

Said that, is Percona acting bad? No I don't think so, they are moving ahead, period. It is in their DNA, acting, do things.

Are them not following their own previous statements? Is too early to say that, but what I see here is also an incitation to all others to move on, and act.

What I really hope is to do not have in the next feature a SkyMySQL MySQL event, a MariaDB MySQL event and so on.

What I hope is to see all of us converging and helping each other in having this event as a common event and not as a single company.
Percona did the first move, but this doesn't prevent all others to join and have it to become a MySQL Community event. It will be great to recognize to Percona the role of initiators, but it would also be great to see Percona able to modify his position in respect to the event, from single company event to community open event.

As always I am a dreamer, and my dream here is that $$$ are less relevant than the great work we can do all together.
Despite the fact that doing it as single company will awaken less interest (also in $$$) then doing it as community event.

Remain one big question mark ... who will be in? Who is really ready to start putting effort and money to have a successful  MySQL global event?

Will be nice to see statements of real commitment coming in from all actors.


Great MySQL to every one.

 

{joscommentenable}

Last Updated on Sunday, 18 August 2013 17:37
 
Status update on how MySQL handle the partition(s) for maintenance PDF Print E-mail
Written by Marco Tusa   
Saturday, 04 June 2011 11:37

Overview

 

We know that having the chance to split a table in different and smaller chunk helps.
It helps because performance searches, insert, index handling and data management as well.
All fine we are all happy, and very often we push on our customer to go for partition in order to solve part of their problems.
What happens quite often is that customer (and sometime MySQL dba as well) see partitions as separate tables.
Unfortunately is not like that, there are some operations that are (still?) creating unexpected results.
One of this is the OPTIMIZATION of a partition.


The case

Some time ago, almost 16 months I was at customer site and we were working with MySQL 5.5 rc.
We were really happy with the new version of MySQL, and being working for MySQL/SUN I was quite proud of it as well.
Then we run the ALTER TABLE X OPTIMIZE PARTITION Y;

Suddenly we realize something was not working as expected because all partitions had temporary tables.
We know and were expecting the LOCK while doing the exercise, but this was not expected at all.

I was attending several internal presentations, and I was in many internal calls,
all stating that the operation should not affect all partitions but only one.
It was not nice, and it was a bug (42822) marked non critical, referring to another one 46158,
which was close because referring to the previous as a duplicate.

Funny thing at the end is  ... we still have it in 5.5.12.

There is a work around as usual, but CUSTOMERS need to be aware and manual modified according,
unfortunately this is not the case.

So after all this time, it could sounds silly but let us talk how OPTIMIZE partitions doesn't work and what to do.

 

What I have done to replicate the error

I have set up a simple test instance with two tables, and use my stresstool (java),

to fill the tables and play with the inserting threads.

 

Here the tables:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
DATABASE changed
root@localhost [test]> SHOW TABLES;
+----------------+
| Tables_in_test |
+----------------+
| tbtest1 |
| tbtest2 |
+----------------+
4 rows IN SET (0.00 sec)
 
root@localhost [test]> DESCRIBE tbtest1;
+---------------+--------------+------+-----+-------------------+-----------------------------+
| FIELD | Type | NULL | KEY | DEFAULT | Extra |
+---------------+--------------+------+-----+-------------------+-----------------------------+
| a | int(11) | NO | MUL | NULL | |
| uuid | char(36) | NO | PRI | NULL | |
| b | varchar(100) | NO | | NULL | |
| c | char(200) | NO | | NULL | |
| counter | bigint(20) | YES | | NULL | |
| time | timestamp | NO | | CURRENT_TIMESTAMP | ON UPDATE CURRENT_TIMESTAMP |
| partitonid | int(11) | NO | | 0 | |
| strrecordtype | char(3) | YES | | NULL | |
+---------------+--------------+------+-----+-------------------+-----------------------------+
 

 

How the data looks like:

1
2
3
4
5
6
7
8
9
10
11
12
root@localhost [test]> SELECT * FROM tbtest1 LIMIT 10;
+------------+---------------+----------------+---------------+-------------+---------------------+------------+---------------+
| a | uuid | b | c | counter | time | partitonid | strrecordtype |
+------------+---------------+----------------+---------------+-------------+---------------------+------------+---------------+
| 793902056 | 00ea7db2babeb | ac hpzenqafpml | ac hpzr
| 366195670 | 07c620fdfab8d | umlfzae zdzbes | umlfzapgnhlvb | 3359819254 | 2011-06-03 12:27:26 | 18 | fai |
| 19501151 | 08271816a1256 | odpp gporogqmg | odpp glpgfgnf | 3010735465 | 2011-06-03 12:27:26 | 5 | odo |
| 1925082207 | 0bf7d042da285 | toammdmqgttlto | toammdeioidzs | 18048741375 | 2011-06-03 12:27:26 | 17 | gvm |
| 631797159 | 0c9d4857f2f27 | lisn vqclglnmm | lisn vggpirza | 40556949738 | 2011-06-03 12:27:26 | 18 | ldc |
+------------+---------------+----------------+---------------+-------------+---------------------+------------+---------------+
10 rows IN SET (0.01 sec
 

 

 

How the Table looks like before partition:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
 
root@localhost [test]>
CREATE TABLE `tbtest1` (
`a` int(11) NOT NULL,
`uuid` char(36) COLLATE utf8_unicode_ci NOT NULL,
`b` varchar(100) COLLATE utf8_unicode_ci NOT NULL,
`c` char(200) COLLATE utf8_unicode_ci NOT NULL,
`counter` bigint(20) DEFAULT NULL,
`time` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
`partitonid` int(11) NOT NULL DEFAULT '0',
`strrecordtype` char(3) COLLATE utf8_unicode_ci DEFAULT NULL,
PRIMARY KEY (`uuid`),
KEY `IDX_a` (`a`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci
1 row IN SET (0.02 sec)
 

 

Then Insert the partition definition:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
ALTER TABLE tbtest1 DROP PRIMARY KEY, ADD PRIMARY KEY (`uuid`,`partitonid`)
PARTITION BY RANGE (partitonid) (
PARTITION p0 VALUES LESS THAN (2),
PARTITION p1 VALUES LESS THAN (4),
PARTITION p2 VALUES LESS THAN (6),
PARTITION p3 VALUES LESS THAN (8),
PARTITION p4 VALUES LESS THAN (10),
PARTITION p5 VALUES LESS THAN (12),
PARTITION p6 VALUES LESS THAN (14),
PARTITION p7 VALUES LESS THAN (16),
PARTITION p8 VALUES LESS THAN (18),
PARTITION p9 VALUES LESS THAN (20),
PARTITION p10 VALUES LESS THAN MAXVALUE
);
 
SELECT TABLE_SCHEMA,TABLE_NAME,partition_name,TABLE_ROWS AS 'ROWS',
TRUNCATE(DATA_LENGTH/pow(1024,2),2) AS 'DATA (M)',
TRUNCATE(INDEX_LENGTH/pow(1024,2),2) AS 'INDEX (M)',
TRUNCATE((DATA_LENGTH+INDEX_LENGTH)/pow(1024,2),2) AS 'TOTAL(M)'
FROM information_schema.partitions
WHERE TABLE_SCHEMA <> 'information_schema'
AND TABLE_SCHEMA <> 'mysql'
AND TABLE_NAME = 'tbtest1'
ORDER BY `TOTAL(M)` DESC,`DATA (M)` DESC;
+--------------+------------+----------------+------+----------+-----------+----------+
| TABLE_SCHEMA | TABLE_NAME | partition_name | ROWS | DATA (M) | INDEX (M) | TOTAL(M) |
+--------------+------------+----------------+------+----------+-----------+----------+
| test | tbtest1 | p0 | 244 | 0.12 | 0.01 | 0.14 |
| test | tbtest1 | p8 | 227 | 0.12 | 0.01 | 0.14 |
| test | tbtest1 | p7 | 211 | 0.12 | 0.01 | 0.14 |
| test | tbtest1 | p5 | 187 | 0.12 | 0.01 | 0.14 |
| test | tbtest1 | p4 | 227 | 0.12 | 0.01 | 0.14 |
| test | tbtest1 | p3 | 212 | 0.12 | 0.01 | 0.14 |
| test | tbtest1 | p1 | 204 | 0.12 | 0.01 | 0.14 |
| test | tbtest1 | p6 | 175 | 0.10 | 0.01 | 0.12 |
| test | tbtest1 | p2 | 235 | 0.10 | 0.01 | 0.12 |
| test | tbtest1 | p9 | 229 | 0.10 | 0.01 | 0.12 |
| test | tbtest1 | p10 | 0 | 0.01 | 0.01 | 0.03 |
+--------------+------------+----------------+------+----------+-----------+----------+

 

 

And fill it with some data with my stresstool, but keeping the partition P10 empty, loading values up to PARTITIONID <=20

then filling P10 with an insert:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
INSERT INTO tbtest1 (a, uuid,b,c,counter,time,partitonid,strrecordtype) 
SELECT a, uuid,b,c,counter,time,30,strrecordtype FROM tbtest1 WHERE partitonid <=20;
 
+--------------+------------+----------------+--------+----------+-----------+----------+
| TABLE_SCHEMA | TABLE_NAME | partition_name | ROWS | DATA (M) | INDEX (M) | TOTAL(M) |
+--------------+------------+----------------+--------+----------+-----------+----------+
| test | tbtest1 | p10 | 85104 | 106.95 | 10.54 | 117.50 |
| test | tbtest1 | p5 | 95647 | 57.76 | 8.54 | 66.31 |
| test | tbtest1 | p4 | 99459 | 57.76 | 8.54 | 66.31 |
| test | tbtest1 | p3 | 92256 | 57.76 | 8.54 | 66.31 |
| test | tbtest1 | p2 | 77794 | 57.76 | 8.54 | 66.31 |
| test | tbtest1 | p9 | 79871 | 57.71 | 8.54 | 66.26 |
| test | tbtest1 | p7 | 100681 | 57.76 | 7.54 | 65.31 |
| test | tbtest1 | p1 | 102414 | 56.76 | 8.54 | 65.31 |
| test | tbtest1 | p6 | 92961 | 56.76 | 7.54 | 64.31 |
| test | tbtest1 | p8 | 93872 | 56.76 | 7.54 | 64.31 |
| test | tbtest1 | p0 | 80510 | 56.76 | 7.54 | 64.31 |
+--------------+------------+----------------+--------+----------+-----------+----------+
 
 
 
root@localhost [(none)]> SHOW processlist;
+------+------+---------------------------+------+---------+------+--------+-------------------------------------------------------------------+
| Id | User | Host | db | Command | Time | State | Info |
+------+------+---------------------------+------+---------+------+--------+-------------------------------------------------------------------+
| 144 | root | localhost | test | Sleep | 285 | | NULL |
| 195 | root | tusacentral03:56633 | test | Sleep | 55 | | NULL |
| 197 | root | tusacentral07.LOCAL:53368 | test | Sleep | 1 | | NULL |
| 199 | root | tusacentral07.LOCAL:53370 | test | Query | 1 | UPDATE | INSERT INTO test.tbtest1 (uuid,a,...becordtype)VALUES("2f...95-bc |
| 200 | root | tusacentral07.local:53371 | test | Query | 1 | update | insert INTO test.tbtest1 (uuid,a,...becordtype)VALUES("0f...d5-a9 |
| 201 | root | tusacentral07.LOCAL:53372 | test | Query | 1 | UPDATE | INSERT INTO test.tbtest1 (uuid,a,...becordtype)VALUES("ce...94-9d |
| 202 | root | tusacentral07.local:53373 | test | Query | 1 | update | insert INTO test.tbtest1 (uuid,a,...becordtype)VALUES("32...ec-98 |
| 203 | root | tusacentral07.LOCAL:53374 | test | Query | 1 | UPDATE | INSERT INTO test.tbtest1 (uuid,a,...becordtype)VALUES("c3...7d-86 |
| 204 | root | tusacentral07.local:53375 | test | Query | 1 | update | insert INTO test.tbtest1 (uuid,a,...becordtype)VALUES("63...02-ba |
| 1031 | root | localhost | NULL | Query | 0 | NULL | SHOW processlist |
+------+------+---------------------------+------+---------+------+--------+-------------------------------------------------------------------+
10 rows IN SET (0.00 sec)
 

 

Time to run the optimize, and discover that nothing was changed in all this time.

 

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
mysql> ALTER TABLE tbtest1 OPTIMIZE  PARTITION p10;
 
root@localhost [(none)]> SHOW processlist;
+------+------+---------------------------+------+---------+------+---------------------------------+-------------------------------------------------------------+
| Id | User | Host | db | Command | Time | State | Info |
+------+------+---------------------------+------+---------+------+---------------------------------+-------------------------------------------------------------+
| 144 | root | localhost | test | Sleep | 844 | | NULL |
| 195 | root | tusacentral03:56633 | test | Query | 92 | copy TO tmp TABLE | ALTER TABLE tbtest1 OPTIMIZE PARTITION p10 |
| 197 | root | tusacentral07.LOCAL:53368 | test | Sleep | 0 | | NULL |
| 199 | root | tusacentral07.LOCAL:53370 | test | Query | 90 | Waiting FOR TABLE metadata LOCK | INSERT INTO test.tbtest1 (uuid,a,b,c,counter,partit-4cdd-a8 |
| 200 | root | tusacentral07.LOCAL:53371 | test | Query | 90 | Waiting FOR TABLE metadata LOCK | INSERT INTO test.tbtest1 (uuid,a,b,c,counter,partit-427f-be |
| 201 | root | tusacentral07.LOCAL:53372 | test | Query | 90 | Waiting FOR TABLE metadata LOCK | INSERT INTO test.tbtest1 (uuid,a,b,c,counter,partit-4694-9d |
| 202 | root | tusacentral07.LOCAL:53373 | test | Query | 90 | Waiting FOR TABLE metadata LOCK | INSERT INTO test.tbtest1 (uuid,a,b,c,counter,partit-4a5a-a0 |
| 203 | root | tusacentral07.LOCAL:53374 | test | Query | 91 | Waiting FOR TABLE metadata LOCK | INSERT INTO test.tbtest1 (uuid,a,b,c,counter,partit-48de-b8 |
| 204 | root | tusacentral07.LOCAL:53375 | test | Query | 90 | Waiting FOR TABLE metadata LOCK | INSERT INTO test.tbtest1 (uuid,a,b,c,counter,partit-4a0f-9c |
| 1031 | root | localhost | NULL | Query | 0 | NULL | SHOW processlist |
+------+------+---------------------------+------+---------+------+---------------------------------+-------------------------------------------------------------+
10 rows IN SET (0.00 sec)

 

 

MySQL, when using InnoDB, still do the process on all partitions and not only on the selected one.
This is probably not critical for the developer but a customer running it on a huge table could strongly disagree.

Immagine running it against a TABLE with 1.2TB and each partiton of 20-25GB.

It is not funny...

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
 
total 1031896
drwx------ 2 mysql mysql 4096 2011-06-03 13:50 ./
drwxr-xr-x 11 mysql mysql 4096 2011-06-03 13:21 ../
-rw-rw---- 1 mysql mysql 8578 2011-03-31 22:44 bbb.frm
-rw-rw---- 1 mysql mysql 163840 2011-03-31 22:44 bbb.MYD
-rw-rw---- 1 mysql mysql 1024 2011-03-31 22:44 bbb.MYI
-rw-rw---- 1 mysql mysql 8578 2011-03-31 22:34 ccc.frm
-rw-rw---- 1 mysql mysql 425984 2011-03-31 22:39 ccc.ibd
-rw-rw---- 1 mysql mysql 61 2011-03-04 09:47 db.opt
-rw-rw---- 1 mysql mysql 8788 2011-06-03 13:50 #sql-3205_c3.frm
-rw-rw---- 1 mysql mysql 64 2011-06-03 13:50 #sql-3205_c3.par
-rw-rw---- 1 mysql mysql 58720256 2011-06-03 13:51 #sql-3205_c3#P#p0.ibd
-rw-rw---- 1 mysql mysql 114688 2011-06-03 13:50 #sql-3205_c3#P#p10.ibd
-rw-rw---- 1 mysql mysql 54525952 2011-06-03 13:51 #sql-3205_c3#P#p1.ibd
-rw-rw---- 1 mysql mysql 114688 2011-06-03 13:50 #sql-3205_c3#P#p2.ibd
-rw-rw---- 1 mysql mysql 114688 2011-06-03 13:50 #sql-3205_c3#P#p3.ibd
-rw-rw---- 1 mysql mysql 114688 2011-06-03 13:50 #sql-3205_c3#P#p4.ibd
-rw-rw---- 1 mysql mysql 114688 2011-06-03 13:50 #sql-3205_c3#P#p5.ibd
-rw-rw---- 1 mysql mysql 114688 2011-06-03 13:50 #sql-3205_c3#P#p6.ibd
-rw-rw---- 1 mysql mysql 114688 2011-06-03 13:50 #sql-3205_c3#P#p7.ibd
-rw-rw---- 1 mysql mysql 114688 2011-06-03 13:50 #sql-3205_c3#P#p8.ibd
-rw-rw---- 1 mysql mysql 114688 2011-06-03 13:50 #sql-3205_c3#P#p9.ibd
-rw-rw---- 1 mysql mysql 8788 2011-06-03 13:47 tbtest1.frm
-rw-rw---- 1 mysql mysql 64 2011-06-03 13:47 tbtest1.par
-rw-rw---- 1 mysql mysql 83886080 2011-06-03 13:51 tbtest1#P#p0.ibd
-rw-rw---- 1 mysql mysql 75497472 2011-06-03 13:49 tbtest1#P#p10.ibd
-rw-rw---- 1 mysql mysql 83886080 2011-06-03 13:51 tbtest1#P#p1.ibd
-rw-rw---- 1 mysql mysql 83886080 2011-06-03 13:51 tbtest1#P#p2.ibd
-rw-rw---- 1 mysql mysql 83886080 2011-06-03 13:51 tbtest1#P#p3.ibd
-rw-rw---- 1 mysql mysql 83886080 2011-06-03 13:51 tbtest1#P#p4.ibd
-rw-rw---- 1 mysql mysql 88080384 2011-06-03 13:51 tbtest1#P#p5.ibd
-rw-rw---- 1 mysql mysql 83886080 2011-06-03 13:51 tbtest1#P#p6.ibd
-rw-rw---- 1 mysql mysql 83886080 2011-06-03 13:51 tbtest1#P#p7.ibd
-rw-rw---- 1 mysql mysql 83886080 2011-06-03 13:51 tbtest1#P#p8.ibd
-rw-rw---- 1 mysql mysql 96468992 2011-06-03 13:51 tbtest1#P#p9.ibd
-rw-rw---- 1 mysql mysql 8630 2011-06-03 12:36 tbtest2.frm
-rw-rw---- 1 mysql mysql 9437184 2011-06-03 13:51 tbtest2.ibd
root@tusacentral03:/home/mysql/instances/master/data/test#
 
 
 

 

So what to do?


Use REBUILD instead.

This drives me crazy, why the mapping could not be done inside the code?
If InnoDB then OPTIMIZE = REBUILD ... and go ahead.

To me this sounds not user friendly at all.

 

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
mysql> ALTER TABLE tbtest1 REBUILD PARTITION p10;
Query OK, 126994 rows affected (16.50 sec)
Records: 126994 Duplicates: 0 Warnings: 0
 
 
root@localhost [(none)]> SHOW processlist;
+------+------+---------------------------+------+---------+------+---------------------------------+-----------------------------------------------------------------+
| Id | User | Host | db | Command | Time | State | Info |
+------+------+---------------------------+------+---------+------+---------------------------------+-----------------------------------------------------------------+
| 144 | root | localhost | test | Sleep | 357 | | NULL |
| 195 | root | tusacentral03:56633 | test | Query | 3 | setup | ALTER TABLE tbtest1 REBUILD PARTITION p10 |
| 197 | root | tusacentral07.LOCAL:53368 | test | Sleep | 0 | | NULL |
| 199 | root | tusacentral07.LOCAL:53370 | test | Query | 2 | Waiting FOR TABLE metadata LOCK | INSERT INTO test.tbtest1 (uuid,a,b,c,counter,partit22da-4c7d-94 |
| 200 | root | tusacentral07.LOCAL:53371 | test | Query | 2 | Waiting FOR TABLE metadata LOCK | INSERT INTO test.tbtest1 (uuid,a,b,c,counter,partit8459-477b-94 |
| 201 | root | tusacentral07.LOCAL:53372 | test | Query | 2 | Waiting FOR TABLE metadata LOCK | INSERT INTO test.tbtest1 (uuid,a,b,c,counter,partit93b7-413f-a8 |
| 202 | root | tusacentral07.LOCAL:53373 | test | Query | 2 | Waiting FOR TABLE metadata LOCK | INSERT INTO test.tbtest1 (uuid,a,b,c,counter,partit6cd0-4e4b-85 |
| 203 | root | tusacentral07.LOCAL:53374 | test | Query | 2 | Waiting FOR TABLE metadata LOCK | INSERT INTO test.tbtest1 (uuid,a,b,c,counter,partit79c1-4da5-8c |
| 204 | root | tusacentral07.LOCAL:53375 | test | Query | 2 | Waiting FOR TABLE metadata LOCK | INSERT INTO test.tbtest1 (uuid,a,b,c,counter,partitff17-4be7-b1 |
| 1031 | root | localhost | NULL | Query | 0 | NULL | SHOW processlist |
+------+------+---------------------------+------+---------+------+---------------------------------+-----------------------------------------------------------------+
10 rows IN SET (0.00 sec)

 

Anyhow checking the file system, this time all works as expected and ONLY the partiton p10 was REBUILD.
Please take mental note of the STATE, is saying "Waiting for table metadata lock", this sounds interesting.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
drwx------  2 mysql mysql     4096 2011-06-03 13:47 ./
drwxr-xr-x 11 mysql mysql 4096 2011-06-03 13:21 ../
-rw-rw---- 1 mysql mysql 8578 2011-03-31 22:44 bbb.frm
-rw-rw---- 1 mysql mysql 163840 2011-03-31 22:44 bbb.MYD
-rw-rw---- 1 mysql mysql 1024 2011-03-31 22:44 bbb.MYI
-rw-rw---- 1 mysql mysql 8578 2011-03-31 22:34 ccc.frm
-rw-rw---- 1 mysql mysql 425984 2011-03-31 22:39 ccc.ibd
-rw-rw---- 1 mysql mysql 61 2011-03-04 09:47 db.opt
-rw-rw---- 1 mysql mysql 8788 2011-06-03 13:47 #sql-tbtest1.frm
-rw-rw---- 1 mysql mysql 64 2011-06-03 13:47 #sql-tbtest1.par
-rw-rw---- 1 mysql mysql 8788 2011-06-03 13:44 tbtest1.frm
-rw-rw---- 1 mysql mysql 64 2011-06-03 13:44 tbtest1.par
-rw-rw---- 1 mysql mysql 83886080 2011-06-03 13:47 tbtest1#P#p0.ibd
-rw-rw---- 1 mysql mysql 75497472 2011-06-03 13:46 tbtest1#P#p10.ibd
-rw-rw---- 1 mysql mysql 606208 2011-06-03 13:47 tbtest1#P#p10#TMP#.ibd
-rw-rw---- 1 mysql mysql 83886080 2011-06-03 13:47 tbtest1#P#p1.ibd
-rw-rw---- 1 mysql mysql 83886080 2011-06-03 13:47 tbtest1#P#p2.ibd
-rw-rw---- 1 mysql mysql 83886080 2011-06-03 13:47 tbtest1#P#p3.ibd
-rw-rw---- 1 mysql mysql 83886080 2011-06-03 13:47 tbtest1#P#p4.ibd
-rw-rw---- 1 mysql mysql 83886080 2011-06-03 13:47 tbtest1#P#p5.ibd
-rw-rw---- 1 mysql mysql 83886080 2011-06-03 13:47 tbtest1#P#p6.ibd
-rw-rw---- 1 mysql mysql 83886080 2011-06-03 13:47 tbtest1#P#p7.ibd
-rw-rw---- 1 mysql mysql 83886080 2011-06-03 13:47 tbtest1#P#p8.ibd
-rw-rw---- 1 mysql mysql 92274688 2011-06-03 13:47 tbtest1#P#p9.ibd
-rw-rw---- 1 mysql mysql 8630 2011-06-03 12:36 tbtest2.frm
-rw-rw---- 1 mysql mysql 9437184 2011-06-03 13:46 tbtest2.ibd
root@tusacentral03:/home/mysql/instances/master/data/test#

 

 

While there and to avoid doubt I also checked what about CHECKING a partition,

Who knows may be some surprise as well

 

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
mysql> ALTER TABLE tbtest1 CHECK  PARTITION p10;
+--------------+-------+----------+----------+
| TABLE | Op | Msg_type | Msg_text |
+--------------+-------+----------+----------+
| test.tbtest1 | CHECK | STATUS | OK |
+--------------+-------+----------+----------+
1 row IN SET (36.64 sec)
 
 
root@localhost [(none)]> SHOW processlist;
+------+------+---------------------------+------+---------+------+--------+----------------------------------------------------------+
| Id | User | Host | db | Command | Time | State | Info |
+------+------+---------------------------+------+---------+------+--------+----------------------------------------------------------+
| 144 | root | localhost | test | Sleep | 1114 | | NULL |
| 195 | root | tusacentral03:56633 | test | Query | 15 | NULL | ALTER TABLE tbtest1 CHECK PARTITION p10 |
| 197 | root | tusacentral07.LOCAL:53368 | test | Sleep | 0 | | NULL |
| 199 | root | tusacentral07.LOCAL:53370 | test | Query | 0 | UPDATE | INSERT INTO test.tbtest1 (uuid,a,b,c,counte-39e0-4296-97 |
| 200 | root | tusacentral07.LOCAL:53371 | test | Query | 1 | UPDATE | INSERT INTO test.tbtest1 (uuid,a,b,c,counte-f508-4ec8-87 |
| 201 | root | tusacentral07.LOCAL:53372 | test | Query | 1 | UPDATE | INSERT INTO test.tbtest1 (uuid,a,b,c,counte-3a5f-4444-81 |
| 202 | root | tusacentral07.LOCAL:53373 | test | Query | 1 | UPDATE | INSERT INTO test.tbtest1 (uuid,a,b,c,counte-970e-48bb-a2 |
| 203 | root | tusacentral07.LOCAL:53374 | test | Query | 0 | UPDATE | INSERT INTO test.tbtest1 (uuid,a,b,c,counte-62ea-4c1f-81 |
| 204 | root | tusacentral07.LOCAL:53375 | test | Query | 0 | UPDATE | INSERT INTO test.tbtest1 (uuid,a,b,c,counte-6277-4dd5-89 |
| 1215 | root | localhost | NULL | Query | 0 | NULL | SHOW processlist |
+------+------+---------------------------+------+---------+------+--------+----------------------------------------------------------+
10 rows IN SET (0.00 sec)
 
 

 

But luckly no it works fine  and no lock on the table.

 

Conclusions

1) We still have to deal with this non critical bug, and we must remember
 DO NOT USE: 

 ALTER TABLE <table_name> OPTIMIZE PARTITION A,B,C 

 BUT

 ALTER TABLE <table_name> REBUILD PARTITION A,B,C

 

Not yet clear to me why a simple mapping between REBUILD and OPTIMIZATION is not possible. 

 
2) The level of lock is related to METADATA because the ALTER command, looking around for a better explanation I have seen  that there is not too much material about it, but the presentation from Konstantin Osipov for sure helps to better understand.
http://en.oreilly.com/mysql2011/public/schedule/detail/17340

 

 

 

My dream

The OPTIMIZATION of a partition is a special case, where data, is not modified but eventually physically re-organized.
Having the status of the lock declared as METADATA lock, brings me to think we can safely insert/modify all data not related to that partition.
What I am feeling from all this, is the possibility to have in the near future this lock release, I see issues with the Index merge and others so some digging in the code will require..

 

It would be nice ...to have it.

 


Reference

17.3.3. Maintenance of Partitions - http://dev.mysql.com/doc/refman/5.1/en/partitioning-maintenance.html
Metadata locking and deadlock detection in MySQL 5.5  - http://en.oreilly.com/mysql2011/public/schedule/detail/17340
optimize partition recreates full table instead just partition - http://bugs.mysql.com/bug.php?id=42822
Make partition maintenance commands more efficient - http://bugs.mysql.com/bug.php?id=46158

Last Updated on Sunday, 18 August 2013 17:39
 
My afterthoughts on teams compose by multi-cultural, multi-language elements, in consulting and support. PDF Print E-mail
Written by Marco Tusa   
Saturday, 23 April 2011 12:57


This is not a technical article; it starts from my own thinking related to the work leaded in many years in international environments and with people coming from all around the world. Many though, about me, about the interaction with colleagues, about interactions with customers, may be few dolphins here in terms of bit, but not in terms of spirit, human respect and the research of being good.


teammulti3

My thinking and I

I am a lucky guy; I had the chance to travel a lot from when I was really young, so young that I cannot remember my first time. I have done mine the natural fact that we are different and we have different approaches in human relations.

Once become adult I was even luckier, because I started to work in international environment, so I had the chance to learn many different “cultural” approaches from the source on the top of my travels.

What have always amazed me, are the many subtle differences existing between people also leaving close each other, and very tight similarities also if staying from the other side of the world.

There are so many that anthropologically is very difficult to classify them all. But I am not an anthropologist so I can do mistake there and in my mind I have built up a differentiation, that refining to the extreme reduce the whole to two categories, not the good and the bad or we and the others, given we sometime shift from one category to the other during our day.

But let us say that it is a common behaviour during our day to define if “that” day we were here or there.

What I am talking about is the different approach that we can call the “One man hero” & “family man”.

 

The One man/woman Hero

He is he, now way to be better.  He save the world alone, he is firmly convinced to be (almost) always right. Whatever comes out from his mouth is holy true, what is in his mind no human can even imagine. If he stops doing the entire world stops as well.

When talking with others no need to take care of any debate, just listen to him.  Teams should follow him try to remain in his shadow.

He makes no mistakes, but if he does, he fell.

 

The Family Man/Woman

We must do it together, he has his own ideas and he is sure he is right, but you can be right as well. He proposes ideas not impose, want to hear from you, your concern, or thinking, especially if not in line. He learns from you, you learn from him.  He tries to keep his space/territory in order respecting his own attitude; if you do the same the world will be ok.

He can fell, but the “family” will stay up, and help him to rise again.

 

What and how this fits in consulting/support?

Well in many ways, starting from the interactions in the working place. It is funny to note that, any or almost any, company would like to have “the family man” on board, given the attitude to convince people not to impose, the capacity of sharing and so on. Also when he is sick or if he goes away the family survives.

On the other hands customers, often, want to have the hero with the magic wand. He comes save the world, takes the money and goes. What a big relief for them.

If the above is true, at the end the situation will not be so dramatic. We can act as family man inside our company and as the one man hero when dealing with the external.

Given life is fun, the reality is much more complicated, and the infinite combination starts putting together different types in the same room.  So you can have a lot of fun if behave as the omh (one man hero) and in the same room on the other side there is another one, who will wins, who will be the one having the right answer? On the other hands having a room filled with fm(family man) will bring to very interesting long discussions, but it is possible that for the time a solution is found the humans are extinct.

 

So as usual the right measure and the right combination of the two elements is the right path to go.

There are different points that must always take in consideration from my point of view:

1.       No one is better than someone else

2.       Stating to be good is not enough we must be good

3.       Shit happens

4.       Hiding shit bring more shit

5.       Learning from mistakes makes you stronger

6.       If you do not make mistakes you are lying

7.       Explain your ideas allow you to have better ideas

8.       Propose not impose

9.       If you know you right, be ready to accept that also someone else can be right as well

10.   If you know and can prove he is wrong explain not denigrate

The funny thing is considering the same points from the customer point of view:

1.       Better for you to be better than me

2.       You must be good, period

3.       Shit don’t happens if it does your fault

4.       If shit happens and it is my fault, then it is your fault because you did not tell me the right thing

5.       Your mistakes should never happen on my environment

6.       Better you know what you do so train yourself out of my “house”

7.       You must have good ideas always

8.       You must be sure of what you say

9.       If I know I am right I am right

10.   If you can prove I am wrong,  say it such that no one will get offended

 

In any case the golden rules for me are:

1.       Say always the true to the customer or colleague

2.       Base your suggestion and recommendation on observation and real facts

About 2, you can have inspiration that is good, but ten you must prove it.

 

The list can/should be longer, but what I mean here is that there are different approaches when we are acting as Consultant/Support or as customer, and we shift from one point to the other without even thinking.

If you say no, think when YOU are calling for assistance because your cable TV is not working and you are going to lose your favorite match, or because the Internet connection drops.

 

Back to the point please

But return back to the point, dealing with International consulting/support.

As said we must be good, and I am firmly convinced that if we act “Always” as a family man with our colleagues, be good will be a shared efforts and to achieve good results will be a lot easier.

 

The International teams first

Family’s teams

Let us start from here;  there are culture that see the family approach as must, you cannot see a good result as your result is a team result, as well the failure. We have this in Japan, Asia in general, but it is becoming more and more frequent in Europe as well. We have some “strange” behavior in country like Italy and Spain, but normally it works.

When I say strange I mean the “passing the buck” approach that unfortunately is very common in latin countries.

In the same context also the physical behavior and how the voice is used is different, avoiding to being rude or too direct; instead being more descriptive and verbose.

 

One Man Hero’s team

A different approach is used in some companies, where the leadership is still considering the OMH behavior a benefit. It is easier to find such approach in North America and European companies where the leadership is still bounded to old Top->Down models.  There the people or resource (I hate that term but it is a good indicator of the company approach) composing the teams, are often referring to someone above them in the pyramid, not interacting directly.  The results are shared bottom up, then top down. Not following an internal exchange. This is very important because the awareness re of the internal problem is not equally shared. Being part of group as such expose you to a lot of possible mistakes because bad communication.

 

My preference and thoughts

It is quite obvious that from the “internal” operation I put my vote on the FM and not on the other, period.

The difficulty exists in keeping all the people align and trusting each other. There will always be, especially when setting up a new team, someone that feels himself as the ONE. Not because he is bad but because the culture, the background and the lack of trust in the others.

This is probably the most important and crucial point, trusting each other in a team is not easy, requires more courage than acting alone.

Building up relation between different people is easier when you can have them in the same space at the same time, but it becomes a heavy task when happening with people distributed around the globe.  The differences start from the working hours, the holidays they have, the need to talk they have, and so on.

Again I was lucky, I had my first international team, when I was able to handle it directly, so at least the Time zone issue was reduced to minimum. But the respect you must have for their differences is the same, so different holidays, different working time, different culture that require to be taken in account.  When happens to me to deal with remote team, the first time, wow that was fun, just to organize the kick-off meeting I take at least 48 hours.  No was not easy to deal with it in a “family mode” and this is also why very often some companies goes for the OMH, choosing to impose instead propose.

Here I would like to remember to all a fantastic statement: “People do not leave their jobs, they run away from their boss”. Keep an eye on places where people are leaving, there you will probably find a bad manager, and I have seen that recently happening.

 

Now about Customer

It is now time to see how we can combine an international team with all his internal interactions with customers, sometime international, sometime no.

Here must distinguish, Consulting and Support are two different categories, or to be more precise, onsite Consulting and remote work, two different way to interact with customer.

 

Dealing with customer onsite

When you go on site at customer, also if you are a family man, be careful to interact in the correct way, do not be too flexible, given that could be misunderstood and read as weakness or worse. You are alone, no family with you, you must trust yourself fully. Better to switch to the One Man Hero mode, the lions will come and they want your body. This is true for the first 5 minutes, where the first 10’’ gives you the chance to evaluate how much you can be flexible or not at all.

What happens if you have in front of you a team headed by an OMH?  I have seen a constant behavior here; OMH meets you alone, before introducing you to the rest of the team. The first aspect that you should consider is if he is interacting with you at your working level or if he is there to coordinate. If he is a manager, be careful it doesn’t matter what nationality he will be, he will not understand you if you talk about technical aspects (few cases do, very few), and he will not understand the explanations, skip them, go straight to the Executive Summary.   If he is a working level guy, then culture, nationality and background could make the difference.  In some cases you must be dramatically direct close to be arrogant, and say what you have to say, like I have seen working in many cases with US people. But if you have in front of you a Latin guy or someone from Arabic countries, then do not do it. You will have an enemy before even open your mouth. Let him be opening all the conversation and discussions, try to keep all the conversation following indirect patterns; only when you are sure he start to trust you, start to gently propose him that there could be a “different” way of doing things. Then remember if he will reduce the talks with you, means you have a problem, try to fix it talking in 1:1 never with others.

In those context if the customer has a mixed team, follow the internal dynamics will helps a lot.

In short dealing with a customer face by face is primarily a matter of human experience, and then comes how good you are, but you know that you must be good, period.

 

Dealing with customer remote

Two level of contact here, using any system or by voice.

The golden rule here is … “Scripta manent verba volant” (Google … if you don’t know Latin).

 

System or email

Systems, including emails are done to reduce the overhead created by the human interaction, and to optimize relevant information exchange. When you have to exchange information with the customer by email, or by ticketing system, nationality, culture and background are not so relevant.  What is relevant is to be clear, and descriptive, period.

Good customer/managers know that, and tempt to push you to use that way, when communicating with them.

Never the less the use of system could hide some pitfalls and you must be careful trying to at least be aware of the possible issues.  There is not a real way to identify what kind of team the customer has (base on the FM or OMH model), but few elements could help: when you see only one contact point, or many different one for different topic, when you don’t see a common account used, then be careful, behind the scene is lurking a team managed by an OMH focus on lack of communication. On the other hands good sign is when they gives you one contact (email, ticket system, tracing tool) that is used in parallel by multiple team member, very likely they communicate each other, and so you can as well.

 

Voice

Huston … we have a problem.  You can only guess the first time with who you are dealing with. This is why you should never ever, talk directly with the customer. The sales guys or whoever talks with customer as main contact, must first brief you.  He should be able to say to you (if he do not, suggest to fire him) how to interact, and you should have a place where take note on customer information, and share that with the other members of the teams.

Voice and conference calls, are the preferred tools of customer that have very basic understanding of distributed work and international teams. Normally they have no idea how confusing a conference call could be when coordinating real work in case of problem.

When dealing with customer around the world we have to keep in mind that not all of them talk good English, or English at all, as well teams members, from both sides. This is one of the main issues during conference call, after line quality, for misunderstanding between us and customer.

For geographic and language reasons the North America is the one mostly using it. The problems start when they forgot there are others 201 countries in the world, 6000 spoken languages, where the most used (excluding Chinese) is Spanish with 329 million vs English 328 million, and last but not least there are other 20Hrs in the Time Zones and people work there too, and sometime sleep.

In short, keep an eye on the world clock and whenever there is the need of a call show some respect and at least talk slowly and clear, avoiding slang or too many jokes.

That’s it, multi-cultural, multi-language teams should not have conference calls with customer too often or at all, but the coordination level should. Those calls should be focus on planning, and customer review, nothing more all other communication should be written.

Doing that will help preventing any friction between the different ways oh behave.

 

Lesson learn

There is always something to learn. In this case I think is that the power, flexibility, creativity and much more obtained having a team compose by multi-cultural elements, is so more than the difficulties to manage it that also companies not dealing with international customer should go for it.

Language and cultural differences could be an issue but is possible to handle them; the only require ingredient is showing a little bit of respect and ask for it as well.

The problem is keeping the beast under control specking about OMH, but someone more expert then me had already spoke about it in the MUST READING book: “Good Boss, Bad Boss” by Robert I. Sutton.


Conclusion

What conclusion? There is not a conclusion; I have just put in write a week-end/Easter thought.

The only thing I would like to stress a bit is that whenever there are teams based on multi-cultural members, from Service provider side or customer side, the standard way of acting are not working anymore.

The key there is respect, once more respect, no one is better, we are different and we must respect that. There is no need to do huge things, small simple ones are good enough. Talk slowly when talking, respect other people traditions: don’t ask for consultancy on 26 of December, don’t plan to send your Guru to Cairo on Friday and so on.

Our world is a common net from many points of view, but still lack in understanding how different is dealing with company, teams and customer limited to a specific territory and/or country; comparing with the company, team and customer distributed around the globe.

Excellent managers that work great and achieve  great results in UK (or Italy, US, France, Japan and so on)  for company working within the country with people comeing only for that country, miserable fails if require to lead the same role in a distributed, international environment.

 

Dealing with international teams and customer, require specific background. Managers, team leader, Coordinators that are not use to work in an international environment , should not be place in a leading position given they total lack of experience in handling correctly the PEOPLE, not resources, but PEOPLE.

One clarification, a company that has 100 offices in 100 countries and use a vertical structure, is not an international distributed company; it only has many boxes, period.

 

{joscommentenable}

Last Updated on Sunday, 18 August 2013 17:40
 
MySQL Search facility is gone PDF Print E-mail
Written by Marco Tusa   
Thursday, 31 March 2011 19:44

 

MySQL Search facility is gone …

 

Well it happens, after a long period of time without one second of relax The MySQL Dolphin Search, nick name Fetch for the friends, decide to go on vacation.

 

sakila_beach_256x213

He was already not working as expected, and I remember times when together with other colleagues we were pushing on him to be more precise, and less restrictive, more “concept” oriented than “word” oriented. Not really accepting comments he was his way, and in any case we were loving him.

What happen recently I don't know, from when I left I didn't had the chance to talk with him too much, but just recently I was in the need to have his support, but he was not there... instead there was a message, saying that a search for “InnoDB” :

Skip navigation links

Oracle Secure Enterprise Search

 

* Help

* Search

 

Oracle Secure Enterprise Search Error

A problem has been encountered while processing your request.

Based on the information we have, we suspect the following:

 

The exception did not contain any error message. Report this problem to your administrator, or contact your Oracle representative.

 

Of course I can use his friend G, but you know I miss him, and also if I was not too nice with him sometime, I have just realize the great work he was doing, and how much added value he was providing to the documentation I use every day.

 

So, please whoever knows where he went for vacation, tell him to comes back.

 

{joscommentenable}

Last Updated on Sunday, 18 August 2013 17:41
 
A dream on MySQL parallel replication PDF Print E-mail
Written by Marco Tusa   
Saturday, 12 March 2011 20:02

 

The Binary Log

 

The binary log in MySQL has two main declared purpose, replication and PTR (point in time recovery), as declared in the MySQL manual. In the MySQL binary log are stored all that statements that cause a change in the database. In short statements like DDL such as ALTER, CREATE, and so on, and DML like INSERT, UPDATE and DELETE.

As said binary log transform multi thread concurrent data/structure modification in to a serialize steps of activity. To handle this, a lock for the binary log—the LOCK_log mutex—is acquired just before the event is written to the binary log and released just after the event has been written. Because all session threads for the server log statements to the binary log, it is quite common for several session threads to block on this lock.

The other side effect of this is on SLAVE side. There the binary log becomes the relay-log, and will be processed SINGLE threaded.

In the MASTER it looks like:

binlog_today

While on SLAVE:

relaylogtoday

 

 

It is quite clear how the SLAVE cannot work efficiently in the case we will have a high level of  data modification on the MASTER.

 

And now the story

The other day I was working on a client site, I was performing some simple benchmarking to identify the correct thresholds for monitoring tools and so on.

While doing that I start to stress a little his server a 24CPU 48GB RAM SAN attached storage running Linux kernel 2.6.

The client had decided, before I or (we) can say anything to use MMM for HA, given that to have an up to date replication is a crucial factor for their HA architecture.

The test I was running execute inserts and selects, of 650 Bytes per insert using scalable number of thread, modifiable complexity in the SELECT, can do single insert or batch inserts and finally can do also BLOB/TEXT inserts (in that case the size per insert will jump up).

I was executing my standard low traffic exercises, when I realized that with only 6 threads doing 318 inserts per second (204Kb/s) the slaves were lagging behind the master for an indecent number of seconds.

I was expecting some lag but not so much. I stop any activity on the Master and wait for slave to recover, it takes 25 minutes to the slaves to recover/apply 20 seconds of inserts on the master.

I was shocked. That was too much considering I normally reach the 36 threads running at the same time on 5.1 and 144 on 5.5  before starting to see scaling issue and performance degradations (not in the scope of this article to publish that results).

Anyhow the issue was on replication not on Master. I was in the need to find a solution for the client or at least try before suggesting an architecture review.

Given re-inventing the wheel it is not my “motto” I started to look around for possible solutions, and parallelization seems the most interesting one.

 

My first though goes to Tungsten from Continuent, this also thanks to the very nice bogs coming from Giuseppe Maxia :

http://datacharmer.blogspot.com/2011/02/advanced-replication-for-masses-part-i.html

http://datacharmer.blogspot.com/2011/02/advanced-replication-for-masses-part-ii.html

http://datacharmer.blogspot.com/2011/03/advanced-replication-for-masses-part.html

 

Giuseppe was providing a superb introduction on one of the most interesting (actually the only really usable) solution currently available, the “Tungsten Replicator”.

 

I was reading the articles and get excited more and more.
That was exactly what I was looking for, until I read “A Shard, in this context, is the criteria used to split the transactions across channels. By default, it happens by database”.

 

Ooops, it will not work for my customer and me, given that his instance is mainly single schema, or at least the major amount of traffic is directed to only one schema.

I pinged Giuseppe, and we had a very interesting talk, he confirms the approach by object/schema, suggesting alternatives for my issue. Unfortunately given the special case no sharding could be done or table distribution, so for the moment and for the specific case this solution will not fit.

Looking around I had seen that almost all use the same approach, which is parallel replication by OBJECT where object is again and unfortunately, mainly the SCHEMA.

Sharding replication by schema is an approach that could work in some cases but not all.

If you have many schemas, with a limited number of well-distributed Questions per second, I agree this solution is a very good one.

If you have only one schema with very high level of Questions per seconds, this will not work anymore, for sure we can distribute the tables in different schemas, but what about transactions doing inserts/modifications in multiple tables?

We should be sure that each action is running on isolate state, so no relation with others. In short it will not work.


Having the possibility to dream, I would like to have parallel replication by MySQL User/Connections.

I know it could looks as a crazy idea, but trying to replicate the behavior of the MASTER on the SLAVES starts from the user(s) behavior, not from the Object inside the server instance.

Trying to move objects from A to B (see from MASTER to SLAVE) is really complicate because it needs to take in to account how people->application behave on a specific OBJECT or set of them. The status of the OBJECTS is subject to constant changes and most of them at the same time, so concurrency and versioning needs to be preserve.

In short what I was thinking was to realize a conceptual shift from REPLICATION to MIMIC approach.


I started my dream looking on what we already have and how it could be used.

We have a flexible binary log format (version4), we have a known set of SCHEMAS, USERS, and connections going from users to schemas. We have (at least for Innodb) the possibility to have good concurrent and versioning mechanism. The MAX_USER_CONNECTIONS can be used to tune the number of threads the specific user can use, and assign also the permissions.

Setting MAX_USER_CONNECTIONS=30 to USER1@ will help us in correctly dimensioning the effort that replication by user will have to do in checking for consistency.

I am aware of the issue given the interaction and relation between user/transactions, and I see the need of having a coordinator that will eventually manage the correct flush of the transaction by epoch.

What I was thinking is that the coordinator on the slave could open as much as MAX_USER_CONNECTIONS on the master, and use them as flushing threads.

As said before, I was thinking to trust the internal versioning, so not supporting transaction engines are excluded.

The epoch and/or Transaction ID would then be used to flush them in correct order, where the flush will be done to the pages in memory and then use internal mechanism for flushing to disk(s), talking of InnoDB.

It will not be real parallelization but also not a serialization given we will trust the internal versioning, so we will respect the EPOCH for the launch order not waiting for the completion.

On MASTER the replicator agent will create an X number of Replication channel for each authorized/configured user per SCHEMA, users and permission must be the same also on SLAVE.

The binary log write could looks like:

binlog_onlyuser 

On the SLAVE, the logs will be copied and then processed opening 1 to MAX_USER_CONNECTIONS Thread(s). 
Coordination between insert will be taken by replication MUTEX per User based on the EPOCH.

 relayloguser

I know quite difficult to realize given the fact that we will have more users writing on different schemas. To have it working correctly, a MUTEX should check the correct process execution at Relay Log Coordinator, ensuring also a correct execution order base on the EPOCH.

Sounds like another serialization, also if we will not wait for the transaction to complete, but only to start.

It seems that a mixed approach will be a better solution, such that replication by SHARD will be used for DB and then per USER inside only by SHARD.  This means archiving action per schema and per user transaction. Replicating the work on the SLAVE ensuring to have the same start order of the transaction for the modifications. This is not so crazy as most can think and not so far from sharding by schema, except the multi-threading during the writes.

Main point will be archiving correctly the information in the binary log:

 

binlog_schema_user

And execute them in the right order as well, keeping also into account the User/Transaction order:

relaylog_per_schema_user

Relay logs will contain information divide by SCHEMA. Processed using multi thread per User/Connection and organize with reference to the epoch of the starting execution order on the MASTER. Different grades of parallel or serialization execution could be set on the how the single Schema relay coordinator will work.

I know it will not be full parallelization but is not serializing as well, given the possible parallel execution of N transaction per user.

 

I think that there could be some risk here so in the case we should be able to:

-       Configure the number of executing threads per user

-       Execute only one thread per user

-       Choose between EPOCH execution and on Lock relief execution

 

Fast/short comment on the new binary log.

From binary log V4 we have the possibility to extend what the binary log store in a flexible way.

The extra headers (from byte 19 to X) are there but not used, while that sounds to me the most interesting place where to store additional information regarding what and how happen on the MASTER.

Not only, an HASH value representing the data on the master after the operation, would be of incredible help in checking if the replicated data on the SLAVE is consistent with the MASTER or not, avoiding to do that with silly methods like we are force to do today.

Below the binary log V4 format description:

v4 format description event (size >= 91 bytes; the size is 76 + the number of event types):

+=====================================+
| event  | timestamp         0 : 4    |
| header +----------------------------+
|        | type_code         4 : 1    | = FORMAT_DESCRIPTION_EVENT = 15
|        +----------------------------+
|        | server_id         5 : 4    |
|        +----------------------------+
|        | event_length      9 : 4    | >= 91
|        +----------------------------+
|        | next_position    13 : 4    |
|        +----------------------------+
|        | flags            17 : 2    |
+=====================================+
| event  | binlog_version   19 : 2    | = 4
| data   +----------------------------+
|        | server_version   21 : 50   |
|        +----------------------------+
|        | create_timestamp 71 : 4    |
|        +----------------------------+
|        | header_length    75 : 1    |
|        +----------------------------+
|        | post-header      76 : n    | = array of n bytes, one byte per event
|        | lengths for all            |   type that the server knows about
|        | event types                |
+=====================================+
v4 event header:
+============================+
| timestamp         0 : 4    |
+----------------------------+
| type_code         4 : 1    |
+----------------------------+
| server_id         5 : 4    |
+----------------------------+
| event_length      9 : 4    |
+----------------------------+
| next_position    13 : 4    |
+----------------------------+
| flags            17 : 2    |
+----------------------------+
| extra_headers    19 : x-19 |
+============================+


Conclusion

My favorite mottos are:
 “only the ones that do nothing do not make mistakes”

and

“don’t shoot on the pianist”.

As I say from the beginning mine is a dream, and I am just proposing an idea, specially because I see that almost all were following the same mind path focused on OBJECTS, probably for a good reason, but never the less I am following my dream.

It could happen that someone would like to see/try to work on that or that will use part of the idea revisiting modifying and making it possible to implement.

For the moment it is too complex and taking too much time, for me to do it alone, but it could be a good idea for a plug-in in my next life.

 

References:

In addition to Giuseppe blog, which for now it seems to me the only one proposing something that could be implement, I did some research, but at the end the actors are always the same:

Percona

http://www.percona.com/docs/wiki/percona-server:blueprints:parallel-replication

http://www.mysqlperformanceblog.com/2007/01/30/making-mysql-replication-parallel/

MySQL forge

http://forge.mysql.com/worklog/task.php?id=5569

Other

http://feedblog.org/2007/01/30/design-proposal-for-multithreaded-replication-in-mysql/

http://scale-out-blog.blogspot.com/2010/10/parallel-replication-on-mysql-report.html

 

Note

I already received a comment fromGiuseppe about how to handle the Locks and prevent dead lock... working on that (I have some ideas and details will provide soon).

 

{joscommentenable}

Last Updated on Sunday, 18 August 2013 18:36
 
«StartPrev12345678910NextEnd»

Page 10 of 12
 

Connecting from

Your IP: 54.162.168.187

Location:

Who's Online

We have 82 guests online