Thursday, February 26, 2009

Managing Auto Increments with Multi Masters

MySQL has system variables auto_increment_increment and auto_increment_offset for managing auto increment 'sequences' in multi master environment. Using these variables, it is possible to set up a multi master replication, where auto increment sequences in each master node interleave, and no conflicts should happen in the cluster. No matter which master(s) get the INSERTs.

Logically auto increment sequence is a shared resource, which would require distributed locking to deal with. However, auto increment sequence interleaving circumvents the need to lock, it sort of splits the auto increment sequence to several node specific sequences, making it "not a shared resource" anymore.

auto_increment_increment and auto_increment_offset have been implemented as session variables, as opposed to being global. We felt at first a bit uncomfortable with this, as there is obvious risk of misconfiguration resulting in conflicts. Apparently, the idea is to let user make separation with tables which are shared in the cluster and tables which are local to each node. If some dedicated session(s) touch only local tables, they can have session specific increment and offset values set to 1. And for sessions needing access to cluster wide shared tables, proper cluster-aware auto increment settings should be used.

These auto increment controlling variables are suitable for our Galera replication model as well. We however, wanted to go one step further and prevent any possibility of auto increment conflicts in the cluster. Galera runs on top of group communication system, which has real time view of cluster memberships. It is therefore possible to adjust increment and offset variables on the fly, triggered by any changes in cluster configuration. We implemented cluster view handler, which the group communication calls whenever somebody joins or leaves the group. The handler code is passed the number of members in the group and group ID of the processing node. These translate nicely to increment and offset values, cluster size will be the auto_increment_increment and auto_increment_offset can be calculated from the node ID.

We wanted to play even more safe to avoid any possibilities for conflicts, and therefore decided to restrict user's access to auto increment variables to read only. And, in order to be transparent, we also added one more variable: wsrep_auto_increment_control to define if the automatic auto increment controlling is enabled or not. With auto increment controlling enabled, cluster will take fully care of setting the increment and offset variables and this will guarantee no conflicts. If auto increment controlling is disabled, system will behave as default MySQL, and user must specify increment and offset globally or by session to suitable values.

Our implementation is based to MySQL 5.1.30, which happens to suffer from a known problem with slave side applying: This bug is so severe, that we backported the fix for this issue. So current wsrep integration code is actually 5.1.30 + 41986 patch + wsrep related changes.

The auto increment controlling among other new features are present in next Galera Demo-2 Release. The release is under testing and will be released as soon as our paranoid QA manager is done with all his remaining manoeuvres.

1 comment:

  1. Has the QA manager concluded its manoeuvres already? :-)