Wednesday, January 28, 2009

Experimenting with Write Set Replication

During the past year, we have been developing a write set replication system for MySQL/innodb engine, called Galera. Now, our project has reached milestone where we can run benchmarks to get performance metrics and also give out releases for public evaluation. In this blog, I'll give a short introduction to Galera and related projects.

Some Technology
Galera is generic multi-master synchronous replication system, which replicates transaction write sets at the commit time and resolves conflicts by comparing keys in write sets. Replication happens through group communication system, which (among other tasks) defines the order of committing transactions in the cluster. The write sets can carry original SQL statements or for best performance: row based replication events, available in MySQL 5.1 and onwards.

Galera replication method leaves the actual SQL statement processing to happen uninterrupted, and quite close to the native MySQL way. This makes client interaction with the cluster fast and for the application, Galera cluster looks just like any native MySQL server. Only difference is commit processing, where certain delay is caused by synchronization with the cluster.

Galera replication has been integrated with MySQL/Innodb 5.1.30 providing a full fledged multi-master MySQL database cluster. We call this first version "demo release" and it is available for downloads in our website.

Some Benchmarking
We have benchmarked Galera with different benchmarks (sysbench, dbt2, DOTS, osdb, sqlgen) using different load profiles to find out the constraints for the feasibility of Galera replication. Our observation is, that Galera cluster provides good performance and scalability even with write intensive work loads.

Here is one summary gained with dbt2 benchmark (resembles TPC-C), run in amazon EC2 environment. The graph shows how 1-4 node Galera cluster compares against pristine MySQL 5.1.30 server.
Dbt2 load contains hot spots and is not favorable for clustering. You can see the deadlock rate growing when more cluster nodes are added. However, total performance still gets better even with 4 nodes.

One Roadmap
We just released MySQL/Galera demo release. It should be stable enough for evaluating with real applications. You can download the demo release from here: Galera demo.

Our next task will be to implement all missing features, we plan to have in beta release. The major task there is providing a way to bring a new node in the cluster. In essence, this means implementing DB snapshot transfer for joining nodes. We assume, that feature complete version is possible during Q2 this year.

And All Those Projects
Galera communicates with DBMS engine through an API, which we call: wsrep API (wsrep as: "write set replication"). We started one open source project just for defining this API and another project for implementing the API integration in MySQL/innodb engine.

Here's our current project list:
  • wsrep API defines the wsrep API only.
  • mysql patches by Codership is open source wsrep integration in MySQL code base.
  • openrep will be open source implementation of wsrep API replication system. We just started working on this, no deliverables yet.
  • galera is wsrep API implementation, optimized for best performance
We have investigated postgres source code quite a bit, and wish to be able to start "wsrep integration patches for postgres" project as well. But, we don't have enough hands and heads to go ahead with this plan in the near future. Technically however, postgres integration should be within easy reach.

This is the state of Galera development in a nutshell. Feel free to visit our website, there is plenty of more information available, for the interested reader.