Performance of MySQL Semi-Synchronous Replication Over High Latency Connections

I have seen a few posts on DBA.SE (where I answer a lot of questions) recommending the use of semi-synchronous replication in MySQL 5.5 over a WAN as a way to improve the reliability of replication. My gut reaction was that this is a very bad idea with even a tiny write load, but I wanted to test it out to confirm. Please note that I do not mean to disparage the author of those posts, a user whom I have great respect for.

What is semi-synchronous replication?

The short version is that one slave has to acknowledge receipt of the binary log event before the query returns. The slave doesn’t have to execute it before returning control so it’s still an asynchronous commit. The net effect is that the slave should only miss a maximum of 1 event if the master has a catastrophic failure. It does not improve the reliability of replication itself or prevent data drift.

What about performance, though? Semi-synchronous replication causes the client to block until a slave has acknowledged that it has received the event. On a LAN with sub-millisecond latencies, this should not present much of a problem. But what if there is 85ms of latency between the master and the slave, as is the case between Virginia and California? My hypothesis is that, with 85ms of latency, it is impossible to get better than 11 write queries (INSERT/UPDATE/DELETE) per second – 1000ms / 85ms = 11.7.

Let’s test that out.

I spun up identical m1.small instances in EC2′s us-east-1 and us-west-1 regions using the latest Ubuntu 12.04 LTS AMI from Alestic and installed the latest Percona Server 5.5 from their apt repository.

gpg --keyserver  hkp://keys.gnupg.net --recv-keys 1C4CBDCDCD2EFD2A
gpg -a --export CD2EFD2A | apt-key add -
echo deb http://repo.percona.com/apt precise main > /etc/apt/sources.list.d/percona.list
apt-get update
apt-get install percona-server-server-5.5 libmysqlclient-dev

Although mostly irrelevant to the benchmarks, my.cnf is configured thusly (basically identical to support-files/my-huge.cnf shipped with the distribution):

[mysqld]
port          = 3306
socket          = /var/run/mysql/mysql.sock
skip-external-locking
key_buffer_size = 384M
max_allowed_packet = 1M
table_open_cache = 512
sort_buffer_size = 2M
read_buffer_size = 2M
read_rnd_buffer_size = 8M
myisam_sort_buffer_size = 64M
thread_cache_size = 8
query_cache_size = 32M
thread_concurrency = 8

server-id     = 1
log-bin=mysql-bin
binlog_format=mixed

innodb_data_home_dir = /var/lib/mysql
innodb_data_file_path = ibdata1:2000M;ibdata2:10M:autoextend
innodb_log_group_home_dir = /var/lib/mysql
innodb_buffer_pool_size = 384M
innodb_additional_mem_pool_size = 20M
innodb_log_file_size = 100M
innodb_log_buffer_size = 8M
innodb_flush_log_at_trx_commit = 0
innodb_lock_wait_timeout = 50

Then, I wrote a very simple Ruby script to perform 10k inserts into a table, using the sequel gem, which I love.

apt-get install rubygems
gem install sequel mysql2 --no-rdoc --no-ri
#!/usr/bin/env ruby
# insertperf.rb

require 'logger'
require 'rubygems'
require 'sequel'

logger = Logger.new(STDOUT)
localdb = "inserttest"

db = Sequel.connect( :database => localdb,
                     :adapter  => 'mysql2',
                     :user     => 'root',
                     :logger   => logger )

db["DROP DATABASE IF EXISTS #{localdb}"].all
db["CREATE DATABASE #{localdb}"].all
db["CREATE TABLE IF NOT EXISTS #{localdb}.foo (
  id int unsigned AUTO_INCREMENT PRIMARY KEY,
  text VARCHAR(8)
) ENGINE=InnoDB"].all

n = 10000
t1 = Time.new
n.times do
  value = (0...8).map{65.+(rand(25)).chr}.join
  db["INSERT INTO #{localdb}.foo (text) VALUES (?)", value].insert
end
t2 = Time.new
elapsed = t2-t1
logger.info "Elapsed: #{elapsed} seconds. #{n/elapsed} qps"

With MySQL configured, let’s knock out a few INSERTs into the us-east-1 database, which has no slaves:

# w/ no slaves
...
INFO -- : (0.000179s) INSERT INTO test.foo (text) VALUES ('FKGDLOWD')
...
INFO -- : Elapsed: 9.37364 seconds. 1066.82142689499 qps

My control is roughly 1000 inserts/sec with each query taking less than .2ms.

Then, I set up a traditional, asynchronous MySQL slave on the server in us-west-1 and ran the test again:

# w/ traditional replication
...
INFO -- : (0.000237s) INSERT INTO test.foo (text) VALUES ('CVGAMLXA')
...
INFO -- : Elapsed: 10.601943 seconds. 943.223331798709 qps

Somewhat inexplicably, the performance was slightly worse with the slave attached, but not by much. ~950 inserts/sec

Next is semi-synchronous replication. First, I tested the latency between us-east-1 and us-west-1.

# ping -c 1 184.72.189.235
PING 184.72.189.235 (184.72.189.235) 56(84) bytes of data.
64 bytes from 184.72.189.235: icmp_req=1 ttl=52 time=85.5 ms

Latency between us-east-1 and us-west-1 is 85ms, so I still predict 11 inserts/sec at most, which means my script will take 15 minutes instead of 10 seconds:

I set up semi-synchronous replication like this:

master> INSTALL PLUGIN rpl_semi_sync_master SONAME 'semisync_master.so';
master> SET GLOBAL rpl_semi_sync_master_enabled = 1;
slave> INSTALL PLUGIN rpl_semi_sync_slave SONAME 'semisync_slave.so';
slave> SET GLOBAL rpl_semi_sync_slave_enabled = 1;
slave> STOP SLAVE; START SLAVE;

I started the script and, as predicted, each insert was taking approximately 85ms. There was a screaming 2 month old in the next room so in the interest of brevity, I reduced the count from 10k to 1k. That should take about 90s:

# w/ semi-sync replication
...
INFO -- : (0.086301s) INSERT INTO test.foo (text) VALUES ('JKIJTUDO')
...
INFO -- : Elapsed: 86.889529 seconds. 11.5088666207409 qps

Just as I suspected – 11 inserts/sec.

In conclusion, the speed of light is a bitch, so don’t enable semi-synchronous replication over wide area or high latency networks.

12 Comments

  • Rolando Edwards says:

    I read this and learned something profound. Semisync Replication should not be used geographically. Network latency between data centers would deteriorate Semisync back to Async. Nevertheless, nothing beats proving it beyond a reasonable doubt. Thank you.

  • Justin Swanhart says:

    This might be acceptable for some workloads, such as data warehouses, which feature bulk loading. They don’t have high transactional write rates.

    This hopefully will also be mitigated somewhat by MySQL 5.6 group commit, at least for parallel workloads.

    Also, you can get true synchronous replication that is WAN friendly using XtraDB cluster, because Galera replication happens in parallel.

    • Aaron says:

      Hi Justin,

      I agree that for certain OLAP applications, this might be acceptable. I’ll have to read up on group commit to see how it might help this – I haven’t done any testing w/ 5.6 yet.

      With regards to PXC, can you elaborate on how this is mitigated? While some applies could happen in parallel, I expect that each query will still block the client for at least 85ms, which would have a negative impact to most web/OLTP applications. Perhaps this is a good excuse for me to do some testing with PXC. :)

      Another scenario where I think this might be ok is if you have multiple semi-sync slaves, with some of them colocated with the master. Since the protocol only requires one ack, as long as the local slaves are healthy, they would tend to respond first. I’m also curious how semi-sync works with tiered replicas – is the protocol supported in the case where there is a relay slave?

  • James Attard says:

    I think if you use this with NBD or Sharding or any other write scale application, it would make sense for geographically dispersed nodes.

    • Aaron says:

      James, can you elaborate? I don’t understand your NBD comment (did you mean NDB?).

      How does sharding change the latency issue? If 11 INSERTs a second is acceptable throughput, you would be not have reached the scale where you need to shard anything.

  • Kenneth says:

    Many thanks sharing this information. It helps a lot for out ongoing decision for Mysql replication.

  • I think this is what you would expect through single connection. However what if you do multiple connections at the time ? There would be a potential optimization opportunity of “group commit” for semi sync replication. This is also where parallel application by solutions like Percona XtraDB Cluster help.

  • zhang says:

    I am behind Peter’s opinion. I also tested semi-sync three month ago when I was using sysbench with several threads. I was excited to find that the performance of semi-sync DIDN’T dropped so much significantly. The result was NOT same as what I expected and understood before. Finally, wow…Here is key: “network group commit”. That is a cool idea,well done! The following opinion from Baron is :”semi-synchronous replication actually forces the client to be synchronized, not the replicas.” It also makes me much more clear.

  • Hi Aaron, this is a nice article. I referenced it in my own blog article concerning synchronous replication (http://scale-out-blog.blogspot.com/2012/08/is-synchronous-data-replication-over.html). Thanks for doing the experiment.

    Meanwhile, I don’t know if you have gotten email from others, but comments on this blog appear as separate posts on Planet MySQL. There might be a problem with your RSS feed settings.

    • Aaron says:

      Thanks Robert. I saw your blog post as well and enjoyed it. I may be testing Tungsten out in the near future.

      I, too noticed the comments on Planet MySQL and I am not sure where they are coming from or who to contact to get the comments feed removed.

  • Artyom says:

    Hi Aaron, really nice article. we just discussed this replication and prodaction perfomance. Made a Russian translation of your article in my blog ( http://sorok7.blogspot.com/2012/08/mysql_7.html ) Thanks for article!

  • Rubén says:

    Hi all

    I was in Last London Percona Meeting, was a very great surprise discover this blog with lot of people of percona here :)

    thanks by share this information, let me ask something very easy , If we setup new semi-synchronous replication between Mysql boxes in same datacenter, for example, Have we to expect normal performance between them? At least, same performance than normal replication???

    I tested normal replication in the past and we have lot of errors and was a little bit annoying for me to fix every time the replication so I’m excited about test semi-repl!

    Thanks!

2 Trackbacks

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>