I have seen a few posts on DBA.SE (where I answer a lot of questions) recommending the use of semi-synchronous replication in MySQL 5.5 over a WAN as a way to improve the reliability of replication. My gut reaction was that this is a very bad idea with even a tiny write load, but I wanted to test it out to confirm. Please note that I do not mean to disparage the author of those posts, a user whom I have great respect for.
The short version is that one slave has to acknowledge receipt of the binary log event before the query returns. The slave doesn’t have to execute it before returning control so it’s still an asynchronous commit. The net effect is that the slave should only miss a maximum of 1 event if the master has a catastrophic failure. It does not improve the reliability of replication itself or prevent data drift.
What about performance, though? Semi-synchronous replication causes the client to block until a slave has acknowledged that it has received the event. On a LAN with sub-millisecond latencies, this should not present much of a problem. But what if there is 85ms of latency between the master and the slave, as is the case between Virginia and California? My hypothesis is that, with 85ms of latency, it is impossible to get better than 11 write queries (INSERT/UPDATE/DELETE) per second – 1000ms / 85ms = 11.7.
Let’s test that out.
I spun up identical m1.small instances in EC2’s us-east-1 and us-west-1 regions using the latest Ubuntu 12.04 LTS AMI from Alestic and installed the latest Percona Server 5.5 from their apt repository.
Although mostly irrelevant to the benchmarks, my.cnf is configured thusly (basically identical to support-files/my-huge.cnf shipped with the distribution):
Then, I wrote a very simple Ruby script to perform 10k inserts into a table, using the sequel gem, which I love.
With MySQL configured, let’s knock out a few INSERTs into the us-east-1 database, which has no slaves:
# w/ no slaves
...
INFO -- : (0.000179s) INSERT INTO test.foo (text) VALUES ('FKGDLOWD')
...
INFO -- : Elapsed: 9.37364 seconds. 1066.82142689499 qps
My control is roughly 1000 inserts/sec with each query taking less than .2ms.
Then, I set up a traditional, asynchronous MySQL slave on the server in us-west-1 and ran the test again:
# w/ traditional replication
...
INFO -- : (0.000237s) INSERT INTO test.foo (text) VALUES ('CVGAMLXA')
...
INFO -- : Elapsed: 10.601943 seconds. 943.223331798709 qps
Somewhat inexplicably, the performance was slightly worse with the slave attached, but not by much. ~950 inserts/sec
Next is semi-synchronous replication. First, I tested the latency between us-east-1 and us-west-1.
# ping -c 1 184.72.189.235
PING 184.72.189.235 (184.72.189.235) 56(84) bytes of data.
64 bytes from 184.72.189.235: icmp_req=1 ttl=52 time=85.5 ms
Latency between us-east-1 and us-west-1 is 85ms, so I still predict 11 inserts/sec at most, which means my script will take 15 minutes instead of 10 seconds:
I set up semi-synchronous replication like this:
I started the script and, as predicted, each insert was taking approximately 85ms. There was a screaming 2 month old in the next room so in the interest of brevity, I reduced the count from 10k to 1k. That should take about 90s:
# w/ semi-sync replication
...
INFO -- : (0.086301s) INSERT INTO test.foo (text) VALUES ('JKIJTUDO')
...
INFO -- : Elapsed: 86.889529 seconds. 11.5088666207409 qps
Just as I suspected – 11 inserts/sec.
In conclusion, the speed of light is a bitch, so don’t enable semi-synchronous replication over wide area or high latency networks.