abci: Flush socket requests and responses immediately. (#6997)
The main effect of this change is to flush the socket client and server message
encoding buffers immediately once the message is fully and correctly encoded.
This allows us to remove the timer and some other special cases, without
changing the observed behaviour of the system.
-- Background
The socket protocol client and server each use a buffered writer to encode
request and response messages onto the underlying connection. This reduces the
possibility of a single message being split across multiple writes, but has the
side-effect that a request may remain buffered for some time.
The implementation worked around this by keeping a ticker that occasionally
triggers a flush, and by flushing the writer in response to an explicit request
baked into the client/server protocol (see also #6994).
These workarounds are both unnecessary: Once a message has been dequeued for
sending and fully encoded in wire format, there is no real use keeping all or
part of it buffered locally. Moreover, using an asynchronous process to flush
the buffer makes the round-trip performance of the request unpredictable.
-- Benchmarks
Code: https://play.golang.org/p/0ChUOxJOiHt
I found no pre-existing performance benchmarks to justify the flush pattern,
but a natural question is whether this will significantly harm client/server
performance. To test this, I implemented a simple benchmark that transfers
randomly-sized byte buffers from a no-op "client" to a no-op "server" over a
Unix-domain socket, using a buffered writer, both with and without explicit
flushes after each write.
As the following data show, flushing every time (FLUSH=true) does reduce raw
throughput, but not by a significant amount except for very small request
sizes, where the transfer time is already trivial (1.9μs). Given that the
client is calibrated for 1MiB transactions, the overhead is not meaningful.
The percentage in each section is the speedup for flushing only when the buffer
is full, relative to flushing every block. The benchmark uses the default
buffer size (4096 bytes), which is the same value used by the socket client and
server implementation:
FLUSH NBLOCKS MAX AVG TOTAL ELAPSED TIME/BLOCK
false 3957471 512 255 1011165416 2.00018873s 505ns
true 1068568 512 255 273064368 2.000217051s 1.871µs
(73%)
false 536096 4096 2048 1098066401 2.000229108s 3.731µs
true 477911 4096 2047 978746731 2.000177825s 4.185µs
(10.8%)
false 124595 16384 8181 1019340160 2.000235086s 16.053µs
true 120995 16384 8179 989703064 2.000329349s 16.532µs
(2.9%)
false 2114 1048576 525693 1111316541 2.000479928s 946.3µs
true 2083 1048576 526379 1096449173 2.001817137s 961.025µs
(1.5%)
Note also that the FLUSH=false baseline is actually faster than the production
code, which flushes more often than is required by the buffer filling up.
Moreover, the timer slows down the overall transaction rate of the client and
server, indepenedent of how fast the socket transfer is, so the loss on a real
workload is probably much less.