This continues the push of plumbing contexts through tendermint. I
attempted to find all goroutines in the production code (non-test) and
made sure that these threads would exit when their contexts were
canceled, and I believe this PR does that.
Prior to this change, shutting down the pubsub server could cause
any laggard publishers to race with the shutdown plumbing.
Fix that race condition, and plumb in the service context to the
runner so that it will respect the external signal directly.
Remove now-redundant local shutdown plumbing.
Rework the implementation of event query parsing and execution to
improve performance and reduce memory usage.
Previous memory and CPU profiles of the pubsub service showed query
processing as a significant hotspot. While we don't have evidence that
this is visibly hurting users, fixing it is fairly easy and self-contained.
Updates #6439.
Typical benchmark results comparing the original implementation (PEG) with the reworked implementation (Custom):
```
TEST TIME/OP BYTES/OP ALLOCS/OP SPEEDUP MEM SAVING
BenchmarkParsePEG-12 51716 ns 526832 27
BenchmarkParseCustom-12 2167 ns 4616 17 23.8x 99.1%
BenchmarkMatchPEG-12 3086 ns 1097 22
BenchmarkMatchCustom-12 294.2 ns 64 3 10.5x 94.1%
```
Components:
* Add a basic parsing benchmark.
* Move the original query implementation to a subdirectory.
* Add lexical scanner for Query expressions.
* Add a parser for Query expressions.
* Implement query compiler.
* Add test cases based on OpenAPI examples.
* Add MustCompile to replace the original MustParse, and update usage.
This is a very small change, but removes a method from the
`service.Service` interface (a win!) and forces callers to explicitly
pass loggers in to objects during construction rather than (later)
injecting them. There's not a real need for this kind of lazy
construction of loggers, and I think a decent potential for confusion
for mutable loggers.
The main concern I have is that this changes the constructor API for
ABCI clients. I think this is fine, and I suspect that as we plumb
contexts through, and make changes to the RPC services there'll be a
number of similar sorts of changes to various (quasi) public
interfaces, which I think we should welcome.
I think calling os.Exit at arbitrary points is _bad_ and is good to
delete. I think panics in the case of data courruption have a chance
of providing useful information.
This is part of the work described by #7156.
Remove "unbuffered subscriptions" from the pubsub service.
Replace them with a dedicated blocking "observer" mechanism.
Use the observer mechanism for indexing.
Add a SubscribeWithArgs method and deprecate the old Subscribe
method. Remove SubscribeUnbuffered entirely (breaking).
Rework the Subscription interface to eliminate exposed channels.
Subscriptions now use a context to manage lifecycle notifications.
Internalize the eventbus package.
Prior to #7177, these benchmarks did not provide any useful data about the
performance of the pubsub system (in fact, prior to #7178, half of them did not
work at all).
Specifically, they create a bunch of subscribers with 1 buffer slot on a
default publisher and copy messages to them. But because the publisher is
single-threaded, and doesn't block for delivery, all this tested is how long it
takes to receive a single message from a channel and deliver it to another
channel. The resulting stat does not even vary meaningfully with batch size,
since it's testing a serial workload.
Since #7177 the benchmarks do correctly reflect delivery time (good), but still
do not tell us anything useful: The latencies that matter for pubsub are not
internal queuing, but the effects of backpressure on the publisher via the
subscribers. That's an integration problem, and simulating a fake workload does
not provide meaningful results.
On that basis, remove these benchmarks.
Updates #7156, and a follow-up to #7070.
Event subscriptions in Tendermint currently use a fixed-length Go
channel as a queue. When the channel fills up, the publisher
immediately terminates the subscription. This prevents slow
subscribers from creating memory pressure on the node by not
servicing their queue fast enough.
Replace the buffered channel used to deliver events to buffered
subscribers with an explicit queue. The queue provides a soft
quota and burst credit mechanism: Clients that usually keep up
can survive occasional bursts, without allowing truly slow
clients to hog resources indefinitely.
Fixes#7176. Some of the benchmarks create a bunch of different subscriptions all sharing the same query. These were all using the same client ID, which violates one of the subscriber rules. Ensure each subscriber gets a unique ID.
This has been broken as long as this library has been in the repo—I tracked it back to bb9aa85d and it was already failing there, so I think this never really worked. I'm not sure these test anything useful, but at least now they run.
Rework the internal plumbing of the server. This change does not modify the
exported interfaces or semantics of the package, and all the existing tests
still pass.
The main changes here are to:
- Simplify the interface for subscription indexing with a typed index rather
than a single nested map.
- Ensure orderly shutdown of channels, so that there is no longer a dynamic
race with concurrent publishers & subscribers at shutdown.
- Remove a layer of indirection between publishers and subscribers. This mainly
helps legibility.
- Remove order dependencies between registration and delivery.
- Add documentation comments where they seemed helpful, and clarified the
existing comments where it was practical.
Although performance was not a primary goal of this change, the simplifications
did very slightly reduce memory use and increase throughput on the existing
benchmarks, though the delta is not statistically significant.
BENCHMARK BEFORE AFTER SPEEDUP (%) B/op (B) B/op (A)
Benchmark10Clients-12 5947 5566 6.4 2017 1942
Benchmark100Clients-12 6111 5762 5.7 1992 1910
Benchmark1000Clients-12 6983 6344 9.2 2046 1959
This is a very minor change, but I was looking through the code, and
this seems like it shouldn't be exported or used more broadly, so I've
moved it out.
As written, the encoding step unnecessarily made and moved multiple copies of
the encoded representation. Reduce this to a single allocation and encode the
data in-place so that a shift is no longer required.
Also: Add a test to ensure letter digits are capitalized, which was previously not
verified but was expected downstream.
No functional changes.
## Description
Internalize some libs. This reduces the amount ot public API tendermint is supporting. The moved libraries are mainly ones that are used within Tendermint-core.
To make sure finalizers run, we use channel for synchronization, and a
separate goroutine for trigger runtime.GC every 1 second. In practice,
just two consecutive runtime.GC calls can make all finalizers will run,
but using a separate goroutine make the code more robust and not depend
on garbage collector internal implementation.
Fixes#6452
This change fixes a potential exploitable vulnerability
that can cause the WAL to be consistently truncated by falsely
supplying the WAL path which would be any arbitrary dirrectory.
Fixes#6427
## Description
Fixes marshaling error in sdk
closes https://github.com/cosmos/cosmos-sdk/issues/8578
the output stays the same, we are avoiding the passing of the callback because sdk uses typed logging.
#5852 fixed an issue with error propagation in `os.EnsureDir()`. However, this function is basically identical to `os.MkdirAll()`, and can be replaced entirely with a call to it. We keep the function for backwards compatibility.