tendermint

3902 Commits

183 Branches

291 MiB

Tree: fe4123684d

Commit Graph

Author	SHA1	Message	Date
Emmanuel Odeke	15ef57c6d0	types: TxEventBuffer.Flush now uses capacity preserving slice clearing idiom Fixes https://github.com/tendermint/tendermint/issues/1189 For every TxEventBuffer.Flush() invoking, we were invoking a: b.events = make([]EventDataTx, 0, b.capacity) whose intention is to innocently clear the events slice but maintain the underlying capacity. However, unfortunately this is memory and garbage collection intensive which is linear in the number of events added. If an attack had access to our code somehow, invoking .Flush() in tight loops would be a sure way to cause huge GC pressure, and say if they added about 1e9 events maliciously, every Flush() would take at least 3.2seconds which is enough to now control our application. The new using of the capacity preserving slice clearing idiom takes a constant time regardless of the number of elements with zero allocations so we are killing many birds with one stone i.e b.events = b.events[:0] For benchmarking results, please see https://gist.github.com/odeke-em/532c14ab67d71c9c0b95518a7a526058 for a reference on how things can get out of hand easily.	7 years ago
Ethan Buchman	d71aed309f	some minor changes	7 years ago
Anton Kaliaev	f6539737de	new pubsub package comment out failing consensus tests for now rewrite rpc httpclient to use new pubsub package import pubsub as tmpubsub, query as tmquery make event IDs constants EventKey -> EventTypeKey rename EventsPubsub to PubSub mempool does not use pubsub rename eventsSub to pubsub new subscribe API fix channel size issues and consensus tests bugs refactor rpc client add missing discardFromChan method add mutex rename pubsub to eventBus remove IsRunning from WSRPCConnection interface (not needed) add a comment in broadcastNewRoundStepsAndVotes rename registerEventCallbacks to broadcastNewRoundStepsAndVotes See https://dave.cheney.net/2014/03/19/channel-axioms stop eventBuses after reactor tests remove unnecessary Unsubscribe return subscribe helper function move discardFromChan to where it is used subscribe now returns an err this gives us ability to refuse to subscribe if pubsub is at its max capacity. use context for control overflow cache queries handle err when subscribing in replay_test rename testClientID to testSubscriber extract var set channel buffer capacity to 1 in replay_file fix byzantine_test unsubscribe from single event, not all events refactor httpclient to return events to appropriate channels return failing testReplayCrashBeforeWriteVote test fix TestValidatorSetChanges refactor code a bit fix testReplayCrashBeforeWriteVote add comment fix TestValidatorSetChanges fixes from Bucky's review update comment [ci skip] test TxEventBuffer update changelog fix TestValidatorSetChanges (2nd attempt) only do wg.Done when no errors benchmark event bus create pubsub server inside NewEventBus only expose config params (later if needed) set buffer capacity to 0 so we are not testing cache new tx event format: key = "Tx" plus a tag {"tx.hash": XYZ} This should allow to subscribe to all transactions! or a specific one using a query: "tm.events.type = Tx and tx.hash = '013ABF99434...'" use TimeoutCommit instead of afterPublishEventNewBlockTimeout TimeoutCommit is the time a node waits after committing a block, before it goes into the next height. So it will finish everything from the last block, but then wait a bit. The idea is this gives it time to hear more votes from other validators, to strengthen the commit it includes in the next block. But it also gives it time to hear about new transactions. waitForBlockWithUpdatedVals rewrite WAL crash tests Task: test that we can recover from any WAL crash. Solution: the old tests were relying on event hub being run in the same thread (we were injecting the private validator's last signature). when considering a rewrite, we considered two possible solutions: write a "fuzzy" testing system where WAL is crashing upon receiving a new message, or inject failures and trigger them in tests using something like https://github.com/coreos/gofail. remove sleep no cs.Lock around wal.Save test different cases (empty block, non-empty block, ...) comments add comments test 4 cases: empty block, non-empty block, non-empty block with smaller part size, many blocks fixes as per Bucky's last review reset subscriptions on UnsubscribeAll use a simple counter to track message for which we panicked also, set a smaller part size for all test cases	7 years ago

Author

SHA1

Message

Date

Emmanuel Odeke

15ef57c6d0

types: TxEventBuffer.Flush now uses capacity preserving slice clearing idiom

Fixes https://github.com/tendermint/tendermint/issues/1189

For every TxEventBuffer.Flush() invoking, we were invoking
a:
  b.events = make([]EventDataTx, 0, b.capacity)
whose intention is to innocently clear the events slice but
maintain the underlying capacity.

However, unfortunately this is memory and garbage collection intensive
which is linear in the number of events added. If an attack had access
to our code somehow, invoking .Flush() in tight loops would be a sure
way to cause huge GC pressure, and say if they added about 1e9
events maliciously, every Flush() would take at least 3.2seconds
which is enough to now control our application.

The new using of the capacity preserving slice clearing idiom
takes a constant time regardless of the number of elements with zero
allocations so we are killing many birds with one stone i.e
  b.events = b.events[:0]

For benchmarking results, please see
https://gist.github.com/odeke-em/532c14ab67d71c9c0b95518a7a526058
for a reference on how things can get out of hand easily.

7 years ago

Ethan Buchman

d71aed309f

some minor changes

7 years ago

Anton Kaliaev

f6539737de

new pubsub package

comment out failing consensus tests for now

rewrite rpc httpclient to use new pubsub package

import pubsub as tmpubsub, query as tmquery

make event IDs constants
EventKey -> EventTypeKey

rename EventsPubsub to PubSub

mempool does not use pubsub

rename eventsSub to pubsub

new subscribe API

fix channel size issues and consensus tests bugs

refactor rpc client

add missing discardFromChan method

add mutex

rename pubsub to eventBus

remove IsRunning from WSRPCConnection interface (not needed)

add a comment in broadcastNewRoundStepsAndVotes

rename registerEventCallbacks to broadcastNewRoundStepsAndVotes

See https://dave.cheney.net/2014/03/19/channel-axioms

stop eventBuses after reactor tests

remove unnecessary Unsubscribe

return subscribe helper function

move discardFromChan to where it is used

subscribe now returns an err

this gives us ability to refuse to subscribe if pubsub is at its max
capacity.

use context for control overflow

cache queries

handle err when subscribing in replay_test

rename testClientID to testSubscriber

extract var

set channel buffer capacity to 1 in replay_file

fix byzantine_test

unsubscribe from single event, not all events

refactor httpclient to return events to appropriate channels

return failing testReplayCrashBeforeWriteVote test

fix TestValidatorSetChanges

refactor code a bit

fix testReplayCrashBeforeWriteVote

add comment

fix TestValidatorSetChanges

fixes from Bucky's review

update comment [ci skip]

test TxEventBuffer

update changelog

fix TestValidatorSetChanges (2nd attempt)

only do wg.Done when no errors

benchmark event bus

create pubsub server inside NewEventBus

only expose config params (later if needed)

set buffer capacity to 0 so we are not testing cache

new tx event format: key = "Tx" plus a tag {"tx.hash": XYZ}

This should allow to subscribe to all transactions! or a specific one
using a query: "tm.events.type = Tx and tx.hash = '013ABF99434...'"

use TimeoutCommit instead of afterPublishEventNewBlockTimeout

TimeoutCommit is the time a node waits after committing a block, before
it goes into the next height. So it will finish everything from the last
block, but then wait a bit. The idea is this gives it time to hear more
votes from other validators, to strengthen the commit it includes in the
next block. But it also gives it time to hear about new transactions.

waitForBlockWithUpdatedVals

rewrite WAL crash tests

Task:
test that we can recover from any WAL crash.

Solution:
the old tests were relying on event hub being run in the same thread (we
were injecting the private validator's last signature).

when considering a rewrite, we considered two possible solutions: write
a "fuzzy" testing system where WAL is crashing upon receiving a new
message, or inject failures and trigger them in tests using something
like https://github.com/coreos/gofail.

remove sleep

no cs.Lock around wal.Save

test different cases (empty block, non-empty block, ...)

comments

add comments

test 4 cases: empty block, non-empty block, non-empty block with smaller part size, many blocks

fixes as per Bucky's last review

reset subscriptions on UnsubscribeAll

use a simple counter to track message for which we panicked

also, set a smaller part size for all test cases

7 years ago

3 Commits (fe4123684d4a02cbc53eacf93458724ccc8c6d91)