You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

90 lines
2.0 KiB

internal/libs/protoio: optimize MarshalDelimited by plain byteslice allocations+sync.Pool (#7325) Noticed in profiles that invoking *VoteSignBytes always created a bytes.Buffer, then discarded it inside protoio.MarshalDelimited. I dug further and examined the call paths and noticed that we unconditionally create the bytes.Buffer, even though we might have proto messages (in the common case) that implement MarshalTo([]byte), and invoked varintWriter. Instead by inlining this case, we skip a bunch of allocations and CPU cycles, which then reflects properly on all calling functions. Here are the benchmark results: ```shell $ benchstat before.txt after.txt name old time/op new time/op delta types.VoteSignBytes-8 705ns ± 3% 573ns ± 6% -18.74% (p=0.000 n=18+20) types.CommitVoteSignBytes-8 8.15µs ± 9% 6.81µs ± 4% -16.51% (p=0.000 n=20+19) protoio.MarshalDelimitedWithMarshalTo-8 788ns ± 8% 772ns ± 3% -2.01% (p=0.050 n=20+20) protoio.MarshalDelimitedNoMarshalTo-8 989ns ± 4% 845ns ± 2% -14.51% (p=0.000 n=20+18) name old alloc/op new alloc/op delta types.VoteSignBytes-8 792B ± 0% 600B ± 0% -24.24% (p=0.000 n=20+20) types.CommitVoteSignBytes-8 9.52kB ± 0% 7.60kB ± 0% -20.17% (p=0.000 n=20+20) protoio.MarshalDelimitedNoMarshalTo-8 808B ± 0% 440B ± 0% -45.54% (p=0.000 n=20+20) name old allocs/op new allocs/op delta types.VoteSignBytes-8 13.0 ± 0% 10.0 ± 0% -23.08% (p=0.000 n=20+20) types.CommitVoteSignBytes-8 140 ± 0% 110 ± 0% -21.43% (p=0.000 n=20+20) protoio.MarshalDelimitedNoMarshalTo-8 10.0 ± 0% 7.0 ± 0% -30.00% (p=0.000 n=20+20) ``` Thanks to Tharsis who tasked me to help them increase TPS and who are keen on improving Tendermint and efficiency.
3 years ago
internal/libs/protoio: optimize MarshalDelimited by plain byteslice allocations+sync.Pool (#7325) Noticed in profiles that invoking *VoteSignBytes always created a bytes.Buffer, then discarded it inside protoio.MarshalDelimited. I dug further and examined the call paths and noticed that we unconditionally create the bytes.Buffer, even though we might have proto messages (in the common case) that implement MarshalTo([]byte), and invoked varintWriter. Instead by inlining this case, we skip a bunch of allocations and CPU cycles, which then reflects properly on all calling functions. Here are the benchmark results: ```shell $ benchstat before.txt after.txt name old time/op new time/op delta types.VoteSignBytes-8 705ns ± 3% 573ns ± 6% -18.74% (p=0.000 n=18+20) types.CommitVoteSignBytes-8 8.15µs ± 9% 6.81µs ± 4% -16.51% (p=0.000 n=20+19) protoio.MarshalDelimitedWithMarshalTo-8 788ns ± 8% 772ns ± 3% -2.01% (p=0.050 n=20+20) protoio.MarshalDelimitedNoMarshalTo-8 989ns ± 4% 845ns ± 2% -14.51% (p=0.000 n=20+18) name old alloc/op new alloc/op delta types.VoteSignBytes-8 792B ± 0% 600B ± 0% -24.24% (p=0.000 n=20+20) types.CommitVoteSignBytes-8 9.52kB ± 0% 7.60kB ± 0% -20.17% (p=0.000 n=20+20) protoio.MarshalDelimitedNoMarshalTo-8 808B ± 0% 440B ± 0% -45.54% (p=0.000 n=20+20) name old allocs/op new allocs/op delta types.VoteSignBytes-8 13.0 ± 0% 10.0 ± 0% -23.08% (p=0.000 n=20+20) types.CommitVoteSignBytes-8 140 ± 0% 110 ± 0% -21.43% (p=0.000 n=20+20) protoio.MarshalDelimitedNoMarshalTo-8 10.0 ± 0% 7.0 ± 0% -30.00% (p=0.000 n=20+20) ``` Thanks to Tharsis who tasked me to help them increase TPS and who are keen on improving Tendermint and efficiency.
3 years ago
internal/libs/protoio: optimize MarshalDelimited by plain byteslice allocations+sync.Pool (#7325) Noticed in profiles that invoking *VoteSignBytes always created a bytes.Buffer, then discarded it inside protoio.MarshalDelimited. I dug further and examined the call paths and noticed that we unconditionally create the bytes.Buffer, even though we might have proto messages (in the common case) that implement MarshalTo([]byte), and invoked varintWriter. Instead by inlining this case, we skip a bunch of allocations and CPU cycles, which then reflects properly on all calling functions. Here are the benchmark results: ```shell $ benchstat before.txt after.txt name old time/op new time/op delta types.VoteSignBytes-8 705ns ± 3% 573ns ± 6% -18.74% (p=0.000 n=18+20) types.CommitVoteSignBytes-8 8.15µs ± 9% 6.81µs ± 4% -16.51% (p=0.000 n=20+19) protoio.MarshalDelimitedWithMarshalTo-8 788ns ± 8% 772ns ± 3% -2.01% (p=0.050 n=20+20) protoio.MarshalDelimitedNoMarshalTo-8 989ns ± 4% 845ns ± 2% -14.51% (p=0.000 n=20+18) name old alloc/op new alloc/op delta types.VoteSignBytes-8 792B ± 0% 600B ± 0% -24.24% (p=0.000 n=20+20) types.CommitVoteSignBytes-8 9.52kB ± 0% 7.60kB ± 0% -20.17% (p=0.000 n=20+20) protoio.MarshalDelimitedNoMarshalTo-8 808B ± 0% 440B ± 0% -45.54% (p=0.000 n=20+20) name old allocs/op new allocs/op delta types.VoteSignBytes-8 13.0 ± 0% 10.0 ± 0% -23.08% (p=0.000 n=20+20) types.CommitVoteSignBytes-8 140 ± 0% 110 ± 0% -21.43% (p=0.000 n=20+20) protoio.MarshalDelimitedNoMarshalTo-8 10.0 ± 0% 7.0 ± 0% -30.00% (p=0.000 n=20+20) ``` Thanks to Tharsis who tasked me to help them increase TPS and who are keen on improving Tendermint and efficiency.
3 years ago
internal/libs/protoio: optimize MarshalDelimited by plain byteslice allocations+sync.Pool (#7325) Noticed in profiles that invoking *VoteSignBytes always created a bytes.Buffer, then discarded it inside protoio.MarshalDelimited. I dug further and examined the call paths and noticed that we unconditionally create the bytes.Buffer, even though we might have proto messages (in the common case) that implement MarshalTo([]byte), and invoked varintWriter. Instead by inlining this case, we skip a bunch of allocations and CPU cycles, which then reflects properly on all calling functions. Here are the benchmark results: ```shell $ benchstat before.txt after.txt name old time/op new time/op delta types.VoteSignBytes-8 705ns ± 3% 573ns ± 6% -18.74% (p=0.000 n=18+20) types.CommitVoteSignBytes-8 8.15µs ± 9% 6.81µs ± 4% -16.51% (p=0.000 n=20+19) protoio.MarshalDelimitedWithMarshalTo-8 788ns ± 8% 772ns ± 3% -2.01% (p=0.050 n=20+20) protoio.MarshalDelimitedNoMarshalTo-8 989ns ± 4% 845ns ± 2% -14.51% (p=0.000 n=20+18) name old alloc/op new alloc/op delta types.VoteSignBytes-8 792B ± 0% 600B ± 0% -24.24% (p=0.000 n=20+20) types.CommitVoteSignBytes-8 9.52kB ± 0% 7.60kB ± 0% -20.17% (p=0.000 n=20+20) protoio.MarshalDelimitedNoMarshalTo-8 808B ± 0% 440B ± 0% -45.54% (p=0.000 n=20+20) name old allocs/op new allocs/op delta types.VoteSignBytes-8 13.0 ± 0% 10.0 ± 0% -23.08% (p=0.000 n=20+20) types.CommitVoteSignBytes-8 140 ± 0% 110 ± 0% -21.43% (p=0.000 n=20+20) protoio.MarshalDelimitedNoMarshalTo-8 10.0 ± 0% 7.0 ± 0% -30.00% (p=0.000 n=20+20) ``` Thanks to Tharsis who tasked me to help them increase TPS and who are keen on improving Tendermint and efficiency.
3 years ago
internal/libs/protoio: optimize MarshalDelimited by plain byteslice allocations+sync.Pool (#7325) Noticed in profiles that invoking *VoteSignBytes always created a bytes.Buffer, then discarded it inside protoio.MarshalDelimited. I dug further and examined the call paths and noticed that we unconditionally create the bytes.Buffer, even though we might have proto messages (in the common case) that implement MarshalTo([]byte), and invoked varintWriter. Instead by inlining this case, we skip a bunch of allocations and CPU cycles, which then reflects properly on all calling functions. Here are the benchmark results: ```shell $ benchstat before.txt after.txt name old time/op new time/op delta types.VoteSignBytes-8 705ns ± 3% 573ns ± 6% -18.74% (p=0.000 n=18+20) types.CommitVoteSignBytes-8 8.15µs ± 9% 6.81µs ± 4% -16.51% (p=0.000 n=20+19) protoio.MarshalDelimitedWithMarshalTo-8 788ns ± 8% 772ns ± 3% -2.01% (p=0.050 n=20+20) protoio.MarshalDelimitedNoMarshalTo-8 989ns ± 4% 845ns ± 2% -14.51% (p=0.000 n=20+18) name old alloc/op new alloc/op delta types.VoteSignBytes-8 792B ± 0% 600B ± 0% -24.24% (p=0.000 n=20+20) types.CommitVoteSignBytes-8 9.52kB ± 0% 7.60kB ± 0% -20.17% (p=0.000 n=20+20) protoio.MarshalDelimitedNoMarshalTo-8 808B ± 0% 440B ± 0% -45.54% (p=0.000 n=20+20) name old allocs/op new allocs/op delta types.VoteSignBytes-8 13.0 ± 0% 10.0 ± 0% -23.08% (p=0.000 n=20+20) types.CommitVoteSignBytes-8 140 ± 0% 110 ± 0% -21.43% (p=0.000 n=20+20) protoio.MarshalDelimitedNoMarshalTo-8 10.0 ± 0% 7.0 ± 0% -30.00% (p=0.000 n=20+20) ``` Thanks to Tharsis who tasked me to help them increase TPS and who are keen on improving Tendermint and efficiency.
3 years ago
  1. package protoio_test
  2. import (
  3. "testing"
  4. "time"
  5. "github.com/gogo/protobuf/proto"
  6. "github.com/stretchr/testify/require"
  7. "github.com/tendermint/tendermint/crypto"
  8. "github.com/tendermint/tendermint/crypto/tmhash"
  9. "github.com/tendermint/tendermint/internal/libs/protoio"
  10. tmproto "github.com/tendermint/tendermint/proto/tendermint/types"
  11. "github.com/tendermint/tendermint/types"
  12. )
  13. func aVote(t testing.TB) *types.Vote {
  14. t.Helper()
  15. var stamp, err = time.Parse(types.TimeFormat, "2017-12-25T03:00:01.234Z")
  16. require.NoError(t, err)
  17. return &types.Vote{
  18. Type: tmproto.SignedMsgType(byte(tmproto.PrevoteType)),
  19. Height: 12345,
  20. Round: 2,
  21. Timestamp: stamp,
  22. BlockID: types.BlockID{
  23. Hash: tmhash.Sum([]byte("blockID_hash")),
  24. PartSetHeader: types.PartSetHeader{
  25. Total: 1000000,
  26. Hash: tmhash.Sum([]byte("blockID_part_set_header_hash")),
  27. },
  28. },
  29. ValidatorAddress: crypto.AddressHash([]byte("validator_address")),
  30. ValidatorIndex: 56789,
  31. }
  32. }
  33. type excludedMarshalTo struct {
  34. msg proto.Message
  35. }
  36. func (emt *excludedMarshalTo) ProtoMessage() {}
  37. func (emt *excludedMarshalTo) String() string {
  38. return emt.msg.String()
  39. }
  40. func (emt *excludedMarshalTo) Reset() {
  41. emt.msg.Reset()
  42. }
  43. func (emt *excludedMarshalTo) Marshal() ([]byte, error) {
  44. return proto.Marshal(emt.msg)
  45. }
  46. var _ proto.Message = (*excludedMarshalTo)(nil)
  47. var sink interface{}
  48. func BenchmarkMarshalDelimitedWithMarshalTo(b *testing.B) {
  49. msgs := []proto.Message{
  50. aVote(b).ToProto(),
  51. }
  52. benchmarkMarshalDelimited(b, msgs)
  53. }
  54. func BenchmarkMarshalDelimitedNoMarshalTo(b *testing.B) {
  55. msgs := []proto.Message{
  56. &excludedMarshalTo{aVote(b).ToProto()},
  57. }
  58. benchmarkMarshalDelimited(b, msgs)
  59. }
  60. func benchmarkMarshalDelimited(b *testing.B, msgs []proto.Message) {
  61. b.ResetTimer()
  62. b.ReportAllocs()
  63. for i := 0; i < b.N; i++ {
  64. for _, msg := range msgs {
  65. blob, err := protoio.MarshalDelimited(msg)
  66. require.Nil(b, err)
  67. sink = blob
  68. }
  69. }
  70. if sink == nil {
  71. b.Fatal("Benchmark did not run")
  72. }
  73. // Reset the sink.
  74. sink = (interface{})(nil)
  75. }