You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

134 lines
3.9 KiB

internal/libs/protoio: optimize MarshalDelimited by plain byteslice allocations+sync.Pool (#7325) Noticed in profiles that invoking *VoteSignBytes always created a bytes.Buffer, then discarded it inside protoio.MarshalDelimited. I dug further and examined the call paths and noticed that we unconditionally create the bytes.Buffer, even though we might have proto messages (in the common case) that implement MarshalTo([]byte), and invoked varintWriter. Instead by inlining this case, we skip a bunch of allocations and CPU cycles, which then reflects properly on all calling functions. Here are the benchmark results: ```shell $ benchstat before.txt after.txt name old time/op new time/op delta types.VoteSignBytes-8 705ns ± 3% 573ns ± 6% -18.74% (p=0.000 n=18+20) types.CommitVoteSignBytes-8 8.15µs ± 9% 6.81µs ± 4% -16.51% (p=0.000 n=20+19) protoio.MarshalDelimitedWithMarshalTo-8 788ns ± 8% 772ns ± 3% -2.01% (p=0.050 n=20+20) protoio.MarshalDelimitedNoMarshalTo-8 989ns ± 4% 845ns ± 2% -14.51% (p=0.000 n=20+18) name old alloc/op new alloc/op delta types.VoteSignBytes-8 792B ± 0% 600B ± 0% -24.24% (p=0.000 n=20+20) types.CommitVoteSignBytes-8 9.52kB ± 0% 7.60kB ± 0% -20.17% (p=0.000 n=20+20) protoio.MarshalDelimitedNoMarshalTo-8 808B ± 0% 440B ± 0% -45.54% (p=0.000 n=20+20) name old allocs/op new allocs/op delta types.VoteSignBytes-8 13.0 ± 0% 10.0 ± 0% -23.08% (p=0.000 n=20+20) types.CommitVoteSignBytes-8 140 ± 0% 110 ± 0% -21.43% (p=0.000 n=20+20) protoio.MarshalDelimitedNoMarshalTo-8 10.0 ± 0% 7.0 ± 0% -30.00% (p=0.000 n=20+20) ``` Thanks to Tharsis who tasked me to help them increase TPS and who are keen on improving Tendermint and efficiency.
3 years ago
internal/libs/protoio: optimize MarshalDelimited by plain byteslice allocations+sync.Pool (#7325) Noticed in profiles that invoking *VoteSignBytes always created a bytes.Buffer, then discarded it inside protoio.MarshalDelimited. I dug further and examined the call paths and noticed that we unconditionally create the bytes.Buffer, even though we might have proto messages (in the common case) that implement MarshalTo([]byte), and invoked varintWriter. Instead by inlining this case, we skip a bunch of allocations and CPU cycles, which then reflects properly on all calling functions. Here are the benchmark results: ```shell $ benchstat before.txt after.txt name old time/op new time/op delta types.VoteSignBytes-8 705ns ± 3% 573ns ± 6% -18.74% (p=0.000 n=18+20) types.CommitVoteSignBytes-8 8.15µs ± 9% 6.81µs ± 4% -16.51% (p=0.000 n=20+19) protoio.MarshalDelimitedWithMarshalTo-8 788ns ± 8% 772ns ± 3% -2.01% (p=0.050 n=20+20) protoio.MarshalDelimitedNoMarshalTo-8 989ns ± 4% 845ns ± 2% -14.51% (p=0.000 n=20+18) name old alloc/op new alloc/op delta types.VoteSignBytes-8 792B ± 0% 600B ± 0% -24.24% (p=0.000 n=20+20) types.CommitVoteSignBytes-8 9.52kB ± 0% 7.60kB ± 0% -20.17% (p=0.000 n=20+20) protoio.MarshalDelimitedNoMarshalTo-8 808B ± 0% 440B ± 0% -45.54% (p=0.000 n=20+20) name old allocs/op new allocs/op delta types.VoteSignBytes-8 13.0 ± 0% 10.0 ± 0% -23.08% (p=0.000 n=20+20) types.CommitVoteSignBytes-8 140 ± 0% 110 ± 0% -21.43% (p=0.000 n=20+20) protoio.MarshalDelimitedNoMarshalTo-8 10.0 ± 0% 7.0 ± 0% -30.00% (p=0.000 n=20+20) ``` Thanks to Tharsis who tasked me to help them increase TPS and who are keen on improving Tendermint and efficiency.
3 years ago
internal/libs/protoio: optimize MarshalDelimited by plain byteslice allocations+sync.Pool (#7325) Noticed in profiles that invoking *VoteSignBytes always created a bytes.Buffer, then discarded it inside protoio.MarshalDelimited. I dug further and examined the call paths and noticed that we unconditionally create the bytes.Buffer, even though we might have proto messages (in the common case) that implement MarshalTo([]byte), and invoked varintWriter. Instead by inlining this case, we skip a bunch of allocations and CPU cycles, which then reflects properly on all calling functions. Here are the benchmark results: ```shell $ benchstat before.txt after.txt name old time/op new time/op delta types.VoteSignBytes-8 705ns ± 3% 573ns ± 6% -18.74% (p=0.000 n=18+20) types.CommitVoteSignBytes-8 8.15µs ± 9% 6.81µs ± 4% -16.51% (p=0.000 n=20+19) protoio.MarshalDelimitedWithMarshalTo-8 788ns ± 8% 772ns ± 3% -2.01% (p=0.050 n=20+20) protoio.MarshalDelimitedNoMarshalTo-8 989ns ± 4% 845ns ± 2% -14.51% (p=0.000 n=20+18) name old alloc/op new alloc/op delta types.VoteSignBytes-8 792B ± 0% 600B ± 0% -24.24% (p=0.000 n=20+20) types.CommitVoteSignBytes-8 9.52kB ± 0% 7.60kB ± 0% -20.17% (p=0.000 n=20+20) protoio.MarshalDelimitedNoMarshalTo-8 808B ± 0% 440B ± 0% -45.54% (p=0.000 n=20+20) name old allocs/op new allocs/op delta types.VoteSignBytes-8 13.0 ± 0% 10.0 ± 0% -23.08% (p=0.000 n=20+20) types.CommitVoteSignBytes-8 140 ± 0% 110 ± 0% -21.43% (p=0.000 n=20+20) protoio.MarshalDelimitedNoMarshalTo-8 10.0 ± 0% 7.0 ± 0% -30.00% (p=0.000 n=20+20) ``` Thanks to Tharsis who tasked me to help them increase TPS and who are keen on improving Tendermint and efficiency.
3 years ago
internal/libs/protoio: optimize MarshalDelimited by plain byteslice allocations+sync.Pool (#7325) Noticed in profiles that invoking *VoteSignBytes always created a bytes.Buffer, then discarded it inside protoio.MarshalDelimited. I dug further and examined the call paths and noticed that we unconditionally create the bytes.Buffer, even though we might have proto messages (in the common case) that implement MarshalTo([]byte), and invoked varintWriter. Instead by inlining this case, we skip a bunch of allocations and CPU cycles, which then reflects properly on all calling functions. Here are the benchmark results: ```shell $ benchstat before.txt after.txt name old time/op new time/op delta types.VoteSignBytes-8 705ns ± 3% 573ns ± 6% -18.74% (p=0.000 n=18+20) types.CommitVoteSignBytes-8 8.15µs ± 9% 6.81µs ± 4% -16.51% (p=0.000 n=20+19) protoio.MarshalDelimitedWithMarshalTo-8 788ns ± 8% 772ns ± 3% -2.01% (p=0.050 n=20+20) protoio.MarshalDelimitedNoMarshalTo-8 989ns ± 4% 845ns ± 2% -14.51% (p=0.000 n=20+18) name old alloc/op new alloc/op delta types.VoteSignBytes-8 792B ± 0% 600B ± 0% -24.24% (p=0.000 n=20+20) types.CommitVoteSignBytes-8 9.52kB ± 0% 7.60kB ± 0% -20.17% (p=0.000 n=20+20) protoio.MarshalDelimitedNoMarshalTo-8 808B ± 0% 440B ± 0% -45.54% (p=0.000 n=20+20) name old allocs/op new allocs/op delta types.VoteSignBytes-8 13.0 ± 0% 10.0 ± 0% -23.08% (p=0.000 n=20+20) types.CommitVoteSignBytes-8 140 ± 0% 110 ± 0% -21.43% (p=0.000 n=20+20) protoio.MarshalDelimitedNoMarshalTo-8 10.0 ± 0% 7.0 ± 0% -30.00% (p=0.000 n=20+20) ``` Thanks to Tharsis who tasked me to help them increase TPS and who are keen on improving Tendermint and efficiency.
3 years ago
  1. // Protocol Buffers for Go with Gadgets
  2. //
  3. // Copyright (c) 2013, The GoGo Authors. All rights reserved.
  4. // http://github.com/gogo/protobuf
  5. //
  6. // Redistribution and use in source and binary forms, with or without
  7. // modification, are permitted provided that the following conditions are
  8. // met:
  9. //
  10. // * Redistributions of source code must retain the above copyright
  11. // notice, this list of conditions and the following disclaimer.
  12. // * Redistributions in binary form must reproduce the above
  13. // copyright notice, this list of conditions and the following disclaimer
  14. // in the documentation and/or other materials provided with the
  15. // distribution.
  16. //
  17. // THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
  18. // "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
  19. // LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
  20. // A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
  21. // OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
  22. // SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
  23. // LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
  24. // DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
  25. // THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
  26. // (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
  27. // OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
  28. //
  29. // Modified from original GoGo Protobuf to return number of bytes written.
  30. package protoio
  31. import (
  32. "bytes"
  33. "encoding/binary"
  34. "io"
  35. "sync"
  36. "github.com/gogo/protobuf/proto"
  37. )
  38. // NewDelimitedWriter writes a varint-delimited Protobuf message to a writer. It is
  39. // equivalent to the gogoproto NewDelimitedWriter, except WriteMsg() also returns the
  40. // number of bytes written, which is necessary in the p2p package.
  41. func NewDelimitedWriter(w io.Writer) WriteCloser {
  42. return &varintWriter{w, make([]byte, binary.MaxVarintLen64), nil}
  43. }
  44. type varintWriter struct {
  45. w io.Writer
  46. lenBuf []byte
  47. buffer []byte
  48. }
  49. func (w *varintWriter) WriteMsg(msg proto.Message) (int, error) {
  50. if m, ok := msg.(marshaler); ok {
  51. n, ok := getSize(m)
  52. if ok {
  53. if n+binary.MaxVarintLen64 >= len(w.buffer) {
  54. w.buffer = make([]byte, n+binary.MaxVarintLen64)
  55. }
  56. lenOff := binary.PutUvarint(w.buffer, uint64(n))
  57. _, err := m.MarshalTo(w.buffer[lenOff:])
  58. if err != nil {
  59. return 0, err
  60. }
  61. _, err = w.w.Write(w.buffer[:lenOff+n])
  62. return lenOff + n, err
  63. }
  64. }
  65. // fallback
  66. data, err := proto.Marshal(msg)
  67. if err != nil {
  68. return 0, err
  69. }
  70. length := uint64(len(data))
  71. n := binary.PutUvarint(w.lenBuf, length)
  72. _, err = w.w.Write(w.lenBuf[:n])
  73. if err != nil {
  74. return 0, err
  75. }
  76. _, err = w.w.Write(data)
  77. return len(data) + n, err
  78. }
  79. func (w *varintWriter) Close() error {
  80. if closer, ok := w.w.(io.Closer); ok {
  81. return closer.Close()
  82. }
  83. return nil
  84. }
  85. func varintWrittenBytes(m marshaler, size int) ([]byte, error) {
  86. buf := make([]byte, size+binary.MaxVarintLen64)
  87. n := binary.PutUvarint(buf, uint64(size))
  88. nw, err := m.MarshalTo(buf[n:])
  89. if err != nil {
  90. return nil, err
  91. }
  92. return buf[:n+nw], nil
  93. }
  94. var bufPool = &sync.Pool{
  95. New: func() interface{} {
  96. return new(bytes.Buffer)
  97. },
  98. }
  99. func MarshalDelimited(msg proto.Message) ([]byte, error) {
  100. // The goal here is to write proto message as is knowning already if
  101. // the exact size can be retrieved and if so just use that.
  102. if m, ok := msg.(marshaler); ok {
  103. size, ok := getSize(msg)
  104. if ok {
  105. return varintWrittenBytes(m, size)
  106. }
  107. }
  108. // Otherwise, go down the route of using proto.Marshal,
  109. // and use the buffer pool to retrieve a writer.
  110. buf := bufPool.Get().(*bytes.Buffer)
  111. defer bufPool.Put(buf)
  112. buf.Reset()
  113. _, err := NewDelimitedWriter(buf).WriteMsg(msg)
  114. if err != nil {
  115. return nil, err
  116. }
  117. // Given that we are reusing buffers, we should
  118. // make a copy of the returned bytes.
  119. bytesCopy := make([]byte, buf.Len())
  120. copy(bytesCopy, buf.Bytes())
  121. return bytesCopy, nil
  122. }