You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

174 lines
4.8 KiB

Close and retry a RemoteSigner on err (#2923) * Close and recreate a RemoteSigner on err * Update changelog * Address Anton's comments / suggestions: - update changelog - restart TCPVal - shut down on `ErrUnexpectedResponse` * re-init remote signer client with fresh connection if Ping fails - add/update TODOs in secret connection - rename tcp.go -> tcp_client.go, same with ipc to clarify their purpose * account for `conn returned by waitConnection can be `nil` - also add TODO about RemoteSigner conn field * Tests for retrying: IPC / TCP - shorter info log on success - set conn and use it in tests to close conn * Tests for retrying: IPC / TCP - shorter info log on success - set conn and use it in tests to close conn - add rwmutex for conn field in IPC * comments and doc.go * fix ipc tests. fixes #2677 * use constants for tests * cleanup some error statements * fixes #2784, race in tests * remove print statement * minor fixes from review * update comment on sts spec * cosmetics * p2p/conn: add failing tests * p2p/conn: make SecretConnection thread safe * changelog * IPCVal signer refactor - use a .reset() method - don't use embedded RemoteSignerClient - guard RemoteSignerClient with mutex - drop the .conn - expose Close() on RemoteSignerClient * apply IPCVal refactor to TCPVal * remove mtx from RemoteSignerClient * consolidate IPCVal and TCPVal, fixes #3104 - done in tcp_client.go - now called SocketVal - takes a listener in the constructor - make tcpListener and unixListener contain all the differences * delete ipc files * introduce unix and tcp dialer for RemoteSigner * rename files - drop tcp_ prefix - rename priv_validator.go to file.go * bring back listener options * fix node * fix priv_val_server * fix node test * minor cleanup and comments
6 years ago
Close and retry a RemoteSigner on err (#2923) * Close and recreate a RemoteSigner on err * Update changelog * Address Anton's comments / suggestions: - update changelog - restart TCPVal - shut down on `ErrUnexpectedResponse` * re-init remote signer client with fresh connection if Ping fails - add/update TODOs in secret connection - rename tcp.go -> tcp_client.go, same with ipc to clarify their purpose * account for `conn returned by waitConnection can be `nil` - also add TODO about RemoteSigner conn field * Tests for retrying: IPC / TCP - shorter info log on success - set conn and use it in tests to close conn * Tests for retrying: IPC / TCP - shorter info log on success - set conn and use it in tests to close conn - add rwmutex for conn field in IPC * comments and doc.go * fix ipc tests. fixes #2677 * use constants for tests * cleanup some error statements * fixes #2784, race in tests * remove print statement * minor fixes from review * update comment on sts spec * cosmetics * p2p/conn: add failing tests * p2p/conn: make SecretConnection thread safe * changelog * IPCVal signer refactor - use a .reset() method - don't use embedded RemoteSignerClient - guard RemoteSignerClient with mutex - drop the .conn - expose Close() on RemoteSignerClient * apply IPCVal refactor to TCPVal * remove mtx from RemoteSignerClient * consolidate IPCVal and TCPVal, fixes #3104 - done in tcp_client.go - now called SocketVal - takes a listener in the constructor - make tcpListener and unixListener contain all the differences * delete ipc files * introduce unix and tcp dialer for RemoteSigner * rename files - drop tcp_ prefix - rename priv_validator.go to file.go * bring back listener options * fix node * fix priv_val_server * fix node test * minor cleanup and comments
6 years ago
privval: refactor Remote signers (#3370) This PR is related to #3107 and a continuation of #3351 It is important to emphasise that in the privval original design, client/server and listening/dialing roles are inverted and do not follow a conventional interaction. Given two hosts A and B: Host A is listener/client Host B is dialer/server (contains the secret key) When A requires a signature, it needs to wait for B to dial in before it can issue a request. A only accepts a single connection and any failure leads to dropping the connection and waiting for B to reconnect. The original rationale behind this design was based on security. Host B only allows outbound connections to a list of whitelisted hosts. It is not possible to reach B unless B dials in. There are no listening/open ports in B. This PR results in the following changes: Refactors ping/heartbeat to avoid previously existing race conditions. Separates transport (dialer/listener) from signing (client/server) concerns to simplify workflow. Unifies and abstracts away the differences between unix and tcp sockets. A single signer endpoint implementation unifies connection handling code (read/write/close/connection obj) The signer request handler (server side) is customizable to increase testability. Updates and extends unit tests A high level overview of the classes is as follows: Transport (endpoints): The following classes take care of establishing a connection SignerDialerEndpoint SignerListeningEndpoint SignerEndpoint groups common functionality (read/write/timeouts/etc.) Signing (client/server): The following classes take care of exchanging request/responses SignerClient SignerServer This PR also closes #3601 Commits: * refactoring - work in progress * reworking unit tests * Encapsulating and fixing unit tests * Improve tests * Clean up * Fix/improve unit tests * clean up tests * Improving service endpoint * fixing unit test * fix linter issues * avoid invalid cache values (improve later?) * complete implementation * wip * improved connection loop * Improve reconnections + fixing unit tests * addressing comments * small formatting changes * clean up * Update node/node.go Co-Authored-By: jleni <juan.leni@zondax.ch> * Update privval/signer_client.go Co-Authored-By: jleni <juan.leni@zondax.ch> * Update privval/signer_client_test.go Co-Authored-By: jleni <juan.leni@zondax.ch> * check during initialization * dropping connecting when writing fails * removing break * use t.log instead * unifying and using cmn.GetFreePort() * review fixes * reordering and unifying drop connection * closing instead of signalling * refactored service loop * removed superfluous brackets * GetPubKey can return errors * Revert "GetPubKey can return errors" This reverts commit 68c06f19b4650389d7e5ab1659b318889028202c. * adding entry to changelog * Update CHANGELOG_PENDING.md Co-Authored-By: jleni <juan.leni@zondax.ch> * Update privval/signer_client.go Co-Authored-By: jleni <juan.leni@zondax.ch> * Update privval/signer_dialer_endpoint.go Co-Authored-By: jleni <juan.leni@zondax.ch> * Update privval/signer_dialer_endpoint.go Co-Authored-By: jleni <juan.leni@zondax.ch> * Update privval/signer_dialer_endpoint.go Co-Authored-By: jleni <juan.leni@zondax.ch> * Update privval/signer_dialer_endpoint.go Co-Authored-By: jleni <juan.leni@zondax.ch> * Update privval/signer_listener_endpoint_test.go Co-Authored-By: jleni <juan.leni@zondax.ch> * updating node.go * review fixes * fixes linter * fixing unit test * small fixes in comments * addressing review comments * addressing review comments 2 * reverting suggestion * Update privval/signer_client_test.go Co-Authored-By: Anton Kaliaev <anton.kalyaev@gmail.com> * Update privval/signer_client_test.go Co-Authored-By: Anton Kaliaev <anton.kalyaev@gmail.com> * Update privval/signer_listener_endpoint_test.go Co-Authored-By: Anton Kaliaev <anton.kalyaev@gmail.com> * do not expose brokenSignerDialerEndpoint * clean up logging * unifying methods shorten test time signer also drops * reenabling pings * improving testability + unit test * fixing go fmt + unit test * remove unused code * Addressing review comments * simplifying connection workflow * fix linter/go import issue * using base service quit * updating comment * Simplifying design + adjusting names * fixing linter issues * refactoring test harness + fixes * Addressing review comments * cleaning up * adding additional error check
5 years ago
  1. package main
  2. import (
  3. "context"
  4. "crypto/tls"
  5. "crypto/x509"
  6. "flag"
  7. "fmt"
  8. "net"
  9. "net/http"
  10. "os"
  11. "os/signal"
  12. "syscall"
  13. "time"
  14. grpc_prometheus "github.com/grpc-ecosystem/go-grpc-prometheus"
  15. "github.com/prometheus/client_golang/prometheus"
  16. "github.com/prometheus/client_golang/prometheus/promhttp"
  17. "google.golang.org/grpc"
  18. "google.golang.org/grpc/credentials"
  19. "github.com/tendermint/tendermint/libs/log"
  20. tmnet "github.com/tendermint/tendermint/libs/net"
  21. "github.com/tendermint/tendermint/privval"
  22. grpcprivval "github.com/tendermint/tendermint/privval/grpc"
  23. privvalproto "github.com/tendermint/tendermint/proto/tendermint/privval"
  24. )
  25. var (
  26. // Create a metrics registry.
  27. reg = prometheus.NewRegistry()
  28. // Create some standard server metrics.
  29. grpcMetrics = grpc_prometheus.NewServerMetrics()
  30. )
  31. func main() {
  32. var (
  33. addr = flag.String("addr", "127.0.0.1:26659", "Address to listen on (host:port)")
  34. chainID = flag.String("chain-id", "mychain", "chain id")
  35. privValKeyPath = flag.String("priv-key", "", "priv val key file path")
  36. privValStatePath = flag.String("priv-state", "", "priv val state file path")
  37. insecure = flag.Bool("insecure", false, "allow server to run insecurely (no TLS)")
  38. certFile = flag.String("certfile", "", "absolute path to server certificate")
  39. keyFile = flag.String("keyfile", "", "absolute path to server key")
  40. rootCA = flag.String("rootcafile", "", "absolute path to root CA")
  41. prometheusAddr = flag.String("prometheus-addr", "", "address for prometheus endpoint (host:port)")
  42. )
  43. flag.Parse()
  44. logger, err := log.NewDefaultLogger(log.LogFormatPlain, log.LogLevelInfo)
  45. if err != nil {
  46. fmt.Fprintf(os.Stderr, "failed to construct logger: %v", err)
  47. os.Exit(1)
  48. }
  49. logger = logger.With("module", "priv_val")
  50. ctx, cancel := context.WithCancel(context.Background())
  51. defer cancel()
  52. logger.Info(
  53. "Starting private validator",
  54. "addr", *addr,
  55. "chainID", *chainID,
  56. "privKeyPath", *privValKeyPath,
  57. "privStatePath", *privValStatePath,
  58. "insecure", *insecure,
  59. "certFile", *certFile,
  60. "keyFile", *keyFile,
  61. "rootCA", *rootCA,
  62. )
  63. pv, err := privval.LoadFilePV(*privValKeyPath, *privValStatePath)
  64. if err != nil {
  65. fmt.Fprint(os.Stderr, err)
  66. os.Exit(1)
  67. }
  68. opts := []grpc.ServerOption{}
  69. if !*insecure {
  70. certificate, err := tls.LoadX509KeyPair(*certFile, *keyFile)
  71. if err != nil {
  72. fmt.Fprintf(os.Stderr, "failed to load X509 key pair: %v", err)
  73. os.Exit(1)
  74. }
  75. certPool := x509.NewCertPool()
  76. bs, err := os.ReadFile(*rootCA)
  77. if err != nil {
  78. fmt.Fprintf(os.Stderr, "failed to read client ca cert: %s", err)
  79. os.Exit(1)
  80. }
  81. if ok := certPool.AppendCertsFromPEM(bs); !ok {
  82. fmt.Fprintf(os.Stderr, "failed to append client certs")
  83. os.Exit(1)
  84. }
  85. tlsConfig := &tls.Config{
  86. ClientAuth: tls.RequireAndVerifyClientCert,
  87. Certificates: []tls.Certificate{certificate},
  88. ClientCAs: certPool,
  89. MinVersion: tls.VersionTLS13,
  90. }
  91. creds := grpc.Creds(credentials.NewTLS(tlsConfig))
  92. opts = append(opts, creds)
  93. logger.Info("SignerServer: Creating security credentials")
  94. } else {
  95. logger.Info("SignerServer: You are using an insecure gRPC connection!")
  96. }
  97. // add prometheus metrics for unary RPC calls
  98. opts = append(opts, grpc.UnaryInterceptor(grpc_prometheus.UnaryServerInterceptor))
  99. ss := grpcprivval.NewSignerServer(*chainID, pv, logger)
  100. protocol, address := tmnet.ProtocolAndAddress(*addr)
  101. lis, err := net.Listen(protocol, address)
  102. if err != nil {
  103. fmt.Fprintf(os.Stderr, "SignerServer: Failed to listen %v", err)
  104. os.Exit(1)
  105. }
  106. s := grpc.NewServer(opts...)
  107. privvalproto.RegisterPrivValidatorAPIServer(s, ss)
  108. var httpSrv *http.Server
  109. if *prometheusAddr != "" {
  110. httpSrv = registerPrometheus(*prometheusAddr, s)
  111. }
  112. logger.Info("SignerServer: Starting grpc server")
  113. if err := s.Serve(lis); err != nil {
  114. fmt.Fprintf(os.Stderr, "Unable to listen on port %s: %v", *addr, err)
  115. os.Exit(1)
  116. }
  117. opctx, opcancel := signal.NotifyContext(ctx, os.Interrupt, syscall.SIGTERM)
  118. defer opcancel()
  119. go func() {
  120. <-opctx.Done()
  121. if *prometheusAddr != "" {
  122. ctx, cancel := context.WithTimeout(context.Background(), 1*time.Second)
  123. defer cancel()
  124. if err := httpSrv.Shutdown(ctx); err != nil {
  125. fmt.Fprintf(os.Stderr, "Unable to stop http server: %v", err)
  126. os.Exit(1)
  127. }
  128. }
  129. s.GracefulStop()
  130. }()
  131. // Run forever.
  132. select {}
  133. }
  134. func registerPrometheus(addr string, s *grpc.Server) *http.Server {
  135. // Initialize all metrics.
  136. grpcMetrics.InitializeMetrics(s)
  137. // create http server to serve prometheus
  138. httpServer := &http.Server{Handler: promhttp.HandlerFor(reg, promhttp.HandlerOpts{}), Addr: addr}
  139. go func() {
  140. if err := httpServer.ListenAndServe(); err != nil {
  141. fmt.Fprintf(os.Stderr, "Unable to start a http server: %v", err)
  142. os.Exit(1)
  143. }
  144. }()
  145. return httpServer
  146. }