-
Notifications
You must be signed in to change notification settings - Fork 694
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
node: add amazon kms and benchmark signers #4168
base: main
Are you sure you want to change the base?
node: add amazon kms and benchmark signers #4168
Conversation
} | ||
} | ||
|
||
func (b *BenchmarkSigner) Sign(ctx context.Context, hash []byte) ([]byte, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will errors skew the data? Should you add a metric for errors and bucket the latency separately or exclude them from the latency numbers?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this! I've added error counters.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Following on from Bruce's comment, you're now tracking error counts but the latency figures aren't grouped by success/fail cases. Should we only observe the latency for a successful operation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've been involved in reviewing this before this PR was open so I want to signal I think this is in a good shape to be seriously considered for merge by other more seasoned maintainers.
That said, @andreclaro and xLabs Nodes team will be soaking this in testnet and report back 🙏🏻
} | ||
} | ||
|
||
func (b *BenchmarkSigner) Sign(ctx context.Context, hash []byte) ([]byte, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Following on from Bruce's comment, you're now tracking error counts but the latency figures aren't grouped by success/fail cases. Should we only observe the latency for a successful operation?
// an error. This is why the region is first extracted from the keyPath. | ||
cfg, err := config.LoadDefaultConfig(timeoutCtx, config.WithDefaultRegion(amazonKmsSigner.region)) | ||
if err != nil { | ||
return nil, errors.New("Failed to load default config") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this error be more descriptive re being related to KMS? i.e. should every error string in this signer include the string "KMS"?
|
||
ecdsaPubkey := ecdsa.PublicKey{ | ||
X: new(big.Int).SetBytes(asn1Pubkey.PublicKey.Bytes[1 : 1+32]), | ||
Y: new(big.Int).SetBytes(asn1Pubkey.PublicKey.Bytes[1+32:]), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comment to explain the byte array indexes?
} | ||
|
||
func (a *AmazonKms) Verify(ctx context.Context, sig []byte, hash []byte) (bool, error) { | ||
timeoutCtx, cancel := context.WithTimeout(ctx, time.Second*15) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use KMS_TIMEOUT
constant here
} | ||
|
||
// adjustBufferSize takes an input buffer and | ||
// a) trims it down to 32 bytes, if the input length is greater than 32, or |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When I read "trim it down to 32 bytes" I would guess the least significant bytes are trimmed, but this trims from the front of the array. Maybe worth making this clearer in the comment?
@@ -236,13 +236,15 @@ func getPublicRPCServiceClient(ctx context.Context, addr string) (*grpc.ClientCo | |||
} | |||
|
|||
func runSignWormchainValidatorAddress(cmd *cobra.Command, args []string) error { | |||
ctx := context.Background() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here and elsewhere in some other source files (e.g. message.go
) you create new contexts without managing their lifecycle...what's the reasoning for this? Are we concerned about resource leaks?
Amazon KMS Guardian Signer
This PR adds an Amazon AWS KMS Guardian signer, allowing observations to be signed using KMS! The new signer can be used by specifying the ARN of the KMS key to use, through the
--guardianSignerUri
commandline argument, as follows:--guardianSignerUri=amazonkms://<ARN>
NOTE For the best possible performance, it is recommended that the Guardian be run from an EC2 instance that is in the same region as the KMS key.
The KMS key's spec should be
ECC_SECG_P256K1
, and should be enabled for signing. In order for the Guardian to authenticate against the KMS service, one of two options are available:~/.aws/credentials
file. (example here).Benchmark Signer
The PR also includes a benchmark signer, which wraps any other signer, logging signing and verification latency to prometheus histograms. External signing services might at times introduce unwanted latency, and if an event occurs where observation processing is particularly slow, the histograms would provide insight into whether or not the signing service is to blame.
NOTE
This is a redo of a previous pull request, which Pires spent time looking at. Below are key points following that review that informs the current state of the code:
Context
s should be supplied to signing, verification and public key retrieval, as these functions potentially interact with 3rd party services that could timeout or block indefinitely.GuardianSigner
constructor (NewGuardianSignerFromUri
) accepts aContext
, as the new signer might need to interact with the 3rd party service to validate configurations (such as theAmazonKms
signer). A different approach could be to have the constructor define its own context, to avoid the necessity of passing a context to it.