Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
note: to be rebased after #180The PR aims to replace the current readiness logic. Current readiness logic is based on the fact the 3rd party storage such as AWS Dynamo and S3 are able to respond. This is good for low traffic service, but not good enough for the intense traffic of the
bitswap-peer
service.The proposed readiness logic is going to consider:
Where active connections and and pending request blocks are hard limits got from production metrics, ELU is a index of the node inner process "busyness", so when one of them pass the limit, we can consider the single instance "busy" and stop to send it more request, until it solves the pending load. Note the ELU is more significant than memory and cpu usage - we're going to add them to the readiness logic in the future, if needed.
Open question: should be consider Dynamo and S3 access for readiness?
What we want to really avoid is this situation, where the service is busy responding (for ~10 minutes at ~15.00) but it still receive requests
More info: https://www.nearform.com/blog/event-loop-utilization-with-hpa/
Also note, the next step for bitswap scalability will be to implement on the k8s load balancer to scale services based on custom parameters, that will be the same (active connections, pending blocks, ELU) exposed on the
/load
endpoint