You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We have had an issue with java-driver reaching end of execution plan and throwing NoNodeAvailableException.
The problem is that when user get this error there is no information in it, beside the fact that end of execution plan has been reached.
Most of the PROD environments have log rate reducing technics in place, like: log sampling, filtering, deduplication, supression.
Due to that, it is common problem that is not possible to figure out why exactly this exception was thrown by just looking at the error message and/or at the logs.
Which causes extra load on both customer engeneering team, our support and engeneering team.
To mitigate this issue in all the drivers the following is proposed to enrich error/exception with following information(pick only that is relevant for given error):
List of the nodes in the cluster (including their status,dc,rack)
List of connections to the replicas (including host, rack, dc, shard)
List of prior errors (if query has been tried to execute on one host, and was switched to another due to the error, show all these errors if end of execution plan is reached).
History of topology changes. (Nodes being UP/DOWN with timestamps)
Replica set information source (tablet/vnode/other)
Node/connection overload status (Status itself, if peresent or queries in flight)
We can include that information into any query error, or into spefic errors, such as timeouts, empty execution plan error, end of execution plan error, or no connections available error.
While doing that we should be aware that clusters potentially could have many nodes (>60) and therefore node status information should be reduced by the following logic:
Add status for nodes that are relevant to the query (based on replica set, dc, rack)
Status for the rest of the cluster we should group by dc/rack/node-status(UP/DOWN)
In order to avoid excessive load we might want to have reducing logic, say to include that information only once a minute.
The text was updated successfully, but these errors were encountered:
We have had an issue with
java-driver
reaching end of execution plan and throwingNoNodeAvailableException
.The problem is that when user get this error there is no information in it, beside the fact that end of execution plan has been reached.
Most of the
PROD
environments have log rate reducing technics in place, like: log sampling, filtering, deduplication, supression.Due to that, it is common problem that is not possible to figure out why exactly this exception was thrown by just looking at the error message and/or at the logs.
Which causes extra load on both customer engeneering team, our support and engeneering team.
To mitigate this issue in all the drivers the following is proposed to enrich error/exception with following information(pick only that is relevant for given error):
We can include that information into any query error, or into spefic errors, such as timeouts, empty execution plan error, end of execution plan error, or no connections available error.
While doing that we should be aware that clusters potentially could have many nodes (>60) and therefore node status information should be reduced by the following logic:
In order to avoid excessive load we might want to have reducing logic, say to include that information only once a minute.
The text was updated successfully, but these errors were encountered: