-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sql: disable gossip-based physical planning by default #135034
Conversation
It looks like your PR touches production code but doesn't add or edit any test code. Did you consider adding tests to your PR? 🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf. |
e0e55c6
to
7c81ac8
Compare
7c81ac8
to
9a0aaa9
Compare
Now that we improved handling of draining procedure in the virtual clusters in 24.3 timeframe, I believe it's time to begin fully deprecating the gossip-based physical planning, and as the first step this commit introduces a cluster setting to control which method is used in single-tenant deployments (instance-based planning being the default). I plan to have the setting as an escape hatch in case we find problems when rolling out this change and to remove the setting altogether after 25.2. One of the differences between two planning methods is slightly adjusted in this commit. In particular, in the instance-based planning we stitch together consecutive spans whenever `EnsureSafeSplitKey` for the keys in the spans returns the same prefix. The idea is that spans corresponding to different column families of the same SQL row must be placed on the same instance. However, the condition on the length of the keys returned by `EnsureSafeSplitKey` is too broad and captures more cases than needed (i.e. two keys that cannot be part of the same SQL row end up in the same `SpanPartition`). To partially alleviate this difference, this commit introduces "boundary granularity" knob which indicates whether consecutive spans _might_ be part of the same SQL row, and we now use the stitching logic only if so. All callers of `PartitionSpans` have been audited, and we need the stitching logic only in two call sites that correspond to planning of table readers. Longer term, if we change the encoding so that column family IDs are encoded in a special way, we'll be able to tell definitively whether stitching is needed simply based on the key itself, and this additional logic could be removed. Epic: None Release note: None
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 9 of 9 files at r1, all commit messages.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @michae2 and @yuzefovich)
pkg/sql/fingerprint_span.go
line 121 at r1 (raw file):
} spanPartitions, err := dsp.PartitionSpans(ctx, planCtx, []roachpb.Span{span}, PartitionSpansBoundDefault)
Note: we can reach here through the builtin function crdb_internal.fingerprint()
pkg/sql/distsql_physical_planner.go
line 1567 at r1 (raw file):
// address #112702, we'll be able to make the check here precise. if safeKey, err := keys.EnsureSafeSplitKey(span.Key); err == nil && len(safeKey) > 0 { if safeKey.Equal(lastKey) {
I know this is how it was already, but why don't we compare the key against the last end key (if it's set) instead? Won't this only catch the case when the preceding span covers a single row?
pkg/sql/distsql_physical_planner.go
line 1569 at r1 (raw file):
if safeKey.Equal(lastKey) { if log.V(1) { log.Infof(ctx, "stitchin span %s into the previous span partition", span)
[nit] stitchin
-> stitching
9a0aaa9
to
5507067
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @DrewKimball and @michae2)
pkg/sql/distsql_physical_planner.go
line 1567 at r1 (raw file):
Previously, DrewKimball (Drew Kimball) wrote…
I know this is how it was already, but why don't we compare the key against the last end key (if it's set) instead? Won't this only catch the case when the preceding span covers a single row?
Not sure I understand your question and proposal, could you please provide more context?
The current code works in the following way:
- if start keys for two consecutive spans have the same "safe split key", then we think that two spans might be from the same SQL row, so we must put the spans into the same
spanPartition
. - the code assumes that it's only possible to break a single SQL row into multiple spans IF the original row-covering span didn't include any other rows. In other words, the code assumes that something like
(roachpb.Span[row1_start, row2_middle], roachpb.Span[row2_middle, row2_end])
will never be passed as an argument toPartitionSpans
. This is the case since we only operate at column-family-level spans in thespan.Builder
where we break a single-row-span into multiple spans when some CFs are not needed.
IIUC you're concerned about the case I described above, right?
pkg/sql/distsql_physical_planner.go
line 1569 at r1 (raw file):
Previously, DrewKimball (Drew Kimball) wrote…
[nit]
stitchin
->stitching
Done.
pkg/sql/fingerprint_span.go
line 121 at r1 (raw file):
Previously, DrewKimball (Drew Kimball) wrote…
Note: we can reach here through the builtin function
crdb_internal.fingerprint()
Good point. This builtin was added in #91124 and suggests that the main usage for it is for internal testing of cluster-to-cluster replication. I looked over all the call sites, and it still remains the case in the code base, and we use this builtin for ensuring correctness of tenant replication, so way above the column families boundary within a single SQL row. It's possible that a user will issue a query with such a span argument that falls within a row, but it seems ok to me to ignore this case given that this is crdb_internal.*
builtin that is not documented.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @michae2 and @yuzefovich)
pkg/sql/distsql_physical_planner.go
line 1567 at r1 (raw file):
the code assumes that it's only possible to break a single SQL row into multiple spans IF the original row-covering span didn't include any other rows.
This is what I was missing - in that case, ignore my comment.
pkg/sql/fingerprint_span.go
line 121 at r1 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
Good point. This builtin was added in #91124 and suggests that the main usage for it is for internal testing of cluster-to-cluster replication. I looked over all the call sites, and it still remains the case in the code base, and we use this builtin for ensuring correctness of tenant replication, so way above the column families boundary within a single SQL row. It's possible that a user will issue a query with such a span argument that falls within a row, but it seems ok to me to ignore this case given that this is
crdb_internal.*
builtin that is not documented.
SGTM
TFTR! bors r+ |
135034: sql: disable gossip-based physical planning by default r=yuzefovich a=yuzefovich Now that we improved handling of draining procedure in the virtual clusters in 24.3 timeframe, I believe it's time to begin fully deprecating the gossip-based physical planning, and as the first step this commit introduces a cluster setting to control which method is used in single-tenant deployments (instance-based planning being the default). I plan to have the setting as an escape hatch in case we find problems when rolling out this change and to remove the setting altogether after 25.2. One of the differences between two planning methods is slightly adjusted in this commit. In particular, in the instance-based planning we stitch together consecutive spans whenever `EnsureSafeSplitKey` for the keys in the spans returns the same prefix. The idea is that spans corresponding to different column families of the same SQL row must be placed on the same instance. However, the condition on the length of the keys returned by `EnsureSafeSplitKey` is too broad and captures more cases than needed (i.e. two keys that cannot be part of the same SQL row end up in the same `SpanPartition`). To partially alleviate this difference, this commit introduces "boundary granularity" knob which indicates whether consecutive spans _might_ be part of the same SQL row, and we now use the stitching logic only if so. All callers of `PartitionSpans` have been audited, and we need the stitching logic only in two call sites that correspond to planning of table readers. Longer term, if we change the encoding so that column family IDs are encoded in a special way, we'll be able to tell definitively whether stitching is needed simply based on the key itself, and this additional logic could be removed. Epic: None Release note: None 135852: sql: avoid unnecessary singleflight requests for role membership cache r=rafiss a=rafiss See individual commits for refactors that make the enhancement easier. --- Since we use a query that scans the whole role_members table to populate the membership cache, it is wasteful to launch multiple singleflight requests for different users. This patch changes the cache refresh logic of the singleflight so that it always populates the entire cache, which means there will only ever be one in-flight query per each node. ``` $ benchdiff --old 98b0521 --new 26daef2 . -r BenchmarkUseManyRoles pkg=1/1 iter=10/10 cockroachdb/cockroach/pkg/bench/rttanalysis \ name old time/op new time/op delta UseManyRoles/use_50_roles-10 225ms ± 6% 114ms ± 2% -49.43% (p=0.000 n=10+10) UseManyRoles/use_2_roles-10 10.7ms ± 4% 7.4ms ± 3% -30.74% (p=0.000 n=10+8) name old roundtrips new roundtrips delta UseManyRoles/use_2_roles-10 9.00 ± 0% 5.00 ± 0% -44.44% (p=0.000 n=10+10) UseManyRoles/use_50_roles-10 3.00 ± 0% 3.00 ± 0% ~ (all equal) name old alloc/op new alloc/op delta UseManyRoles/use_50_roles-10 438MB ± 0% 377MB ± 1% -13.88% (p=0.000 n=9+10) UseManyRoles/use_2_roles-10 20.7MB ± 2% 19.1MB ± 1% -7.46% (p=0.000 n=10+10) name old allocs/op new allocs/op delta UseManyRoles/use_50_roles-10 881k ± 0% 398k ± 4% -54.81% (p=0.000 n=8+10) UseManyRoles/use_2_roles-10 54.4k ± 8% 36.0k ± 5% -33.85% (p=0.000 n=10+10) ``` fixes #135931 Release note (performance improvement): Improved the internal caching logic for role membership information. This reduces the latency impact of commands such as `DROP ROLE`, `CREATE ROLE`, and `GRANT role TO user`, which cause the role membership cache to be invalidated. 136115: roachtest: rm vmod logging in rebalance/by-load/* tests r=arulajmani a=kvoli Historically, the `rebalance/by-load/*` roachtests were difficult to track failure down for. To aid investigation, we bumped the vmodule verbosity to the maximum level in relevant test files. The default logging and metrics have since improved. We have also noticed that while vmodule is enabled, there can be skewed resource usage as a result which is not observed by the replica. Disable the verbose logging, we will rely on the defaults for future failures. Note, not all failures linked below were due to vmodule logging. However, this change will remove some degree of noise. Part of: #133054 Part of: #132633 Part of: #135055 Part of: #135869 Part of: #135811 Part of: #133004 Part of: #135791 Part of: #132019 Release note: None 136116: roachtest: rm redundant cpu profiling in rebalance/by-load tests r=arulajmani a=kvoli After #97699 was resolved, the cpu profiling parameter declaration in the `rebalance/by-load/*` roachtests became redundant. Remove the declaration and use the default settings instead. Informs: #133054 Informs: #132633 Informs: #135055 Informs: #135869 Informs: #135811 Informs: #133004 Informs: #135791 Informs: #132019 Release note: None Co-authored-by: Yahor Yuzefovich <[email protected]> Co-authored-by: Rafi Shamim <[email protected]> Co-authored-by: Austen McClernon <[email protected]>
Build failed (retrying...): |
Now that we improved handling of draining procedure in the virtual clusters in 24.3 timeframe, I believe it's time to begin fully deprecating the gossip-based physical planning, and as the first step this commit introduces a cluster setting to control which method is used in single-tenant deployments (instance-based planning being the default). I plan to have the setting as an escape hatch in case we find problems when rolling out this change and to remove the setting altogether after 25.2.
One of the differences between two planning methods is slightly adjusted in this commit. In particular, in the instance-based planning we stitch together consecutive spans whenever
EnsureSafeSplitKey
for the keys in the spans returns the same prefix. The idea is that spans corresponding to different column families of the same SQL row must be placed on the same instance. However, the condition on the length of the keys returned byEnsureSafeSplitKey
is too broad and captures more cases than needed (i.e. two keys that cannot be part of the same SQL row end up in the sameSpanPartition
). To partially alleviate this difference, this commit introduces "boundary granularity" knob which indicates whether consecutive spans might be part of the same SQL row, and we now use the stitching logic only if so. All callers ofPartitionSpans
have been audited, and we need the stitching logic only in two call sites that correspond to planning of table readers. Longer term, if we change the encoding so that column family IDs are encoded in a special way, we'll be able to tell definitively whether stitching is needed simply based on the key itself, and this additional logic could be removed.Epic: None
Release note: None