sql: disable gossip-based physical planning by default #135034

yuzefovich · 2024-11-12T22:18:20Z

Now that we improved handling of draining procedure in the virtual clusters in 24.3 timeframe, I believe it's time to begin fully deprecating the gossip-based physical planning, and as the first step this commit introduces a cluster setting to control which method is used in single-tenant deployments (instance-based planning being the default). I plan to have the setting as an escape hatch in case we find problems when rolling out this change and to remove the setting altogether after 25.2.

One of the differences between two planning methods is slightly adjusted in this commit. In particular, in the instance-based planning we stitch together consecutive spans whenever EnsureSafeSplitKey for the keys in the spans returns the same prefix. The idea is that spans corresponding to different column families of the same SQL row must be placed on the same instance. However, the condition on the length of the keys returned by EnsureSafeSplitKey is too broad and captures more cases than needed (i.e. two keys that cannot be part of the same SQL row end up in the same SpanPartition). To partially alleviate this difference, this commit introduces "boundary granularity" knob which indicates whether consecutive spans might be part of the same SQL row, and we now use the stitching logic only if so. All callers of PartitionSpans have been audited, and we need the stitching logic only in two call sites that correspond to planning of table readers. Longer term, if we change the encoding so that column family IDs are encoded in a special way, we'll be able to tell definitively whether stitching is needed simply based on the key itself, and this additional logic could be removed.

Epic: None

Release note: None

blathers-crl · 2024-11-12T22:18:23Z

It looks like your PR touches production code but doesn't add or edit any test code. Did you consider adding tests to your PR?

_{🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.}

cockroach-teamcity · 2024-11-12T22:18:29Z

This change is

Now that we improved handling of draining procedure in the virtual clusters in 24.3 timeframe, I believe it's time to begin fully deprecating the gossip-based physical planning, and as the first step this commit introduces a cluster setting to control which method is used in single-tenant deployments (instance-based planning being the default). I plan to have the setting as an escape hatch in case we find problems when rolling out this change and to remove the setting altogether after 25.2. One of the differences between two planning methods is slightly adjusted in this commit. In particular, in the instance-based planning we stitch together consecutive spans whenever `EnsureSafeSplitKey` for the keys in the spans returns the same prefix. The idea is that spans corresponding to different column families of the same SQL row must be placed on the same instance. However, the condition on the length of the keys returned by `EnsureSafeSplitKey` is too broad and captures more cases than needed (i.e. two keys that cannot be part of the same SQL row end up in the same `SpanPartition`). To partially alleviate this difference, this commit introduces "boundary granularity" knob which indicates whether consecutive spans _might_ be part of the same SQL row, and we now use the stitching logic only if so. All callers of `PartitionSpans` have been audited, and we need the stitching logic only in two call sites that correspond to planning of table readers. Longer term, if we change the encoding so that column family IDs are encoded in a special way, we'll be able to tell definitively whether stitching is needed simply based on the key itself, and this additional logic could be removed. Epic: None Release note: None

DrewKimball

Nice!

Reviewed 9 of 9 files at r1, all commit messages.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @michae2 and @yuzefovich)

pkg/sql/fingerprint_span.go line 121 at r1 (raw file):

	}

	spanPartitions, err := dsp.PartitionSpans(ctx, planCtx, []roachpb.Span{span}, PartitionSpansBoundDefault)

Note: we can reach here through the builtin function crdb_internal.fingerprint()

pkg/sql/distsql_physical_planner.go line 1567 at r1 (raw file):

			// address #112702, we'll be able to make the check here precise.
			if safeKey, err := keys.EnsureSafeSplitKey(span.Key); err == nil && len(safeKey) > 0 {
				if safeKey.Equal(lastKey) {

I know this is how it was already, but why don't we compare the key against the last end key (if it's set) instead? Won't this only catch the case when the preceding span covers a single row?

pkg/sql/distsql_physical_planner.go line 1569 at r1 (raw file):

				if safeKey.Equal(lastKey) {
					if log.V(1) {
						log.Infof(ctx, "stitchin span %s into the previous span partition", span)

[nit] stitchin -> stitching

yuzefovich

Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @DrewKimball and @michae2)

pkg/sql/distsql_physical_planner.go line 1567 at r1 (raw file):

Previously, DrewKimball (Drew Kimball) wrote…

I know this is how it was already, but why don't we compare the key against the last end key (if it's set) instead? Won't this only catch the case when the preceding span covers a single row?

Not sure I understand your question and proposal, could you please provide more context?

The current code works in the following way:

if start keys for two consecutive spans have the same "safe split key", then we think that two spans might be from the same SQL row, so we must put the spans into the same spanPartition.
the code assumes that it's only possible to break a single SQL row into multiple spans IF the original row-covering span didn't include any other rows. In other words, the code assumes that something like (roachpb.Span[row1_start, row2_middle], roachpb.Span[row2_middle, row2_end]) will never be passed as an argument to PartitionSpans. This is the case since we only operate at column-family-level spans in the span.Builder where we break a single-row-span into multiple spans when some CFs are not needed.

IIUC you're concerned about the case I described above, right?

pkg/sql/distsql_physical_planner.go line 1569 at r1 (raw file):

Previously, DrewKimball (Drew Kimball) wrote…

[nit] stitchin -> stitching

Done.

pkg/sql/fingerprint_span.go line 121 at r1 (raw file):

Previously, DrewKimball (Drew Kimball) wrote…

Note: we can reach here through the builtin function crdb_internal.fingerprint()

Good point. This builtin was added in #91124 and suggests that the main usage for it is for internal testing of cluster-to-cluster replication. I looked over all the call sites, and it still remains the case in the code base, and we use this builtin for ensuring correctness of tenant replication, so way above the column families boundary within a single SQL row. It's possible that a user will issue a query with such a span argument that falls within a row, but it seems ok to me to ignore this case given that this is crdb_internal.* builtin that is not documented.

DrewKimball

Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @michae2 and @yuzefovich)

pkg/sql/distsql_physical_planner.go line 1567 at r1 (raw file):

the code assumes that it's only possible to break a single SQL row into multiple spans IF the original row-covering span didn't include any other rows.

This is what I was missing - in that case, ignore my comment.

pkg/sql/fingerprint_span.go line 121 at r1 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

Good point. This builtin was added in #91124 and suggests that the main usage for it is for internal testing of cluster-to-cluster replication. I looked over all the call sites, and it still remains the case in the code base, and we use this builtin for ensuring correctness of tenant replication, so way above the column families boundary within a single SQL row. It's possible that a user will issue a query with such a span argument that falls within a row, but it seems ok to me to ignore this case given that this is crdb_internal.* builtin that is not documented.

SGTM

yuzefovich · 2024-11-25T19:43:29Z

TFTR!

bors r+

135034: sql: disable gossip-based physical planning by default r=yuzefovich a=yuzefovich Now that we improved handling of draining procedure in the virtual clusters in 24.3 timeframe, I believe it's time to begin fully deprecating the gossip-based physical planning, and as the first step this commit introduces a cluster setting to control which method is used in single-tenant deployments (instance-based planning being the default). I plan to have the setting as an escape hatch in case we find problems when rolling out this change and to remove the setting altogether after 25.2. One of the differences between two planning methods is slightly adjusted in this commit. In particular, in the instance-based planning we stitch together consecutive spans whenever `EnsureSafeSplitKey` for the keys in the spans returns the same prefix. The idea is that spans corresponding to different column families of the same SQL row must be placed on the same instance. However, the condition on the length of the keys returned by `EnsureSafeSplitKey` is too broad and captures more cases than needed (i.e. two keys that cannot be part of the same SQL row end up in the same `SpanPartition`). To partially alleviate this difference, this commit introduces "boundary granularity" knob which indicates whether consecutive spans _might_ be part of the same SQL row, and we now use the stitching logic only if so. All callers of `PartitionSpans` have been audited, and we need the stitching logic only in two call sites that correspond to planning of table readers. Longer term, if we change the encoding so that column family IDs are encoded in a special way, we'll be able to tell definitively whether stitching is needed simply based on the key itself, and this additional logic could be removed. Epic: None Release note: None 135852: sql: avoid unnecessary singleflight requests for role membership cache r=rafiss a=rafiss See individual commits for refactors that make the enhancement easier. --- Since we use a query that scans the whole role_members table to populate the membership cache, it is wasteful to launch multiple singleflight requests for different users. This patch changes the cache refresh logic of the singleflight so that it always populates the entire cache, which means there will only ever be one in-flight query per each node. ``` $ benchdiff --old 98b0521 --new 26daef2 . -r BenchmarkUseManyRoles pkg=1/1 iter=10/10 cockroachdb/cockroach/pkg/bench/rttanalysis \ name old time/op new time/op delta UseManyRoles/use_50_roles-10 225ms ± 6% 114ms ± 2% -49.43% (p=0.000 n=10+10) UseManyRoles/use_2_roles-10 10.7ms ± 4% 7.4ms ± 3% -30.74% (p=0.000 n=10+8) name old roundtrips new roundtrips delta UseManyRoles/use_2_roles-10 9.00 ± 0% 5.00 ± 0% -44.44% (p=0.000 n=10+10) UseManyRoles/use_50_roles-10 3.00 ± 0% 3.00 ± 0% ~ (all equal) name old alloc/op new alloc/op delta UseManyRoles/use_50_roles-10 438MB ± 0% 377MB ± 1% -13.88% (p=0.000 n=9+10) UseManyRoles/use_2_roles-10 20.7MB ± 2% 19.1MB ± 1% -7.46% (p=0.000 n=10+10) name old allocs/op new allocs/op delta UseManyRoles/use_50_roles-10 881k ± 0% 398k ± 4% -54.81% (p=0.000 n=8+10) UseManyRoles/use_2_roles-10 54.4k ± 8% 36.0k ± 5% -33.85% (p=0.000 n=10+10) ``` fixes #135931 Release note (performance improvement): Improved the internal caching logic for role membership information. This reduces the latency impact of commands such as `DROP ROLE`, `CREATE ROLE`, and `GRANT role TO user`, which cause the role membership cache to be invalidated. 136115: roachtest: rm vmod logging in rebalance/by-load/* tests r=arulajmani a=kvoli Historically, the `rebalance/by-load/*` roachtests were difficult to track failure down for. To aid investigation, we bumped the vmodule verbosity to the maximum level in relevant test files. The default logging and metrics have since improved. We have also noticed that while vmodule is enabled, there can be skewed resource usage as a result which is not observed by the replica. Disable the verbose logging, we will rely on the defaults for future failures. Note, not all failures linked below were due to vmodule logging. However, this change will remove some degree of noise. Part of: #133054 Part of: #132633 Part of: #135055 Part of: #135869 Part of: #135811 Part of: #133004 Part of: #135791 Part of: #132019 Release note: None 136116: roachtest: rm redundant cpu profiling in rebalance/by-load tests r=arulajmani a=kvoli After #97699 was resolved, the cpu profiling parameter declaration in the `rebalance/by-load/*` roachtests became redundant. Remove the declaration and use the default settings instead. Informs: #133054 Informs: #132633 Informs: #135055 Informs: #135869 Informs: #135811 Informs: #133004 Informs: #135791 Informs: #132019 Release note: None Co-authored-by: Yahor Yuzefovich <[email protected]> Co-authored-by: Rafi Shamim <[email protected]> Co-authored-by: Austen McClernon <[email protected]>

craig · 2024-11-25T20:19:03Z

Build failed (retrying...):

macos_amd64_build

craig · 2024-11-25T20:52:10Z

Build succeeded:

yuzefovich force-pushed the gossip-planning branch 2 times, most recently from e0e55c6 to 7c81ac8 Compare November 20, 2024 02:28

jeffswenson mentioned this pull request Nov 20, 2024

Logical Data Replication Does Not Support External Process Multitenancy #134857

Open

yuzefovich force-pushed the gossip-planning branch from 7c81ac8 to 9a0aaa9 Compare November 23, 2024 02:04

yuzefovich requested review from michae2 and DrewKimball November 23, 2024 02:58

yuzefovich marked this pull request as ready for review November 23, 2024 02:58

yuzefovich requested review from a team as code owners November 23, 2024 02:58

yuzefovich requested review from jeffswenson and wenyihu6 and removed request for a team, jeffswenson and wenyihu6 November 23, 2024 02:58

DrewKimball approved these changes Nov 23, 2024

View reviewed changes

yuzefovich force-pushed the gossip-planning branch from 9a0aaa9 to 5507067 Compare November 25, 2024 19:01

yuzefovich requested a review from DrewKimball November 25, 2024 19:02

yuzefovich commented Nov 25, 2024

View reviewed changes

DrewKimball approved these changes Nov 25, 2024

View reviewed changes

craig bot merged commit cc31172 into cockroachdb:master Nov 25, 2024
22 of 23 checks passed

yuzefovich deleted the gossip-planning branch November 25, 2024 21:31

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sql: disable gossip-based physical planning by default #135034

sql: disable gossip-based physical planning by default #135034

yuzefovich commented Nov 12, 2024 •

edited

Loading

blathers-crl bot commented Nov 12, 2024

cockroach-teamcity commented Nov 12, 2024

DrewKimball left a comment

yuzefovich left a comment

DrewKimball left a comment

yuzefovich commented Nov 25, 2024

craig bot commented Nov 25, 2024

craig bot commented Nov 25, 2024

sql: disable gossip-based physical planning by default #135034

sql: disable gossip-based physical planning by default #135034

Conversation

yuzefovich commented Nov 12, 2024 • edited Loading

blathers-crl bot commented Nov 12, 2024

cockroach-teamcity commented Nov 12, 2024

DrewKimball left a comment

Choose a reason for hiding this comment

yuzefovich left a comment

Choose a reason for hiding this comment

DrewKimball left a comment

Choose a reason for hiding this comment

yuzefovich commented Nov 25, 2024

craig bot commented Nov 25, 2024

craig bot commented Nov 25, 2024

yuzefovich commented Nov 12, 2024 •

edited

Loading