Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql: disable gossip-based physical planning by default #135034

Merged
merged 1 commit into from
Nov 25, 2024

Conversation

yuzefovich
Copy link
Member

@yuzefovich yuzefovich commented Nov 12, 2024

Now that we improved handling of draining procedure in the virtual clusters in 24.3 timeframe, I believe it's time to begin fully deprecating the gossip-based physical planning, and as the first step this commit introduces a cluster setting to control which method is used in single-tenant deployments (instance-based planning being the default). I plan to have the setting as an escape hatch in case we find problems when rolling out this change and to remove the setting altogether after 25.2.

One of the differences between two planning methods is slightly adjusted in this commit. In particular, in the instance-based planning we stitch together consecutive spans whenever EnsureSafeSplitKey for the keys in the spans returns the same prefix. The idea is that spans corresponding to different column families of the same SQL row must be placed on the same instance. However, the condition on the length of the keys returned by EnsureSafeSplitKey is too broad and captures more cases than needed (i.e. two keys that cannot be part of the same SQL row end up in the same SpanPartition). To partially alleviate this difference, this commit introduces "boundary granularity" knob which indicates whether consecutive spans might be part of the same SQL row, and we now use the stitching logic only if so. All callers of PartitionSpans have been audited, and we need the stitching logic only in two call sites that correspond to planning of table readers. Longer term, if we change the encoding so that column family IDs are encoded in a special way, we'll be able to tell definitively whether stitching is needed simply based on the key itself, and this additional logic could be removed.

Epic: None

Release note: None

Copy link

blathers-crl bot commented Nov 12, 2024

It looks like your PR touches production code but doesn't add or edit any test code. Did you consider adding tests to your PR?

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

@cockroach-teamcity
Copy link
Member

This change is Reviewable

Now that we improved handling of draining procedure in the virtual
clusters in 24.3 timeframe, I believe it's time to begin fully
deprecating the gossip-based physical planning, and as the first step
this commit introduces a cluster setting to control which method is used
in single-tenant deployments (instance-based planning being the
default). I plan to have the setting as an escape hatch in case we find
problems when rolling out this change and to remove the setting
altogether after 25.2.

One of the differences between two planning methods is slightly adjusted
in this commit. In particular, in the instance-based planning we stitch
together consecutive spans whenever `EnsureSafeSplitKey` for the keys in
the spans returns the same prefix. The idea is that spans corresponding
to different column families of the same SQL row must be placed on the
same instance. However, the condition on the length of the keys returned
by `EnsureSafeSplitKey` is too broad and captures more cases than
needed (i.e. two keys that cannot be part of the same SQL row end up in
the same `SpanPartition`). To partially alleviate this difference, this
commit introduces "boundary granularity" knob which indicates whether
consecutive spans _might_ be part of the same SQL row, and we now use
the stitching logic only if so. All callers of `PartitionSpans` have
been audited, and we need the stitching logic only in two call sites that
correspond to planning of table readers. Longer term, if we change the
encoding so that column family IDs are encoded in a special way, we'll
be able to tell definitively whether stitching is needed simply based on
the key itself, and this additional logic could be removed.

Epic: None

Release note: None
@yuzefovich yuzefovich marked this pull request as ready for review November 23, 2024 02:58
@yuzefovich yuzefovich requested review from a team as code owners November 23, 2024 02:58
@yuzefovich yuzefovich requested review from jeffswenson and wenyihu6 and removed request for a team, jeffswenson and wenyihu6 November 23, 2024 02:58
Copy link
Collaborator

@DrewKimball DrewKimball left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm: Nice!

Reviewed 9 of 9 files at r1, all commit messages.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @michae2 and @yuzefovich)


pkg/sql/fingerprint_span.go line 121 at r1 (raw file):

	}

	spanPartitions, err := dsp.PartitionSpans(ctx, planCtx, []roachpb.Span{span}, PartitionSpansBoundDefault)

Note: we can reach here through the builtin function crdb_internal.fingerprint()


pkg/sql/distsql_physical_planner.go line 1567 at r1 (raw file):

			// address #112702, we'll be able to make the check here precise.
			if safeKey, err := keys.EnsureSafeSplitKey(span.Key); err == nil && len(safeKey) > 0 {
				if safeKey.Equal(lastKey) {

I know this is how it was already, but why don't we compare the key against the last end key (if it's set) instead? Won't this only catch the case when the preceding span covers a single row?


pkg/sql/distsql_physical_planner.go line 1569 at r1 (raw file):

				if safeKey.Equal(lastKey) {
					if log.V(1) {
						log.Infof(ctx, "stitchin span %s into the previous span partition", span)

[nit] stitchin -> stitching

Copy link
Member Author

@yuzefovich yuzefovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @DrewKimball and @michae2)


pkg/sql/distsql_physical_planner.go line 1567 at r1 (raw file):

Previously, DrewKimball (Drew Kimball) wrote…

I know this is how it was already, but why don't we compare the key against the last end key (if it's set) instead? Won't this only catch the case when the preceding span covers a single row?

Not sure I understand your question and proposal, could you please provide more context?

The current code works in the following way:

  • if start keys for two consecutive spans have the same "safe split key", then we think that two spans might be from the same SQL row, so we must put the spans into the same spanPartition.
  • the code assumes that it's only possible to break a single SQL row into multiple spans IF the original row-covering span didn't include any other rows. In other words, the code assumes that something like (roachpb.Span[row1_start, row2_middle], roachpb.Span[row2_middle, row2_end]) will never be passed as an argument to PartitionSpans. This is the case since we only operate at column-family-level spans in the span.Builder where we break a single-row-span into multiple spans when some CFs are not needed.

IIUC you're concerned about the case I described above, right?


pkg/sql/distsql_physical_planner.go line 1569 at r1 (raw file):

Previously, DrewKimball (Drew Kimball) wrote…

[nit] stitchin -> stitching

Done.


pkg/sql/fingerprint_span.go line 121 at r1 (raw file):

Previously, DrewKimball (Drew Kimball) wrote…

Note: we can reach here through the builtin function crdb_internal.fingerprint()

Good point. This builtin was added in #91124 and suggests that the main usage for it is for internal testing of cluster-to-cluster replication. I looked over all the call sites, and it still remains the case in the code base, and we use this builtin for ensuring correctness of tenant replication, so way above the column families boundary within a single SQL row. It's possible that a user will issue a query with such a span argument that falls within a row, but it seems ok to me to ignore this case given that this is crdb_internal.* builtin that is not documented.

Copy link
Collaborator

@DrewKimball DrewKimball left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @michae2 and @yuzefovich)


pkg/sql/distsql_physical_planner.go line 1567 at r1 (raw file):

the code assumes that it's only possible to break a single SQL row into multiple spans IF the original row-covering span didn't include any other rows.

This is what I was missing - in that case, ignore my comment.


pkg/sql/fingerprint_span.go line 121 at r1 (raw file):

Previously, yuzefovich (Yahor Yuzefovich) wrote…

Good point. This builtin was added in #91124 and suggests that the main usage for it is for internal testing of cluster-to-cluster replication. I looked over all the call sites, and it still remains the case in the code base, and we use this builtin for ensuring correctness of tenant replication, so way above the column families boundary within a single SQL row. It's possible that a user will issue a query with such a span argument that falls within a row, but it seems ok to me to ignore this case given that this is crdb_internal.* builtin that is not documented.

SGTM

@yuzefovich
Copy link
Member Author

TFTR!

bors r+

craig bot pushed a commit that referenced this pull request Nov 25, 2024
135034: sql: disable gossip-based physical planning by default r=yuzefovich a=yuzefovich

Now that we improved handling of draining procedure in the virtual clusters in 24.3 timeframe, I believe it's time to begin fully deprecating the gossip-based physical planning, and as the first step this commit introduces a cluster setting to control which method is used in single-tenant deployments (instance-based planning being the default). I plan to have the setting as an escape hatch in case we find problems when rolling out this change and to remove the setting altogether after 25.2.

One of the differences between two planning methods is slightly adjusted in this commit. In particular, in the instance-based planning we stitch together consecutive spans whenever `EnsureSafeSplitKey` for the keys in the spans returns the same prefix. The idea is that spans corresponding to different column families of the same SQL row must be placed on the same instance. However, the condition on the length of the keys returned by `EnsureSafeSplitKey` is too broad and captures more cases than needed (i.e. two keys that cannot be part of the same SQL row end up in the same `SpanPartition`). To partially alleviate this difference, this commit introduces "boundary granularity" knob which indicates whether consecutive spans _might_ be part of the same SQL row, and we now use the stitching logic only if so. All callers of `PartitionSpans` have been audited, and we need the stitching logic only in two call sites that correspond to planning of table readers. Longer term, if we change the encoding so that column family IDs are encoded in a special way, we'll be able to tell definitively whether stitching is needed simply based on the key itself, and this additional logic could be removed.

Epic: None

Release note: None

135852: sql: avoid unnecessary singleflight requests for role membership cache r=rafiss a=rafiss

See individual commits for refactors that make the enhancement easier.

---

Since we use a query that scans the whole role_members table to
populate the membership cache, it is wasteful to launch multiple
singleflight requests for different users. This patch changes
the cache refresh logic of the singleflight so that it always populates
the entire cache, which means there will only ever be one in-flight
query per each node.

```
$ benchdiff --old 98b0521 --new 26daef2 . -r BenchmarkUseManyRoles

pkg=1/1 iter=10/10 cockroachdb/cockroach/pkg/bench/rttanalysis \

name                          old time/op     new time/op     delta
UseManyRoles/use_50_roles-10      225ms ± 6%      114ms ± 2%  -49.43%  (p=0.000 n=10+10)
UseManyRoles/use_2_roles-10      10.7ms ± 4%      7.4ms ± 3%  -30.74%  (p=0.000 n=10+8)

name                          old roundtrips  new roundtrips  delta
UseManyRoles/use_2_roles-10        9.00 ± 0%       5.00 ± 0%  -44.44%  (p=0.000 n=10+10)
UseManyRoles/use_50_roles-10       3.00 ± 0%       3.00 ± 0%     ~     (all equal)

name                          old alloc/op    new alloc/op    delta
UseManyRoles/use_50_roles-10      438MB ± 0%      377MB ± 1%  -13.88%  (p=0.000 n=9+10)
UseManyRoles/use_2_roles-10      20.7MB ± 2%     19.1MB ± 1%   -7.46%  (p=0.000 n=10+10)

name                          old allocs/op   new allocs/op   delta
UseManyRoles/use_50_roles-10       881k ± 0%       398k ± 4%  -54.81%  (p=0.000 n=8+10)
UseManyRoles/use_2_roles-10       54.4k ± 8%      36.0k ± 5%  -33.85%  (p=0.000 n=10+10)
```

fixes #135931
Release note (performance improvement): Improved the internal caching
logic for role membership information. This reduces the latency impact
of commands such as `DROP ROLE`, `CREATE ROLE`, and `GRANT role TO user`,
which cause the role membership cache to be invalidated.

136115: roachtest: rm vmod logging in rebalance/by-load/* tests r=arulajmani a=kvoli

Historically, the `rebalance/by-load/*` roachtests were difficult to track failure down for. To aid investigation, we bumped the vmodule verbosity to the maximum level in relevant test files.

The default logging and metrics have since improved. We have also noticed that while vmodule is enabled, there can be skewed resource usage as a result which is not observed by the replica.

Disable the verbose logging, we will rely on the defaults for future failures.

Note, not all failures linked below were due to vmodule logging. However, this change will remove some degree of noise.

Part of: #133054
Part of: #132633
Part of: #135055
Part of: #135869
Part of: #135811
Part of: #133004
Part of: #135791
Part of: #132019

Release note: None

136116: roachtest: rm redundant cpu profiling in rebalance/by-load tests r=arulajmani a=kvoli

After #97699 was resolved, the cpu profiling parameter declaration in the `rebalance/by-load/*` roachtests became redundant.

Remove the declaration and use the default settings instead.

Informs: #133054
Informs: #132633
Informs: #135055
Informs: #135869
Informs: #135811
Informs: #133004
Informs: #135791
Informs: #132019

Release note: None

Co-authored-by: Yahor Yuzefovich <[email protected]>
Co-authored-by: Rafi Shamim <[email protected]>
Co-authored-by: Austen McClernon <[email protected]>
@craig
Copy link
Contributor

craig bot commented Nov 25, 2024

Build failed (retrying...):

@craig craig bot merged commit cc31172 into cockroachdb:master Nov 25, 2024
22 of 23 checks passed
@yuzefovich yuzefovich deleted the gossip-planning branch November 25, 2024 21:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants