Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sstable: shard categorized iterator stats #4095

Merged
merged 2 commits into from
Oct 24, 2024

Conversation

jbowens
Copy link
Collaborator

@jbowens jbowens commented Oct 22, 2024

Shard the categorized iterator stats to avoid mutex contention in high-read workloads that are frequently closing sstable iterators.

@jbowens jbowens requested a review from a team as a code owner October 22, 2024 16:32
@cockroach-teamcity
Copy link
Member

This change is Reviewable

Copy link
Collaborator Author

@jbowens jbowens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 1 files reviewed, 1 unresolved discussion (waiting on @RaduBerinde)


sstable/category_stats.go line 209 at r1 (raw file):

func (accum *iterStatsAccumulator) close() {
	if accum.collector != nil {
		accum.collector.reportStats(uintptr(unsafe.Pointer(accum)), accum.Category, accum.QoSLevel, accum.stats)

I'm realizing the low bits are probably the same for all iterStatsAccumulator because the singleLevelIterator is going to be aligned... maybe I should just use a prng to pick a shard

Copy link
Collaborator

@sumeerbhola sumeerbhola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did this improve the performance?

Reviewed 1 of 1 files at r1, all commit messages.
Reviewable status: all files reviewed, 1 unresolved discussion (waiting on @jbowens and @RaduBerinde)


sstable/category_stats.go line 209 at r1 (raw file):

Previously, jbowens (Jackson Owens) wrote…

I'm realizing the low bits are probably the same for all iterStatsAccumulator because the singleLevelIterator is going to be aligned... maybe I should just use a prng to pick a shard

You could use the Fibonacci hash trick we use in other places:

// Fibonacci hash https://probablydance.com/2018/06/16/fibonacci-hashing-the-optimization-that-the-world-forgot-or-a-better-alternative-to-integer-modulo/  
simpleHash := (11400714819323198485 * uint64(ptr)) % numShards

@RaduBerinde
Copy link
Member

sstable/category_stats.go line 209 at r1 (raw file):

Previously, sumeerbhola wrote…

You could use the Fibonacci hash trick we use in other places:

// Fibonacci hash https://probablydance.com/2018/06/16/fibonacci-hashing-the-optimization-that-the-world-forgot-or-a-better-alternative-to-integer-modulo/  
simpleHash := (11400714819323198485 * uint64(ptr)) % numShards

Heh, I was going to suggest ((uint64(x) * 25214903917) >> 32) & 7 (multiplier taken from this table: https://en.wikipedia.org/wiki/Linear_congruential_generator#Parameters_in_common_use). Either should work just fine

@RaduBerinde
Copy link
Member

sstable/category_stats.go line 209 at r1 (raw file):

Previously, RaduBerinde wrote…

Heh, I was going to suggest ((uint64(x) * 25214903917) >> 32) & 7 (multiplier taken from this table: https://en.wikipedia.org/wiki/Linear_congruential_generator#Parameters_in_common_use). Either should work just fine

Note that Fibonacci hash as described in the article uses the high bits of the result, not the low ones (we would do >> (64-3) instead of % 8)

Copy link
Collaborator Author

@jbowens jbowens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Running a benchmark in the background right now. With other recent improvements, reportStats is now ~8% of cpu in the ycsb/C/values=64 benchmark

Reviewable status: 0 of 1 files reviewed, 1 unresolved discussion (waiting on @RaduBerinde and @sumeerbhola)

Copy link
Member

@RaduBerinde RaduBerinde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need something like golang/go#18802 :/

Reviewable status: 0 of 1 files reviewed, 1 unresolved discussion (waiting on @jbowens and @sumeerbhola)

Copy link
Collaborator Author

@jbowens jbowens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the benchmark, cpu in reportStats reduced to 3.70%:

ROUTINE ======================== github.com/cockroachdb/pebble/sstable.(*CategoryStatsCollector).reportStats in /Users/jackson/go/src/github.com/cockroachdb/pebble/sstable/category_stats.go
  2950.28s    4.24hrs (flat, cum)  3.70% of Total
   110.55s    110.56s    144:func (c *CategoryStatsCollector) reportStats(
         .          .    145:	p uint64, category Category, qosLevel QoSLevel, stats CategoryStats,
         .          .    146:) {
   129.36s    1.30hrs    147:	v, ok := c.statsMap.Load(category)
    91.69s     91.72s    148:	if !ok {
         .          .    149:		c.mu.Lock()
         .          .    150:		v, _ = c.statsMap.LoadOrStore(category, &shardedCategoryStats{
         .          .    151:			Category: category,
         .          .    152:			QoSLevel: qosLevel,
         .          .    153:		})
         .          .    154:		c.mu.Unlock()
         .          .    155:	}
         .          .    156:
    93.55s     93.56s    157:	shardedStats := v.(*shardedCategoryStats)
  2338.74s   2339.56s    158:	s := ((p * 25214903917) >> 32) & (numCategoryStatsShards - 1)
    40.75s    1.70hrs    159:	shardedStats.shards[s].mu.Lock()
    52.07s    295.26s    160:	shardedStats.shards[s].stats.aggregate(stats)
    45.31s   1465.73s    161:	shardedStats.shards[s].mu.Unlock()
    48.26s     48.27s    162:}

I'm going to try doubling the number of shards.

Reviewable status: 0 of 1 files reviewed, 1 unresolved discussion (waiting on @sumeerbhola)

Copy link
Collaborator Author

@jbowens jbowens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'll also try performing the statsMap.Load(category) once per pebble.Iterator, threading the *shardedCategoryStats down the iterator stack in the internalIterOptions

Reviewable status: 0 of 1 files reviewed, 1 unresolved discussion (waiting on @sumeerbhola)

Copy link
Collaborator

@sumeerbhola sumeerbhola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 1 of 1 files at r2.
Reviewable status: all files reviewed (commit messages unreviewed), 2 unresolved discussions (waiting on @jbowens)


sstable/category_stats.go line 158 at r2 (raw file):

	shardedStats := v.(*shardedCategoryStats)
	s := ((p * 25214903917) >> 32) & (numCategoryStatsShards - 1)

Can you add a code comment pointing to the source of this equation.

Shard the categorized iterator stats to avoid mutex contention in high-read
workloads that are frequently closing sstable iterators.
When creating a top-level *pebble.Iterator, look up the appropriate categorized
stats struct once and propagate it to all sstable iterators constructed by the
Iterator, rather than looking up on every sstable iterator Close.

Recent experiments showed significant time spent in the statsMap Load:

```
  2950.28s    4.24hrs (flat, cum)  3.70% of Total
   110.55s    110.56s    144:func (c *CategoryStatsCollector) reportStats(
         .          .    145:	p uint64, category Category, qosLevel QoSLevel, stats CategoryStats,
         .          .    146:) {
   129.36s    1.30hrs    147:	v, ok := c.statsMap.Load(category)
    91.69s     91.72s    148:	if !ok {
         .          .    149:		c.mu.Lock()
         .          .    150:		v, _ = c.statsMap.LoadOrStore(category, &shardedCategoryStats{
         .          .    151:			Category: category,
         .          .    152:			QoSLevel: qosLevel,
         .          .    153:		})
         .          .    154:		c.mu.Unlock()
         .          .    155:	}
         .          .    156:
    93.55s     93.56s    157:	shardedStats := v.(*shardedCategoryStats)
  2338.74s   2339.56s    158:	s := ((p * 25214903917) >> 32) & (numCategoryStatsShards - 1)
    40.75s    1.70hrs    159:	shardedStats.shards[s].mu.Lock()
    52.07s    295.26s    160:	shardedStats.shards[s].stats.aggregate(stats)
    45.31s   1465.73s    161:	shardedStats.shards[s].mu.Unlock()
    48.26s     48.27s    162:}
```

With this change, it's limited to 2.44% across the per-Iterator acquistion of
an Accumulator and the per-sstable iterator reporting of the iterator stats:
```
Active filters:
   focus=Accumulat
Showing nodes accounting for 2.78hrs, 2.44% of 114.22hrs total
Dropped 103 nodes (cum <= 0.57hrs)
Showing top 20 nodes out of 48
      flat  flat%   sum%        cum   cum%
   0.81hrs  0.71%  0.71%    1.31hrs  1.15%  github.com/cockroachdb/pebble/sstable.(*categoryStatsWithMu).Accumulate
   0.39hrs  0.34%  1.05%    1.46hrs  1.28%  github.com/cockroachdb/pebble/sstable.(*CategoryStatsCollector).Accumulator
   0.37hrs  0.32%  1.37%    0.72hrs  0.63%  runtime.mapaccess2
   0.34hrs   0.3%  1.67%    0.36hrs  0.31%  sync.(*Mutex).Unlock (inline)
   0.20hrs  0.18%  1.84%    0.20hrs  0.18%  sync.(*entry).load (inline)
   0.10hrs 0.088%  1.93%    1.08hrs  0.94%  sync.(*Map).Load
   0.08hrs 0.069%  2.00%    0.10hrs 0.089%  sync.(*Mutex).lockSlow
   0.06hrs 0.056%  2.06%    0.07hrs 0.063%  runtime.strequal
   0.06hrs 0.056%  2.11%    0.11hrs 0.092%  runtime.typehash
   0.06hrs 0.049%  2.16%    0.17hrs  0.15%  runtime.nilinterhash
   0.05hrs 0.047%  2.21%    0.05hrs 0.048%  sync.(*Map).loadReadOnly (inline)
   0.04hrs 0.037%  2.25%    0.11hrs   0.1%  runtime.efaceeq
   0.04hrs 0.034%  2.28%    0.15hrs  0.13%  runtime.nilinterequal
   0.03hrs 0.027%  2.31%    1.35hrs  1.18%  github.com/cockroachdb/pebble/sstable.(*iterStatsAccumulator).close
   0.03hrs 0.027%  2.33%    0.03hrs 0.027%  github.com/cockroachdb/pebble/sstable.(*iterStatsAccumulator).init
   0.03hrs 0.025%  2.36%    0.03hrs 0.025%  aeshashbody
   0.02hrs 0.022%  2.38%    0.13hrs  0.11%  sync.(*Mutex).Lock (inline)
   0.02hrs  0.02%  2.40%    0.02hrs  0.02%  github.com/cockroachdb/pebble/sstable.(*iterStatsAccumulator).reportStats
   0.02hrs  0.02%  2.42%    0.02hrs  0.02%  github.com/cockroachdb/pebble/sstable.(*CategoryStats).aggregate (inline)
   0.02hrs 0.014%  2.44%    0.02hrs 0.014%  runtime.add (inline)
```
Copy link
Collaborator Author

@jbowens jbowens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added another commit that avoids performing the sync.Map lookup for every sstable iterator (performing it once per *pebble.Iterator instead).

Reviewable status: 0 of 19 files reviewed, 2 unresolved discussions (waiting on @RaduBerinde and @sumeerbhola)


sstable/category_stats.go line 158 at r2 (raw file):

Previously, sumeerbhola wrote…

Can you add a code comment pointing to the source of this equation.

Done.

Copy link
Member

@RaduBerinde RaduBerinde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 19 of 19 files at r4, all commit messages.
Reviewable status: all files reviewed, 2 unresolved discussions (waiting on @jbowens and @sumeerbhola)

@jbowens
Copy link
Collaborator Author

jbowens commented Oct 24, 2024

TFTRs!

@jbowens jbowens merged commit d2480d7 into cockroachdb:master Oct 24, 2024
23 checks passed
@jbowens jbowens deleted the shard-catstats branch October 24, 2024 00:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants