Skip to content

[SPARK-55751][SS] Add metrics on state store loads from cloud storage#54567

Open
gnanda wants to merge 2 commits intoapache:masterfrom
gnanda:stack/SPARK-55751
Open

[SPARK-55751][SS] Add metrics on state store loads from cloud storage#54567
gnanda wants to merge 2 commits intoapache:masterfrom
gnanda:stack/SPARK-55751

Conversation

@gnanda
Copy link

@gnanda gnanda commented Mar 2, 2026

What changes were proposed in this pull request?

This PR adds a new metric, rocksdbLoadedFromCloud, that tracks how many times the RocksDB state store fetched a snapshot from remote (cloud/DFS) storage during a single load operation.

The implementation introduces a fetchCheckpointFromDfs helper in RocksDB that centralizes all fileManager.loadCheckpointFromDfs(...) call sites. Each call through this helper increments a per-load numCloudLoads counter, which is then emitted
via loadMetrics (under both the load and loadFromSnapshot paths). The counter is reset at the start of each load.

The metric is surfaced as a StateStoreCustomSumMetric named rocksdbLoadedFromCloud and included in the full list of custom metrics reported by RocksDBStateStoreProvider.

Why are the changes needed?

Fetching state from cloud storage is significantly more expensive than a local cache hit. Without this metric, there is no way to distinguish a load that was fully served from local disk from one that required one or more round-trips to cloud
storage. This metric gives operators and developers visibility into cloud load frequency, which is useful for diagnosing performance regressions, tuning snapshot and changelog checkpointing configuration, and understanding cost implications of
state store operations.

Does this PR introduce any user-facing change?

Yes. A new custom metric rocksdbLoadedFromCloud ("RocksDB: load - number of times state was loaded from cloud storage") is now reported in the Structured Streaming progress reporter under stateOperatorMetrics.customMetrics.

How was this patch tested?

  • RocksDBSuite: added assertions verifying loadMetrics("numCloudLoads") === 1 after a load and after a loadFromSnapshot that each fetch from DFS.
  • RocksDBStateStoreIntegrationSuite: updated the exhaustive custom-metrics presence check to include rocksdbLoadedFromCloud.

Was this patch authored or co-authored using generative AI tooling?

Generated-by: Claude Code 2.1.58

@gnanda gnanda force-pushed the stack/SPARK-55751 branch from 761843b to b36d880 Compare March 2, 2026 04:35
@gnanda gnanda force-pushed the stack/SPARK-55751 branch from b36d880 to 910b000 Compare March 2, 2026 04:42
@gnanda gnanda requested a review from ericm-db March 2, 2026 22:54
@gnanda gnanda changed the title [SPARK-55751)][SS] Add metrics on state store loads from cloud storage [SPARK-55751][SS] Add metrics on state store loads from cloud storage Mar 2, 2026
@gnanda gnanda force-pushed the stack/SPARK-55751 branch from 65f059f to 22cd93d Compare March 2, 2026 23:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants