Skip to content

concurrency: add metrics for virtual intent resolution#164641

Open
stevendanna wants to merge 3 commits intocockroachdb:masterfrom
stevendanna:ssd/vir-metrics
Open

concurrency: add metrics for virtual intent resolution#164641
stevendanna wants to merge 3 commits intocockroachdb:masterfrom
stevendanna:ssd/vir-metrics

Conversation

@stevendanna
Copy link
Collaborator

Add four metrics to provide observability into VIR behavior:

  • kv.concurrency.virtual_resolve.condense: number of times point intent
    resolutions were condensed into range resolutions during scanning.
  • kv.concurrency.virtual_resolve.disabled: number of times VIR was
    disabled for a request due to excessive range resolve accumulation.
  • kv.concurrency.virtual_resolve.intent: number of point intents
    resolved virtually during read evaluation.
  • kv.concurrency.virtual_resolve.intent_range: number of range intent
    resolutions resolved virtually during read evaluation.

Epic: CRDB-42978

Release note: None

…quests

If we encounter a LockConflictError after our first attempt to virtually
resolve all of our intents, it must mean that we hit the maximum lock
conflicts on our first scan. In this case, we want to avoid a situation
where we end up with either

(1) a huge number of lock updates in memory, or

(2) a livelock in which we the lock table memory limit continues to
evict previously found locks, resulting in a virtual resolve set that is
never complete.

We solve this by condensing all virtually resolved locks to
per-transaction resolve range requests after the first retry.

If we hit too many resolve range requests, we then give up on virtual
intent resolution.

Fixes cockroachdb#163924
Release note: None
We recently added PrepareForLockConflictRetry to the concurrency
manager which, for VIR-enabled requests, condenses the previous set of
LockUpdates into a smaller set of ranged LockUpdates. This solved a
situation in which upon re-entering ScanAndEnqueue after a
LockConflictError, we did not depend on those same locks being in the
lock table.

This, however, overlooked another, perhaps even more common, case:

- Request enters ScanAndEnqueue

- Request encounters some number of locks and pushes their holders

- Other requests push at least one of our updated transaction statuses
  out of txnStatusCache.

- Request re-enters ScanAndEnqueue, encounters the same locks, but
  now must re-push it.

This commit introduces a request-scoped map of transaction statuses to
ensure we never have to re-push a request we previously pushed.

Along the way, it refactors our handling of toResolve condensing.

We are also considering other alternatives:

- Using a much larger txnStatusCache to avoid this issue in practice
  most of the time.

- Preserving toResolve across calls of ScanAndEnqueue for VIR-enabled
  requests. This is a bit tricky to do efficiently since without care
  it would result in duplicates in toResolve or excessive allocations
  to maintain our set.

Epic: CRDB-49120

Release note: none
@stevendanna stevendanna requested a review from a team as a code owner March 2, 2026 09:50
@trunk-io
Copy link
Contributor

trunk-io bot commented Mar 2, 2026

Merging to master in this repository is managed by Trunk.

  • To merge this pull request, check the box to the left or comment /trunk merge below.

@blathers-crl
Copy link

blathers-crl bot commented Mar 2, 2026

Your pull request contains more than 1000 changes. It is strongly encouraged to split big PRs into smaller chunks.

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

@stevendanna stevendanna requested a review from miraradeva March 2, 2026 09:50
@cockroach-teamcity
Copy link
Member

This change is Reviewable

Add four metrics to provide observability into VIR behavior:

- kv.concurrency.virtual_resolve.condense: number of times point intent
  resolutions were condensed into range resolutions during scanning.
- kv.concurrency.virtual_resolve.disabled: number of times VIR was
  disabled for a request due to excessive range resolve accumulation.
- kv.concurrency.virtual_resolve.intent: number of point intents
  resolved virtually during read evaluation.
- kv.concurrency.virtual_resolve.intent_range: number of range intent
  resolutions resolved virtually during read evaluation.

Epic: CRDB-42978

Release note: None
@stevendanna stevendanna requested a review from a team as a code owner March 2, 2026 10:18
@stevendanna stevendanna requested review from herkolategan and nameisbhaskar and removed request for a team March 2, 2026 10:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants