Skip to content

GH-49438: [C++][Gandiva] Optimize LPAD/RPAD functions#49439

Open
dmitry-chirkov-dremio wants to merge 1 commit intoapache:mainfrom
dmitry-chirkov-dremio:gandiva-lpad-optimization
Open

GH-49438: [C++][Gandiva] Optimize LPAD/RPAD functions#49439
dmitry-chirkov-dremio wants to merge 1 commit intoapache:mainfrom
dmitry-chirkov-dremio:gandiva-lpad-optimization

Conversation

@dmitry-chirkov-dremio
Copy link

@dmitry-chirkov-dremio dmitry-chirkov-dremio commented Mar 3, 2026

Rationale for this change

The lpad_utf8_int32_utf8 and rpad_utf8_int32_utf8 functions have performance inefficiency and a potential memory safety issue:

  1. Performance: Single-byte fills iterate character-by-character when memset would suffice. Multi-byte fills use O(n) iterations instead of O(log n) with a doubling strategy.
  2. Memory safety: When the fill string is longer than the padding space needed, the code could write more bytes than allocated. Fixed preventatively.

What changes are included in this PR?

  1. Memory safety fix: Use std::min(fill_text_len, total_fill_bytes) for the initial copy to prevent overflow
  2. Fast path: Add single-byte fill optimization using memset
  3. General path: Replace character-by-character loop with doubling strategy for multi-byte fills
  4. Tests: Add comprehensive tests for the new code paths

Are these changes tested?

Yes. Added tests covering:

  • Large UTF-8 fill characters (4-byte emoji, 3-byte Chinese)
  • Single-byte fill boundaries (1 char and 65536 char padding)
  • Content verification for fill patterns
  • Doubling strategy boundaries
  • Partial fill scenarios (fill text longer than padding needed)

Are there any user-facing changes?

No.

@github-actions
Copy link

github-actions bot commented Mar 3, 2026

⚠️ GitHub issue #49438 has no components, please add labels for components.

@dmitry-chirkov-dremio
Copy link
Author

Local Benchmark Results

Platform: Apple M3, macOS
Benchmark: cpp/src/gandiva/tests/micro_benchmarks.cc, 10 repetitions, 1 million rows per test

The original RPAD was pathologically slow compared to LPAD due to different algorithms. For 65K padding: LPAD took ~29ms while RPAD took ~992ms (34x slower for identical operation). The optimization applies the same efficient algorithm to both functions.

LPAD (mean time in μs)

Benchmark Original Optimized Speedup
Minimal (9 padding chars) 147 148 -
Small (99 padding chars) 312 167 1.9x
Medium (100 padding chars) 368 219 1.7x
Large (1000 padding chars) 16,242 16,273 -
XLarge (65436 padding chars) 29,115 27,987 1.04x

RPAD (mean time in μs)

Benchmark Original Optimized Speedup
Minimal (9 padding chars) 247 148 1.7x
Small (99 padding chars) 1,704 165 10x
Medium (100 padding chars) 1,773 216 8x
Large (1000 padding chars) 30,813 16,082 1.9x
XLarge (65436 padding chars) 992,334 27,724 36x

@dmitry-chirkov-dremio dmitry-chirkov-dremio changed the title GH-49438: [C++][Gandiva] Optimize lpad/rpad UTF-8 functions GH-49438: [C++][Gandiva] Optimize LPAD/RPAD functions Mar 3, 2026
@github-actions
Copy link

github-actions bot commented Mar 3, 2026

⚠️ GitHub issue #49438 has no components, please add labels for components.

1 similar comment
@github-actions
Copy link

github-actions bot commented Mar 3, 2026

⚠️ GitHub issue #49438 has no components, please add labels for components.

@kou
Copy link
Member

kou commented Mar 4, 2026

@lriggs @akravchukdremio @xxlaykxx You may want to review this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants