Skip to content

[NVIDIA] Update NVIDIA GPT-OSS vLLM image from v0.15.1 to v0.16.0#800

Open
cquil11 wants to merge 3 commits intomainfrom
claude/issue-798-20260226-0534
Open

[NVIDIA] Update NVIDIA GPT-OSS vLLM image from v0.15.1 to v0.16.0#800
cquil11 wants to merge 3 commits intomainfrom
claude/issue-798-20260226-0534

Conversation

@cquil11
Copy link
Collaborator

@cquil11 cquil11 commented Feb 26, 2026

Bump vllm/vllm-openai image tag for all 3 NVIDIA GPT-OSS configs (B200, H100, H200). All existing BKC flags preserved — no config changes beyond the image tag.

v0.16.0 notable changes for GPT-OSS/MXFP4:

  • Async scheduling + pipeline parallelism (30.8% throughput improvement)
  • New MXFP4 backends: SM90 FlashInfer BF16, SM100 CUTLASS
  • MoE cold start optimization
  • Triton backend now default non-FlashInfer fallback on SM90/SM100

Closes #798

Bump vllm/vllm-openai image tag for all 3 NVIDIA GPT-OSS configs
(B200, H100, H200). All existing BKC flags preserved — no config
changes beyond the image tag.

v0.16.0 notable changes for GPT-OSS/MXFP4:
- Async scheduling + pipeline parallelism (30.8% throughput improvement)
- New MXFP4 backends: SM90 FlashInfer BF16, SM100 CUTLASS
- MoE cold start optimization
- Triton backend now default non-FlashInfer fallback on SM90/SM100

Closes #798

Co-authored-by: Cameron Quilici <cquil11@users.noreply.github.com>
Removed outdated configuration entries and added new vLLM image update details for NVIDIA GPT-OSS. Updated pull request links for changes.
@cquil11
Copy link
Collaborator Author

cquil11 commented Feb 26, 2026

Copy link
Contributor

@functionstackx functionstackx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lgtm

@functionstackx
Copy link
Contributor

gonna merge this soon

@kedarpotdar-nv
Copy link
Collaborator

Looks like small perf regression on B200 1k/1k @ankursingh-nv is investigating

@functionstackx
Copy link
Contributor

functionstackx commented Mar 1, 2026

v0.17 is coming out wednesday, probably gonna merge this v0.16 in soon before then since we doing best effort on gptoss

@jgangani
Copy link
Collaborator

jgangani commented Mar 2, 2026

@functionstackx @ankursingh-nv, Should we then just wait for 0.17 to land and update this PR before merging?

@ankursingh-nv
Copy link
Collaborator

In generally we should have the version that results in best performance today.
We are investigating it but in the meantime, if v0.17 is released and the out-of-box performance is good, we can skip v0.16

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

[NVIDIA] update H100, H200, B200 GPT OSS vLLM image to latest 0.16.0

5 participants