Skip to content

[WIP] [NV] add kimi fp4 configs#862

Open
ankursingh-nv wants to merge 5 commits intomainfrom
kimik2.5-fp4-b200-vllm
Open

[WIP] [NV] add kimi fp4 configs#862
ankursingh-nv wants to merge 5 commits intomainfrom
kimik2.5-fp4-b200-vllm

Conversation

@ankursingh-nv
Copy link
Collaborator

@ankursingh-nv ankursingh-nv commented Mar 3, 2026

Summary

Add Kimi K2.5 FP4 benchmark configuration for B200 using vLLM.

Changes

  • New config kimik2.5-fp4-b200-vllm in nvidia-master.yaml
    • Model: nvidia/Kimi-K2.5-NVFP4
    • Image: vllm/vllm-openai:v0.16.0
    • Parallelism: TP=8/EP=1 (conc 4-128) and TP=4/EP=4 (conc 4-64)
    • Sequence lengths: 1k1k, 1k8k, 8k1k
  • New benchmark script benchmarks/single_node/kimik2.5_fp4_b200.sh
    • Uses --reasoning-parser kimi_k2 and --tool-call-parser kimi_k2
    • Enables --compilation_config.pass_config.fuse_allreduce_rms true
    • Sets TORCH_CUDA_ARCH_LIST="10.0" for B200
    • GPU memory utilization: 0.90

@ankursingh-nv
Copy link
Collaborator Author

@claude update PR description and also add an entry to perf-changelog file.

@Klaud-Cold
Copy link
Collaborator

Klaud-Cold commented Mar 3, 2026

Claude finished @ankursingh-nv's task in 1m 27s —— View job


Completed

  • Gather context on PR changes
  • Update PR description
  • Add entry to perf-changelog.yaml
  • Commit and push changes

Done:

  • Updated PR description with a summary of config and script changes (model, image, parallelism, sequence lengths, key flags)
  • Added kimik2.5-fp4-b200-vllm entry to perf-changelog.yaml referencing this PR
    |

@functionstackx
Copy link
Contributor

@claude update the PR description to match the acutal vllm image being used

@Klaud-Cold
Copy link
Collaborator

Klaud-Cold commented Mar 3, 2026

Claude finished @functionstackx's task in 44s —— View job


Completed

  • Gather context on actual vLLM image used
  • Update PR description to match

Done: Updated the PR description to reflect the actual vLLM image vllm/vllm-openai:v0.16.0 (was incorrectly listed as v0.15.1).

@ankursingh-nv ankursingh-nv force-pushed the kimik2.5-fp4-b200-vllm branch from 2101d26 to 065f123 Compare March 4, 2026 19:03

kimik2.5-fp4-b200-vllm:
image: vllm/vllm-openai:v0.16.0
model: nvidia/Kimi-K2.5-NVFP4
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the PR! overview LGTM!

@ankursingh-nv @kedarpotdar-nv 1 small thing is if you can add documentations about nvfp4 version of kimi k2.5 nvidia/Kimi-K2.5-NVFP4 to the vllm recipes https://github.com/vllm-project/recipes/blob/main/moonshotai/Kimi-K2.5.md?target=https://github.com . Lets ensure that the documentation is first class such that the entire ml community can benefit from your hard work!

+viz @faradawn

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

started PR here vllm-project/recipes#267

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

4 participants