Conversation
|
@claude update PR description and also add an entry to perf-changelog file. |
|
Claude finished @ankursingh-nv's task in 1m 27s —— View job Completed
Done:
|
|
@claude update the PR description to match the acutal vllm image being used |
|
Claude finished @functionstackx's task in 44s —— View job Completed
Done: Updated the PR description to reflect the actual vLLM image |
Co-authored-by: Ankur Singh <ankursingh-nv@users.noreply.github.com>
2101d26 to
065f123
Compare
|
|
||
| kimik2.5-fp4-b200-vllm: | ||
| image: vllm/vllm-openai:v0.16.0 | ||
| model: nvidia/Kimi-K2.5-NVFP4 |
There was a problem hiding this comment.
thanks for the PR! overview LGTM!
@ankursingh-nv @kedarpotdar-nv 1 small thing is if you can add documentations about nvfp4 version of kimi k2.5 nvidia/Kimi-K2.5-NVFP4 to the vllm recipes https://github.com/vllm-project/recipes/blob/main/moonshotai/Kimi-K2.5.md?target=https://github.com . Lets ensure that the documentation is first class such that the entire ml community can benefit from your hard work!
+viz @faradawn
Summary
Add Kimi K2.5 FP4 benchmark configuration for B200 using vLLM.
Changes
kimik2.5-fp4-b200-vllminnvidia-master.yamlnvidia/Kimi-K2.5-NVFP4vllm/vllm-openai:v0.16.0benchmarks/single_node/kimik2.5_fp4_b200.sh--reasoning-parser kimi_k2and--tool-call-parser kimi_k2--compilation_config.pass_config.fuse_allreduce_rms trueTORCH_CUDA_ARCH_LIST="10.0"for B200