feat: add GLM-5 FP8 SGLang benchmark for MI355X by functionstackx · Pull Request #762 · SemiAnalysisAI/InferenceX

functionstackx · 2026-02-19T21:39:29Z

Add single-node benchmark configuration for GLM-5 FP8 on MI355X:

Config key: glm5-fp8-mi355x-sglang
Model: zai-org/GLM-5-FP8 with NSA tilelang backends
Image: rocm/sgl-dev:v0.5.8.post1-rocm720-mi35x-20260219
TP=8, concurrency 4-64 for 1k1k, 1k8k, and 8k1k

Closes #761

Generated with Claude Code

e2e Tests: https://github.com/SemiAnalysisAI/InferenceX/actions/runs/22581147775?target=https://github.com

functionstackx · 2026-02-19T22:08:06Z

@claude there is an transformer dependency error

can u add this to the glm5 benchmark script

pip install git+https://github.com/huggingface/transformers.git

following anush's gist recipe

Traceback (most recent call last):
  File "/home/cameronamd@semianalysis.com/.local/bin/hf", line 3, in <module>
    from huggingface_hub.cli.hf import main
ModuleNotFoundError: No module named 'huggingface_hub'
[aiter] import [module_aiter_enum] under /sgl-workspace/aiter/aiter/jit/module_aiter_enum.so
[2026-02-19 21:52:22] INFO core.py:501: import [module_aiter_enum] under /sgl-workspace/aiter/aiter/jit/module_aiter_enum.so
Traceback (most recent call last):
  File "/opt/venv/lib/python3.10/site-packages/transformers/models/auto/configuration_auto.py", line 1360, in from_pretrained
    config_class = CONFIG_MAPPING[config_dict["model_type"]]
  File "/opt/venv/lib/python3.10/site-packages/transformers/models/auto/configuration_auto.py", line 1048, in __getitem__
    raise KeyError(key)
KeyError: 'glm_moe_dsa'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/sgl-workspace/sglang/python/sglang/launch_server.py", line 32, in <module>
    server_args = prepare_server_args(sys.argv[1:])
  File "/sgl-workspace/sglang/python/sglang/srt/server_args.py", line 5591, in prepare_server_args
    return ServerArgs.from_cli_args(raw_args)
  File "/sgl-workspace/sglang/python/sglang/srt/server_args.py", line 5077, in from_cli_args
    return cls(**{attr: getattr(args, attr) for attr in attrs})
  File "<string>", line 331, in __init__
  File "/sgl-workspace/sglang/python/sglang/srt/server_args.py", line 733, in __post_init__
    self._handle_gpu_memory_settings(gpu_mem)
  File "/sgl-workspace/sglang/python/sglang/srt/server_args.py", line 1010, in _handle_gpu_memory_settings
    if not self.use_mla_backend():
  File "/sgl-workspace/sglang/python/sglang/srt/server_args.py", line 5110, in use_mla_backend
    model_config = self.get_model_config()
  File "/sgl-workspace/sglang/python/sglang/srt/server_args.py", line 5091, in get_model_config
    self.model_config = ModelConfig.from_server_args(self)
  File "/sgl-workspace/sglang/python/sglang/srt/configs/model_config.py", line 250, in from_server_args
    return ModelConfig(
  File "/sgl-workspace/sglang/python/sglang/srt/configs/model_config.py", line 127, in __init__
    self.hf_config = get_config(
  File "/sgl-workspace/sglang/python/sglang/srt/utils/common.py", line 3475, in wrapper
    result = func(*args, **kwargs)
  File "/sgl-workspace/sglang/python/sglang/srt/utils/hf_transformers_utils.py", line 320, in get_config
    raise e
  File "/sgl-workspace/sglang/python/sglang/srt/utils/hf_transformers_utils.py", line 315, in get_config
    config = AutoConfig.from_pretrained(
  File "/opt/venv/lib/python3.10/site-packages/transformers/models/auto/configuration_auto.py", line 1362, in from_pretrained
    raise ValueError(
ValueError: The checkpoint you are trying to load has model type `glm_moe_dsa` but Transformers does not recognize this architecture. This could be because of an issue with the checkpoint, or because your version of Transformers is out of date.

You can update Transformers with the command `pip install --upgrade transformers`. If this does not work, and the checkpoint is very new, then there may not be a release version that supports this model yet. In this case, you can get the most up-to-date code by installing Transformers from source with the command `pip install git+[https://github.com/huggingface/transformers.git`](https://github.com/huggingface/transformers.git%60)?target=https://github.com
Server died before becoming healthy. Exiting.
srun: error: mia1-p01-g12: task 0: Exited with exit code 1
+ scancel 3420
+ ls 'gpucore.*'
Waiting for result file... (attempt 1)
Waiting for result file... (attempt 2)
Waiting for result file... (attempt 3)
Waiting for result file... (attempt 4)
Waiting for result file... (attempt 5)
Waiting for result file... (attempt 6)
Waiting for result file... (attempt 7)
Waiting for result file... (attempt 8)
Waiting for result file... (attempt 9)

Klaud-Cold · 2026-02-19T22:08:25Z

Claude finished @functionstackx's task in 39s —— View job

Adding transformers dependency fix to GLM-5 benchmark script

Read the current benchmark script
Add pip install git+https://github.com/huggingface/transformers.git to the script
Commit and push the change

Added pip install git+https://github.com/huggingface/transformers.git before the hf download step in benchmarks/glm5_fp8_mi355x.sh. The container's bundled transformers version doesn't recognize the glm_moe_dsa model type that GLM-5 uses, so installing from source provides the latest model type support.

View commit ·

Add single-node benchmark configuration for GLM-5 FP8 on MI355X: - Config key: glm5-fp8-mi355x-sglang - Model: zai-org/GLM-5-FP8 with NSA tilelang backends - Image: rocm/sgl-dev:v0.5.8.post1-rocm720-mi35x-20260219 - TP=8, concurrency 4-64 for 1k1k, 1k8k, and 8k1k Closes #761 Co-authored-by: functionstackx <functionstackx@users.noreply.github.com>

…port The container's bundled transformers version doesn't recognize the glm_moe_dsa model type. Install from source to get the latest support. Co-authored-by: functionstackx <functionstackx@users.noreply.github.com>

functionstackx · 2026-02-19T22:15:15Z

@claude move glm5_fp8_mi355x.sh to under benchmarks/single_node and look at the refactor on how to fix it and look at other sglang single node mi355 scripts on how to do it after the refactor benchmarks/single_node/

Klaud-Cold · 2026-02-19T22:15:36Z

Claude finished @functionstackx's task in 1m 57s —— View job

Moving GLM-5 benchmark to single_node directory

Read current benchmarks/glm5_fp8_mi355x.sh
Explore benchmarks/single_node/ for other SGLang MI355X scripts
Understand the refactored pattern (runner resolves path as benchmarks/single_node/)
Move script to benchmarks/single_node/ with corrected source path
Commit and push changes

Moved benchmarks/glm5_fp8_mi355x.sh → benchmarks/single_node/glm5_fp8_mi355x.sh and updated the source path from ./benchmark_lib.sh to ../benchmark_lib.sh to match the directory depth used by all other single-node scripts. The runner (launch_mi355x-amds.sh:184) constructs the script path as benchmarks/single_node/{prefix}_{precision}_mi355x.sh, so this is required for the benchmark to actually execute.

View commit ·

Move GLM-5 FP8 benchmark script to benchmarks/single_node/ to match the refactored directory structure used by all other single-node benchmarks. The runner (launch_mi355x-amds.sh) resolves scripts from benchmarks/single_node/, so this is required for the benchmark to run. Updated the source path from ./benchmark_lib.sh to ../benchmark_lib.sh to match the new directory depth. Co-authored-by: functionstackx <functionstackx@users.noreply.github.com>

- Pin transformers to a tested commit (6ed9ee36) instead of unpinned HEAD to ensure reproducible CI runs. The bundled transformers 4.57.1 in the sgl-dev image lacks glm_moe_dsa model type support. - Add --tool-call-parser glm47 and --reasoning-parser glm45 to the server launch, matching the validated manual deployment and the existing experimental/glm-5 serving config.

- The apt-get block (git, build-essential) is unnecessary because the Docker image already provides these packages, and dpkg fails without superuser privilege in CI. - Remove SGLANG_USE_AITER, HSA_NO_SCRATCH_RECLAIM (already defaults in the image), PYTORCH_HIP_ALLOC_CONF (no-op on MI355X), and --log-level info (SGLang default).

chunfangamd

Looks good to me

github-project-automation bot added this to InferenceMAX Board Feb 19, 2026

functionstackx added AMD sweep-enabled and removed sweep-enabled labels Feb 19, 2026

functionstackx added sweep-enabled and removed sweep-enabled labels Feb 19, 2026

github-actions bot and others added 2 commits February 19, 2026 17:14

functionstackx force-pushed the claude/issue-761-20260219-2050 branch from 460d361 to cd6b187 Compare February 19, 2026 22:14

functionstackx added sweep-enabled and removed sweep-enabled labels Feb 19, 2026

chunfangamd requested a review from a team February 25, 2026 07:17

Merge branch 'main' into claude/issue-761-20260219-2050

51f9593

functionstackx added the sweep-enabled label Feb 26, 2026

functionstackx and others added 4 commits February 25, 2026 21:07

Merge branch 'main' into claude/issue-761-20260219-2050

e948925

Update perf-changelog.yaml with new benchmarks

8108f92

Merge branch 'main' into claude/issue-761-20260219-2050

7773230

Fix GLM-5 benchmark: install system deps and add ROCm perf tuning

e8caa0c

chunfangamd requested review from billishyahao and chunfangamd as code owners March 2, 2026 12:01

functionstackx added sweep-enabled and removed sweep-enabled labels Mar 2, 2026

functionstackx added the sweep-enabled label Mar 2, 2026

Merge branch 'main' into claude/issue-761-20260219-2050

fbc7dc8

chunfangamd approved these changes Mar 4, 2026

View reviewed changes

chunfangamd added 2 commits March 4, 2026 14:10

Merge branch 'main' into claude/issue-761-20260219-2050

5787463

Update the PR number in perf-changelog for glm5-fp8-mi355x-sglang

70e2ed6

chunfangamd enabled auto-merge (squash) March 4, 2026 12:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add GLM-5 FP8 SGLang benchmark for MI355X#762

feat: add GLM-5 FP8 SGLang benchmark for MI355X#762
functionstackx wants to merge 13 commits intomainfrom
claude/issue-761-20260219-2050

functionstackx commented Feb 19, 2026 •

edited by chunfangamd

Loading

Uh oh!

functionstackx commented Feb 19, 2026

Uh oh!

Klaud-Cold commented Feb 19, 2026 •

edited

Loading

Uh oh!

functionstackx commented Feb 19, 2026

Uh oh!

Klaud-Cold commented Feb 19, 2026 •

edited

Loading

Uh oh!

chunfangamd left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

functionstackx commented Feb 19, 2026 • edited by chunfangamd Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

functionstackx commented Feb 19, 2026

Uh oh!

Klaud-Cold commented Feb 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Adding transformers dependency fix to GLM-5 benchmark script

Uh oh!

functionstackx commented Feb 19, 2026

Uh oh!

Klaud-Cold commented Feb 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Moving GLM-5 benchmark to single_node directory

Uh oh!

chunfangamd left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

functionstackx commented Feb 19, 2026 •

edited by chunfangamd

Loading

Klaud-Cold commented Feb 19, 2026 •

edited

Loading

Klaud-Cold commented Feb 19, 2026 •

edited

Loading