GLM5.1适配问题

Members 9 posts

2026年4月14日 10:13 2026年4月14日 10:13

GLM-5.1 在 MetaX C500 上部署遇到的问题

【系统环境】
GPU：MetaX C500 × 16 卡（2 节点，各 8 卡，64GB/卡）
操作系统：Ubuntu 22.04, x86_64
MACA SDK：3.5.3.102
Docker 镜像：pub-registry1.metax-tech.com/ai-opentest/dev/vllm-metax:0.14.0-maca.ai3.5.3.102-torch2.8-py310-ubuntu22.04-amd64_gl
m_w4a8_full (49.3GB)
vLLM 版本：vLLM-MetaX 0.14.0（v1 引擎）
Ray 版本：2.53.0
部署方式：2 节点 PP=2 TP=8，Ray 集群已组建
模型存储：GPFS 共享存储，所有节点可访问

【问题 1：FP8 模型不兼容】
模型：GLM-5.1-FP8（705GB）
报错：fp8 quantization is currently not supported in maca
MACA 当前不支持 FP8 量化

【问题 2：Eco-Tech W4A8 模型不兼容】
模型：Eco-Tech/GLM-5.1-w4a8（~783GB）
该模型使用 msmodelslim 格式（为昇腾设计），权重文件名为 quant_model_weights 而非标准 model，配置文件为
quant_model_description.json 而非标准 quantization_config，vLLM-MetaX 无法识别加载

【问题 3：v1 引擎多节点 Pipeline Parallelism】
报错：local_rank 10 is out of bounds / device id 2 not exist
v1 引擎的 multiproc_executor 将所有 worker 当作本地进程，无法正确映射远程节点 GPU
此版本已移除 v0 引擎（VLLM_USE_V1=0 无效）
--distributed-executor-backend ray 也遇到相同的 device 映射错误

【请求】
1. 是否有 MACA 兼容的 GLM-5.1 W4A8 模型（compressed-tensors 格式）？
2. 该镜像正确的多节点部署方式是什么？

link

shuai_chen

Members 384 posts

2026年4月14日 11:10 2026年4月14日 11:10

link

尊敬的开发者您好，问题回复如下
A1：C500不支持FP8
A2：C500不支持 msmodelslim 格式
A3：MCCL节点间是否测试正常，双机16卡
请求回复如下
1.请参考vllm社区相关量化文档进行量化，GLM-5.1 W8A8模型权重正在上传中，请关注modelscope.cn/organization/metax-tech
2.MCCL节点间是否测试正常，双机16卡

link

jiaqian

Members 9 posts

2026年4月14日 13:28 2026年4月14日 13:28

link

请教一下，当前是否存在可以推理GLM5的amd镜像？

link

shuai_chen

Members 384 posts

2026年4月14日 13:59 2026年4月14日 13:59

link

尊敬的开发者您好，镜像

docker pull pub-registry1.metax-tech.com/ai-opentest/dev/vllm-metax:0.14.0-maca.ai3.5.3.102-torch2.8-py310-ubuntu22.04-amd64_glm_w4a8_full

link

jiaqian

Members 9 posts

2026年4月14日 18:03 2026年4月14日 18:03

link

感谢您的回复，请问之后是否会考虑上传W4A8的量化版本。