docker镜像:vllm-metax:0.15.0-maca.ai3.5.3.203-torch2.8-py312-ubuntu22.04-amd64
mx-smi:
mx-smi version: 2.2.12
=================== MetaX System Management Interface Log ===================
Timestamp : Tue Apr 7 13:23:44 2026
Attached GPUs : 8
+---------------------------------------------------------------------------------+
| MX-SMI 2.2.12 Kernel Mode Driver Version: 3.3.12 |
| MACA Version: 3.5.3.20 BIOS Version: 1.22.3.0 |
|------------------+-----------------+---------------------+----------------------|
| Board Name | GPU Persist-M | Bus-id | GPU-Util sGPU-M |
| Pwr:Usage/Cap | Temp Perf | Memory-Usage | GPU-State |
|==================+=================+=====================+======================|
| 0 MetaX C550 | 0 N/A | 0000:2a:00.0 | 0% Disabled |
| NA / NA | 34C N/A | 858/65536 MiB | Available |
+------------------+-----------------+---------------------+----------------------+
| 1 MetaX C550 | 1 N/A | 0000:3a:00.0 | 0% Disabled |
| NA / NA | 38C N/A | 858/65536 MiB | Available |
+------------------+-----------------+---------------------+----------------------+
| 2 MetaX C550 | 2 N/A | 0000:4c:00.0 | 0% Disabled |
| NA / NA | 39C N/A | 858/65536 MiB | Available |
+------------------+-----------------+---------------------+----------------------+
| 3 MetaX C550 | 3 N/A | 0000:5c:00.0 | 0% Disabled |
| NA / NA | 37C N/A | 858/65536 MiB | Available |
+------------------+-----------------+---------------------+----------------------+
| 4 MetaX C550 | 4 N/A | 0000:aa:00.0 | 0% Disabled |
| NA / NA | 37C N/A | 858/65536 MiB | Available |
+------------------+-----------------+---------------------+----------------------+
| 5 MetaX C550 | 5 N/A | 0000:ba:00.0 | 0% Disabled |
| NA / NA | 39C N/A | 858/65536 MiB | Available |
+------------------+-----------------+---------------------+----------------------+
| 6 MetaX C550 | 6 N/A | 0000:ca:00.0 | 0% Disabled |
| NA / NA | 39C N/A | 858/65536 MiB | Available |
+------------------+-----------------+---------------------+----------------------+
| 7 MetaX C550 | 7 N/A | 0000:da:00.0 | 0% Disabled |
| NA / NA | 35C N/A | 858/65536 MiB | Available |
+------------------+-----------------+---------------------+----------------------+
+---------------------------------------------------------------------------------+
| Process: |
| GPU PID Process Name GPU Memory |
| Usage(MiB) |
|=================================================================================|
| no process found |
+---------------------------------------------------------------------------------+
End of Log
vllm 运行命令:
nohup vllm serve /data/metax-tech/Qwen3.5-397B-A17B-W8A8 \
--host 0.0.0.0 \
--port 8000 \
--tensor-parallel-size 8 \
--gpu-memory-utilization 0.9 \
--max-model-len 262144 \
--reasoning-parser qwen3 \
--enable-auto-tool-choice \
--tool-call-parser qwen3_coder \
--served-model-name Qwen3.5-W8A8\
--speculative-config '{"method":"qwen3_next_mtp","num_speculative_tokens":2}' &
报错信息:
root@h20gpu-20:/data/metax-tech/Qwen3.5-397B-A17B-W8A8# tail -f nohup.out
(APIServer pid=32) File "/opt/conda/lib/python3.12/site-packages/vllm/engine/arg_utils.py", line 12
(APIServer pid=32) return ModelConfig(
(APIServer pid=32) ^^^^^^^^^^^^
(APIServer pid=32) File "/opt/conda/lib/python3.12/site-packages/pydantic/_internal/_dataclasses.py
(APIServer pid=32) s.__pydantic_validator__.validate_python(ArgsKwargs(args, kwargs), self_instan
(APIServer pid=32) pydantic_core._pydantic_core.ValidationError: 1 validation error for ModelConfig
(APIServer pid=32) Value error, The checkpoint you are trying to load has model type `qwen3_5_moe` int, or because your version of Transformers is out of date.
(APIServer pid=32)
(APIServer pid=32) You can update Transformers with the command `pip install --upgrade transformers`.orts this model yet. In this case, you can get the most up-to-date code by installing Transformers frerror, input_value=ArgsKwargs((), {'model': ...rocessor_plugin': None}), input_type=ArgsKwargs]
(APIServer pid=32) For further information visit https://errors.pydantic.dev/2.12/v/value_error
(APIServer pid=32) File "/opt/conda/lib/python3.12/site-packages/vllm/engine/arg_utils.py", line 12
(APIServer pid=32) return ModelConfig(
(APIServer pid=32) ^^^^^^^^^^^^
(APIServer pid=32) File "/opt/conda/lib/python3.12/site-packages/pydantic/_internal/_dataclasses.py
(APIServer pid=32) s.__pydantic_validator__.validate_python(ArgsKwargs(args, kwargs), self_instan
(APIServer pid=32) pydantic_core._pydantic_core.ValidationError: 1 validation error for ModelConfig
(APIServer pid=32) Value error, The checkpoint you are trying to load has model type `qwen3_5_moe` int, or because your version of Transformers is out of date.
(APIServer pid=32)
(APIServer pid=32) You can update Transformers with the command `pip install --upgrade transformers`.orts this model yet. In this case, you can get the most up-to-date code by installing Transformers frerror, input_value=ArgsKwargs((), {'model': ...rocessor_plugin': None}), input_type=ArgsKwargs]
(APIServer pid=32) For further information visit https://errors.pydantic.dev/2.12/v/value_error
是要升级 transformers 吗?