硬件环境
mx-smi version: 2.2.12
=================== MetaX System Management Interface Log ===================
Timestamp : Fri Apr 17 13:50:37 2026
Attached GPUs : 8
+---------------------------------------------------------------------------------+
| MX-SMI 2.2.12 Kernel Mode Driver Version: 3.3.12 |
| MACA Version: 3.5.3.20 BIOS Version: 1.22.3.0 |
|------------------+-----------------+---------------------+----------------------|
| Board Name | GPU Persist-M | Bus-id | GPU-Util sGPU-M |
| Pwr:Usage/Cap | Temp Perf | Memory-Usage | GPU-State |
|==================+=================+=====================+======================|
| 0 MetaX C550 | 0 N/A | 0000:2a:00.0 | 0% Disabled |
| NA / NA | 36C N/A | 60773/65536 MiB | Available |
+------------------+-----------------+---------------------+----------------------+
| 1 MetaX C550 | 1 N/A | 0000:3a:00.0 | 0% Disabled |
| NA / NA | 41C N/A | 60773/65536 MiB | Available |
+------------------+-----------------+---------------------+----------------------+
| 2 MetaX C550 | 2 N/A | 0000:4c:00.0 | 0% Disabled |
| NA / NA | 43C N/A | 60773/65536 MiB | Available |
+------------------+-----------------+---------------------+----------------------+
| 3 MetaX C550 | 3 N/A | 0000:5c:00.0 | 0% Disabled |
| NA / NA | 38C N/A | 60771/65536 MiB | Available |
+------------------+-----------------+---------------------+----------------------+
| 4 MetaX C550 | 4 N/A | 0000:aa:00.0 | 0% Disabled |
| NA / NA | 39C N/A | 60773/65536 MiB | Available |
+------------------+-----------------+---------------------+----------------------+
| 5 MetaX C550 | 5 N/A | 0000:ba:00.0 | 0% Disabled |
| NA / NA | 43C N/A | 60771/65536 MiB | Available |
+------------------+-----------------+---------------------+----------------------+
| 6 MetaX C550 | 6 N/A | 0000:ca:00.0 | 0% Disabled |
| NA / NA | 43C N/A | 60771/65536 MiB | Available |
+------------------+-----------------+---------------------+----------------------+
| 7 MetaX C550 | 7 N/A | 0000:da:00.0 | 0% Disabled |
| NA / NA | 37C N/A | 60771/65536 MiB | Available |
+------------------+-----------------+---------------------+----------------------+
+---------------------------------------------------------------------------------+
| Process: |
| GPU PID Process Name GPU Memory |
| Usage(MiB) |
|=================================================================================|
| 0 315 VLLM::Worker_TP 59902 |
| 1 316 VLLM::Worker_TP 59902 |
| 2 317 VLLM::Worker_TP 59902 |
| 3 318 VLLM::Worker_TP 59900 |
| 4 319 VLLM::Worker_TP 59902 |
| 5 320 VLLM::Worker_TP 59900 |
| 6 321 VLLM::Worker_TP 59900 |
| 7 322 VLLM::Worker_TP 59900 |
+---------------------------------------------------------------------------------+
使用的docker镜像
vllm-metax:0.17.0-maca.ai3.5.3.307-torch2.8-py312-ubuntu22.04-amd64
使用的权重
Qwen3.5-397B-A17B-W8A8
由于兼容性问题关闭了 CUDA Graph 捕获 VLLM_USE_V1=0
升级了transformers到5.2.0
启动命令:
vllm serve /data/metax-tech/Qwen3.5-397B-A17B-W8A8 \
--host 0.0.0.0 \
--port 8000 \
--tensor-parallel-size 8 \
--gpu-memory-utilization 0.88 \
--max-model-len 262144 \
--reasoning-parser qwen3 \
--enable-auto-tool-choice \
--tool-call-parser qwen3_coder \
--served-model-name Qwen3.5-W8A8 \
--trust-remote-code \
--enforce-eager
现在整体速度很低约 7.9 tokens/s 。有没有那些参数可以进行加速和优化?