环境信息: k8s里的提交了一个job, 镜像用的vllm-metax:0.15.0。。。
containers:
- name: master
image: registry.sa-ryd.sensetime.com.sa/ccr-alg/vllm-metax:0.15.0-maca.ai3.5.3.203-torch2.8-py312-ubuntu22.04-amd64
command: ["bash", "-lc", "sleep infinity"]
resources:
requests:
cpu: '32'
memory: '64Gi'
metax-tech.com/gpu: '1'
limits:
cpu: '32'
memory: '64Gi'
metax-tech.com/gpu: '1'
mx-smi信息:
mx-smi version: 2.2.9
=================== MetaX System Management Interface Log ===================
Timestamp : Wed Apr 8 19:20:41 2026
Attached GPUs : 1
+---------------------------------------------------------------------------------+
| MX-SMI 2.2.9 Kernel Mode Driver Version: 3.4.4 |
| MACA Version: 3.5.3.20 BIOS Version: 1.30.0.0 |
|------------------+-----------------+---------------------+----------------------|
| Board Name | GPU Persist-M | Bus-id | GPU-Util sGPU-M |
| Pwr:Usage/Cap | Temp Perf | Memory-Usage | GPU-State |
|==================+=================+=====================+======================|
| 0 MetaX C500 | 0 Off | 0000:0c:00.0 | 0% Disabled |
| 57W / 350W | 36C P0 | 858/65536 MiB | Available |
+------------------+-----------------+---------------------+----------------------+
启动命令:
根据vllm-metax.readthedocs.io/en/latest/getting_started/installation/maca.html#build-vllm的文档,
安装了 git clone --branch v0.18.0-dev github.com/MetaX-MACA/vLLM-metax 和 vllm 0.18.0. 安装成功。
pip list | grep vllm
vllm 0.18.1.dev0+gbcf2be961.d20260408.empty
vllm_metax 0.18.0+gea0600.d20260408.maca3.5.3.20.torch2.8
然后运行vllm,
vllm serve /data/ASR/qwen3-asr-hf/hub/models--Qwen--Qwen3-ASR-1.7B/snapshots/7278e1e70fe206f11671096ffdd38061171dd6e5 --served-model-name "Qwen3-ASR-1.7B" --host 0.0.0.0 --port 12212
报错信息:
INFO 04-08 19:15:51 [init.py:44] Available plugins for group vllm.platform_plugins:
INFO 04-08 19:15:51 [init.py:46] - metax -> vllm_metax:register
INFO 04-08 19:15:51 [init.py:49] All plugins in this group will be loaded. Set VLLM_PLUGINS to control which plugins to load.
INFO 04-08 19:15:51 [init.py:239] Platform plugin metax is activated
INFO 04-08 19:15:51 [envs.py:104] Plugin sets VLLM_USE_FLASHINFER_SAMPLER to False. Reason: flashinfer sampler are not supported on maca
INFO 04-08 19:15:51 [envs.py:104] Plugin sets VLLM_ENGINE_READY_TIMEOUT_S to 3600. Reason: set timeout to 3600s for model loading
INFO Print the version information of mcoplib during compilation.
Version info:Mcoplib_Version = '0.4.0'
Build_Maca_Version = '3.5.3.18'
GIT_BRANCH = 'HEAD'
GIT_COMMIT = 'fe3a7e2'
Vllm Op Version = 0.15.0
SGlang Op Version = 0.5.7 && 0.5.8
INFO Staring Check the current MACA version of the operating environment.
INFO: Release major.minor matching, successful:3.5.
Traceback (most recent call last):
File "/opt/conda/bin/vllm", line 5, in <module>
from vllm.entrypoints.cli.main import main
File "/opt/conda/lib/python3.12/site-packages/vllm/entrypoints/cli/init.py", line 4, in <module>
from vllm.entrypoints.cli.benchmark.mm_processor import (
File "/opt/conda/lib/python3.12/site-packages/vllm/entrypoints/cli/benchmark/mm_processor.py", line 5, in <module>
from vllm.benchmarks.mm_processor import add_cli_args, main
File "/opt/conda/lib/python3.12/site-packages/vllm/benchmarks/mm_processor.py", line 26, in <module>
from vllm.benchmarks.datasets import (
File "/opt/conda/lib/python3.12/site-packages/vllm/benchmarks/datasets.py", line 39, in <module>
from vllm.lora.utils import get_adapter_absolute_path
File "/opt/conda/lib/python3.12/site-packages/vllm/lora/utils.py", line 17, in <module>
from vllm.lora.layers import (
File "/opt/conda/lib/python3.12/site-packages/vllm/lora/layers/init.py", line 4, in <module>
from vllm.lora.layers.column_parallel_linear import (
File "/opt/conda/lib/python3.12/site-packages/vllm/lora/layers/column_parallel_linear.py", line 12, in <module>
from vllm.model_executor.layers.linear import (
File "/opt/conda/lib/python3.12/site-packages/vllm/model_executor/layers/linear.py", line 28, in <module>
from vllm.model_executor.layers.utils import (
File "/opt/conda/lib/python3.12/site-packages/vllm/model_executor/layers/utils.py", line 9, in <module>
from vllm import _custom_ops as ops
File "/opt/conda/lib/python3.12/site-packages/vllm/_custom_ops.py", line 95, in <module>
@register_fake("_C::scaled_fp4_quant.out")
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/conda/lib/python3.12/site-packages/torch/library.py", line 1069, in register
use_lib._register_fake(
File "/opt/conda/lib/python3.12/site-packages/torch/library.py", line 219, in _register_fake
handle = entry.fake_impl.register(
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/conda/lib/python3.12/site-packages/torch/_library/fake_impl.py", line 50, in register
if torch._C._dispatch_has_kernel_for_dispatch_key(self.qualname, "Meta"):
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: operator _C::scaled_fp4_quant.out does not exist
问题: 这个算子报错的链路出在哪里啊? 模型都没开始加载,像是初始化效验算子到位的时候,直接崩了。
自己想提前跑点新模型, 碰到类似的报错是不是到此为止无能为力了?!
k8s里的宿主机不升级驱动, 光使用新的vllm-metax:0.15.0-maca.ai3.5.3.203的镜像在上面跑,会有问题不?!