vllm版本问题

Members 2 posts

2025年10月29日 12:03 2025年10月29日 12:03

一、软硬件信息
1.服务器厂家:浪潮
2.沐曦GPU型号：METAX_C500_64G *4
3.操作系统内核版本：4.19.90-89.11.v2401.ky10.x86_64
4.是否开启CPU虚拟化：否
5.mx-smi回显：
mx-smi
mx-smi version: 2.2.9

=================== MetaX System Management Interface Log ===================
Timestamp : Thu Oct 30 10:02:21 2025

Attached GPUs : 4
+---------------------------------------------------------------------------------+
| MX-SMI 2.2.9 Kernel Mode Driver Version: 3.3.12 |
| MACA Version: unknown BIOS Version: 1.29.1.0 |
|------------------+-----------------+---------------------+----------------------|
| Board Name | GPU Persist-M | Bus-id | GPU-Util sGPU-M |
| Pwr:Usage/Cap | Temp Perf | Memory-Usage | GPU-State |
|==================+=================+=====================+======================|
| 0 MetaX C500 | 0 Off | 0000:43:00.0 | 0% Disabled |
| 54W / 350W | 31C P0 | 858/65536 MiB | Available |
+------------------+-----------------+---------------------+----------------------+
| 1 MetaX C500 | 1 Off | 0000:44:00.0 | 0% Disabled |
| 55W / 350W | 31C P0 | 858/65536 MiB | Available |
+------------------+-----------------+---------------------+----------------------+
| 2 MetaX C500 | 2 Off | 0000:45:00.0 | 0% Disabled |
| 60W / 350W | 33C P0 | 858/65536 MiB | Available |
+------------------+-----------------+---------------------+----------------------+
| 3 MetaX C500 | 3 Off | 0000:47:00.0 | 0% Disabled |
| 57W / 350W | 33C P0 | 858/65536 MiB | Available |
+------------------+-----------------+---------------------+----------------------+

+---------------------------------------------------------------------------------+
| Process: |
| GPU PID Process Name GPU Memory |
| Usage(MiB) |
|=================================================================================|
| no process found |
+---------------------------------------------------------------------------------+

End of Log

6.docker info回显：
docker info
Client:
Version: 28.3.3
Context: default
Debug Mode: false

Server:
Containers: 2
Running: 2
Paused: 0
Stopped: 0
Images: 6
Server Version: 28.3.3
Storage Driver: overlay2
Backing Filesystem: extfs
Supports d_type: true
Using metacopy: false
Native Overlay Diff: true
userxattr: false
Logging Driver: json-file
Cgroup Driver: cgroupfs
Cgroup Version: 1
Plugins:
Volume: local
Network: bridge host ipvlan macvlan null overlay
Log: awslogs fluentd gcplogs gelf journald json-file local splunk syslog
CDI spec directories:
/etc/cdi
/var/run/cdi
Swarm: inactive
Runtimes: io.containerd.runc.v2 runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 05044ec0a9a75232cad458027ca83437aae3f4da
runc version: v1.2.6-0-ge89a299
init version: de40ad0
Security Options:
seccomp
Profile: builtin
Kernel Version: 4.19.90-89.11.v2401.ky10.x86_64
Operating System: Kylin Linux Advanced Server V10 (Halberd)
OSType: linux
Architecture: x86_64
CPUs: 128
Total Memory: 994.7GiB
Name: localhost.localdomain
ID: ca3e0563-e7fc-4f52-ad88-9655d1100756
Docker Root Dir: /data/docker
7.镜像版本：
cr.metax-tech.com/public-library/maca-pytorch:3.2.1.4-torch2.6-py310-ubuntu24.04-amd64
cr.metax-tech.com/public-ai-release/maca/vllm:maca.ai3.1.0.7-torch2.6-py310-ubuntu22.04-amd64
cr.metax-tech.com/public-ai-release/maca/modelzoo.llm.vllm:maca.ai2.33.1.12-torch2.6-py310-ubuntu22.04-amd64
cr.metax-tech.com/public-ai-release/maca/vllm:maca.ai2.33.1.12-torch2.6-py310-ubuntu22.04-amd64
8.启动容器命令：
docker run -it --device=/dev/dri --device=/dev/mxcd --group-add video --name images --device=/dev/mem --network=host --security-opt seccomp=unconfined --security-opt apparmor=unconfined --shm-size '100gb' --ulimit memlock=-1 -v /usr/local/:/usr/local/ -v /data/models/:/data/models/ ce3f69501a52 /bin/bash
9.容器内执行命令：
vllm serve /data/models/Qwen/Qwen3-VL-30B-A3B-Instruct --served-model-name Qwen3-VL-30B --tensor-parallel-size 4 --swap-space 16 --trust-remote-code --dtype bfloat16 --gpu-memory-utilization 0.9 --max-model-len 30720 --port 18091
二、问题现象
服务器是4张64G C500沐曦显卡，部署MiniCPM-V-4_5 、Qwen3-VL-30B-A3B-Instruct都失败了，vllm 0.10.0不支持这2个模型，请问沐曦官方的vllm0.11什么时候可以升级,Qwen3-Image也没有成功

link

shuai_chen

Members 314 posts

2025年10月30日 18:23 2025年10月30日 18:23

link

尊敬的开发者您好，请给出详细的问题日志。

link

shuai_chen

Members 314 posts

2025年10月30日 18:25 2025年10月30日 18:25

link

尊敬的开发者您好，vllm0.11镜像更新请关注开发者镜像下载中心更新。

link

rootcj

Members 2 posts

2025年10月30日 18:26 2025年10月30日 18:26

link

RuntimeError: Worker failed with error 'CUDA out of memory. Tried to allocate 288.00 MiB. GPU 0 has a total capacity of 63.59 GiB of which 0 bytes is free. Of the allocated memory 57.90 GiB is allocated by PyTorch, and 45.49 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (pytorch.org/docs/stable/notes/cuda.html#environment-variables)', please check the stack trace above for the root cause

link

shuai_chen

Members 314 posts

2025年10月30日 18:27 2025年10月30日 18:27

link

尊敬的开发者您好，您这个报错是显存不足。