同时运行推理、嵌入、排序模型卡在显存分配问题

link

hjq
Members 2 posts

2026年2月3日 09:58 2026年2月3日 09:58
link

我也碰到一模一样的问题，两张卡上部署三个模型总有一个部署不上

Members 18 posts

2026年2月3日 11:59 2026年2月3日 11:59

你好，我已尝试 qwen2.5 的 7B 模型，还是卡住

(EngineCore_DP0 pid=34195) INFO 02-03 11:57:19 [parallel_state.py:1208] world_size=1 rank=0 local_rank=0 distributed_init_method=tcp://10.196.210.3:35845 backend=nccl
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
(EngineCore_DP0 pid=34195) INFO 02-03 11:57:19 [parallel_state.py:1394] rank 0 in world size 1 is assigned as DP rank 0, PP rank 0, TP rank 0, EP rank 0

dzdwd@dzdwd-server:~$ mx-smi
mx-smi  version: 2.2.9

=================== MetaX System Management Interface Log ===================
Timestamp                                         : Tue Feb  3 11:59:24 2026

Attached GPUs                                     : 2
+---------------------------------------------------------------------------------+
| MX-SMI 2.2.9                       Kernel Mode Driver Version: 3.4.4            |
| MACA Version: 3.3.0.15             BIOS Version: 1.29.1.0                       |
|------------------+-----------------+---------------------+----------------------|
| Board       Name | GPU   Persist-M | Bus-id              | GPU-Util      sGPU-M |
| Pwr:Usage/Cap    | Temp       Perf | Memory-Usage        | GPU-State            |
|==================+=================+=====================+======================|
| 0     MetaX N260 | 0           Off | 0000:41:00.0        | 0%          Disabled |
| 51W / 225W       | 43C          P9 | 22079/65536 MiB     | Available            |
+------------------+-----------------+---------------------+----------------------+
| 1     MetaX N260 | 1           Off | 0000:c1:00.0        | 0%          Disabled |
| 47W / 225W       | 40C          P9 | 22063/65536 MiB     | Available            |
+------------------+-----------------+---------------------+----------------------+

+---------------------------------------------------------------------------------+
| Process:                                                                        |
|  GPU                    PID         Process Name                 GPU Memory     |
|                                                                  Usage(MiB)     |
|=================================================================================|
|  0                  1384499         VLLM::Worker_TP              21394          |
|  0                  1400062         VLLM::EngineCor              16             |
|  1                  1384500         VLLM::Worker_TP              21394          |
+---------------------------------------------------------------------------------+

link

xiaoo
Members 18 posts

2026年2月3日 12:00 2026年2月3日 12:00
link

你什么卡，部署的什么模型哦。我只能同时运行一个
link

huxia
Members 1 post

2026年2月3日 14:40 2026年2月3日 14:40
link

c500 64g，也是两张卡运行三个模型，任意两个模型都不能放在同一张卡上
link

hjq
Members 2 posts

2026年2月3日 15:15 2026年2月3日 15:15
link

qwen3-embedding,qwen3-reranker,qwen3-30b-a3b
arrow_forward
Thread has been moved from 解决中.
- By shuai_chen on 2026年2月11日 14:38.
link

zhangtaoshan
Members 5 posts

2026年3月17日 15:16 2026年3月17日 15:16
link

这个有进展吗，我也遇到了如果是多模态模型vit和llm放到同一张卡上也有相同的问题，感觉是类似于这个问题？

@huxia has written:

c500 64g，也是两张卡运行三个模型，任意两个模型都不能放在同一张卡上
link

zhangtaoshan
Members 5 posts

2026年3月17日 17:16 2026年3月17日 17:16
link

请问一下，这个大概是什么问题有什么发现吗？
link

Xcvbb
Members 2 posts

2026年3月17日 22:49 2026年3月17日 22:49
link

在一张C550，64G上，用K8s，跑rerank 和embedding 同样的卡住不动，请问你怎么解决的最后？
link

Xcvbb
Members 2 posts

2026年3月17日 22:51 2026年3月17日 22:51
link

请问，你找到解决办法了吗？文档里的sGPU和shared GPU我都试了，都不行
link

xiaoo
Members 18 posts

2026年3月18日 10:28 2026年3月18日 10:28
link

已解决，需要使用分割 gpu 技术。sgpu
link

xiaoo
Members 18 posts

2026年3月18日 10:33 2026年3月18日 10:33
link

已解决，需要使用分割 gpu 技术。sgpu
link

zhangtaoshan
Members 5 posts

2026年3月18日 14:19 2026年3月18日 14:19
link

看文档是需要先手动划分显存吗？

link

xiaoo

Members 18 posts

2026年3月18日 18:55 2026年3月18日 18:55

link

没错

参考如下命令启动模型

docker run -itd \
--restart always \
--device=/dev/mxcd \
--device=/dev/sgpu000 \
--device=/dev/sgpu001 \
--device=/dev/dri/renderD128 \
--device=/dev/dri/renderD129 \
--group-add video \
--network=host \
--name llm \
--security-opt seccomp=unconfined \
--security-opt apparmor=unconfined \
--shm-size 110gb \
--ulimit memlock=-1 \
-v /models:/models \
cr.metax-tech.com/public-ai-release/maca/vllm-metax:0.13.0-maca.ai3.3.0.303-torch2.8-py310-ubuntu22.04-amd64 /bin/bash

link

zhangtaoshan
Members 5 posts

2026年3月19日 10:02 2026年3月19日 10:02
link

嗯看启动命令是划分完了启动时还需要设置 --device 是吗
link

xiaoo
Members 18 posts

2026年3月19日 12:15 2026年3月19日 12:15
link

是的哈，emmm
link

zhangtaoshan
Members 5 posts

2026年3月19日 12:36 2026年3月19日 12:36
link

好的，（补充字数
link

Irony
Members 3 posts

2026年4月7日 17:33 2026年4月7日 17:33
link

您好，请问是如何解决的啊？我是单卡c500，64g显存，尝试了sgpu分割成两块，然后在其中一块sgpu上跑qwen3_8B，能启动成功，但是测试模型输出要不是没有内容，要不就是几个汉字，跪求大佬解答