我也碰到一模一样的问题,两张卡上部署三个模型总有一个部署不上
我也碰到一模一样的问题,两张卡上部署三个模型总有一个部署不上
你好,我已尝试 qwen2.5 的 7B 模型,还是卡住
(EngineCore_DP0 pid=34195) INFO 02-03 11:57:19 [parallel_state.py:1208] world_size=1 rank=0 local_rank=0 distributed_init_method=tcp://10.196.210.3:35845 backend=nccl
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
(EngineCore_DP0 pid=34195) INFO 02-03 11:57:19 [parallel_state.py:1394] rank 0 in world size 1 is assigned as DP rank 0, PP rank 0, TP rank 0, EP rank 0
dzdwd@dzdwd-server:~$ mx-smi
mx-smi version: 2.2.9
=================== MetaX System Management Interface Log ===================
Timestamp : Tue Feb 3 11:59:24 2026
Attached GPUs : 2
+---------------------------------------------------------------------------------+
| MX-SMI 2.2.9 Kernel Mode Driver Version: 3.4.4 |
| MACA Version: 3.3.0.15 BIOS Version: 1.29.1.0 |
|------------------+-----------------+---------------------+----------------------|
| Board Name | GPU Persist-M | Bus-id | GPU-Util sGPU-M |
| Pwr:Usage/Cap | Temp Perf | Memory-Usage | GPU-State |
|==================+=================+=====================+======================|
| 0 MetaX N260 | 0 Off | 0000:41:00.0 | 0% Disabled |
| 51W / 225W | 43C P9 | 22079/65536 MiB | Available |
+------------------+-----------------+---------------------+----------------------+
| 1 MetaX N260 | 1 Off | 0000:c1:00.0 | 0% Disabled |
| 47W / 225W | 40C P9 | 22063/65536 MiB | Available |
+------------------+-----------------+---------------------+----------------------+
+---------------------------------------------------------------------------------+
| Process: |
| GPU PID Process Name GPU Memory |
| Usage(MiB) |
|=================================================================================|
| 0 1384499 VLLM::Worker_TP 21394 |
| 0 1400062 VLLM::EngineCor 16 |
| 1 1384500 VLLM::Worker_TP 21394 |
+---------------------------------------------------------------------------------+
你什么卡,部署的什么模型哦。我只能同时运行一个
c500 64g,也是两张卡运行三个模型,任意两个模型都不能放在同一张卡上
qwen3-embedding,qwen3-reranker,qwen3-30b-a3b
这个有进展吗, 我也遇到了如果是多模态模型vit和llm放到同一张卡上也有相同的问题,感觉是类似于这个问题?
请问一下,这个大概是什么问题有什么发现吗?
在一张C550,64G上,用K8s,跑rerank 和embedding 同样的卡住不动,请问你怎么解决的最后?
请问,你找到解决办法了吗?文档里的sGPU和shared GPU我都试了,都不行
已解决,需要使用分割 gpu 技术。sgpu
已解决,需要使用分割 gpu 技术。sgpu
看文档是需要先手动划分显存吗?
没错
参考如下命令启动模型
docker run -itd \
--restart always \
--device=/dev/mxcd \
--device=/dev/sgpu000 \
--device=/dev/sgpu001 \
--device=/dev/dri/renderD128 \
--device=/dev/dri/renderD129 \
--group-add video \
--network=host \
--name llm \
--security-opt seccomp=unconfined \
--security-opt apparmor=unconfined \
--shm-size 110gb \
--ulimit memlock=-1 \
-v /models:/models \
cr.metax-tech.com/public-ai-release/maca/vllm-metax:0.13.0-maca.ai3.3.0.303-torch2.8-py310-ubuntu22.04-amd64 /bin/bash
嗯看启动命令是划分完了启动时还需要设置 --device 是吗
是的哈,emmm
好的,(补充字数
您好,请问是如何解决的啊?我是单卡c500,64g显存,尝试了sgpu分割成两块,然后在其中一块sgpu上跑qwen3_8B,能启动成功,但是测试模型输出要不是没有内容,要不就是几个汉字,跪求大佬解答