服务器:
h3c服务器
芯片:
c550
操作系统:
PRETTY_NAME="Ubuntu 22.04.5 LTS"
NAME="Ubuntu"
VERSION_ID="22.04"
VERSION="22.04.5 LTS (Jammy Jellyfish)"
VERSION_CODENAME=jammy
ID=ubuntu
ID_LIKE=debian
HOME_URL="www.ubuntu.com/"
SUPPORT_URL="help.ubuntu.com/"
BUG_REPORT_URL="bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="www.ubuntu.com/legal/terms-and-policies/privacy-policy"
UBUNTU_CODENAME=jammy
启动的模型:qwen3-vl-235b
镜像启动:docker run \
--network=host \
--device /dev/dri:/dev/dri \
--device /dev/mxcd:/dev/mxcd \
--group-add video \
--runtime=runc \
--detach=true \
--shm-size 100gb \
--ulimit memlock=-1 \
-it \
cr.metax-tech.com/public-ai-release/maca/vllm-metax:0.12.0-maca.ai3.3.0.204-torch2.8-py312-ubuntu22.04-amd64
模型启动命令:vllm serve qwen3-vl-235b-a22 --tensor-parallel-size 8 --max-model-len 1024 --enable-chunked-prefill --max-num-batched-tokens 2048 --trust-remote-code --gpu-memory-utilization 0.80 --mm-processor-cache-gb 0
看样子是卡在多卡通信如何解决