MetaX-Tech Developer Forum 论坛首页
  • 沐曦开发者
search
Sign in

rootcj

  • Members
  • Joined 2025年7月22日
  • message 帖子
  • forum 主题
  • favorite 关注者
  • favorite_border Follows
  • person_outline 详细信息

rootcj has started 1 thread.

  • See post chevron_right
    rootcj
    Members
    vllm版本问题 解决中 2025年10月29日 12:03

    一、软硬件信息
    1.服务器厂家:浪潮
    2.沐曦GPU型号:METAX_C500_64G *4
    3.操作系统内核版本:4.19.90-89.11.v2401.ky10.x86_64
    4.是否开启CPU虚拟化:否
    5.mx-smi回显:
    mx-smi
    mx-smi version: 2.2.9

    =================== MetaX System Management Interface Log ===================
    Timestamp : Thu Oct 30 10:02:21 2025

    Attached GPUs : 4
    +---------------------------------------------------------------------------------+
    | MX-SMI 2.2.9 Kernel Mode Driver Version: 3.3.12 |
    | MACA Version: unknown BIOS Version: 1.29.1.0 |
    |------------------+-----------------+---------------------+----------------------|
    | Board Name | GPU Persist-M | Bus-id | GPU-Util sGPU-M |
    | Pwr:Usage/Cap | Temp Perf | Memory-Usage | GPU-State |
    |==================+=================+=====================+======================|
    | 0 MetaX C500 | 0 Off | 0000:43:00.0 | 0% Disabled |
    | 54W / 350W | 31C P0 | 858/65536 MiB | Available |
    +------------------+-----------------+---------------------+----------------------+
    | 1 MetaX C500 | 1 Off | 0000:44:00.0 | 0% Disabled |
    | 55W / 350W | 31C P0 | 858/65536 MiB | Available |
    +------------------+-----------------+---------------------+----------------------+
    | 2 MetaX C500 | 2 Off | 0000:45:00.0 | 0% Disabled |
    | 60W / 350W | 33C P0 | 858/65536 MiB | Available |
    +------------------+-----------------+---------------------+----------------------+
    | 3 MetaX C500 | 3 Off | 0000:47:00.0 | 0% Disabled |
    | 57W / 350W | 33C P0 | 858/65536 MiB | Available |
    +------------------+-----------------+---------------------+----------------------+

    +---------------------------------------------------------------------------------+
    | Process: |
    | GPU PID Process Name GPU Memory |
    | Usage(MiB) |
    |=================================================================================|
    | no process found |
    +---------------------------------------------------------------------------------+

    End of Log

    6.docker info回显:
    docker info
    Client:
    Version: 28.3.3
    Context: default
    Debug Mode: false

    Server:
    Containers: 2
    Running: 2
    Paused: 0
    Stopped: 0
    Images: 6
    Server Version: 28.3.3
    Storage Driver: overlay2
    Backing Filesystem: extfs
    Supports d_type: true
    Using metacopy: false
    Native Overlay Diff: true
    userxattr: false
    Logging Driver: json-file
    Cgroup Driver: cgroupfs
    Cgroup Version: 1
    Plugins:
    Volume: local
    Network: bridge host ipvlan macvlan null overlay
    Log: awslogs fluentd gcplogs gelf journald json-file local splunk syslog
    CDI spec directories:
    /etc/cdi
    /var/run/cdi
    Swarm: inactive
    Runtimes: io.containerd.runc.v2 runc
    Default Runtime: runc
    Init Binary: docker-init
    containerd version: 05044ec0a9a75232cad458027ca83437aae3f4da
    runc version: v1.2.6-0-ge89a299
    init version: de40ad0
    Security Options:
    seccomp
    Profile: builtin
    Kernel Version: 4.19.90-89.11.v2401.ky10.x86_64
    Operating System: Kylin Linux Advanced Server V10 (Halberd)
    OSType: linux
    Architecture: x86_64
    CPUs: 128
    Total Memory: 994.7GiB
    Name: localhost.localdomain
    ID: ca3e0563-e7fc-4f52-ad88-9655d1100756
    Docker Root Dir: /data/docker
    7.镜像版本:
    cr.metax-tech.com/public-library/maca-pytorch:3.2.1.4-torch2.6-py310-ubuntu24.04-amd64
    cr.metax-tech.com/public-ai-release/maca/vllm:maca.ai3.1.0.7-torch2.6-py310-ubuntu22.04-amd64
    cr.metax-tech.com/public-ai-release/maca/modelzoo.llm.vllm:maca.ai2.33.1.12-torch2.6-py310-ubuntu22.04-amd64
    cr.metax-tech.com/public-ai-release/maca/vllm:maca.ai2.33.1.12-torch2.6-py310-ubuntu22.04-amd64
    8.启动容器命令:
    docker run -it --device=/dev/dri --device=/dev/mxcd --group-add video --name images --device=/dev/mem --network=host --security-opt seccomp=unconfined --security-opt apparmor=unconfined --shm-size '100gb' --ulimit memlock=-1 -v /usr/local/:/usr/local/ -v /data/models/:/data/models/ ce3f69501a52 /bin/bash
    9.容器内执行命令:
    vllm serve /data/models/Qwen/Qwen3-VL-30B-A3B-Instruct --served-model-name Qwen3-VL-30B --tensor-parallel-size 4 --swap-space 16 --trust-remote-code --dtype bfloat16 --gpu-memory-utilization 0.9 --max-model-len 30720 --port 18091
    二、问题现象
    服务器是4张64G C500沐曦显卡,部署MiniCPM-V-4_5 、Qwen3-VL-30B-A3B-Instruct都失败了,vllm 0.10.0不支持这2个模型,请问沐曦官方的vllm0.11什么时候可以升级,Qwen3-Image也没有成功

  • 沐曦开发者论坛
powered by misago