MetaX-Tech Developer Forum 论坛首页
  • 沐曦开发者
search
Sign in

lgp001

  • Members
  • Joined 2026年2月24日
  • message 帖子
  • forum 主题
  • favorite 关注者
  • favorite_border Follows
  • person_outline 详细信息

lgp001 has posted 7 messages.

  • See post chevron_right
    lgp001
    Members
    沐曦C500部署qwen3-vl-30b-a3b,第二次识别图片时报错 已解决 2026年2月26日 15:32

    好的,我看错了

  • See post chevron_right
    lgp001
    Members
    沐曦C500部署qwen3-vl-30b-a3b,第二次识别图片时报错 已解决 2026年2月26日 15:31

    请您查看我的回复,已经选择mx-smi -u /lib/firmware/metax/$chip_type/mxc500/mxvbios-1.31.1.0-1078-C550.bin -t 600不是吗?

  • See post chevron_right
    lgp001
    Members
    沐曦C500部署qwen3-vl-30b-a3b,第二次识别图片时报错 已解决 2026年2月26日 15:24

    mx-smi -u /lib/firmware/metax/$chip_type/mxc500/mxvbios-1.31.1.0-1078-C550.bin -t 600
    参照教程,这个可以吗?

  • See post chevron_right
    lgp001
    Members
    沐曦C500部署qwen3-vl-30b-a3b,第二次识别图片时报错 已解决 2026年2月26日 15:09

    root@chaoxun:/lib/firmware/metax/mxc500# mx-smi -u /lib/firmware/metax/$chip_type/mxc500/mxvbios-1.31.1.0-1078-C550.bin -t 600
    mx-smi version: 2.2.12
    Hint: -u only support upgrading vbios for all devices.
    GPU#0 vbios-upgrade Ioctl failed: Chip info mismatch

    root@chaoxun:/lib/firmware/metax/mxc500# mx-smi
    mx-smi version: 2.2.12

    =================== MetaX System Management Interface Log ===================
    Timestamp : Thu Feb 26 15:05:22 2026

    Attached GPUs : 8
    +---------------------------------------------------------------------------------+
    | MX-SMI 2.2.12 Kernel Mode Driver Version: 3.4.4 |
    | MACA Version: 3.5.3.17 BIOS Version: 1.26.1.0 |
    |------------------+-----------------+---------------------+----------------------|
    | Board Name | GPU Persist-M | Bus-id | GPU-Util sGPU-M |
    | Pwr:Usage/Cap | Temp Perf | Memory-Usage | GPU-State |
    |==================+=================+=====================+======================|
    | 0 MetaX C500 | 0 Off | 0000:08:00.0 | 0% Disabled |
    | 60W / 350W | 47C P0 | 858/65536 MiB | Available |
    +------------------+-----------------+---------------------+----------------------+
    | 1 MetaX C500 | 1 Off | 0000:09:00.0 | 0% Disabled |
    | 63W / 350W | 48C P0 | 858/65536 MiB | Available |
    +------------------+-----------------+---------------------+----------------------+
    | 2 MetaX C500 | 2 Off | 0000:0e:00.0 | 0% Disabled |
    | 58W / 350W | 45C P0 | 858/65536 MiB | Available |
    +------------------+-----------------+---------------------+----------------------+
    | 3 MetaX C500 | 3 Off | 0000:11:00.0 | 0% Disabled |
    | 61W / 350W | 46C P0 | 858/65536 MiB | Available |
    +------------------+-----------------+---------------------+----------------------+
    | 4 MetaX C500 | 4 Off | 0000:32:00.0 | 0% Disabled |
    | 58W / 350W | 45C P0 | 858/65536 MiB | Available |
    +------------------+-----------------+---------------------+----------------------+
    | 5 MetaX C500 | 5 Off | 0000:38:00.0 | 0% Disabled |
    | 59W / 350W | 46C P0 | 858/65536 MiB | Available |
    +------------------+-----------------+---------------------+----------------------+
    | 6 MetaX C500 | 6 Off | 0000:3b:00.0 | 0% Disabled |
    | 61W / 350W | 47C P0 | 858/65536 MiB | Available |
    +------------------+-----------------+---------------------+----------------------+
    | 7 MetaX C500 | 7 Off | 0000:3c:00.0 | 0% Disabled |
    | 63W / 350W | 49C P0 | 858/65536 MiB | Available |
    +------------------+-----------------+---------------------+----------------------+

    +---------------------------------------------------------------------------------+
    | Process: |
    | GPU PID Process Name GPU Memory |
    | Usage(MiB) |
    |=================================================================================|
    | no process found |
    +---------------------------------------------------------------------------------+

    你好,我更新了驱动跟sdk后,再更新固件报错了,是哪里操作错了吗

  • See post chevron_right
    lgp001
    Members
    沐曦C500部署qwen3-vl-30b-a3b,第二次识别图片时报错 已解决 2026年2月26日 11:53

    请问固件在哪下载安装,我只找到了驱动跟sdk

  • See post chevron_right
    lgp001
    Members
    沐曦C500部署qwen3-vl-30b-a3b,第二次识别图片时报错 已解决 2026年2月26日 11:29

    一、软硬件信息
    1.服务器厂家:
    New H3C Technologies Co., Ltd.
    2.沐曦GPU型号:MetaX C500
    3.操作系统内核版本:5.19.0-46-generic
    4.是否开启CPU虚拟化:
    5.mx-smi回显:
    root@chaoxun:/home/sts# mx-smi
    mx-smi version: 2.2.9

    =================== MetaX System Management Interface Log ===================
    Timestamp : Thu Feb 26 11:24:27 2026

    Attached GPUs : 8
    +---------------------------------------------------------------------------------+
    | MX-SMI 2.2.9 Kernel Mode Driver Version: 3.4.4 |
    | MACA Version: 3.0.0.8 BIOS Version: 1.26.1.0 |
    |------------------+-----------------+---------------------+----------------------|
    | Board Name | GPU Persist-M | Bus-id | GPU-Util sGPU-M |
    | Pwr:Usage/Cap | Temp Perf | Memory-Usage | GPU-State |
    |==================+=================+=====================+======================|
    | 0 MetaX C500 | 0 Off | 0000:08:00.0 | 0% Disabled |
    | 72W / 350W | 50C P9 | 59360/65536 MiB | Available |
    +------------------+-----------------+---------------------+----------------------+
    | 1 MetaX C500 | 1 Off | 0000:09:00.0 | 0% Disabled |
    | 76W / 350W | 52C P9 | 59360/65536 MiB | Available |
    +------------------+-----------------+---------------------+----------------------+
    | 2 MetaX C500 | 2 Off | 0000:0e:00.0 | 0% Disabled |
    | 72W / 350W | 49C P9 | 59360/65536 MiB | Available |
    +------------------+-----------------+---------------------+----------------------+
    | 3 MetaX C500 | 3 Off | 0000:11:00.0 | 0% Disabled |
    | 75W / 350W | 50C P9 | 59360/65536 MiB | Available |
    +------------------+-----------------+---------------------+----------------------+
    | 4 MetaX C500 | 4 Off | 0000:32:00.0 | 0% Disabled |
    | 58W / 350W | 45C P0 | 860/65536 MiB | Available |
    +------------------+-----------------+---------------------+----------------------+
    | 5 MetaX C500 | 5 Off | 0000:38:00.0 | 0% Disabled |
    | 59W / 350W | 46C P0 | 860/65536 MiB | Available |
    +------------------+-----------------+---------------------+----------------------+
    | 6 MetaX C500 | 6 Off | 0000:3b:00.0 | 0% Disabled |
    | 61W / 350W | 48C P0 | 860/65536 MiB | Available |
    +------------------+-----------------+---------------------+----------------------+
    | 7 MetaX C500 | 7 Off | 0000:3c:00.0 | 0% Disabled |
    | 63W / 350W | 50C P0 | 860/65536 MiB | Available |
    +------------------+-----------------+---------------------+----------------------+

    +---------------------------------------------------------------------------------+
    | Process: |
    | GPU PID Process Name GPU Memory |
    | Usage(MiB) |
    |=================================================================================|
    | 0 54852 VLLM::Worker_TP 58496 |
    | 1 54853 VLLM::Worker_TP 58496 |
    | 2 54854 VLLM::Worker_TP 58496 |
    | 3 54855 VLLM::Worker_TP 58496 |
    +---------------------------------------------------------------------------------+

    6.docker info回显:
    Client:
    Version: 24.0.9
    Context: default
    Debug Mode: false

    Server:
    Containers: 123
    Running: 63
    Paused: 0
    Stopped: 60
    Images: 137
    Server Version: 24.0.9
    Storage Driver: overlay2
    Backing Filesystem: extfs
    Supports d_type: true
    Using metacopy: false
    Native Overlay Diff: true
    userxattr: false
    Logging Driver: json-file
    Cgroup Driver: systemd
    Cgroup Version: 2
    Plugins:
    Volume: local
    Network: bridge host ipvlan macvlan null overlay
    Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
    Swarm: inactive
    Runtimes: metax runc
    Default Runtime: metax
    Init Binary: docker-init
    containerd version: 7c3aca7a610df76212171d200ca3811ff6096eb8
    runc version: v1.1.12-0-g51d5e94
    init version: de40ad0
    Security Options:
    apparmor
    seccomp
    Profile: builtin
    cgroupns
    Kernel Version: 5.19.0-46-generic
    Operating System: Ubuntu 22.04.5 LTS
    OSType: linux
    Architecture: x86_64
    CPUs: 160
    Total Memory: 1.968TiB
    Name: chaoxun
    ID: d55e8de3-5871-4a10-814a-87f1c884bf87
    Docker Root Dir: /mnt/disk0/sagesuite/data/docker
    Debug Mode: false
    Experimental: false
    Insecure Registries:
    127.0.0.0/8
    Registry Mirrors:
    docker.mirrors.sjtug.sjtu.edu.cn/
    docker.m.daocloud.io/
    docker.1panel.live/
    Live Restore Enabled: false
    Product License: Community Engine

    7.镜像版本:cr.metax-tech.com/public-ai-release/maca/modelzoo.llm.vllm:1.0.0-maca.ai3.3.0.11-torch2.6-py312-ubuntu22.04-amd64
    8.启动容器命令:
    command:
    - /opt/conda/bin/vllm
    - serve
    # 3. 模型路径/名称:改为你要运行的Qwen3-VL-30B-A3B模型
    - /models # 假设模型文件存放在宿主机的这个子目录下
    - --port
    - "8000"
    # 4. 张量并行:根据你的显卡数量和模型需求调整。MoE模型建议开启专家并行。
    - --tensor-parallel-size
    - "4" # 假设仍然使用4张卡
    # 5. 【关键】MoE模型必须设置:如果tensor-parallel-size无法被专家数整除,必须启用此选项
    - --enable-expert-parallel
    # 6. 数据类型:官方推荐使用bfloat16以获得最佳性能
    - --dtype
    - "bfloat16" # 从float16改为bfloat16
    # 7. 模型长度:可根据需求调整,例如设为32768或更高
    - --max-model-len
    - "16384" # 提升上下文长度以发挥模型潜力 [citation:10]
    # 8. 显存利用率:可以保持0.8-0.9,具体看显存大小
    - --gpu-memory-utilization
    - "0.9"
    - --use-v1-engine
    - "false"
    # 9. 服务名称:可自定义
    - --served-model-name
    - qwen3-vl-30b-a3b
    # 10. 多模态相关参数:根据实际硬件情况决定是否保留 --enforce-eager
    # 如果使用FlashAttention,可以尝试去掉 --enforce-eager
    - --enforce-eager # 如果不支持flash-attn或遇到问题,保留此项
    - --no-enable-chunked-prefill
    - --no-enable-prefix-caching
    - --block-size
    - "16"
    - --disable-log-requests
    - --disable-log-stats
    - --disable-custom-all-reduce
    - --disable-custom-all-reduce
    - --max-num-seqs
    - "1" # 强制串行处理
    9.容器内执行命令:
    二、问题现象
    沐曦C500部署qwen3-vl-30b-a3b,第一次可以识别,第二次识别图片时报错。

  • See post chevron_right
    lgp001
    Members
    沐曦C500部署qwen3-vl-30b-a3b,第二次识别图片时报错 已解决 2026年2月24日 14:39

    附件一为相关日志
    附件二为部署的k8s的yaml文件

  • 沐曦开发者论坛
powered by misago