MetaX-Tech Developer Forum 论坛首页
  • 沐曦开发者
search
Sign in

Tao_01

  • Members
  • Joined 2026年4月22日
  • message 帖子
  • forum 主题
  • favorite 关注者
  • favorite_border Follows
  • person_outline 详细信息

Tao_01 has posted 8 messages.

  • See post chevron_right
    Tao_01
    Members
    N260单卡Qwen3.5-9B推理性能问题 已解决 2026年4月24日 19:01

    软硬件信息
    1.服务器厂家:AI一体机
    2.沐曦GPU型号:N260
    3.操作系统内核版本:Kylin Linux Advanced Server V10 (Halberd) 4.19.90-89.11.v2401.ky10.x86_64
    4.是否开启CPU虚拟化:否
    5.mx-smi回显:
    mx-smi version: 2.2.12

    =================== MetaX System Management Interface Log ===================
    Timestamp : Fri Apr 24 18:19:35 2026

    Attached GPUs : 1
    +---------------------------------------------------------------------------------+
    | MX-SMI 2.2.12 Kernel Mode Driver Version: 3.7.11 |
    | MACA Version: unknown BIOS Version: 1.25.0.0 |
    |------------------+-----------------+---------------------+----------------------|
    | Board Name | GPU Persist-M | Bus-id | GPU-Util sGPU-M |
    | Pwr:Usage/Cap | Temp Perf | Memory-Usage | GPU-State |
    |==================+=================+=====================+======================|
    | 0 MetaX N260 | 0 N/A | 0000:06:00.0 | 0% Disabled |
    | NA / NA | 66C N/A | 666/65536 MiB | Available |
    +------------------+-----------------+---------------------+----------------------+

    +---------------------------------------------------------------------------------+
    | Process: |
    | GPU PID Process Name GPU Memory |
    | Usage(MiB) |
    |=================================================================================|
    | no process found |
    +---------------------------------------------------------------------------------+

    6.docker info回显:
    [root@localhost ~]# docker info
    Client:
    Version: 27.0.3
    Context: default
    Debug Mode: false
    Plugins:
    buildx: Docker Buildx (Docker Inc.)
    Version: v0.33.0
    Path: /usr/libexec/docker/cli-plugins/docker-buildx
    compose: Docker Compose (Docker Inc.)
    Version: v5.1.3
    Path: /usr/libexec/docker/cli-plugins/docker-compose

    Server:
    Containers: 2
    Running: 2
    Paused: 0
    Stopped: 0
    Images: 2
    Server Version: 27.0.3
    Storage Driver: overlay2
    Backing Filesystem: xfs
    Supports d_type: true
    Using metacopy: false
    Native Overlay Diff: true
    userxattr: false
    Logging Driver: json-file
    Cgroup Driver: cgroupfs
    Cgroup Version: 1
    Plugins:
    Volume: local
    Network: bridge host ipvlan macvlan null overlay
    Log: awslogs fluentd gcplogs gelf journald json-file local splunk syslog
    Swarm: inactive
    Runtimes: io.containerd.runc.v2 runc
    Default Runtime: runc
    Init Binary: docker-init
    containerd version: ae71819c4f5e67bb4d5ae76a6b735f29cc25774e
    runc version:
    init version: de40ad0
    Security Options:
    seccomp
    Profile: builtin
    Kernel Version: 4.19.90-89.11.v2401.ky10.x86_64
    Operating System: Kylin Linux Advanced Server V10 (Halberd)
    OSType: linux
    Architecture: x86_64
    CPUs: 16
    Total Memory: 124.6GiB
    Name: localhost.localdomain
    ID: 60d2904c-d86a-4d04-8844-a309991ee526
    Docker Root Dir: /data/docker
    Debug Mode: false
    Experimental: false
    Insecure Registries:
    127.0.0.0/8
    Registry Mirrors:
    docker.m.daocloud.io/
    Live Restore Enabled: false
    Product License: Community Engin

    7.镜像版本:cr.metax-tech.com/public-ai-release/maca/vllm-metax:0.17.0-maca.ai3.5.3.307-torch2.8-py312-ubuntu22.04-amd64
    8.启动容器命令:见附件qwen3.5-9b.yaml
    9.容器内执行命令:见附件qwen3.5-9b.yaml
    二、问题现象
    我换了一台机器测试还是2 token/s

  • See post chevron_right
    Tao_01
    Members
    N260单卡Qwen3.5-9B推理性能问题 已解决 2026年4月24日 09:22

    没什么变化

  • See post chevron_right
    Tao_01
    Members
    N260单卡Qwen3.5-9B推理性能问题 已解决 2026年4月23日 17:53

    [root@localhost ~]# dmesg -T | grep -i err
    [六 4月 18 13:36:39 2026] ACPI: IRQ0 used by override.
    [六 4月 18 13:36:39 2026] ACPI: IRQ9 used by override.
    [六 4月 18 13:36:39 2026] ACPI: Using IOAPIC for interrupt routing
    [六 4月 18 13:36:39 2026] ACPI: PCI Interrupt Link [LNKA] (IRQs 4 5 7 10 11 14 15) 0
    [六 4月 18 13:36:39 2026] ACPI: PCI Interrupt Link [LNKB] (IRQs 4 5 7 10 11 14 15)
    0
    [六 4月 18 13:36:39 2026] ACPI: PCI Interrupt Link [LNKC] (IRQs 4 5 7 10 11 14 15) 0
    [六 4月 18 13:36:39 2026] ACPI: PCI Interrupt Link [LNKD] (IRQs 4 5 7 10 11 14 15)
    0
    [六 4月 18 13:36:39 2026] ACPI: PCI Interrupt Link [LNKE] (IRQs 4 5 7 10 11 14 15) 0
    [六 4月 18 13:36:39 2026] ACPI: PCI Interrupt Link [LNKF] (IRQs 4 5 7 10 11 14 15)
    0
    [六 4月 18 13:36:39 2026] ACPI: PCI Interrupt Link [LNKG] (IRQs 4 5 7 10 11 14 15) 0
    [六 4月 18 13:36:39 2026] ACPI: PCI Interrupt Link [LNKH] (IRQs 4 5 7 10 11 14 15)
    0
    [六 4月 18 13:36:39 2026] ACPI: IRQ 10 override to edge, high
    [六 4月 18 13:36:39 2026] ACPI: IRQ 3 override to edge, high
    [六 4月 18 13:36:39 2026] ACPI: IRQ 4 override to edge, high
    [六 4月 18 13:36:40 2026] AMD-Vi: Interrupt remapping enabled
    [六 4月 18 13:36:40 2026] RAS: Correctable Errors collector initialized.
    [六 4月 18 13:36:41 2026] igb 0000:03:00.0: Using MSI-X interrupts. 4 rx queue(s), 4 tx queue(s)
    [六 4月 18 13:36:41 2026] igb 0000:04:00.0: Using MSI-X interrupts. 4 rx queue(s), 4 tx queue(s)
    Either enable ECC checking or force module loading by setting 'ecc_enable_override'.
    (Note that use of the override may cause unknown side effects.)
    [六 4月 18 15:36:16 2026] hrtimer: interrupt took 601932 ns
    [六 4月 18 22:31:15 2026] perf: interrupt took too long (2503 > 2500), lowering kernel.perf_event_max_sample_rate to 79000
    [三 4月 22 14:30:36 2026] METAX.MC.ERROR failed to get user pages, -512
    [三 4月 22 14:30:36 2026] METAX.B500.D0.MC.ERROR init_user_pages failed, -512
    [三 4月 22 14:30:36 2026] MXCD.IOCTL.ERROR alloc memory failed, -512
    [三 4月 22 15:27:56 2026] METAX.MC.ERROR failed to get user pages, -512
    [三 4月 22 15:27:56 2026] METAX.B500.D0.MC.ERROR init_user_pages failed, -512
    [三 4月 22 15:27:56 2026] MXCD.IOCTL.ERROR alloc memory failed, -512
    [root@localhost ~]#

  • See post chevron_right
    Tao_01
    Members
    N260单卡Qwen3.5-9B推理性能问题 已解决 2026年4月23日 16:57

    XCORE压力测试,MetaXLink压力测试,算力测试

  • See post chevron_right
    Tao_01
    Members
    N260单卡Qwen3.5-9B推理性能问题 已解决 2026年4月23日 16:42

    要那种压测 XCORE压力测试、ETH压力测试还是其他的测试结果

  • See post chevron_right
    Tao_01
    Members
    N260单卡Qwen3.5-9B推理性能问题 已解决 2026年4月23日 16:32

    num-prompts调成100要13个小时,我先测试了下num-prompts=10

  • See post chevron_right
    Tao_01
    Members
    N260单卡Qwen3.5-9B推理性能问题 已解决 2026年4月23日 10:49

    一、软硬件信息
    1.服务器厂家:AI一体机
    2.沐曦GPU型号:N260
    3.操作系统内核版本:Kylin Linux Advanced Server V10 (Halberd) 4.19.90-89.11.v2401.ky10.x86_64
    4.是否开启CPU虚拟化:否
    5.mx-smi回显:
    mx-smi version: 2.2.12

    =================== MetaX System Management Interface Log ===================
    Timestamp : Thu Apr 23 10:31:59 2026

    Attached GPUs : 2
    +---------------------------------------------------------------------------------+
    | MX-SMI 2.2.12 Kernel Mode Driver Version: 3.6.11 |
    | MACA Version: 3.5.3.18 BIOS Version: 1.25.0.0 |
    |------------------+-----------------+---------------------+----------------------|
    | Board Name | GPU Persist-M | Bus-id | GPU-Util sGPU-M |
    | Pwr:Usage/Cap | Temp Perf | Memory-Usage | GPU-State |
    |==================+=================+=====================+======================|
    | 0 MetaX N260 | 0 N/A | 0000:05:00.0 | 0% Disabled |
    | NA / NA | 73C N/A | 46106/65536 MiB | Available |
    +------------------+-----------------+---------------------+----------------------+
    | 1 MetaX N260 | 1 N/A | 0000:06:00.0 | 0% Disabled |
    | NA / NA | 67C N/A | 666/65536 MiB | Available |
    +------------------+-----------------+---------------------+----------------------+

    +---------------------------------------------------------------------------------+
    | Process: |
    | GPU PID Process Name GPU Memory |
    | Usage(MiB) |
    |=================================================================================|
    | 0 3909052 VLLM::EngineCor 45438 |
    +---------------------------------------------------------------------------------+

    6.docker info回显:
    [root@localhost ~]# docker info
    Client:
    Version: 27.0.3
    Context: default
    Debug Mode: false
    Plugins:
    compose: Docker Compose (Docker Inc.)
    Version: v2.27.0
    Path: /usr/local/lib/docker/cli-plugins/docker-compose

    Server:
    Containers: 7
    Running: 5
    Paused: 0
    Stopped: 2
    Images: 14
    Server Version: 27.0.3
    Storage Driver: overlay2
    Backing Filesystem: xfs
    Supports d_type: true
    Using metacopy: false
    Native Overlay Diff: true
    userxattr: false
    Logging Driver: json-file
    Cgroup Driver: systemd
    Cgroup Version: 1
    Plugins:
    Volume: local
    Network: bridge host ipvlan macvlan null overlay
    Log: awslogs fluentd gcplogs gelf journald json-file local splunk syslog
    Swarm: inactive
    Runtimes: io.containerd.runc.v2 runc vastai
    Default Runtime: vastai
    Init Binary: docker-init
    containerd version: ae71819c4f5e67bb4d5ae76a6b735f29cc25774e
    runc version: v1.1.13-0-g58aa920
    init version: de40ad0
    Security Options:
    seccomp
    Profile: builtin
    Kernel Version: 4.19.90-89.11.v2401.ky10.x86_64
    Operating System: Kylin Linux Advanced Server V10 (Halberd)
    OSType: linux
    Architecture: x86_64
    CPUs: 16
    Total Memory: 61.57GiB
    Name: localhost.localdomain
    ID: 57335424-e00c-4403-acb0-05c37ae8358c
    Docker Root Dir: /var/lib/docker
    Debug Mode: false
    Experimental: false
    Insecure Registries:
    127.0.0.0/8
    Registry Mirrors:
    docker.mirrors.ustc.edu.cn/
    docker.m.daocloud.io/
    https://hub-mirror.c.163.com/
    docker.1panel.live/
    Live Restore Enabled: false
    Product License: Community Engine

    7.镜像版本:cr.metax-tech.com/public-ai-release/maca/vllm-metax:0.17.0-maca.ai3.5.3.307-torch2.8-py312-ubuntu22.04-amd64
    8.启动容器命令:见附件qwen3.5-9b.yaml
    9.容器内执行命令:见附件qwen3.5-9b.yaml
    二、问题现象
    通过vllm-benchmarks测试性能只有2 token/s,相同配置在C500上能跑出60token/s现在搞不清问题出在哪
    vllm bench serve --host 172.16.20.22 --port 8001 --dataset-name random --num-prompts 1 --random-input-len 1024 --seed 1 --random-output-len 1024 --max-concurrency 1 --served-model-name Qwen3.5-9B --save-result --result-filename ./result.json --model /data/models/Qwen3.5-9B --ignore-eos
    tip: install termplotlib and gnuplot to plot the metrics
    ============ Serving Benchmark Result ============
    Successful requests: 1
    Failed requests: 0
    Maximum request concurrency: 1
    Benchmark duration (s): 429.24
    Total input tokens: 1024
    Total generated tokens: 1024
    Request throughput (req/s): 0.00
    Output token throughput (tok/s): 2.39
    Peak output token throughput (tok/s): 3.00
    Peak concurrent requests: 1.00
    Total token throughput (tok/s): 4.77
    ---------------Time to First Token----------------
    Mean TTFT (ms): 229.92
    Median TTFT (ms): 229.92
    P99 TTFT (ms): 229.92
    -----Time per Output Token (excl. 1st token)------
    Mean TPOT (ms): 419.36
    Median TPOT (ms): 419.36
    P99 TPOT (ms): 419.36
    ---------------Inter-token Latency----------------
    Mean ITL (ms): 419.36
    Median ITL (ms): 419.36
    P99 ITL (ms): 420.21
    ==================================================

  • See post chevron_right
    Tao_01
    Members
    N260单卡Qwen3.5-9B推理性能问题 已解决 2026年4月22日 16:20

    使用cr.metax-tech.com/public-ai-release/maca/vllm-metax:0.17.0-maca.ai3.5.3.307-torch2.8-py312-ubuntu22.04-amd64镜像运行Qwen3.5-9B模型,性能测试只有2token/s,相同配置在C500上能跑出60token/s现在搞不清问题出在哪

  • 沐曦开发者论坛
powered by misago