Posts | Tao_01 | 沐曦开发者论坛

See post chevron_right

Tao_01
Members

N260单卡Qwen3.5-9B推理性能问题已解决 2026年4月24日 19:01

软硬件信息
1.服务器厂家:AI一体机
2.沐曦GPU型号：N260
3.操作系统内核版本：Kylin Linux Advanced Server V10 (Halberd) 4.19.90-89.11.v2401.ky10.x86_64
4.是否开启CPU虚拟化：否
5.mx-smi回显：
mx-smi version: 2.2.12

=================== MetaX System Management Interface Log ===================
Timestamp : Fri Apr 24 18:19:35 2026

Attached GPUs : 1
+---------------------------------------------------------------------------------+
| MX-SMI 2.2.12 Kernel Mode Driver Version: 3.7.11 |
| MACA Version: unknown BIOS Version: 1.25.0.0 |
|------------------+-----------------+---------------------+----------------------|
| Board Name | GPU Persist-M | Bus-id | GPU-Util sGPU-M |
| Pwr:Usage/Cap | Temp Perf | Memory-Usage | GPU-State |
|==================+=================+=====================+======================|
| 0 MetaX N260 | 0 N/A | 0000:06:00.0 | 0% Disabled |
| NA / NA | 66C N/A | 666/65536 MiB | Available |
+------------------+-----------------+---------------------+----------------------+

+---------------------------------------------------------------------------------+
| Process: |
| GPU PID Process Name GPU Memory |
| Usage(MiB) |
|=================================================================================|
| no process found |
+---------------------------------------------------------------------------------+

6.docker info回显：
[root@localhost ~]# docker info
Client:
Version: 27.0.3
Context: default
Debug Mode: false
Plugins:
buildx: Docker Buildx (Docker Inc.)
Version: v0.33.0
Path: /usr/libexec/docker/cli-plugins/docker-buildx
compose: Docker Compose (Docker Inc.)
Version: v5.1.3
Path: /usr/libexec/docker/cli-plugins/docker-compose

Server:
Containers: 2
Running: 2
Paused: 0
Stopped: 0
Images: 2
Server Version: 27.0.3
Storage Driver: overlay2
Backing Filesystem: xfs
Supports d_type: true
Using metacopy: false
Native Overlay Diff: true
userxattr: false
Logging Driver: json-file
Cgroup Driver: cgroupfs
Cgroup Version: 1
Plugins:
Volume: local
Network: bridge host ipvlan macvlan null overlay
Log: awslogs fluentd gcplogs gelf journald json-file local splunk syslog
Swarm: inactive
Runtimes: io.containerd.runc.v2 runc
Default Runtime: runc
Init Binary: docker-init
containerd version: ae71819c4f5e67bb4d5ae76a6b735f29cc25774e
runc version:
init version: de40ad0
Security Options:
seccomp
Profile: builtin
Kernel Version: 4.19.90-89.11.v2401.ky10.x86_64
Operating System: Kylin Linux Advanced Server V10 (Halberd)
OSType: linux
Architecture: x86_64
CPUs: 16
Total Memory: 124.6GiB
Name: localhost.localdomain
ID: 60d2904c-d86a-4d04-8844-a309991ee526
Docker Root Dir: /data/docker
Debug Mode: false
Experimental: false
Insecure Registries:
127.0.0.0/8
Registry Mirrors:
docker.m.daocloud.io/
Live Restore Enabled: false
Product License: Community Engin

7.镜像版本：cr.metax-tech.com/public-ai-release/maca/vllm-metax:0.17.0-maca.ai3.5.3.307-torch2.8-py312-ubuntu22.04-amd64
8.启动容器命令：见附件qwen3.5-9b.yaml
9.容器内执行命令：见附件qwen3.5-9b.yaml
二、问题现象
我换了一台机器测试还是2 token/s
See post chevron_right

Tao_01
Members

N260单卡Qwen3.5-9B推理性能问题已解决 2026年4月24日 09:22

没什么变化
See post chevron_right

Tao_01
Members

N260单卡Qwen3.5-9B推理性能问题已解决 2026年4月23日 17:53

[root@localhost ~]# dmesg -T | grep -i err
[六 4月 18 13:36:39 2026] ACPI: IRQ0 used by override.
[六 4月 18 13:36:39 2026] ACPI: IRQ9 used by override.
[六 4月 18 13:36:39 2026] ACPI: Using IOAPIC for interrupt routing
[六 4月 18 13:36:39 2026] ACPI: PCI Interrupt Link [LNKA] (IRQs 4 5 7 10 11 14 15) 0
[六 4月 18 13:36:39 2026] ACPI: PCI Interrupt Link [LNKB] (IRQs 4 5 7 10 11 14 15) 0
[六 4月 18 13:36:39 2026] ACPI: PCI Interrupt Link [LNKC] (IRQs 4 5 7 10 11 14 15) 0
[六 4月 18 13:36:39 2026] ACPI: PCI Interrupt Link [LNKD] (IRQs 4 5 7 10 11 14 15) 0
[六 4月 18 13:36:39 2026] ACPI: PCI Interrupt Link [LNKE] (IRQs 4 5 7 10 11 14 15) 0
[六 4月 18 13:36:39 2026] ACPI: PCI Interrupt Link [LNKF] (IRQs 4 5 7 10 11 14 15) 0
[六 4月 18 13:36:39 2026] ACPI: PCI Interrupt Link [LNKG] (IRQs 4 5 7 10 11 14 15) 0
[六 4月 18 13:36:39 2026] ACPI: PCI Interrupt Link [LNKH] (IRQs 4 5 7 10 11 14 15) 0
[六 4月 18 13:36:39 2026] ACPI: IRQ 10 override to edge, high
[六 4月 18 13:36:39 2026] ACPI: IRQ 3 override to edge, high
[六 4月 18 13:36:39 2026] ACPI: IRQ 4 override to edge, high
[六 4月 18 13:36:40 2026] AMD-Vi: Interrupt remapping enabled
[六 4月 18 13:36:40 2026] RAS: Correctable Errors collector initialized.
[六 4月 18 13:36:41 2026] igb 0000:03:00.0: Using MSI-X interrupts. 4 rx queue(s), 4 tx queue(s)
[六 4月 18 13:36:41 2026] igb 0000:04:00.0: Using MSI-X interrupts. 4 rx queue(s), 4 tx queue(s)
Either enable ECC checking or force module loading by setting 'ecc_enable_override'.
(Note that use of the override may cause unknown side effects.)
[六 4月 18 15:36:16 2026] hrtimer: interrupt took 601932 ns
[六 4月 18 22:31:15 2026] perf: interrupt took too long (2503 > 2500), lowering kernel.perf_event_max_sample_rate to 79000
[三 4月 22 14:30:36 2026] METAX.MC.ERROR failed to get user pages, -512
[三 4月 22 14:30:36 2026] METAX.B500.D0.MC.ERROR init_user_pages failed, -512
[三 4月 22 14:30:36 2026] MXCD.IOCTL.ERROR alloc memory failed, -512
[三 4月 22 15:27:56 2026] METAX.MC.ERROR failed to get user pages, -512
[三 4月 22 15:27:56 2026] METAX.B500.D0.MC.ERROR init_user_pages failed, -512
[三 4月 22 15:27:56 2026] MXCD.IOCTL.ERROR alloc memory failed, -512
[root@localhost ~]#
See post chevron_right

Tao_01
Members

N260单卡Qwen3.5-9B推理性能问题已解决 2026年4月23日 16:57

XCORE压力测试，MetaXLink压力测试，算力测试
See post chevron_right

Tao_01
Members

N260单卡Qwen3.5-9B推理性能问题已解决 2026年4月23日 16:42

要那种压测 XCORE压力测试、ETH压力测试还是其他的测试结果
See post chevron_right

Tao_01
Members

N260单卡Qwen3.5-9B推理性能问题已解决 2026年4月23日 16:32

num-prompts调成100要13个小时，我先测试了下num-prompts=10
See post chevron_right

Tao_01
Members

N260单卡Qwen3.5-9B推理性能问题已解决 2026年4月23日 10:49

一、软硬件信息
1.服务器厂家:AI一体机
2.沐曦GPU型号：N260
3.操作系统内核版本：Kylin Linux Advanced Server V10 (Halberd) 4.19.90-89.11.v2401.ky10.x86_64
4.是否开启CPU虚拟化：否
5.mx-smi回显：
mx-smi version: 2.2.12

=================== MetaX System Management Interface Log ===================
Timestamp : Thu Apr 23 10:31:59 2026

Attached GPUs : 2
+---------------------------------------------------------------------------------+
| MX-SMI 2.2.12 Kernel Mode Driver Version: 3.6.11 |
| MACA Version: 3.5.3.18 BIOS Version: 1.25.0.0 |
|------------------+-----------------+---------------------+----------------------|
| Board Name | GPU Persist-M | Bus-id | GPU-Util sGPU-M |
| Pwr:Usage/Cap | Temp Perf | Memory-Usage | GPU-State |
|==================+=================+=====================+======================|
| 0 MetaX N260 | 0 N/A | 0000:05:00.0 | 0% Disabled |
| NA / NA | 73C N/A | 46106/65536 MiB | Available |
+------------------+-----------------+---------------------+----------------------+
| 1 MetaX N260 | 1 N/A | 0000:06:00.0 | 0% Disabled |
| NA / NA | 67C N/A | 666/65536 MiB | Available |
+------------------+-----------------+---------------------+----------------------+

+---------------------------------------------------------------------------------+
| Process: |
| GPU PID Process Name GPU Memory |
| Usage(MiB) |
|=================================================================================|
| 0 3909052 VLLM::EngineCor 45438 |
+---------------------------------------------------------------------------------+

6.docker info回显：
[root@localhost ~]# docker info
Client:
Version: 27.0.3
Context: default
Debug Mode: false
Plugins:
compose: Docker Compose (Docker Inc.)
Version: v2.27.0
Path: /usr/local/lib/docker/cli-plugins/docker-compose

Server:
Containers: 7
Running: 5
Paused: 0
Stopped: 2
Images: 14
Server Version: 27.0.3
Storage Driver: overlay2
Backing Filesystem: xfs
Supports d_type: true
Using metacopy: false
Native Overlay Diff: true
userxattr: false
Logging Driver: json-file
Cgroup Driver: systemd
Cgroup Version: 1
Plugins:
Volume: local
Network: bridge host ipvlan macvlan null overlay
Log: awslogs fluentd gcplogs gelf journald json-file local splunk syslog
Swarm: inactive
Runtimes: io.containerd.runc.v2 runc vastai
Default Runtime: vastai
Init Binary: docker-init
containerd version: ae71819c4f5e67bb4d5ae76a6b735f29cc25774e
runc version: v1.1.13-0-g58aa920
init version: de40ad0
Security Options:
seccomp
Profile: builtin
Kernel Version: 4.19.90-89.11.v2401.ky10.x86_64
Operating System: Kylin Linux Advanced Server V10 (Halberd)
OSType: linux
Architecture: x86_64
CPUs: 16
Total Memory: 61.57GiB
Name: localhost.localdomain
ID: 57335424-e00c-4403-acb0-05c37ae8358c
Docker Root Dir: /var/lib/docker
Debug Mode: false
Experimental: false
Insecure Registries:
127.0.0.0/8
Registry Mirrors:
docker.mirrors.ustc.edu.cn/
docker.m.daocloud.io/
https://hub-mirror.c.163.com/
docker.1panel.live/
Live Restore Enabled: false
Product License: Community Engine

7.镜像版本：cr.metax-tech.com/public-ai-release/maca/vllm-metax:0.17.0-maca.ai3.5.3.307-torch2.8-py312-ubuntu22.04-amd64
8.启动容器命令：见附件qwen3.5-9b.yaml
9.容器内执行命令：见附件qwen3.5-9b.yaml
二、问题现象
通过vllm-benchmarks测试性能只有2 token/s,相同配置在C500上能跑出60token/s现在搞不清问题出在哪
vllm bench serve --host 172.16.20.22 --port 8001 --dataset-name random --num-prompts 1 --random-input-len 1024 --seed 1 --random-output-len 1024 --max-concurrency 1 --served-model-name Qwen3.5-9B --save-result --result-filename ./result.json --model /data/models/Qwen3.5-9B --ignore-eos
tip: install termplotlib and gnuplot to plot the metrics
============ Serving Benchmark Result ============
Successful requests: 1
Failed requests: 0
Maximum request concurrency: 1
Benchmark duration (s): 429.24
Total input tokens: 1024
Total generated tokens: 1024
Request throughput (req/s): 0.00
Output token throughput (tok/s): 2.39
Peak output token throughput (tok/s): 3.00
Peak concurrent requests: 1.00
Total token throughput (tok/s): 4.77
---------------Time to First Token----------------
Mean TTFT (ms): 229.92
Median TTFT (ms): 229.92
P99 TTFT (ms): 229.92
-----Time per Output Token (excl. 1st token)------
Mean TPOT (ms): 419.36
Median TPOT (ms): 419.36
P99 TPOT (ms): 419.36
---------------Inter-token Latency----------------
Mean ITL (ms): 419.36
Median ITL (ms): 419.36
P99 ITL (ms): 420.21
==================================================
See post chevron_right

Tao_01
Members

N260单卡Qwen3.5-9B推理性能问题已解决 2026年4月22日 16:20

使用cr.metax-tech.com/public-ai-release/maca/vllm-metax:0.17.0-maca.ai3.5.3.307-torch2.8-py312-ubuntu22.04-amd64镜像运行Qwen3.5-9B模型，性能测试只有2token/s，相同配置在C500上能跑出60token/s现在搞不清问题出在哪