一、软硬件信息
1.服务器厂家:H3C UniServer R5300 G6
2.沐曦GPU型号:C500
3.操作系统内核版本:
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=22.04
DISTRIB_CODENAME=jammy
DISTRIB_DESCRIPTION="Ubuntu 22.04.5 LTS"
PRETTY_NAME="Ubuntu 22.04.5 LTS"
NAME="Ubuntu"
VERSION_ID="22.04"
VERSION="22.04.5 LTS (Jammy Jellyfish)"
VERSION_CODENAME=jammy
ID=ubuntu
ID_LIKE=debian
HOME_URL="www.ubuntu.com/"
SUPPORT_URL="help.ubuntu.com/"
BUG_REPORT_URL="bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="www.ubuntu.com/legal/terms-and-policies/privacy-policy"
UBUNTU_CODENAME=jammy
distribution_version=v0.3.205
firmware_version=v0.3.132
driver_version=v0.3.165
4.是否开启CPU虚拟化:
5.mx-smi回显:
mx-smi version: 2.3.1
=================== MetaX System Management Interface Log ===================
Timestamp : Wed May 13 10:06:54 2026
Attached GPUs : 8
+---------------------------------------------------------------------------------+
| MX-SMI 2.3.1 Kernel Mode Driver Version: 3.8.23 |
| MACA Version: 3.7.0.38 BIOS Version: 1.33.4.0 |
|------------------+-----------------+---------------------+----------------------|
| Board Name | GPU Persist-M | Bus-id | GPU-Util sGPU-M |
| Pwr:Usage/Cap | Temp Perf | Memory-Usage | GPU-State |
|==================+=================+=====================+======================|
| 0 MetaX C500 | 0 Off | 0000:08:00.0 | 0% Disabled |
| 36W / 350W | 36C P0 | 858/65536 MiB | Available |
+------------------+-----------------+---------------------+----------------------+
| 1 MetaX C500 | 1 Off | 0000:09:00.0 | 0% Disabled |
| 39W / 350W | 38C P0 | 858/65536 MiB | Available |
+------------------+-----------------+---------------------+----------------------+
| 2 MetaX C500 | 2 Off | 0000:0e:00.0 | 0% Disabled |
| 44W / 350W | 38C P0 | 858/65536 MiB | Available |
+------------------+-----------------+---------------------+----------------------+
| 3 MetaX C500 | 3 Off | 0000:11:00.0 | 0% Disabled |
| 42W / 350W | 38C P0 | 858/65536 MiB | Available |
+------------------+-----------------+---------------------+----------------------+
| 4 MetaX C500 | 4 Off | 0000:32:00.0 | 0% Disabled |
| 38W / 350W | 37C P0 | 858/65536 MiB | Available |
+------------------+-----------------+---------------------+----------------------+
| 5 MetaX C500 | 5 Off | 0000:38:00.0 | 0% Disabled |
| 38W / 350W | 37C P0 | 858/65536 MiB | Available |
+------------------+-----------------+---------------------+----------------------+
| 6 MetaX C500 | 6 Off | 0000:3b:00.0 | 0% Disabled |
| 41W / 350W | 39C P0 | 858/65536 MiB | Available |
+------------------+-----------------+---------------------+----------------------+
| 7 MetaX C500 | 7 Off | 0000:3c:00.0 | 0% Disabled |
| 41W / 350W | 38C P0 | 858/65536 MiB | Available |
+------------------+-----------------+---------------------+----------------------+
+---------------------------------------------------------------------------------+
| Process: |
| GPU PID Process Name GPU Memory |
| Usage(MiB) |
|=================================================================================|
| no process found |
+---------------------------------------------------------------------------------+
End of Log
二、问题现象
运行bash cluster.sh "10.5.1.44:1,10.5.1.45:1" 2 reduce_perf有如下文档回显。脚本也贴在附件中