请问有参考文档讲解mcProfiler抓取的性能指标的含义吗
请问有参考文档讲解mcProfiler抓取的性能指标的含义吗
官网win-perf-kit有多个版本,请问是否有该工具与mx驱动固件等对应关系表格
一、软硬件信息
1.服务器厂家: 浪潮
2.沐曦GPU型号: C500
3.操作系统内核版本:6.6.71-3.0.7.kos5.x86_64
4.是否开启CPU虚拟化:是
5.mx-smi回显:
mx-smi version: 2.2.1
=================== MetaX System Management Interface Log ===================
Timestamp : Fri Mar 13 15:41:01 2026
Attached GPUs : 8
+---------------------------------------------------------------------------------+
| MX-SMI 2.2.1 Kernel Mode Driver Version: 2.12.0 |
| MACA Version: 2.33.0.12 BIOS Version: 1.30.0.0 |
|------------------------------------+---------------------+----------------------+
| GPU NAME | Bus-id | GPU-Util |
| Temp Pwr:Usage/Cap | Memory-Usage | |
|====================================+=====================+======================|
| 0 MetaX C500 | 0000:0d:00.0 | 0% |
| 36C 29W / 350W | 858/65536 MiB | |
+------------------------------------+---------------------+----------------------+
| 1 MetaX C500 | 0000:37:00.0 | 0% |
| 34C 28W / 350W | 858/65536 MiB | |
+------------------------------------+---------------------+----------------------+
| 2 MetaX C500 | 0000:4c:00.0 | 0% |
| 35C 30W / 350W | 858/65536 MiB | |
+------------------------------------+---------------------+----------------------+
| 3 MetaX C500 | 0000:61:00.0 | 0% |
| 36C 31W / 350W | 858/65536 MiB | |
+------------------------------------+---------------------+----------------------+
| 4 MetaX C500 | 0000:85:00.0 | 0% |
| 34C 29W / 350W | 858/65536 MiB | |
+------------------------------------+---------------------+----------------------+
| 5 MetaX C500 | 0000:b1:00.0 | 0% |
| 35C 31W / 350W | 858/65536 MiB | |
+------------------------------------+---------------------+----------------------+
| 6 MetaX C500 | 0000:c7:00.0 | 0% |
| 34C 30W / 350W | 858/65536 MiB | |
+------------------------------------+---------------------+----------------------+
| 7 MetaX C500 | 0000:dd:00.0 | 0% |
| 35C 30W / 350W | 858/65536 MiB | |
+------------------------------------+---------------------+----------------------+
+---------------------------------------------------------------------------------+
| Process: |
| GPU PID Process Name GPU Memory |
| Usage(MiB) |
|=================================================================================|
| no process found |
+---------------------------------------------------------------------------------+
End of Log
二、 问题现象
如图执行exec perf,程序仅获取部分属性。服务器中无法找到/root/mxlog/umd/umd/xxx文件
mcProfiler需要对被度量的目标程序,链接 libmcToolsExt.so。被度量的代码中需要插桩吗?
我执行exec perf后error messages显示“execute failed:Several errors occurred please examine the associated log files /root/mxlog/umd/umd.1906638.*log to identify the root cause.”。但是在linux /root/mxlog/umd目录下未找到对应文件。“Execute Loop 0”也打印输出了,如何定位错误原因。
另请教__threadfence_system()的使用场景具体是什么?该场景下好像没有生效
你好,使用新版本驱动后问题依然存在。
mx-smi version: 2.1.10
=================== MetaX System Management Interface Log ===================
Timestamp : Fri Mar 6 13:40:47 2026
Attached GPUs : 8
+---------------------------------------------------------------------------------+
| MX-SMI 2.1.10 Kernel Mode Driver Version: 3.3.12 |
| MACA Version: 3.2.1.10 BIOS Version: 1.20.3.0 |
|------------------------------------+---------------------+----------------------+
| GPU NAME | Bus-id | GPU-Util |
| Temp Pwr:Usage/Cap | Memory-Usage | |
|====================================+=====================+======================|
| 0 MetaX C550 | 0000:0f:00.0 | 0% |
| 35C 96W / 450W | 858/65536 MiB | |
+------------------------------------+---------------------+----------------------+
| 1 MetaX C550 | 0000:34:00.0 | 0% |
| 38C 95W / 450W | 858/65536 MiB | |
+------------------------------------+---------------------+----------------------+
| 2 MetaX C550 | 0000:48:00.0 | 0% |
| 38C 96W / 450W | 858/65536 MiB | |
+------------------------------------+---------------------+----------------------+
| 3 MetaX C550 | 0000:5a:00.0 | 0% |
| 37C 97W / 450W | 858/65536 MiB | |
+------------------------------------+---------------------+----------------------+
| 4 MetaX C550 | 0000:87:00.0 | 0% |
| 35C 93W / 450W | 858/65536 MiB | |
+------------------------------------+---------------------+----------------------+
| 5 MetaX C550 | 0000:ae:00.0 | 0% |
| 39C 96W / 450W | 858/65536 MiB | |
+------------------------------------+---------------------+----------------------+
| 6 MetaX C550 | 0000:c2:00.0 | 0% |
| 39C 95W / 450W | 858/65536 MiB | |
+------------------------------------+---------------------+----------------------+
| 7 MetaX C550 | 0000:d7:00.0 | 0% |
| 38C 99W / 450W | 858/65536 MiB | |
+------------------------------------+---------------------+----------------------+
+---------------------------------------------------------------------------------+
| Process: |
| GPU PID Process Name GPU Memory |
| Usage(MiB) |
|=================================================================================|
| no process found |
+---------------------------------------------------------------------------------+
两个GPU上核函数伪代码如下
global void setV(float *ptr)
{
float val = 3.3f;
int r = 10;
float *peer_ptr = ptr;
if (idx == 0) {
store_with_flush<float>(peer_ptr, val);
}
asm volatile("wb_l2\n");
asm volatile("arrive 0\n");
__threadfence_system();
if (idx == 0)
printf("in GPU setV %.3f %.3f\n", val, load_uncached<float>(peer_ptr));
while (r-- > 0 && idx == 0) {
__nanosleep(1000000000);
}
}
global void printfV(float *ptr)
{
int r = 10;
while (true && idx == 0 && r-- > 0) {
asm volatile("wb_l2\n");
asm volatile("arrive 0\n");
__threadfence_system();
printf("in GPU printf %.3f\n", load_uncached<float>(ptr));
__nanosleep(1000000000);
}
printf("current threadIdx %d\n", idx);
}
一、软硬件信息
1.服务器厂家: 浪潮
2.沐曦GPU型号: C500
3.操作系统内核版本:6.6.71
4.是否开启CPU虚拟化: 开启
5.mx-smi回显:
mx-smi version: 2.2.4
=================== MetaX System Management Interface Log ===================
Timestamp : Fri Feb 27 08:59:20 2026
Attached GPUs : 4
+---------------------------------------------------------------------------------+
| MX-SMI 2.2.4 Kernel Mode Driver Version: 2.12.0 |
| MACA Version: 2.33.0.12 BIOS Version: 1.18.2.0* |
|------------------------------------+---------------------+----------------------+
| GPU NAME | Bus-id | GPU-Util |
| Temp Pwr:Usage/Cap | Memory-Usage | GPU-State |
|====================================+=====================+======================|
| 0 MetaX C500 | 0000:85:00.0 | 0% |
| 35C 30W / 350W | 858/65536 MiB | Available |
+------------------------------------+---------------------+----------------------+
| 1 MetaX C500 | 0000:b1:00.0 | 0% |
| 35C 31W / NA | 858/65536 MiB | Available |
+------------------------------------+---------------------+----------------------+
| 2 MetaX C500 | 0000:c7:00.0 | 0% |
| 36C 30W / 350W | 858/65536 MiB | Available |
+------------------------------------+---------------------+----------------------+
| 3 MetaX C500 | 0000:dd:00.0 | 0% |
| 33C 28W / NA | 858/65536 MiB | Available |
+------------------------------------+---------------------+----------------------+
+---------------------------------------------------------------------------------+
| Process: |
| GPU PID Process Name GPU Memory |
| Usage(MiB) |
|=================================================================================|
| no process found |
+---------------------------------------------------------------------------------+
End of Log
二、问题现象
P2P场景中,GPU 0核函数对GPU 1上HBM内存通过指针写入,GPU 1轮询读取该地址判断是否收到数据。经测试发现存在两个问题:GPU 0数据保存在其L2 cache中,未推入GPU 1 HBM;GPU 1轮询旧数据保存其L2 cache中,无法感知其HBM数据和更新。请问是否有缓存失效手段?