一、软硬件信息
1.服务器厂家: 浪潮
2.沐曦GPU型号: C500
3.操作系统内核版本:6.6.71
4.是否开启CPU虚拟化: 开启
5.mx-smi回显:
mx-smi version: 2.2.4
=================== MetaX System Management Interface Log ===================
Timestamp : Fri Feb 27 08:59:20 2026
Attached GPUs : 4
+---------------------------------------------------------------------------------+
| MX-SMI 2.2.4 Kernel Mode Driver Version: 2.12.0 |
| MACA Version: 2.33.0.12 BIOS Version: 1.18.2.0* |
|------------------------------------+---------------------+----------------------+
| GPU NAME | Bus-id | GPU-Util |
| Temp Pwr:Usage/Cap | Memory-Usage | GPU-State |
|====================================+=====================+======================|
| 0 MetaX C500 | 0000:85:00.0 | 0% |
| 35C 30W / 350W | 858/65536 MiB | Available |
+------------------------------------+---------------------+----------------------+
| 1 MetaX C500 | 0000:b1:00.0 | 0% |
| 35C 31W / NA | 858/65536 MiB | Available |
+------------------------------------+---------------------+----------------------+
| 2 MetaX C500 | 0000:c7:00.0 | 0% |
| 36C 30W / 350W | 858/65536 MiB | Available |
+------------------------------------+---------------------+----------------------+
| 3 MetaX C500 | 0000:dd:00.0 | 0% |
| 33C 28W / NA | 858/65536 MiB | Available |
+------------------------------------+---------------------+----------------------+
+---------------------------------------------------------------------------------+
| Process: |
| GPU PID Process Name GPU Memory |
| Usage(MiB) |
|=================================================================================|
| no process found |
+---------------------------------------------------------------------------------+
End of Log
二、问题现象
P2P场景中,GPU 0核函数对GPU 1上HBM内存通过指针写入,GPU 1轮询读取该地址判断是否收到数据。经测试发现存在两个问题:GPU 0数据保存在其L2 cache中,未推入GPU 1 HBM;GPU 1轮询旧数据保存其L2 cache中,无法感知其HBM数据和更新。请问是否有缓存失效手段?