MetaX-Tech Developer Forum 论坛首页
  • 沐曦开发者
search
Sign in

aaron

  • Members
  • Joined 2026年2月4日
  • message 帖子
  • forum 主题
  • favorite 关注者
  • favorite_border Follows
  • person_outline 详细信息

aaron has started 2 threads.

  • See post chevron_right
    aaron
    Members
    mcProfiler使用问题 已解决 2026年3月13日 15:33

    mcProfiler需要对被度量的目标程序,链接 libmcToolsExt.so。被度量的代码中需要插桩吗?
    我执行exec perf后error messages显示“execute failed:Several errors occurred please examine the associated log files /root/mxlog/umd/umd.1906638.*log to identify the root cause.”。但是在linux /root/mxlog/umd目录下未找到对应文件。“Execute Loop 0”也打印输出了,如何定位错误原因。

  • See post chevron_right
    aaron
    Members
    L2 cache缓存失效/强制HBM读取方法 已解决 2026年2月27日 09:03

    一、软硬件信息
    1.服务器厂家: 浪潮
    2.沐曦GPU型号: C500
    3.操作系统内核版本:6.6.71
    4.是否开启CPU虚拟化: 开启
    5.mx-smi回显:
    mx-smi version: 2.2.4

    =================== MetaX System Management Interface Log ===================
    Timestamp : Fri Feb 27 08:59:20 2026

    Attached GPUs : 4
    +---------------------------------------------------------------------------------+
    | MX-SMI 2.2.4 Kernel Mode Driver Version: 2.12.0 |
    | MACA Version: 2.33.0.12 BIOS Version: 1.18.2.0* |
    |------------------------------------+---------------------+----------------------+
    | GPU NAME | Bus-id | GPU-Util |
    | Temp Pwr:Usage/Cap | Memory-Usage | GPU-State |
    |====================================+=====================+======================|
    | 0 MetaX C500 | 0000:85:00.0 | 0% |
    | 35C 30W / 350W | 858/65536 MiB | Available |
    +------------------------------------+---------------------+----------------------+
    | 1 MetaX C500 | 0000:b1:00.0 | 0% |
    | 35C 31W / NA | 858/65536 MiB | Available |
    +------------------------------------+---------------------+----------------------+
    | 2 MetaX C500 | 0000:c7:00.0 | 0% |
    | 36C 30W / 350W | 858/65536 MiB | Available |
    +------------------------------------+---------------------+----------------------+
    | 3 MetaX C500 | 0000:dd:00.0 | 0% |
    | 33C 28W / NA | 858/65536 MiB | Available |
    +------------------------------------+---------------------+----------------------+

    +---------------------------------------------------------------------------------+
    | Process: |
    | GPU PID Process Name GPU Memory |
    | Usage(MiB) |
    |=================================================================================|
    | no process found |
    +---------------------------------------------------------------------------------+

    End of Log
    二、问题现象
    P2P场景中,GPU 0核函数对GPU 1上HBM内存通过指针写入,GPU 1轮询读取该地址判断是否收到数据。经测试发现存在两个问题:GPU 0数据保存在其L2 cache中,未推入GPU 1 HBM;GPU 1轮询旧数据保存其L2 cache中,无法感知其HBM数据和更新。请问是否有缓存失效手段?

  • 沐曦开发者论坛
powered by misago