MetaX-Tech Developer Forum 论坛首页
  • 沐曦开发者
search
Sign in

aaron

  • Members
  • Joined 2026年2月4日
  • message 帖子
  • forum 主题
  • favorite 关注者
  • favorite_border Follows
  • person_outline 详细信息

aaron has posted 9 messages.

  • See post chevron_right
    aaron
    Members
    mcProfiler使用问题 已解决 2026年3月17日 14:31

    请问有参考文档讲解mcProfiler抓取的性能指标的含义吗

  • See post chevron_right
    aaron
    Members
    mcProfiler使用问题 已解决 2026年3月16日 09:59

    官网win-perf-kit有多个版本,请问是否有该工具与mx驱动固件等对应关系表格

  • See post chevron_right
    aaron
    Members
    mcProfiler使用问题 已解决 2026年3月13日 15:48

    请问被度量代码需要插桩吗?

  • See post chevron_right
    aaron
    Members
    mcProfiler使用问题 已解决 2026年3月13日 15:45

    一、软硬件信息
    1.服务器厂家: 浪潮
    2.沐曦GPU型号: C500
    3.操作系统内核版本:6.6.71-3.0.7.kos5.x86_64
    4.是否开启CPU虚拟化:是
    5.mx-smi回显:
    mx-smi version: 2.2.1

    =================== MetaX System Management Interface Log ===================
    Timestamp : Fri Mar 13 15:41:01 2026

    Attached GPUs : 8
    +---------------------------------------------------------------------------------+
    | MX-SMI 2.2.1 Kernel Mode Driver Version: 2.12.0 |
    | MACA Version: 2.33.0.12 BIOS Version: 1.30.0.0 |
    |------------------------------------+---------------------+----------------------+
    | GPU NAME | Bus-id | GPU-Util |
    | Temp Pwr:Usage/Cap | Memory-Usage | |
    |====================================+=====================+======================|
    | 0 MetaX C500 | 0000:0d:00.0 | 0% |
    | 36C 29W / 350W | 858/65536 MiB | |
    +------------------------------------+---------------------+----------------------+
    | 1 MetaX C500 | 0000:37:00.0 | 0% |
    | 34C 28W / 350W | 858/65536 MiB | |
    +------------------------------------+---------------------+----------------------+
    | 2 MetaX C500 | 0000:4c:00.0 | 0% |
    | 35C 30W / 350W | 858/65536 MiB | |
    +------------------------------------+---------------------+----------------------+
    | 3 MetaX C500 | 0000:61:00.0 | 0% |
    | 36C 31W / 350W | 858/65536 MiB | |
    +------------------------------------+---------------------+----------------------+
    | 4 MetaX C500 | 0000:85:00.0 | 0% |
    | 34C 29W / 350W | 858/65536 MiB | |
    +------------------------------------+---------------------+----------------------+
    | 5 MetaX C500 | 0000:b1:00.0 | 0% |
    | 35C 31W / 350W | 858/65536 MiB | |
    +------------------------------------+---------------------+----------------------+
    | 6 MetaX C500 | 0000:c7:00.0 | 0% |
    | 34C 30W / 350W | 858/65536 MiB | |
    +------------------------------------+---------------------+----------------------+
    | 7 MetaX C500 | 0000:dd:00.0 | 0% |
    | 35C 30W / 350W | 858/65536 MiB | |
    +------------------------------------+---------------------+----------------------+

    +---------------------------------------------------------------------------------+
    | Process: |
    | GPU PID Process Name GPU Memory |
    | Usage(MiB) |
    |=================================================================================|
    | no process found |
    +---------------------------------------------------------------------------------+

    End of Log

    二、 问题现象
    如图执行exec perf,程序仅获取部分属性。服务器中无法找到/root/mxlog/umd/umd/xxx文件

  • See post chevron_right
    aaron
    Members
    mcProfiler使用问题 已解决 2026年3月13日 15:33

    mcProfiler需要对被度量的目标程序,链接 libmcToolsExt.so。被度量的代码中需要插桩吗?
    我执行exec perf后error messages显示“execute failed:Several errors occurred please examine the associated log files /root/mxlog/umd/umd.1906638.*log to identify the root cause.”。但是在linux /root/mxlog/umd目录下未找到对应文件。“Execute Loop 0”也打印输出了,如何定位错误原因。

  • See post chevron_right
    aaron
    Members
    L2 cache缓存失效/强制HBM读取方法 已解决 2026年3月6日 13:46

    另请教__threadfence_system()的使用场景具体是什么?该场景下好像没有生效

  • See post chevron_right
    aaron
    Members
    L2 cache缓存失效/强制HBM读取方法 已解决 2026年3月6日 13:42

    你好,使用新版本驱动后问题依然存在。
    mx-smi version: 2.1.10

    =================== MetaX System Management Interface Log ===================
    Timestamp : Fri Mar 6 13:40:47 2026

    Attached GPUs : 8
    +---------------------------------------------------------------------------------+
    | MX-SMI 2.1.10 Kernel Mode Driver Version: 3.3.12 |
    | MACA Version: 3.2.1.10 BIOS Version: 1.20.3.0 |
    |------------------------------------+---------------------+----------------------+
    | GPU NAME | Bus-id | GPU-Util |
    | Temp Pwr:Usage/Cap | Memory-Usage | |
    |====================================+=====================+======================|
    | 0 MetaX C550 | 0000:0f:00.0 | 0% |
    | 35C 96W / 450W | 858/65536 MiB | |
    +------------------------------------+---------------------+----------------------+
    | 1 MetaX C550 | 0000:34:00.0 | 0% |
    | 38C 95W / 450W | 858/65536 MiB | |
    +------------------------------------+---------------------+----------------------+
    | 2 MetaX C550 | 0000:48:00.0 | 0% |
    | 38C 96W / 450W | 858/65536 MiB | |
    +------------------------------------+---------------------+----------------------+
    | 3 MetaX C550 | 0000:5a:00.0 | 0% |
    | 37C 97W / 450W | 858/65536 MiB | |
    +------------------------------------+---------------------+----------------------+
    | 4 MetaX C550 | 0000:87:00.0 | 0% |
    | 35C 93W / 450W | 858/65536 MiB | |
    +------------------------------------+---------------------+----------------------+
    | 5 MetaX C550 | 0000:ae:00.0 | 0% |
    | 39C 96W / 450W | 858/65536 MiB | |
    +------------------------------------+---------------------+----------------------+
    | 6 MetaX C550 | 0000:c2:00.0 | 0% |
    | 39C 95W / 450W | 858/65536 MiB | |
    +------------------------------------+---------------------+----------------------+
    | 7 MetaX C550 | 0000:d7:00.0 | 0% |
    | 38C 99W / 450W | 858/65536 MiB | |
    +------------------------------------+---------------------+----------------------+

    +---------------------------------------------------------------------------------+
    | Process: |
    | GPU PID Process Name GPU Memory |
    | Usage(MiB) |
    |=================================================================================|
    | no process found |
    +---------------------------------------------------------------------------------+
    两个GPU上核函数伪代码如下
    global void setV(float *ptr)
    {
    float val = 3.3f;
    int r = 10;

    float *peer_ptr = ptr;
    
    if (idx == 0) {
        store_with_flush<float>(peer_ptr, val);
    }
    asm volatile("wb_l2\n");
    asm volatile("arrive 0\n");
    __threadfence_system();
    if (idx == 0)
        printf("in GPU setV %.3f %.3f\n", val, load_uncached<float>(peer_ptr));
    while (r-- > 0 && idx == 0) {
        __nanosleep(1000000000);
    }
    

    }

    global void printfV(float *ptr)
    {
    int r = 10;
    while (true && idx == 0 && r-- > 0) {
    asm volatile("wb_l2\n");
    asm volatile("arrive 0\n");
    __threadfence_system();
    printf("in GPU printf %.3f\n", load_uncached<float>(ptr));
    __nanosleep(1000000000);
    }
    printf("current threadIdx %d\n", idx);
    }

  • See post chevron_right
    aaron
    Members
    L2 cache缓存失效/强制HBM读取方法 已解决 2026年2月27日 09:03

    一、软硬件信息
    1.服务器厂家: 浪潮
    2.沐曦GPU型号: C500
    3.操作系统内核版本:6.6.71
    4.是否开启CPU虚拟化: 开启
    5.mx-smi回显:
    mx-smi version: 2.2.4

    =================== MetaX System Management Interface Log ===================
    Timestamp : Fri Feb 27 08:59:20 2026

    Attached GPUs : 4
    +---------------------------------------------------------------------------------+
    | MX-SMI 2.2.4 Kernel Mode Driver Version: 2.12.0 |
    | MACA Version: 2.33.0.12 BIOS Version: 1.18.2.0* |
    |------------------------------------+---------------------+----------------------+
    | GPU NAME | Bus-id | GPU-Util |
    | Temp Pwr:Usage/Cap | Memory-Usage | GPU-State |
    |====================================+=====================+======================|
    | 0 MetaX C500 | 0000:85:00.0 | 0% |
    | 35C 30W / 350W | 858/65536 MiB | Available |
    +------------------------------------+---------------------+----------------------+
    | 1 MetaX C500 | 0000:b1:00.0 | 0% |
    | 35C 31W / NA | 858/65536 MiB | Available |
    +------------------------------------+---------------------+----------------------+
    | 2 MetaX C500 | 0000:c7:00.0 | 0% |
    | 36C 30W / 350W | 858/65536 MiB | Available |
    +------------------------------------+---------------------+----------------------+
    | 3 MetaX C500 | 0000:dd:00.0 | 0% |
    | 33C 28W / NA | 858/65536 MiB | Available |
    +------------------------------------+---------------------+----------------------+

    +---------------------------------------------------------------------------------+
    | Process: |
    | GPU PID Process Name GPU Memory |
    | Usage(MiB) |
    |=================================================================================|
    | no process found |
    +---------------------------------------------------------------------------------+

    End of Log
    二、问题现象
    P2P场景中,GPU 0核函数对GPU 1上HBM内存通过指针写入,GPU 1轮询读取该地址判断是否收到数据。经测试发现存在两个问题:GPU 0数据保存在其L2 cache中,未推入GPU 1 HBM;GPU 1轮询旧数据保存其L2 cache中,无法感知其HBM数据和更新。请问是否有缓存失效手段?

  • See post chevron_right
    aaron
    Members
    求助CUDA中插入ptx的替代方法 已解决 2026年2月4日 14:13

    我遇到了相同的问题,请问该问题解决了吗

  • 沐曦开发者论坛
powered by misago