• Members 27 posts
    2026年6月17日 18:14

    metax@metax-host-104:/opt/maca/samples/mccl_tests/perf$ bash mccl.sh 8
    The test is all_reduce_perf, the maca version is /opt/maca-3.7.1
    main_process = 7324
    metax-host-104: Test CUDA failure common.cu:1349 'initialization error'
    .. metax-host-104 pid 7325: Test failure common.cu:1271
    metax-host-104: Test CUDA failure common.cu:1349 'initialization error'
    .. metax-host-104 pid 7326: Test failure common.cu:1271
    metax-host-104: Test CUDA failure common.cu:1349 'initialization error'
    .. metax-host-104 pid 7327: Test failure common.cu:1271
    metax-host-104: Test CUDA failure common.cu:1349 'initialization error'
    .. metax-host-104 pid 7328: Test failure common.cu:1271
    metax-host-104: Test CUDA failure common.cu:1349 'initialization error'
    .. metax-host-104 pid 7329: Test failure common.cu:1271
    metax-host-104: Test CUDA failure common.cu:1349 'initialization error'
    .. metax-host-104 pid 7330: Test failure common.cu:1271
    metax-host-104: Test CUDA failure common.cu:1349 'initialization error'
    .. metax-host-104 pid 7331: Test failure common.cu:1271
    ===============================

    nThread 1 nGpus 1 minBytes 1024 maxBytes 1073741824 step: 2(factor) warmup iters: 5 iters: 10 agg iters: 1 validation: 1 graph: 0

    Using devices

    metax-host-104: Test CUDA failure common.cu:1349 'initialization error'
    .. metax-host-104 pid 7324: Test failure common.cu:1271


    Primary job terminated normally, but 1 process returned
    a non-zero exit code. Per user-direction, the job has been aborted.



    mpirun detected that one or more processes exited with non-zero status, thus causing
    the job to be terminated. The first process to do so was:

    Process name: [[25858,1],4]
    Exit code: 2


    metax@metax-host-104:/opt/maca/samples/mccl_tests/perf$ bash mccl.sh 2
    The test is all_reduce_perf, the maca version is /opt/maca-3.7.1
    main_process = 7378
    metax-host-104: Test CUDA failure common.cu:1349 'initialization error'
    .. metax-host-104 pid 7379: Test failure common.cu:1271
    ===============================

    nThread 1 nGpus 1 minBytes 1024 maxBytes 1073741824 step: 2(factor) warmup iters: 5 iters: 10 agg iters: 1 validation: 1 graph: 0

    Using devices

    metax-host-104: Test CUDA failure common.cu:1349 'initialization error'
    .. metax-host-104 pid 7378: Test failure common.cu:1271


    Primary job terminated normally, but 1 process returned
    a non-zero exit code. Per user-direction, the job has been aborted.



    mpirun detected that one or more processes exited with non-zero status, thus causing
    the job to be terminated. The first process to do so was:

    Process name: [[25940,1],0]
    Exit code: 2


    metax@metax-host-104:/opt/maca/samples/mccl_tests/perf$

    单机mccl测试失败是怎么回事

    mx-smi回显信息正常如下
    metax@metax-host-104:/opt/maca/samples/mccl_tests/perf$ mx-smi
    mx-smi version: 2.3.1

    =================== MetaX System Management Interface Log ===================
    Timestamp : Wed Jun 17 10:13:47 2026

    Attached GPUs : 8
    +---------------------------------------------------------------------------------+
    | MX-SMI 2.3.1 Kernel Mode Driver Version: 3.8.1 |
    | MACA Version: 3.7.1.5 BIOS Version: 1.29.1.0 |
    |------------------+-----------------+---------------------+----------------------|
    | Board Name | GPU Persist-M | Bus-id | GPU-Util sGPU-M |
    | Pwr:Usage/Cap | Temp Perf | Memory-Usage | GPU-State |
    |==================+=================+=====================+======================|
    | 0 MetaX C550 | 0 Off | 0000:2b:00.0 | 0% Disabled |
    | 54W / 450W | 32C P0 | 858/65536 MiB | Available |
    +------------------+-----------------+---------------------+----------------------+
    | 1 MetaX C550 | 1 Off | 0000:3a:00.0 | 0% Disabled |
    | 56W / 450W | 33C P0 | 858/65536 MiB | Available |
    +------------------+-----------------+---------------------+----------------------+
    | 2 MetaX C550 | 2 Off | 0000:4d:00.0 | 0% Disabled |
    | 52W / 450W | 33C P0 | 858/65536 MiB | Available |
    +------------------+-----------------+---------------------+----------------------+
    | 3 MetaX C550 | 3 Off | 0000:5c:00.0 | 0% Disabled |
    | 56W / 450W | 33C P0 | 858/65536 MiB | Available |
    +------------------+-----------------+---------------------+----------------------+
    | 4 MetaX C550 | 4 Off | 0000:aa:00.0 | 0% Disabled |
    | 53W / 450W | 32C P0 | 858/65536 MiB | Available |
    +------------------+-----------------+---------------------+----------------------+
    | 5 MetaX C550 | 5 Off | 0000:ba:00.0 | 0% Disabled |
    | 52W / 450W | 33C P0 | 858/65536 MiB | Available |
    +------------------+-----------------+---------------------+----------------------+
    | 6 MetaX C550 | 6 Off | 0000:ca:00.0 | 0% Disabled |
    | 54W / 450W | 34C P0 | 858/65536 MiB | Available |
    +------------------+-----------------+---------------------+----------------------+
    | 7 MetaX C550 | 7 Off | 0000:da:00.0 | 0% Disabled |
    | 53W / 450W | 33C P0 | 858/65536 MiB | Available |
    +------------------+-----------------+---------------------+----------------------+

    +---------------------------------------------------------------------------------+
    | Process: |
    | GPU PID Process Name GPU Memory |
    | Usage(MiB) |
    |=================================================================================|
    | no process found |
    +---------------------------------------------------------------------------------+

    End of Log
    metax@metax-host-104:/opt/maca/samples/mccl_tests/perf$

  • arrow_forward

    Thread has been moved from 公共.

  • Members 571 posts
    2026年6月17日 18:21

    尊敬的开发者您好,请裸金属执行

    dmesg -T | grep -i err
    
  • Members 27 posts
    2026年6月17日 18:33

    metax@metax-host-104:/opt/maca/samples/mccl_tests/perf$ sudo dmesg -T | grep -i err
    [Wed Jun 17 09:41:06 2026] pmd_set_huge: Cannot satisfy [mem 0x80000000-0x80200000] with a huge-page mapping due to MTRR override.
    [Wed Jun 17 09:41:07 2026] ACPI: Using IOAPIC for interrupt routing
    [Wed Jun 17 09:41:08 2026] ACPI: PCI: Interrupt link LNKA configured for IRQ 15
    [Wed Jun 17 09:41:08 2026] ACPI: PCI: Interrupt link LNKA disabled
    [Wed Jun 17 09:41:08 2026] ACPI: PCI: Interrupt link LNKB configured for IRQ 15
    [Wed Jun 17 09:41:08 2026] ACPI: PCI: Interrupt link LNKB disabled
    [Wed Jun 17 09:41:08 2026] ACPI: PCI: Interrupt link LNKC configured for IRQ 15
    [Wed Jun 17 09:41:08 2026] ACPI: PCI: Interrupt link LNKC disabled
    [Wed Jun 17 09:41:08 2026] ACPI: PCI: Interrupt link LNKD configured for IRQ 15
    [Wed Jun 17 09:41:08 2026] ACPI: PCI: Interrupt link LNKD disabled
    [Wed Jun 17 09:41:08 2026] ACPI: PCI: Interrupt link LNKE configured for IRQ 15
    [Wed Jun 17 09:41:08 2026] ACPI: PCI: Interrupt link LNKE disabled
    [Wed Jun 17 09:41:08 2026] ACPI: PCI: Interrupt link LNKF configured for IRQ 15
    [Wed Jun 17 09:41:08 2026] ACPI: PCI: Interrupt link LNKF disabled
    [Wed Jun 17 09:41:08 2026] ACPI: PCI: Interrupt link LNKG configured for IRQ 15
    [Wed Jun 17 09:41:08 2026] ACPI: PCI: Interrupt link LNKG disabled
    [Wed Jun 17 09:41:08 2026] ACPI: PCI: Interrupt link LNKH configured for IRQ 15
    [Wed Jun 17 09:41:08 2026] ACPI: PCI: Interrupt link LNKH disabled
    [Wed Jun 17 09:41:08 2026] pcieport 0000:15:01.0: pciehp: Slot #20 AttnBtn+ PwrCtrl+ MRL- AttnInd+ PwrInd+ HotPlug+ Surprise- Interlock- NoCompl- IbPresDis- LLActRep+ (with Cmd Compl erratum)
    [Wed Jun 17 09:41:08 2026] ERST: Error Record Serialization Table (ERST) support is initialized.
    [Wed Jun 17 09:41:10 2026] RAS: Correctable Errors collector initialized.
    [Wed Jun 17 09:41:11 2026] i801_smbus 0000:00:1f.4: SMBus using PCI interrupt
    [Wed Jun 17 09:41:56 2026] EDAC MC0: Giving out device to module i10nm_edac controller Intel_10nm Socket#0 IMC#0: DEV 0000:7e:0c.0 (INTERRUPT)
    [Wed Jun 17 09:41:56 2026] EDAC MC1: Giving out device to module i10nm_edac controller Intel_10nm Socket#0 IMC#1: DEV 0000:7e:0d.0 (INTERRUPT)
    [Wed Jun 17 09:41:56 2026] EDAC MC2: Giving out device to module i10nm_edac controller Intel_10nm Socket#0 IMC#2: DEV 0000:7e:0e.0 (INTERRUPT)
    [Wed Jun 17 09:41:56 2026] EDAC MC3: Giving out device to module i10nm_edac controller Intel_10nm Socket#0 IMC#3: DEV 0000:7e:0f.0 (INTERRUPT)
    [Wed Jun 17 09:41:56 2026] EDAC MC4: Giving out device to module i10nm_edac controller Intel_10nm Socket#1 IMC#0: DEV 0000:fe:0c.0 (INTERRUPT)
    [Wed Jun 17 09:41:56 2026] EDAC MC5: Giving out device to module i10nm_edac controller Intel_10nm Socket#1 IMC#1: DEV 0000:fe:0d.0 (INTERRUPT)
    [Wed Jun 17 09:41:56 2026] EDAC MC6: Giving out device to module i10nm_edac controller Intel_10nm Socket#1 IMC#2: DEV 0000:fe:0e.0 (INTERRUPT)
    [Wed Jun 17 09:41:56 2026] EDAC MC7: Giving out device to module i10nm_edac controller Intel_10nm Socket#1 IMC#3: DEV 0000:fe:0f.0 (INTERRUPT)
    [Wed Jun 17 09:41:58 2026] __ib_cache_gid_add: unable to add gid fe80:0000:0000:0000:109c:7fff:fe12:f56d error=-5
    [Wed Jun 17 09:41:58 2026] __ib_cache_gid_add: unable to add gid fe80:0000:0000:0000:109c:7fff:fe12:f56d error=-5
    [Wed Jun 17 09:41:59 2026] __ib_cache_gid_add: unable to add gid fe80:0000:0000:0000:109c:7fff:fe12:f56d error=-5
    [Wed Jun 17 09:41:59 2026] __ib_cache_gid_add: unable to add gid fe80:0000:0000:0000:109c:7fff:fe12:f56d error=-5
    [Wed Jun 17 09:42:01 2026] __ib_cache_gid_add: unable to add gid fe80:0000:0000:0000:4cbb:66ff:feb7:d4be error=-5
    [Wed Jun 17 09:42:01 2026] __ib_cache_gid_add: unable to add gid fe80:0000:0000:0000:4cbb:66ff:feb7:d4be error=-5
    [Wed Jun 17 09:42:02 2026] __ib_cache_gid_add: unable to add gid fe80:0000:0000:0000:4cbb:66ff:feb7:d4be error=-5
    [Wed Jun 17 09:42:02 2026] __ib_cache_gid_add: unable to add gid fe80:0000:0000:0000:4cbb:66ff:feb7:d4be error=-5
    [Wed Jun 17 09:42:08 2026] bnxt_en 0000:4b:00.1 ens13f1np1: Error (timeout: 3000015) msg {0x23 0xc1} len:0
    [Wed Jun 17 09:42:13 2026] bnxt_en 0000:4b:00.1 ens13f1np1: Error (timeout: 3000015) msg {0xb4 0xc2} len:0
    [Wed Jun 17 09:42:22 2026] bnxt_en 0000:4b:00.1 ens13f1np1: Error (timeout: 3000015) msg {0x27 0x199} len:0
    [Wed Jun 17 09:42:28 2026] bnxt_en 0000:4b:00.1 ens13f1np1: Error (timeout: 3000015) msg {0x27 0x19a} len:0
    [Wed Jun 17 09:42:30 2026] __ib_cache_gid_add: unable to add gid fe80:0000:0000:0000:109c:7fff:fe12:f56d error=-5
    [Wed Jun 17 09:42:30 2026] __ib_cache_gid_add: unable to add gid fe80:0000:0000:0000:109c:7fff:fe12:f56d error=-5
    [Wed Jun 17 09:42:30 2026] __ib_cache_gid_add: unable to add gid fe80:0000:0000:0000:109c:7fff:fe12:f56d error=-5
    [Wed Jun 17 09:42:30 2026] __ib_cache_gid_add: unable to add gid fe80:0000:0000:0000:109c:7fff:fe12:f56d error=-5
    [Wed Jun 17 09:42:30 2026] __ib_cache_gid_add: unable to add gid fe80:0000:0000:0000:109c:7fff:fe12:f56d error=-5
    [Wed Jun 17 09:42:30 2026] __ib_cache_gid_add: unable to add gid fe80:0000:0000:0000:109c:7fff:fe12:f56d error=-5
    [Wed Jun 17 09:42:30 2026] __ib_cache_gid_add: unable to add gid fe80:0000:0000:0000:4cbb:66ff:feb7:d4be error=-5
    [Wed Jun 17 09:42:30 2026] __ib_cache_gid_add: unable to add gid fe80:0000:0000:0000:4cbb:66ff:feb7:d4be error=-5
    [Wed Jun 17 09:42:30 2026] __ib_cache_gid_add: unable to add gid fe80:0000:0000:0000:4cbb:66ff:feb7:d4be error=-5
    [Wed Jun 17 09:42:30 2026] __ib_cache_gid_add: unable to add gid fe80:0000:0000:0000:4cbb:66ff:feb7:d4be error=-5
    [Wed Jun 17 09:42:30 2026] __ib_cache_gid_add: unable to add gid 0000:0000:0000:0000:0000:ffff:0a03:1f02 error=-5
    [Wed Jun 17 09:42:30 2026] __ib_cache_gid_add: unable to add gid 0000:0000:0000:0000:0000:ffff:0a03:1f02 error=-5
    [Wed Jun 17 09:42:37 2026] bnxt_en 0000:4b:00.1 ens13f1np1: Error (timeout: 3000015) msg {0x23 0x1b7} len:0
    [Wed Jun 17 09:42:42 2026] bnxt_en 0000:4b:00.1 ens13f1np1: Error (timeout: 3000015) msg {0x16 0x1b8} len:0
    [Wed Jun 17 09:42:51 2026] bnxt_en 0000:4b:00.1 ens13f1np1: Error (timeout: 3000015) msg {0x23 0x1c3} len:0
    [Wed Jun 17 09:42:56 2026] bnxt_en 0000:4b:00.1 ens13f1np1: Error (timeout: 3000015) msg {0xb4 0x1c4} len:0
    [Wed Jun 17 09:43:06 2026] bnxt_en 0000:4b:00.1 ens13f1np1: Error (timeout: 3000015) msg {0x23 0x1d1} len:0
    [Wed Jun 17 09:43:12 2026] bnxt_en 0000:4b:00.1 ens13f1np1: Error (timeout: 3000015) msg {0xb4 0x1d2} len:0
    [Wed Jun 17 09:43:21 2026] bnxt_en 0000:4b:00.1 ens13f1np1: Error (timeout: 3000015) msg {0x23 0x1da} len:0
    [Wed Jun 17 09:43:26 2026] bnxt_en 0000:4b:00.1 ens13f1np1: Error (timeout: 3000015) msg {0xb4 0x1db} len:0
    [Wed Jun 17 09:43:36 2026] bnxt_en 0000:4b:00.1 ens13f1np1: Error (timeout: 3000015) msg {0x23 0x1e9} len:0
    [Wed Jun 17 09:43:42 2026] bnxt_en 0000:4b:00.1 ens13f1np1: Error (timeout: 3000015) msg {0xb4 0x1ea} len:0
    [Wed Jun 17 09:43:51 2026] bnxt_en 0000:4b:00.1 ens13f1np1: Error (timeout: 3000015) msg {0x23 0x1f7} len:0
    [Wed Jun 17 09:43:56 2026] bnxt_en 0000:4b:00.1 ens13f1np1: Error (timeout: 3000015) msg {0xb4 0x1f8} len:0
    [Wed Jun 17 09:44:05 2026] bnxt_en 0000:4b:00.1 ens13f1np1: Error (timeout: 3000015) msg {0x23 0x200} len:0
    [Wed Jun 17 09:44:10 2026] bnxt_en 0000:4b:00.1 ens13f1np1: Error (timeout: 3000015) msg {0xb4 0x201} len:0
    [Wed Jun 17 09:44:20 2026] bnxt_en 0000:4b:00.1 ens13f1np1: Error (timeout: 3000015) msg {0x23 0x20e} len:0
    [Wed Jun 17 09:44:26 2026] bnxt_en 0000:4b:00.1 ens13f1np1: Error (timeout: 3000015) msg {0xb4 0x20f} len:0
    [Wed Jun 17 09:44:35 2026] bnxt_en 0000:4b:00.1 ens13f1np1: Error (timeout: 3000015) msg {0x23 0x217} len:0
    [Wed Jun 17 09:44:40 2026] bnxt_en 0000:4b:00.1 ens13f1np1: Error (timeout: 3000015) msg {0x53 0x218} len:0
    [Wed Jun 17 09:44:50 2026] bnxt_en 0000:4b:00.1 ens13f1np1: Error (timeout: 3000015) msg {0x23 0x227} len:0
    [Wed Jun 17 09:44:55 2026] bnxt_en 0000:4b:00.1 ens13f1np1: Error (timeout: 3000015) msg {0xb4 0x228} len:0
    [Wed Jun 17 09:45:05 2026] bnxt_en 0000:4b:00.1 ens13f1np1: Error (timeout: 3000015) msg {0x23 0x5e3} len:0
    [Wed Jun 17 09:45:10 2026] bnxt_en 0000:4b:00.1 ens13f1np1: Error (timeout: 3000015) msg {0xb4 0x5e4} len:0
    [Wed Jun 17 09:45:19 2026] bnxt_en 0000:4b:00.1 ens13f1np1: Error (timeout: 3000015) msg {0x23 0x5ec} len:0
    [Wed Jun 17 09:45:24 2026] bnxt_en 0000:4b:00.1 ens13f1np1: Error (timeout: 3000015) msg {0xb4 0x5ed} len:0
    [Wed Jun 17 09:45:34 2026] bnxt_en 0000:4b:00.1 ens13f1np1: Error (timeout: 3000015) msg {0x23 0x5fa} len:0
    [Wed Jun 17 09:45:40 2026] bnxt_en 0000:4b:00.1 ens13f1np1: Error (timeout: 3000015) msg {0xb4 0x5fb} len:0
    [Wed Jun 17 09:45:49 2026] bnxt_en 0000:4b:00.1 ens13f1np1: Error (timeout: 3000015) msg {0x23 0x608} len:0
    [Wed Jun 17 09:45:54 2026] bnxt_en 0000:4b:00.1 ens13f1np1: Error (timeout: 3000015) msg {0xb4 0x609} len:0
    [Wed Jun 17 09:46:03 2026] bnxt_en 0000:4b:00.1 ens13f1np1: Error (timeout: 3000015) msg {0x23 0x612} len:0
    [Wed Jun 17 09:46:08 2026] bnxt_en 0000:4b:00.1 ens13f1np1: Error (timeout: 3000015) msg {0xb4 0x613} len:0
    [Wed Jun 17 09:46:18 2026] bnxt_en 0000:4b:00.1 ens13f1np1: Error (timeout: 3000015) msg {0x23 0x620} len:0
    [Wed Jun 17 09:46:24 2026] bnxt_en 0000:4b:00.1 ens13f1np1: Error (timeout: 3000015) msg {0xb4 0x621} len:0
    [Wed Jun 17 09:46:33 2026] bnxt_en 0000:4b:00.1 ens13f1np1: Error (timeout: 3000015) msg {0x23 0x629} len:0
    [Wed Jun 17 09:46:38 2026] bnxt_en 0000:4b:00.1 ens13f1np1: Error (timeout: 3000015) msg {0xb4 0x62a} len:0
    [Wed Jun 17 09:46:48 2026] bnxt_en 0000:4b:00.1 ens13f1np1: Error (timeout: 3000015) msg {0x23 0x637} len:0
    [Wed Jun 17 09:46:54 2026] bnxt_en 0000:4b:00.1 ens13f1np1: Error (timeout: 3000015) msg {0xb4 0x638} len:0
    [Wed Jun 17 09:47:02 2026] bnxt_en 0000:4b:00.1 ens13f1np1: Error (timeout: 3000015) msg {0x23 0x640} len:0
    [Wed Jun 17 09:47:08 2026] bnxt_en 0000:4b:00.1 ens13f1np1: Error (timeout: 3000015) msg {0x53 0x641} len:0
    [Wed Jun 17 09:47:18 2026] bnxt_en 0000:4b:00.1 ens13f1np1: Error (timeout: 3000015) msg {0x23 0x650} len:0
    [Wed Jun 17 09:47:23 2026] bnxt_en 0000:4b:00.1 ens13f1np1: Error (timeout: 3000015) msg {0xb4 0x651} len:0
    [Wed Jun 17 09:47:32 2026] bnxt_en 0000:4b:00.1 ens13f1np1: Error (timeout: 3000015) msg {0x23 0x65e} len:0
    [Wed Jun 17 09:47:38 2026] bnxt_en 0000:4b:00.1 ens13f1np1: Error (timeout: 3000015) msg {0xb4 0x65f} len:0
    [Wed Jun 17 09:47:46 2026] bnxt_en 0000:4b:00.1 ens13f1np1: Error (timeout: 3000015) msg {0x23 0x667} len:0
    [Wed Jun 17 09:47:52 2026] bnxt_en 0000:4b:00.1 ens13f1np1: Error (timeout: 3000015) msg {0xb4 0x668} len:0
    [Wed Jun 17 09:48:02 2026] bnxt_en 0000:4b:00.1 ens13f1np1: Error (timeout: 3000015) msg {0x23 0x675} len:0
    [Wed Jun 17 09:48:07 2026] bnxt_en 0000:4b:00.1 ens13f1np1: Error (timeout: 3000015) msg {0xb4 0x676} len:0
    [Wed Jun 17 09:48:10 2026] __ib_cache_gid_add: unable to add gid fe80:0000:0000:0000:7ce6:d3ff:febc:45c9 error=-5
    [Wed Jun 17 09:48:10 2026] __ib_cache_gid_add: unable to add gid fe80:0000:0000:0000:7ce6:d3ff:febc:45c9 error=-5
    [Wed Jun 17 09:48:10 2026] __ib_cache_gid_add: unable to add gid fe80:0000:0000:0000:7ce6:d3ff:febc:45c9 error=-5
    [Wed Jun 17 09:48:10 2026] __ib_cache_gid_add: unable to add gid fe80:0000:0000:0000:7ce6:d3ff:febc:45c9 error=-5
    [Wed Jun 17 09:48:10 2026] __ib_cache_gid_add: unable to add gid 0000:0000:0000:0000:0000:ffff:0a03:1d02 error=-5
    [Wed Jun 17 09:48:10 2026] __ib_cache_gid_add: unable to add gid 0000:0000:0000:0000:0000:ffff:0a03:1d02 error=-5
    [Wed Jun 17 09:48:10 2026] __ib_cache_gid_add: unable to add gid fe80:0000:0000:0000:7ce6:d3ff:febc:45c9 error=-5
    [Wed Jun 17 09:48:10 2026] __ib_cache_gid_add: unable to add gid fe80:0000:0000:0000:7ce6:d3ff:febc:45c9 error=-5
    [Wed Jun 17 09:48:11 2026] __ib_cache_gid_add: unable to add gid fe80:0000:0000:0000:200d:68ff:fe64:0385 error=-5
    [Wed Jun 17 09:48:11 2026] __ib_cache_gid_add: unable to add gid fe80:0000:0000:0000:200d:68ff:fe64:0385 error=-5
    [Wed Jun 17 09:48:12 2026] __ib_cache_gid_add: unable to add gid fe80:0000:0000:0000:200d:68ff:fe64:0385 error=-5
    [Wed Jun 17 09:48:12 2026] __ib_cache_gid_add: unable to add gid fe80:0000:0000:0000:200d:68ff:fe64:0385 error=-5
    [Wed Jun 17 09:48:12 2026] __ib_cache_gid_add: unable to add gid 0000:0000:0000:0000:0000:ffff:0a03:1c02 error=-5
    [Wed Jun 17 09:48:12 2026] __ib_cache_gid_add: unable to add gid 0000:0000:0000:0000:0000:ffff:0a03:1c02 error=-5
    [Wed Jun 17 09:48:12 2026] __ib_cache_gid_add: unable to add gid fe80:0000:0000:0000:200d:68ff:fe64:0385 error=-5
    [Wed Jun 17 09:48:12 2026] __ib_cache_gid_add: unable to add gid fe80:0000:0000:0000:200d:68ff:fe64:0385 error=-5
    [Wed Jun 17 09:48:16 2026] bnxt_en 0000:4b:00.1 ens13f1np1: Error (timeout: 3000015) msg {0x23 0x688} len:0
    [Wed Jun 17 09:48:17 2026] bnxt_en 0000:4b:00.0 ens13f0np0: Error (timeout: 3000015) msg {0x24 0x271} len:0
    [Wed Jun 17 09:48:18 2026] bnxt_en 0000:29:00.1 ens11f1np1: Error (timeout: 3000015) msg {0x23 0x2b5} len:0
    [Wed Jun 17 09:48:18 2026] bnxt_en 0000:29:00.0 ens11f0np0: Error (timeout: 3000015) msg {0x24 0x2ff} len:0

  • Members 571 posts
    2026年6月18日 10:26

    尊敬的开发者您好,请在裸金属执行

    cd /opt/maca/bin
    ./mxvs ops
    ./mxvs pcie benchmark unidirection --src-devices all --dst-devices all
    ./mxvs pcie benchmark bidirection --devices all
    ./mxvs memory benchmark
    ./mxvs metaxlink benchmark
    ./mxvs p2p --src-devices all --dst-devices all