MetaX-Tech Developer Forum 论坛首页
  • 沐曦开发者
search
Sign in

wanyang

  • Members
  • Joined 2026年4月11日
  • message 帖子
  • forum 主题
  • favorite 关注者
  • favorite_border Follows
  • person_outline 详细信息

wanyang has posted 5 messages.

  • See post chevron_right
    wanyang
    Members
    fla编译triton算子错误 已解决 2026年4月13日 17:27

    您看看还需要补充什么信息吗

  • See post chevron_right
    wanyang
    Members
    fla编译triton算子错误 已解决 2026年4月13日 16:44

    命令在文件中

  • See post chevron_right
    wanyang
    Members
    fla编译triton算子错误 已解决 2026年4月12日 00:39

    一、软硬件信息
    1. 服务器厂家:IEIT SYSTEMS
    2. 沐曦 GPU 型号:MetaX C500,共 8 卡,单卡显存 65536 MiB
    3. 操作系统内核版本:5.14.0-284.25.1.el9_2.x86_64
    4. 是否开启 CPU 虚拟化:lscpu 显示 VT-x;但容器内无法确认宿主机是否运行在虚拟化环境中
    5. mx-smi 回显关键信息:
    - mx-smi version: 2.2.8
    - Kernel Mode Driver Version: 3.0.11
    - MACA Version: 3.5.3.18
    - BIOS Version: 1.27.5.0
    - Attached GPUs: 8
    - GPU Name: MetaX C500
    6. docker info 回显:容器内无法访问 Docker daemon,报错为

    Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
    
    1. 镜像版本:容器内无法直接获取,需宿主机侧补充
    2. 启动容器命令:容器内无法获取原始 docker run / K8s 启动命令,需宿主机侧补充
    3. 容器内执行命令:
    
    **二、问题现象**
    1. 问题描述:在沐曦 `MetaX C500` 上运行 OpenVLA 的 GatedDelta/FLA 路径时,部分 Triton kernel 在编译阶段失败,错误发生在 `fla` 的 GatedDelta chunk kernel 上,不是数据错误、OOM、NCCL 通信错误,也不是 teacher 模型损坏。
    3. 关键报错日志如下:
    ```text
    loc(".../fla/ops/common/chunk_scaled_dot_kkt.py":62:22): error:
    failed to legalize operation 'triton_gpu.async_commit_group'
    that was explicitly marked illegal
    
    RuntimeError: PassManager::run failed
    
    1. 现象判断:
    2. 报错点在 Triton/MLIR lowering 阶段
    3. 失败指令是 triton_gpu.async_commit_group
    4. 这说明问题更接近“MetaX Triton backend 对该 kernel 的 async/N-buffer lowering 不支持或不稳定”,而不是 Python 业务逻辑错误

    三、希望专家协助确认的问题
    1. MACA 3.5.3.18 / KMD 3.0.11 是否官方支持 triton_gpu.async_commit_group 在 MetaX Triton backend 上的 lowering?
    3. 对 fla 的以下 kernel:
    - chunk_scaled_dot_kkt.py
    - chunk_delta_h.py
    - chunk_o.py
    - wy_fast.py
    是否有推荐的 MetaX 专用编译参数、补丁,或已知不支持项?
    4. 如果 async_commit_group 是当前后端已知不支持的指令,是否有官方建议的规避方式,例如:
    - 禁用 async pipeline
    - 使用保守的 num_warps/num_stages
    - 升级到指定的 torch + triton + MACA 版本组合

  • See post chevron_right
    wanyang
    Members
    fla编译triton算子错误 已解决 2026年4月11日 18:28

    好像是对于一些高级的指令比如异步内存拷贝指令不支持

  • See post chevron_right
    wanyang
    Members
    fla编译triton算子错误 已解决 2026年4月11日 17:38

    你好,我在使用c500时测试了几个fla版本(都是沐曦的版本)都遇到了如图所示的相关错误
    mx-smi回显也如图,错误的具体traceback如下:
    [rank0]: Traceback (most recent call last):

    [rank0]: File "/mnt/afs/lixiaoou/intern/wanyang/openvla/vla-scripts/distill_train_stage2.py", line 1453, in <module>

    [rank0]: distill_train()

    [rank0]: File "/mnt/afs/lixiaoou/intern/wanyang/envs/openvla-triton-test-fla040/lib/python3.10/site-packages/draccus/argparsing.py", line 228, in wrapper_inner

    [rank0]: response = fn(cfg, args, *kwargs)

    [rank0]: File "/mnt/afs/lixiaoou/intern/wanyang/openvla/vla-scripts/distill_train_stage2.py", line 1235, in distill_train

    [rank0]: student_output = student_vla(

    [rank0]: File "/mnt/afs/lixiaoou/intern/wanyang/envs/openvla-triton-test-fla040/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl

    [rank0]: return self._call_impl(args, *kwargs)

    [rank0]: File "/mnt/afs/lixiaoou/intern/wanyang/envs/openvla-triton-test-fla040/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1750, in _call_impl

    [rank0]: return forward_call(args, *kwargs)

    [rank0]: File "/mnt/afs/lixiaoou/intern/wanyang/envs/openvla-triton-test-fla040/lib/python3.10/site-packages/deepspeed/utils/nvtx.py", line 20, in wrapped_fn

    [rank0]: ret_val = func(args, *kwargs)

    [rank0]: File "/mnt/afs/lixiaoou/intern/wanyang/envs/openvla-triton-test-fla040/lib/python3.10/site-packages/deepspeed/runtime/engine.py", line 2030, in forward

    [rank0]: loss = self.module(inputs, *kwargs)

    [rank0]: File "/mnt/afs/lixiaoou/intern/wanyang/envs/openvla-triton-test-fla040/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl

    [rank0]: return self._call_impl(args, *kwargs)

    [rank0]: File "/mnt/afs/lixiaoou/intern/wanyang/envs/openvla-triton-test-fla040/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1845, in _call_impl

    [rank0]: return inner()

    [rank0]: File "/mnt/afs/lixiaoou/intern/wanyang/envs/openvla-triton-test-fla040/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1793, in inner

    [rank0]: result = forward_call(args, *kwargs)

    [rank0]: File "/mnt/afs/lixiaoou/intern/wanyang/openvla/prismatic/extern/hf/modeling_prismatic.py", line 404, in forward

    [rank0]: language_model_output = self.language_model(

    [rank0]: File "/mnt/afs/lixiaoou/intern/wanyang/envs/openvla-triton-test-fla040/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl

    [rank0]: return self._call_impl(args, *kwargs)

    [rank0]: File "/mnt/afs/lixiaoou/intern/wanyang/envs/openvla-triton-test-fla040/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1750, in _call_impl

    [rank0]: return forward_call(args, *kwargs)

    [rank0]: File "/mnt/afs/lixiaoou/intern/wanyang/envs/openvla-triton-test-fla040/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 1189, in forward

    [rank0]: outputs = self.model(

    [rank0]: File "/mnt/afs/lixiaoou/intern/wanyang/envs/openvla-triton-test-fla040/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl

    [rank0]: return self._call_impl(args, *kwargs)

    [rank0]: File "/mnt/afs/lixiaoou/intern/wanyang/envs/openvla-triton-test-fla040/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1750, in _call_impl

    [rank0]: return forward_call(args, *kwargs)

    [rank0]: File "/mnt/afs/lixiaoou/intern/wanyang/envs/openvla-triton-test-fla040/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 988, in forward

    [rank0]: layer_outputs = self._gradient_checkpointing_func(

    [rank0]: File "/mnt/afs/lixiaoou/intern/wanyang/envs/openvla-triton-test-fla040/lib/python3.10/site-packages/torch/_compile.py", line 32, in inner

    [rank0]: return disable_fn(args, *kwargs)

    [rank0]: File "/mnt/afs/lixiaoou/intern/wanyang/envs/openvla-triton-test-fla040/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 745, in _fn

    [rank0]: return fn(args, *kwargs)

    [rank0]: File "/mnt/afs/lixiaoou/intern/wanyang/envs/openvla-triton-test-fla040/lib/python3.10/site-packages/torch/utils/checkpoint.py", line 496, in checkpoint

    [rank0]: ret = function(args, *kwargs)

    [rank0]: File "/mnt/afs/lixiaoou/intern/wanyang/envs/openvla-triton-test-fla040/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl

    [rank0]: return self._call_impl(args, *kwargs)

    [rank0]: File "/mnt/afs/lixiaoou/intern/wanyang/envs/openvla-triton-test-fla040/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1750, in _call_impl

    [rank0]: return forward_call(args, *kwargs)

    [rank0]: File "/mnt/afs/lixiaoou/intern/wanyang/openvla/prismatic/models/backbones/llm/llama_gated_delta.py", line 220, in forward

    [rank0]: attn_output, present_key_value = self.self_attn(

    [rank0]: File "/mnt/afs/lixiaoou/intern/wanyang/envs/openvla-triton-test-fla040/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl

    [rank0]: return self._call_impl(args, *kwargs)

    [rank0]: File "/mnt/afs/lixiaoou/intern/wanyang/envs/openvla-triton-test-fla040/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1750, in _call_impl

    [rank0]: return forward_call(args, *kwargs)

    [rank0]: File "/mnt/afs/lixiaoou/intern/wanyang/openvla/prismatic/models/backbones/llm/gated_delta/gated_delta_net.py", line 289, in forward

    [rank0]: o, new_recurrent_state = chunk_gated_delta_rule(

    [rank0]: File "/mnt/afs/lixiaoou/intern/wanyang/envs/openvla-triton-test-fla040/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 745, in _fn

    [rank0]: return fn(args, *kwargs)

    [rank0]: File "/mnt/afs/lixiaoou/intern/wanyang/envs/openvla-triton-test-fla040/lib/python3.10/site-packages/fla/ops/gated_delta_rule/chunk.py", line 313, in chunk_gated_delta_rule

    [rank0]: o, final_state = ChunkGatedDeltaRuleFunction.apply(

    [rank0]: File "/mnt/afs/lixiaoou/intern/wanyang/envs/openvla-triton-test-fla040/lib/python3.10/site-packages/torch/autograd/function.py", line 575, in apply

    [rank0]: return super().apply(args, *kwargs) # type: ignore[misc]

    [rank0]: File "/mnt/afs/lixiaoou/intern/wanyang/envs/openvla-triton-test-fla040/lib/python3.10/site-packages/fla/utils.py", line 164, in wrapper

    [rank0]: return fn(contiguous_args, *contiguous_kwargs)

    [rank0]: File "/mnt/afs/lixiaoou/intern/wanyang/envs/openvla-triton-test-fla040/lib/python3.10/site-packages/torch/amp/autocast_mode.py", line 503, in decorate_fwd

    [rank0]: return fwd(args, *kwargs)

    [rank0]: File "/mnt/afs/lixiaoou/intern/wanyang/envs/openvla-triton-test-fla040/lib/python3.10/site-packages/fla/ops/gated_delta_rule/chunk.py", line 174, in forward

    [rank0]: g, o, A, final_state = chunk_gated_delta_rule_fwd(

    [rank0]: File "/mnt/afs/lixiaoou/intern/wanyang/envs/openvla-triton-test-fla040/lib/python3.10/site-packages/fla/ops/gated_delta_rule/chunk.py", line 31, in chunk_gated_delta_rule_fwd

    [rank0]: A = chunk_scaled_dot_kkt_fwd(

    [rank0]: File "/mnt/afs/lixiaoou/intern/wanyang/envs/openvla-triton-test-fla040/lib/python3.10/site-packages/fla/ops/common/chunk_scaled_dot_kkt.py", line 114, in chunk_scaled_dot_kkt_fwd

    [rank0]: chunk_scaled_dot_kkt_fwd_kernel[(NT, B * H)](

    [rank0]: File "/mnt/afs/lixiaoou/intern/wanyang/envs/openvla-triton-test-fla040/lib/python3.10/site-packages/triton/runtime/jit.py", line 345, in <lambda>

    [rank0]: return lambda args, kwargs: self.run(grid=grid, warmup=False, args, **kwargs)

    [rank0]: File "/mnt/afs/lixiaoou/intern/wanyang/envs/openvla-triton-test-fla040/lib/python3.10/site-packages/triton/runtime/autotuner.py", line 396, in run

    [rank0]: return self.fn.run(args, *kwargs)

    [rank0]: File "/mnt/afs/lixiaoou/intern/wanyang/envs/openvla-triton-test-fla040/lib/python3.10/site-packages/triton/runtime/autotuner.py", line 212, in run

    [rank0]: timings = {config: self._bench(args, config=config, *kwargs) for config in pruned_configs}

    [rank0]: File "/mnt/afs/lixiaoou/intern/wanyang/envs/openvla-triton-test-fla040/lib/python3.10/site-packages/triton/runtime/autotuner.py", line 212, in <dictcomp>

    [rank0]: timings = {config: self._bench(args, config=config, *kwargs) for config in pruned_configs}

    [rank0]: File "/mnt/afs/lixiaoou/intern/wanyang/envs/openvla-triton-test-fla040/lib/python3.10/site-packages/triton/runtime/autotuner.py", line 137, in _bench

    [rank0]: return do_bench(kernel_call, warmup=self.num_warmups, rep=self.num_reps, quantiles=(0.5, 0.2, 0.8))

    [rank0]: File "/mnt/afs/lixiaoou/intern/wanyang/envs/openvla-triton-test-fla040/lib/python3.10/site-packages/triton/testing.py", line 152, in do_bench

    [rank0]: fn()

    [rank0]: File "/mnt/afs/lixiaoou/intern/wanyang/envs/openvla-triton-test-fla040/lib/python3.10/site-packages/triton/runtime/autotuner.py", line 118, in kernel_call

    [rank0]: self.fn.run(

    [rank0]: File "/mnt/afs/lixiaoou/intern/wanyang/envs/openvla-triton-test-fla040/lib/python3.10/site-packages/triton/runtime/jit.py", line 662, in run

    [rank0]: kernel = self.compile(

    [rank0]: File "/mnt/afs/lixiaoou/intern/wanyang/envs/openvla-triton-test-fla040/lib/python3.10/site-packages/triton/compiler/compiler.py", line 283, in compile

    [rank0]: next_module = compile_ir(module, metadata)

    [rank0]: File "/mnt/afs/lixiaoou/intern/wanyang/envs/openvla-triton-test-fla040/lib/python3.10/site-packages/triton/backends/metax/compiler.py", line 419, in <lambda>

    [rank0]: stages["mlir"] = lambda src, metadata: self.make_mlir(src, metadata, options, self.capability)

    [rank0]: File "/mnt/afs/lixiaoou/intern/wanyang/envs/openvla-triton-test-fla040/lib/python3.10/site-packages/triton/backends/metax/compiler.py", line 328, in make_mlir

    [rank0]: pm.run(mod)

    [rank0]: RuntimeError: PassManager::run failed

  • 沐曦开发者论坛
powered by misago