Posts | lihz | 沐曦开发者论坛

See post chevron_right

lihz
Members

指定卡环境变量失效公共 2025年9月4日 16:56

使用镜像cr.metax-tech.com/public-ai-release/maca/vllm:maca.ai3.0.0.5-torch2.6-py310-ubuntu22.04-amd64启动的容器中，使用METAX_VISIBLE_DEVICES环境变量指定GPU失败。使用CUDA_VISIBLE_DEVICES则可以成功指定运行的GPU。目前其他镜像没有发现有这个问题，这个是新版vllm镜像做的调整吗？
See post chevron_right

lihz
Members

关于环境变量公共 2025年9月3日 18:06

好的，感谢
See post chevron_right

lihz
Members

关于环境变量公共 2025年9月2日 09:52

沐曦有可以限制进程使用的显存大小的环境变量吗？
还有什么其他的常用的环境变量？
See post chevron_right

lihz
Members

注意力头部测试问题已解决 2025年9月2日 09:42

感谢。问题已经解决。
See post chevron_right

lihz
Members

注意力头部测试问题已解决 2025年8月29日 10:12

客户在N卡上也做过测试，是没有出现这个报错的。
下面是去除try except 的回显。附件中为运行的代码和报错截图。测试容器是sglang

python test4.py
使用设备: cuda
query_states shape: torch.Size([8, 16, 1, 24]), 设备: cuda:0
key_states shape: torch.Size([8, 16, 1, 24]), 设备: cuda:0
value_states shape: torch.Size([8, 16, 1, 48]), 设备: cuda:0
/opt/conda/lib/python3.10/contextlib.py:103: FutureWarning: torch.backends.cuda.sdp_kernel() is deprecated. In the future, this context manager will be removed. Please see torch.nn.attention.sdpa_kernel() for the new context manager, with updated signature.
self.gen = func(args, kwds)
Traceback (most recent call last):
File "/data/lhz/BD/test1.py", line 51, in <module>
attn_output = flash_attn_func(
File "/opt/conda/lib/python3.10/site-packages/flash_attn/flash_attn_interface.py", line 1054, in flash_attn_func
return FlashAttnFunc.apply(
File "/opt/conda/lib/python3.10/site-packages/torch/autograd/function.py", line 574, in apply
return super().apply(args, **kwargs) # type: ignore[misc]
File "/opt/conda/lib/python3.10/site-packages/flash_attn/flash_attn_interface.py", line 704, in forward
out, q, k, v, out_padded, softmax_lse, S_dmask, rng_state, attn_mask = _flash_attn_forward(
File "/opt/conda/lib/python3.10/site-packages/flash_attn/flash_attn_interface.py", line 110, in _flash_attn_forward
out, q, k, v, out_padded, softmax_lse, S_dmask, rng_state, attn_mask = flash_attn_cuda.fwd(
RuntimeError: Head dimension of query/key must greater or equal to head dimension in query
See post chevron_right

lihz
Members

注意力头部测试问题已解决 2025年8月28日 14:43

各位专家有什么解决办法吗
See post chevron_right

lihz
Members

注意力头部测试问题已解决 2025年8月28日 14:39
在做注意力头部测试时发现，
query_states shape: [8, 16, 1, 24]
key_states shape: [8, 16, 1, 24]
value_states shape: [8, 16, 1, 48]
dtype为float16时，torch.nn.functional.scaled_dot_product_attention会报错：
Head dimension of query/key must greater or equal to head dimension in query。
dtype 为float32时则不会报错。

宿主机环境：
CPU：Intel(R) Xeon(R) Gold 5318Y 2
内存：256GB（32GB8）
GPU：N260*2
OS：Ubuntu 22.04.4 LTS
内核：5.15.0-88-generic
MACA：3.0.0.8
vBIOS：1.26.1.0
docker：27.5.1

容器环境：
cr.metax-tech.com/public-ai-release/maca/vllm:maca.ai3.0.0.5-torch2.6-py310-ubuntu22.04-amd64
cr.metax-tech.com/public-ai-release/maca/sglang:maca.ai2.33.1.7-torch2.6-py310-ubuntu22.04-amd64

简单测试内容介绍
后面在做测试发现在sglang的容器内会出现上述报错，而在vllm容器中不会出现上述报错。使用附件中的test1（上传限制需要修改为.py文件运行。）可稳定复现。
后续在代码中添加下面全局开关后：
torch.backends.cuda.enable_flash_sdp(False)
torch.backends.cuda.enable_mem_efficient_sdp(False)
torch.backends.cuda.enable_math_sdp(True)
在sglang和vllm容器中就不会在出现上述报错。使用附件中的test2复现。
在后续测试中发现即使设置了上面的全局开关，在使用 flash_attn_func 时还是会出现上述错误。附件test3可稳定复现。

回显：
成功及报错回显可查看附件中的截图。