MetaX-Tech Developer Forum 论坛首页
  • 沐曦开发者
search
Sign in

nuuuuuuke

  • Members
  • Joined 2026年4月8日
  • message 帖子
  • forum 主题
  • favorite 关注者
  • favorite_border Follows
  • person_outline 详细信息

nuuuuuuke has posted 12 messages.

  • See post chevron_right
    nuuuuuuke
    Members
    pd分离transfer backend 已解决 2026年4月30日 14:16

    python -m sglang.launch_server --help 2>&1 | grep -A8 -B2 "disaggregation-transfer-backend"

                        [--debug-tensor-dump-inject DEBUG_TENSOR_DUMP_INJECT]
                        [--disaggregation-mode {null,prefill,decode}]
                        [--disaggregation-transfer-backend {mooncake,nixl,ascend,fake,mori}]
                        [--disaggregation-bootstrap-port DISAGGREGATION_BOOTSTRAP_PORT]
                        [--disaggregation-decode-tp DISAGGREGATION_DECODE_TP]
                        [--disaggregation-decode-dp DISAGGREGATION_DECODE_DP]
                        [--disaggregation-prefill-pp DISAGGREGATION_PREFILL_PP]
                        [--disaggregation-ib-device DISAGGREGATION_IB_DEVICE]
                        [--disaggregation-decode-enable-offload-kvcache]
                        [--num-reserved-decode-tokens NUM_RESERVED_DECODE_TOKENS]
                        [--disaggregation-decode-polling-interval DISAGGREGATION_DECODE_POLLING_INTERVAL]
    

    sglang 0.5.9的镜像, pd分离的部署, --disaggregation-transfer-backend用啥 啊, 直接 pip install mooncake的? 还是有沐曦配套的。

  • See post chevron_right
    nuuuuuuke
    Members
    Minimax m2.7适配 已解决 2026年4月17日 10:23

    我没有转vllm的W8A8模型, 另外他们有一堆神秘的环境变量,类似这种
    export VLLM_ALLOW_RUNTIME_LORA_UPDATING=True
    export MACA_DIRECT_DISPATCH=1
    export MACA_GRAPH_LAUNCH_MODE=5
    export MACA_SMALL_PAGESIZE_ENABLE=1
    export MACA_TORCH_COMPILE_CONF=triton.multi_kernel:1

  • See post chevron_right
    nuuuuuuke
    Members
    Minimax m2.7适配 已解决 2026年4月16日 17:05

    不知道啊。

  • See post chevron_right
    nuuuuuuke
    Members
    Minimax m2.7适配 已解决 2026年4月16日 16:43

    System Info:
    Machine ID: 9d52c7d699ca42f0ae1f8b918d2a3eb1
    System UUID: b1a64fb0-1ed5-01e1-d311-debf52dba16c
    Boot ID: bb311989-725f-4a20-baa7-960a7a0087c9
    Kernel Version: 6.8.0-49-generic
    OS Image: Ubuntu 24.04.3 LTS
    Operating System: linux
    Architecture: amd64
    Container Runtime Version: containerd://1.7.23
    Kubelet Version: v1.31.3-8+52431524cc27b6-sc
    Kube-Proxy Version: v1.31.3-8+52431524cc27b6-sc

    单机16卡的C500

  • See post chevron_right
    nuuuuuuke
    Members
    Minimax m2.7适配 已解决 2026年4月16日 16:34

    有sglang的推荐启动参数不, 包括各种神秘的环境变量开关。 单机16卡C500或者更多

  • See post chevron_right
    nuuuuuuke
    Members
    Minimax m2.7适配 已解决 2026年4月16日 16:22

    好吧。。。

  • See post chevron_right
    nuuuuuuke
    Members
    Minimax m2.7适配 已解决 2026年4月16日 16:18

    modelscope.cn/models/metax-tech/MiniMax-M2.7-W8A8

  • See post chevron_right
    nuuuuuuke
    Members
    Minimax m2.7适配 已解决 2026年4月16日 16:10

    modelscope上下载的MiniMax-M2.7-W8A8模型;
    镜像用的: :0.14.0-maca.ai3.5.3.102-torch2.8-py310-ubuntu22.04-amd64
    c500 单机16卡。
    启动命令:
    export VLLM_ALLOW_RUNTIME_LORA_UPDATING=True
    export MACA_DIRECT_DISPATCH=1
    export MACA_GRAPH_LAUNCH_MODE=5
    export MACA_SMALL_PAGESIZE_ENABLE=1
    export MACA_TORCH_COMPILE_CONF=triton.multi_kernel:1
    MODELPATH=/data/opensource-models/MiniMax-M2.7-W8A8-official/
    MODEL_NAME=MiniMax-M2.7-W8A8

    port=${1:-12001}

    currenttime=date "+%Y%m%d%H%M%S"

    vllm serve ${MODELPATH} \
    --host 0.0.0.0 \
    --port ${port} \
    --served-model-name ${MODEL_NAME} \
    --tensor-parallel-size 16 \
    --pipeline-parallel-size 1 \
    --dtype half \
    --gpu-memory-utilization 0.9 \
    --max-num-batched-tokens 8192 \
    --max-model-len 8192 \
    --swap-space 64 \
    --mm-encoder-tp-mode data \
    --trust-remote-code \
    --max-num-seqs=64 \
    --no-enable-prefix-caching --enable-auto-tool-choice --tool-call-parser minimax_m2 --reasoning-parser minimax_m2_append_think \
    2>&1 | tee ./${currenttime}.log

    报错了。。。。

  • See post chevron_right
    nuuuuuuke
    Members
    Qwen3-ASR-1.7B在C500上的测试 已解决 2026年4月10日 08:36

    。。。。。。

  • See post chevron_right
    nuuuuuuke
    Members
    Qwen3-ASR-1.7B在C500上的测试 已解决 2026年4月9日 14:58

    Traceback (most recent call last):
    File "/opt/conda/bin/vllm", line 5, in <module>
    from vllm.entrypoints.cli.main import main
    File "/opt/conda/lib/python3.12/site-packages/vllm/entrypoints/cli/init.py", line 4, in <module>
    from vllm.entrypoints.cli.benchmark.mm_processor import (
    File "/opt/conda/lib/python3.12/site-packages/vllm/entrypoints/cli/benchmark/mm_processor.py", line 5, in <module>
    from vllm.benchmarks.mm_processor import add_cli_args, main
    File "/opt/conda/lib/python3.12/site-packages/vllm/benchmarks/mm_processor.py", line 26, in <module>
    from vllm.benchmarks.datasets import (
    File "/opt/conda/lib/python3.12/site-packages/vllm/benchmarks/datasets.py", line 39, in <module>
    from vllm.lora.utils import get_adapter_absolute_path
    File "/opt/conda/lib/python3.12/site-packages/vllm/lora/utils.py", line 17, in <module>
    from vllm.lora.layers import (
    File "/opt/conda/lib/python3.12/site-packages/vllm/lora/layers/init.py", line 4, in <module>
    from vllm.lora.layers.column_parallel_linear import (
    File "/opt/conda/lib/python3.12/site-packages/vllm/lora/layers/column_parallel_linear.py", line 12, in <module>
    from vllm.model_executor.layers.linear import (
    File "/opt/conda/lib/python3.12/site-packages/vllm/model_executor/layers/linear.py", line 28, in <module>
    from vllm.model_executor.layers.utils import (
    File "/opt/conda/lib/python3.12/site-packages/vllm/model_executor/layers/utils.py", line 9, in <module>
    from vllm import _custom_ops as ops
    File "/opt/conda/lib/python3.12/site-packages/vllm/_custom_ops.py", line 95, in <module>
    @register_fake("_C::scaled_fp4_quant.out")
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    File "/opt/conda/lib/python3.12/site-packages/torch/library.py", line 1069, in register
    use_lib._register_fake(
    File "/opt/conda/lib/python3.12/site-packages/torch/library.py", line 219, in _register_fake
    handle = entry.fake_impl.register(
    ^^^^^^^^^^^^^^^^^^^^^^^^^
    File "/opt/conda/lib/python3.12/site-packages/torch/_library/fake_impl.py", line 50, in register
    if torch._C._dispatch_has_kernel_for_dispatch_key(self.qualname, "Meta"):
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    RuntimeError: operator _C::scaled_fp4_quant.out does not exist

  • See post chevron_right
    nuuuuuuke
    Members
    Qwen3-ASR-1.7B在C500上的测试 已解决 2026年4月8日 19:28

    环境信息: k8s里的提交了一个job, 镜像用的vllm-metax:0.15.0。。。
    containers:
    - name: master
    image: registry.sa-ryd.sensetime.com.sa/ccr-alg/vllm-metax:0.15.0-maca.ai3.5.3.203-torch2.8-py312-ubuntu22.04-amd64
    command: ["bash", "-lc", "sleep infinity"]
    resources:
    requests:
    cpu: '32'
    memory: '64Gi'
    metax-tech.com/gpu: '1'
    limits:
    cpu: '32'
    memory: '64Gi'
    metax-tech.com/gpu: '1'
    mx-smi信息:
    mx-smi version: 2.2.9

    =================== MetaX System Management Interface Log ===================
    Timestamp : Wed Apr 8 19:20:41 2026

    Attached GPUs : 1
    +---------------------------------------------------------------------------------+
    | MX-SMI 2.2.9 Kernel Mode Driver Version: 3.4.4 |
    | MACA Version: 3.5.3.20 BIOS Version: 1.30.0.0 |
    |------------------+-----------------+---------------------+----------------------|
    | Board Name | GPU Persist-M | Bus-id | GPU-Util sGPU-M |
    | Pwr:Usage/Cap | Temp Perf | Memory-Usage | GPU-State |
    |==================+=================+=====================+======================|
    | 0 MetaX C500 | 0 Off | 0000:0c:00.0 | 0% Disabled |
    | 57W / 350W | 36C P0 | 858/65536 MiB | Available |
    +------------------+-----------------+---------------------+----------------------+

    启动命令:
    根据vllm-metax.readthedocs.io/en/latest/getting_started/installation/maca.html#build-vllm的文档,
    安装了 git clone --branch v0.18.0-dev github.com/MetaX-MACA/vLLM-metax 和 vllm 0.18.0. 安装成功。
    pip list | grep vllm
    vllm 0.18.1.dev0+gbcf2be961.d20260408.empty
    vllm_metax 0.18.0+gea0600.d20260408.maca3.5.3.20.torch2.8

    然后运行vllm,
    vllm serve /data/ASR/qwen3-asr-hf/hub/models--Qwen--Qwen3-ASR-1.7B/snapshots/7278e1e70fe206f11671096ffdd38061171dd6e5 --served-model-name "Qwen3-ASR-1.7B" --host 0.0.0.0 --port 12212

    报错信息:

    INFO 04-08 19:15:51 [init.py:44] Available plugins for group vllm.platform_plugins:
    INFO 04-08 19:15:51 [init.py:46] - metax -> vllm_metax:register
    INFO 04-08 19:15:51 [init.py:49] All plugins in this group will be loaded. Set VLLM_PLUGINS to control which plugins to load.
    INFO 04-08 19:15:51 [init.py:239] Platform plugin metax is activated
    INFO 04-08 19:15:51 [envs.py:104] Plugin sets VLLM_USE_FLASHINFER_SAMPLER to False. Reason: flashinfer sampler are not supported on maca
    INFO 04-08 19:15:51 [envs.py:104] Plugin sets VLLM_ENGINE_READY_TIMEOUT_S to 3600. Reason: set timeout to 3600s for model loading
    INFO Print the version information of mcoplib during compilation.

    Version info:Mcoplib_Version = '0.4.0'
    Build_Maca_Version = '3.5.3.18'
    GIT_BRANCH = 'HEAD'
    GIT_COMMIT = 'fe3a7e2'
    Vllm Op Version = 0.15.0
    SGlang Op Version = 0.5.7 && 0.5.8

    INFO Staring Check the current MACA version of the operating environment.

    INFO: Release major.minor matching, successful:3.5.

    Traceback (most recent call last):
    File "/opt/conda/bin/vllm", line 5, in <module>
    from vllm.entrypoints.cli.main import main
    File "/opt/conda/lib/python3.12/site-packages/vllm/entrypoints/cli/init.py", line 4, in <module>
    from vllm.entrypoints.cli.benchmark.mm_processor import (
    File "/opt/conda/lib/python3.12/site-packages/vllm/entrypoints/cli/benchmark/mm_processor.py", line 5, in <module>
    from vllm.benchmarks.mm_processor import add_cli_args, main
    File "/opt/conda/lib/python3.12/site-packages/vllm/benchmarks/mm_processor.py", line 26, in <module>
    from vllm.benchmarks.datasets import (
    File "/opt/conda/lib/python3.12/site-packages/vllm/benchmarks/datasets.py", line 39, in <module>
    from vllm.lora.utils import get_adapter_absolute_path
    File "/opt/conda/lib/python3.12/site-packages/vllm/lora/utils.py", line 17, in <module>
    from vllm.lora.layers import (
    File "/opt/conda/lib/python3.12/site-packages/vllm/lora/layers/init.py", line 4, in <module>
    from vllm.lora.layers.column_parallel_linear import (
    File "/opt/conda/lib/python3.12/site-packages/vllm/lora/layers/column_parallel_linear.py", line 12, in <module>
    from vllm.model_executor.layers.linear import (
    File "/opt/conda/lib/python3.12/site-packages/vllm/model_executor/layers/linear.py", line 28, in <module>
    from vllm.model_executor.layers.utils import (
    File "/opt/conda/lib/python3.12/site-packages/vllm/model_executor/layers/utils.py", line 9, in <module>
    from vllm import _custom_ops as ops
    File "/opt/conda/lib/python3.12/site-packages/vllm/_custom_ops.py", line 95, in <module>
    @register_fake("_C::scaled_fp4_quant.out")
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    File "/opt/conda/lib/python3.12/site-packages/torch/library.py", line 1069, in register
    use_lib._register_fake(
    File "/opt/conda/lib/python3.12/site-packages/torch/library.py", line 219, in _register_fake
    handle = entry.fake_impl.register(
    ^^^^^^^^^^^^^^^^^^^^^^^^^
    File "/opt/conda/lib/python3.12/site-packages/torch/_library/fake_impl.py", line 50, in register
    if torch._C._dispatch_has_kernel_for_dispatch_key(self.qualname, "Meta"):
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    RuntimeError: operator _C::scaled_fp4_quant.out does not exist

    问题: 这个算子报错的链路出在哪里啊? 模型都没开始加载,像是初始化效验算子到位的时候,直接崩了。
    自己想提前跑点新模型, 碰到类似的报错是不是到此为止无能为力了?!
    k8s里的宿主机不升级驱动, 光使用新的vllm-metax:0.15.0-maca.ai3.5.3.203的镜像在上面跑,会有问题不?!

  • See post chevron_right
    nuuuuuuke
    Members
    maca, metax-vllm, mcoplib, pytorch 版本问题 已解决 2026年4月8日 16:29

    我是沐曦的C500显卡, 有一堆,用的k8s + volcano管理的。
    想跑个新点的开源模型是真的难啊, 要等你们官方的vllm-metax镜像,基本要等两个月。。。
    谁能把这个事情说清楚啊, maca, mcoplib, pytorch, metax-vllm这些版本的依赖关系啊。。。
    性能也跑不满, 跟上新的模型也跟不上。

    比如,你们发布了 vllm-metax:0.15.0-maca.ai3.5.3.203-torch2.8-py312-ubuntu22.04-amd64, 集群上node全是 MACA Version: 3.3.x版本的,能跑不?

    mx-smi version: 2.2.9

    =================== MetaX System Management Interface Log ===================
    Timestamp : Wed Apr 8 16:22:30 2026

    Attached GPUs : 16
    +---------------------------------------------------------------------------------+
    | MX-SMI 2.2.9 Kernel Mode Driver Version: 3.4.4 |
    | MACA Version: 3.3.0.15 BIOS Version: 1.30.0.0

  • 沐曦开发者论坛
powered by misago