MetaX-Tech Developer Forum 论坛首页
  • 沐曦开发者
search
Sign in
  • chevron_right Threads
  • label 产品&运维
  • label 解决中

Deepseek-V4-Flash量化

sunjiawang
2026年5月21日
chat_bubble_outline 9
  • link
    sunjiawang
    Members 9 posts
    2026年5月21日 09:44 2026年5月21日 09:44
    link

    我在沐曦官方发现已经发布了vLLM 0.20.0的预发布版本,这个版本支持V4,我下载进行测试发现并不能
    docker run -itd --device=/dev/dri --device=/dev/mxcd --group-add video --name dsv4-master --device=/dev/mem --network=host --security-opt seccomp=unconfined --security-opt apparmor=unconfined --shm-size '100gb' --ulimit memlock=-1 -v /usr/local/:/usr/local/ -v /mnt/data/models:/mnt/data/models -e GLOO_SOCKET_IFNAME=enp184s0f1np1 -e NCCL_IB_HCA=rocep75s0f0 -e RAY_EXPERIMENTAL_NOSET_CUDA_VISIBLE_DEVICES=1 -e MACA_SMALL_PAGESIZE_ENABLE=1 -e MACA_VLLM_ENABLE_MCTLASS_FUSED_MOE=1 -e MACA_VLLM_ENABLE_MCTLASS_PYTHON_API=1 cr.metax-tech.com/public-init/maca/vllm-metax:0.20.0-maca.ai3.7.0.101-torch2.8-py310-ubuntu22.04-amd64 ###这是我的启动docker命令

    vllm serve /mnt/data/models/DeepSeek-V4-Flash-AWQ-W8A8 --trust-remote-code --kv-cache-dtype bfloat16 --block-size 256 --gpu-memory-utilization 0.85 --tokenizer-mode deepseek_v4 --tool-call-parser deepseek_v4 --enable-auto-tool-choice --reasoning-parser deepseek_v4 --compilation-config '{"cudagraph_mode":"FULL_AND_PIECEWISE", "custom_ops":["all"]}' --max-num-seq 128 -tp 8 --speculative_config '{"method": "mtp", "num_speculative_tokens": 1}' --pipeline-parallel-size 2 --nnodes 2 --node-rank 0 --master-addr 192.168.100.234 --mm-encoder-tp-mode data --distributed-executor-backend mp --enable-chunked-prefill --enable-prefix-caching --max-model-len 216244 --served-model-name DeepSeek-V4-Flash-AWQ-W8A8 #这是我启动模型命令,打算用2台服务器,后端用mp方式部署,会一直报错,算子重复注册

    (APIServer pid=3314) WARNING 05-21 09:40:38 [envs.py:1818] Unknown vLLM environment variable detected: VLLM_IN_PROCESS_MODEL_INSPECTION
    (APIServer pid=3314) ERROR 05-21 09:41:30 [registry.py:912] Error in inspecting model architecture 'DeepseekV4ForCausalLM'
    (APIServer pid=3314) ERROR 05-21 09:41:30 [registry.py:912] Traceback (most recent call last):
    (APIServer pid=3314) ERROR 05-21 09:41:30 [registry.py:912] File "/opt/conda/lib/python3.10/site-packages/vllm/model_executor/models/registry.py", line 1336, in _run_in_subprocess
    (APIServer pid=3314) ERROR 05-21 09:41:30 [registry.py:912] returned.check_returncode()
    (APIServer pid=3314) ERROR 05-21 09:41:30 [registry.py:912] File "/opt/conda/lib/python3.10/subprocess.py", line 457, in check_returncode
    (APIServer pid=3314) ERROR 05-21 09:41:30 [registry.py:912] raise CalledProcessError(self.returncode, self.args, self.stdout,
    (APIServer pid=3314) ERROR 05-21 09:41:30 [registry.py:912] subprocess.CalledProcessError: Command '['/opt/conda/bin/python3.10', '-m', 'vllm.model_executor.models.registry']' returned non-zero exit status 1.
    (APIServer pid=3314) ERROR 05-21 09:41:30 [registry.py:912]
    (APIServer pid=3314) ERROR 05-21 09:41:30 [registry.py:912] The above exception was the direct cause of the following exception:
    (APIServer pid=3314) ERROR 05-21 09:41:30 [registry.py:912]
    (APIServer pid=3314) ERROR 05-21 09:41:30 [registry.py:912] Traceback (most recent call last):
    (APIServer pid=3314) ERROR 05-21 09:41:30 [registry.py:912] File "/opt/conda/lib/python3.10/site-packages/vllm/model_executor/models/registry.py", line 910, in _try_inspect_model_cls
    (APIServer pid=3314) ERROR 05-21 09:41:30 [registry.py:912] return model.inspect_model_cls()
    (APIServer pid=3314) ERROR 05-21 09:41:30 [registry.py:912] File "/opt/conda/lib/python3.10/site-packages/vllm/logging_utils/log_time.py", line 21, in _wrapper
    (APIServer pid=3314) ERROR 05-21 09:41:30 [registry.py:912] result = func(args, kwargs)
    (APIServer pid=3314) ERROR 05-21 09:41:30 [registry.py:912] File "/opt/conda/lib/python3.10/site-packages/vllm/model_executor/models/registry.py", line 871, in inspect_model_cls
    (APIServer pid=3314) ERROR 05-21 09:41:30 [registry.py:912] mi = _run_in_subprocess(
    (APIServer pid=3314) ERROR 05-21 09:41:30 [registry.py:912] File "/opt/conda/lib/python3.10/site-packages/vllm/model_executor/models/registry.py", line 1339, in _run_in_subprocess
    (APIServer pid=3314) ERROR 05-21 09:41:30 [registry.py:912] raise RuntimeError(
    (APIServer pid=3314) ERROR 05-21 09:41:30 [registry.py:912] RuntimeError: Error raised in subprocess:
    (APIServer pid=3314) ERROR 05-21 09:41:30 [registry.py:912] /opt/conda/lib/python3.10/runpy.py:126: RuntimeWarning: 'vllm.model_executor.models.registry' found in sys.modules after import of package 'vllm.model_executor.models', but prior to execution of 'vllm.model_executor.models.registry'; this may result in unpredictable behaviour
    (APIServer pid=3314) ERROR 05-21 09:41:30 [registry.py:912] warn(RuntimeWarning(msg))
    (APIServer pid=3314) ERROR 05-21 09:41:30 [registry.py:912] /opt/conda/lib/python3.10/site-packages/tvm_ffi/_optional_torch_c_dlpack.py:181: UserWarning: Failed to JIT torch c dlpack extension, EnvTensorAllocator will not be enabled.
    (APIServer pid=3314) ERROR 05-21 09:41:30 [registry.py:912] We recommend installing via pip install torch-c-dlpack-ext
    (APIServer pid=3314) ERROR 05-21 09:41:30 [registry.py:912] warnings.warn(
    (APIServer pid=3314) ERROR 05-21 09:41:30 [registry.py:912] Traceback (most recent call last):
    (APIServer pid=3314) ERROR 05-21 09:41:30 [registry.py:912] File "/opt/conda/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    (APIServer pid=3314) ERROR 05-21 09:41:30 [registry.py:912] return _run_code(code, main_globals, None,
    (APIServer pid=3314) ERROR 05-21 09:41:30 [registry.py:912] File "/opt/conda/lib/python3.10/runpy.py", line 86, in _run_code
    (APIServer pid=3314) ERROR 05-21 09:41:30 [registry.py:912] exec(code, run_globals)
    (APIServer pid=3314) ERROR 05-21 09:41:30 [registry.py:912] File "/opt/conda/lib/python3.10/site-packages/vllm/model_executor/models/registry.py", line 1362, in <module>
    (APIServer pid=3314) ERROR 05-21 09:41:30 [registry.py:912] _run()
    (APIServer pid=3314) ERROR 05-21 09:41:30 [registry.py:912] File "/opt/conda/lib/python3.10/site-packages/vllm/model_executor/models/registry.py", line 1355, in _run
    (APIServer pid=3314) ERROR 05-21 09:41:30 [registry.py:912] result = fn()
    (APIServer pid=3314) ERROR 05-21 09:41:30 [registry.py:912] File "/opt/conda/lib/python3.10/site-packages/vllm/model_executor/models/registry.py", line 872, in <lambda>
    (APIServer pid=3314) ERROR 05-21 09:41:30 [registry.py:912] lambda: _ModelInfo.from_model_cls(self.load_model_cls())
    (APIServer pid=3314) ERROR 05-21 09:41:30 [registry.py:912] File "/opt/conda/lib/python3.10/site-packages/vllm/model_executor/models/registry.py", line 885, in load_model_cls
    (APIServer pid=3314) ERROR 05-21 09:41:30 [registry.py:912] mod = importlib.import_module(self.module_name)
    (APIServer pid=3314) ERROR 05-21 09:41:30 [registry.py:912] File "/opt/conda/lib/python3.10/importlib/init.py", line 126, in import_module
    (APIServer pid=3314) ERROR 05-21 09:41:30 [registry.py:912] return _bootstrap._gcd_import(name[level:], package, level)
    (APIServer pid=3314) ERROR 05-21 09:41:30 [registry.py:912] File "<frozen importlib._bootstrap>", line 1050, in _gcd_import
    (APIServer pid=3314) ERROR 05-21 09:41:30 [registry.py:912] File "<frozen importlib._bootstrap>", line 1027, in _find_and_load
    (APIServer pid=3314) ERROR 05-21 09:41:30 [registry.py:912] File "<frozen importlib._bootstrap>", line 1006, in _find_and_load_unlocked
    (APIServer pid=3314) ERROR 05-21 09:41:30 [registry.py:912] File "<frozen importlib._bootstrap>", line 688, in _load_unlocked
    (APIServer pid=3314) ERROR 05-21 09:41:30 [registry.py:912] File "<frozen importlib._bootstrap_external>", line 883, in exec_module
    (APIServer pid=3314) ERROR 05-21 09:41:30 [registry.py:912] File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
    (APIServer pid=3314) ERROR 05-21 09:41:30 [registry.py:912] File "/opt/conda/lib/python3.10/site-packages/vllm_metax/models/deepseek_v4.py", line 23, in <module>
    (APIServer pid=3314) ERROR 05-21 09:41:30 [registry.py:912] from vllm_metax.customized.layers.deepseek_v4_attention import (
    (APIServer pid=3314) ERROR 05-21 09:41:30 [registry.py:912] File "/opt/conda/lib/python3.10/site-packages/vllm_metax/customized/layers/deepseek_v4_attention.py", line 486, in <module>
    (APIServer pid=3314) ERROR 05-21 09:41:30 [registry.py:912] direct_register_custom_op(
    (APIServer pid=3314) ERROR 05-21 09:41:30 [registry.py:912] File "/opt/conda/lib/python3.10/site-packages/vllm/utils/torch_utils.py", line 934, in direct_register_custom_op
    (APIServer pid=3314) ERROR 05-21 09:41:30 [registry.py:912] my_lib.define(op_name + schema_str, tags=tags)
    (APIServer pid=3314) ERROR 05-21 09:41:30 [registry.py:912] File "/opt/conda/lib/python3.10/site-packages/torch/library.py", line 177, in define
    (APIServer pid=3314) ERROR 05-21 09:41:30 [registry.py:912] result = self.m.define(schema, alias_analysis, tuple(tags))
    (APIServer pid=3314) ERROR 05-21 09:41:30 [registry.py:912] RuntimeError: Tried to register an operator (vllm::deepseek_v4_attention(Tensor hidden_states, Tensor qr, Tensor kv, Tensor positions, Tensor(a4!) out, str layer_name) -> ()) with the same name and overload name multiple times. Each overload's schema should only be registered with a single call to def(). Duplicate registration: registered at /dev/null:241. Original registration: registered at /dev/null:241
    (APIServer pid=3314) ERROR 05-21 09:41:30 [registry.py:912]
    (APIServer pid=3314) Traceback (most recent call last):
    (APIServer pid=3314) File "/opt/conda/bin/vllm", line 8, in <module>
    (APIServer pid=3314) sys.exit(main())
    (APIServer pid=3314) File "/opt/conda/lib/python3.10/site-packages/vllm/entrypoints/cli/main.py", line 92, in main
    (APIServer pid=3314) args.dispatch_function(args)
    (APIServer pid=3314) File "/opt/conda/lib/python3.10/site-packages/vllm/entrypoints/cli/serve.py", line 122, in cmd
    (APIServer pid=3314) uvloop.run(run_server(args))
    (APIServer pid=3314) File "/opt/conda/lib/python3.10/site-packages/uvloop/init.py", line 69, in run
    (APIServer pid=3314) return loop.run_until_complete(wrapper())
    (APIServer pid=3314) File "uvloop/loop.pyx", line 1518, in uvloop.loop.Loop.run_until_complete
    (APIServer pid=3314) File "/opt/conda/lib/python3.10/site-packages/uvloop/init.py", line 48, in wrapper
    (APIServer pid=3314) return await main
    (APIServer pid=3314) File "/opt/conda/lib/python3.10/site-packages/vllm/entrypoints/openai/api_server.py", line 678, in run_server
    (APIServer pid=3314) await run_server_worker(listen_address, sock, args,
    *uvicorn_kwargs)
    (APIServer pid=3314) File "/opt/conda/lib/python3.10/site-packages/vllm/entrypoints/openai/api_server.py", line 692, in run_server_worker
    (APIServer pid=3314) async with build_async_engine_client(
    (APIServer pid=3314) File "/opt/conda/lib/python3.10/contextlib.py", line 199, in aenter
    (APIServer pid=3314) return await anext(self.gen)
    (APIServer pid=3314) File "/opt/conda/lib/python3.10/site-packages/vllm/entrypoints/openai/api_server.py", line 100, in build_async_engine_client
    (APIServer pid=3314) async with build_async_engine_client_from_engine_args(
    (APIServer pid=3314) File "/opt/conda/lib/python3.10/contextlib.py", line 199, in aenter
    (APIServer pid=3314) return await anext(self.gen)
    (APIServer pid=3314) File "/opt/conda/lib/python3.10/site-packages/vllm/entrypoints/openai/api_server.py", line 124, in build_async_engine_client_from_engine_args
    (APIServer pid=3314) vllm_config = engine_args.create_engine_config(usage_context=usage_context)
    (APIServer pid=3314) File "/opt/conda/lib/python3.10/site-packages/vllm/engine/arg_utils.py", line 1627, in create_engine_config
    (APIServer pid=3314) model_config = self.create_model_config()
    (APIServer pid=3314) File "/opt/conda/lib/python3.10/site-packages/vllm/engine/arg_utils.py", line 1475, in create_model_config
    (APIServer pid=3314) return ModelConfig(
    (APIServer pid=3314) File "/opt/conda/lib/python3.10/site-packages/pydantic/_internal/_dataclasses.py", line 121, in init
    (APIServer pid=3314) s.pydantic_validator.validate_python(ArgsKwargs(args, kwargs), self_instance=s)
    (APIServer pid=3314) pydantic_core._pydantic_core.ValidationError: 1 validation error for ModelConfig
    (APIServer pid=3314) Value error, Model architectures ['DeepseekV4ForCausalLM'] failed to be inspected. Please check the logs for more details. [type=value_error, input_value=ArgsKwargs((), {'model': ...nderer_num_workers': 1}), input_type=ArgsKwargs]
    (APIServer pid=3314) For further information visit errors.pydantic.dev/2.13/v/value_error

    这个该怎么解决?

  • arrow_forward

    Thread has been moved from 公共.

    • By shuai_chen on 2026年5月21日 10:33.
  • link
    shuai_chen
    Members 458 posts
    2026年5月21日 10:38 2026年5月21日 10:38
    link

    尊敬的开发者您好,模型权重请使用modelscope.cn/models/metax-tech/DeepSeek-V4-Flash-FlexSMQ-AWQ-W8A8

  • link
    sunjiawang
    Members 9 posts
    2026年5月21日 10:39 2026年5月21日 10:39
    link

    就是下载的魔搭 沐曦发布的V4

  • link
    shuai_chen
    Members 458 posts
    2026年5月21日 10:41 2026年5月21日 10:41
    link

    尊敬的开发者您好,请先验证单机八卡是否正常启动

  • link
    sunjiawang
    Members 9 posts
    2026年5月21日 10:41 2026年5月21日 10:41
    link

    单机也不行,都是一样的报错,算子库被重复注册

  • link
    shuai_chen
    Members 458 posts
    2026年5月21日 10:43 2026年5月21日 10:43
    link

    尊敬的开发者您好,请参考developer.metax-tech.com/forum/t/fa-tie-qian-bi-kan-jing-xiang-shi-yong-wen-ti-ti-wen-mo-ban/267/ 详细描述您的操作步骤以及详细日志

  • link
    sunjiawang
    Members 9 posts
    2026年5月21日 10:44 2026年5月21日 10:44
    link

    这个网站打不开,具体需要什么?docker启动命令和模型启动命令在上面,其他什么也没动

  • link
    shuai_chen
    Members 458 posts
    2026年5月21日 10:45 2026年5月21日 10:45
    link

    尊敬的开发者您好,请参考developer.metax-tech.com/forum/t/fa-tie-qian-bi-kan-jing-xiang-shi-yong-wen-ti-ti-wen-mo-ban/267/ 详细描述您的操作步骤以及详细日志

  • link
    sunjiawang
    Members 9 posts
    2026年5月21日 10:52 2026年5月21日 10:52
    link

    您好,已按照要求粘贴具体信息,请查看

    insert_drive_file
    V4.docx

    DOCX, 24.5 KB, uploaded by sunjiawang on 2026年5月21日.

  • link
    shuai_chen
    Members 458 posts
    2026年5月21日 11:24 2026年5月21日 11:24
    link

    尊敬的开发者您好,当前镜像不支持DS V4 Flash,请参考developer.metax-tech.com/forum/t/deepseek-v4-flash-liang-hua-bu-shu/468/#post-1959
    部署镜像由于有时效期限制,请开启个人主题获取,右上角倒数第三个图标,收件人写shuai_chen。

arrow_upward Go to top
  • 沐曦开发者论坛
powered by misago