Deepseek-V4-Flash量化

Members 23 posts

2026年5月21日 09:44 2026年5月21日 09:44

我在沐曦官方发现已经发布了vLLM 0.20.0的预发布版本，这个版本支持V4，我下载进行测试发现并不能
docker run -itd --device=/dev/dri --device=/dev/mxcd --group-add video --name dsv4-master --device=/dev/mem --network=host --security-opt seccomp=unconfined --security-opt apparmor=unconfined --shm-size '100gb' --ulimit memlock=-1 -v /usr/local/:/usr/local/ -v /mnt/data/models:/mnt/data/models -e GLOO_SOCKET_IFNAME=enp184s0f1np1 -e NCCL_IB_HCA=rocep75s0f0 -e RAY_EXPERIMENTAL_NOSET_CUDA_VISIBLE_DEVICES=1 -e MACA_SMALL_PAGESIZE_ENABLE=1 -e MACA_VLLM_ENABLE_MCTLASS_FUSED_MOE=1 -e MACA_VLLM_ENABLE_MCTLASS_PYTHON_API=1 cr.metax-tech.com/public-init/maca/vllm-metax:0.20.0-maca.ai3.7.0.101-torch2.8-py310-ubuntu22.04-amd64 ###这是我的启动docker命令

vllm serve /mnt/data/models/DeepSeek-V4-Flash-AWQ-W8A8 --trust-remote-code --kv-cache-dtype bfloat16 --block-size 256 --gpu-memory-utilization 0.85 --tokenizer-mode deepseek_v4 --tool-call-parser deepseek_v4 --enable-auto-tool-choice --reasoning-parser deepseek_v4 --compilation-config '{"cudagraph_mode":"FULL_AND_PIECEWISE", "custom_ops":["all"]}' --max-num-seq 128 -tp 8 --speculative_config '{"method": "mtp", "num_speculative_tokens": 1}' --pipeline-parallel-size 2 --nnodes 2 --node-rank 0 --master-addr 192.168.100.234 --mm-encoder-tp-mode data --distributed-executor-backend mp --enable-chunked-prefill --enable-prefix-caching --max-model-len 216244 --served-model-name DeepSeek-V4-Flash-AWQ-W8A8 #这是我启动模型命令，打算用2台服务器，后端用mp方式部署，会一直报错，算子重复注册

(APIServer pid=3314) WARNING 05-21 09:40:38 [envs.py:1818] Unknown vLLM environment variable detected: VLLM_IN_PROCESS_MODEL_INSPECTION
(APIServer pid=3314) ERROR 05-21 09:41:30 [registry.py:912] Error in inspecting model architecture 'DeepseekV4ForCausalLM'
(APIServer pid=3314) ERROR 05-21 09:41:30 [registry.py:912] Traceback (most recent call last):
(APIServer pid=3314) ERROR 05-21 09:41:30 [registry.py:912] File "/opt/conda/lib/python3.10/site-packages/vllm/model_executor/models/registry.py", line 1336, in _run_in_subprocess
(APIServer pid=3314) ERROR 05-21 09:41:30 [registry.py:912] returned.check_returncode()
(APIServer pid=3314) ERROR 05-21 09:41:30 [registry.py:912] File "/opt/conda/lib/python3.10/subprocess.py", line 457, in check_returncode
(APIServer pid=3314) ERROR 05-21 09:41:30 [registry.py:912] raise CalledProcessError(self.returncode, self.args, self.stdout,
(APIServer pid=3314) ERROR 05-21 09:41:30 [registry.py:912] subprocess.CalledProcessError: Command '['/opt/conda/bin/python3.10', '-m', 'vllm.model_executor.models.registry']' returned non-zero exit status 1.
(APIServer pid=3314) ERROR 05-21 09:41:30 [registry.py:912]
(APIServer pid=3314) ERROR 05-21 09:41:30 [registry.py:912] The above exception was the direct cause of the following exception:
(APIServer pid=3314) ERROR 05-21 09:41:30 [registry.py:912]
(APIServer pid=3314) ERROR 05-21 09:41:30 [registry.py:912] Traceback (most recent call last):
(APIServer pid=3314) ERROR 05-21 09:41:30 [registry.py:912] File "/opt/conda/lib/python3.10/site-packages/vllm/model_executor/models/registry.py", line 910, in _try_inspect_model_cls
(APIServer pid=3314) ERROR 05-21 09:41:30 [registry.py:912] return model.inspect_model_cls()
(APIServer pid=3314) ERROR 05-21 09:41:30 [registry.py:912] File "/opt/conda/lib/python3.10/site-packages/vllm/logging_utils/log_time.py", line 21, in _wrapper
(APIServer pid=3314) ERROR 05-21 09:41:30 [registry.py:912] result = func(args, kwargs)
(APIServer pid=3314) ERROR 05-21 09:41:30 [registry.py:912] File "/opt/conda/lib/python3.10/site-packages/vllm/model_executor/models/registry.py", line 871, in inspect_model_cls
(APIServer pid=3314) ERROR 05-21 09:41:30 [registry.py:912] mi = _run_in_subprocess(
(APIServer pid=3314) ERROR 05-21 09:41:30 [registry.py:912] File "/opt/conda/lib/python3.10/site-packages/vllm/model_executor/models/registry.py", line 1339, in _run_in_subprocess
(APIServer pid=3314) ERROR 05-21 09:41:30 [registry.py:912] raise RuntimeError(
(APIServer pid=3314) ERROR 05-21 09:41:30 [registry.py:912] RuntimeError: Error raised in subprocess:
(APIServer pid=3314) ERROR 05-21 09:41:30 [registry.py:912] /opt/conda/lib/python3.10/runpy.py:126: RuntimeWarning: 'vllm.model_executor.models.registry' found in sys.modules after import of package 'vllm.model_executor.models', but prior to execution of 'vllm.model_executor.models.registry'; this may result in unpredictable behaviour
(APIServer pid=3314) ERROR 05-21 09:41:30 [registry.py:912] warn(RuntimeWarning(msg))
(APIServer pid=3314) ERROR 05-21 09:41:30 [registry.py:912] /opt/conda/lib/python3.10/site-packages/tvm_ffi/_optional_torch_c_dlpack.py:181: UserWarning: Failed to JIT torch c dlpack extension, EnvTensorAllocator will not be enabled.
(APIServer pid=3314) ERROR 05-21 09:41:30 [registry.py:912] We recommend installing via pip install torch-c-dlpack-ext
(APIServer pid=3314) ERROR 05-21 09:41:30 [registry.py:912] warnings.warn(
(APIServer pid=3314) ERROR 05-21 09:41:30 [registry.py:912] Traceback (most recent call last):
(APIServer pid=3314) ERROR 05-21 09:41:30 [registry.py:912] File "/opt/conda/lib/python3.10/runpy.py", line 196, in _run_module_as_main
(APIServer pid=3314) ERROR 05-21 09:41:30 [registry.py:912] return _run_code(code, main_globals, None,
(APIServer pid=3314) ERROR 05-21 09:41:30 [registry.py:912] File "/opt/conda/lib/python3.10/runpy.py", line 86, in _run_code
(APIServer pid=3314) ERROR 05-21 09:41:30 [registry.py:912] exec(code, run_globals)
(APIServer pid=3314) ERROR 05-21 09:41:30 [registry.py:912] File "/opt/conda/lib/python3.10/site-packages/vllm/model_executor/models/registry.py", line 1362, in <module>
(APIServer pid=3314) ERROR 05-21 09:41:30 [registry.py:912] _run()
(APIServer pid=3314) ERROR 05-21 09:41:30 [registry.py:912] File "/opt/conda/lib/python3.10/site-packages/vllm/model_executor/models/registry.py", line 1355, in _run
(APIServer pid=3314) ERROR 05-21 09:41:30 [registry.py:912] result = fn()
(APIServer pid=3314) ERROR 05-21 09:41:30 [registry.py:912] File "/opt/conda/lib/python3.10/site-packages/vllm/model_executor/models/registry.py", line 872, in <lambda>
(APIServer pid=3314) ERROR 05-21 09:41:30 [registry.py:912] lambda: _ModelInfo.from_model_cls(self.load_model_cls())
(APIServer pid=3314) ERROR 05-21 09:41:30 [registry.py:912] File "/opt/conda/lib/python3.10/site-packages/vllm/model_executor/models/registry.py", line 885, in load_model_cls
(APIServer pid=3314) ERROR 05-21 09:41:30 [registry.py:912] mod = importlib.import_module(self.module_name)
(APIServer pid=3314) ERROR 05-21 09:41:30 [registry.py:912] File "/opt/conda/lib/python3.10/importlib/init.py", line 126, in import_module
(APIServer pid=3314) ERROR 05-21 09:41:30 [registry.py:912] return _bootstrap._gcd_import(name[level:], package, level)
(APIServer pid=3314) ERROR 05-21 09:41:30 [registry.py:912] File "<frozen importlib._bootstrap>", line 1050, in _gcd_import
(APIServer pid=3314) ERROR 05-21 09:41:30 [registry.py:912] File "<frozen importlib._bootstrap>", line 1027, in _find_and_load
(APIServer pid=3314) ERROR 05-21 09:41:30 [registry.py:912] File "<frozen importlib._bootstrap>", line 1006, in _find_and_load_unlocked
(APIServer pid=3314) ERROR 05-21 09:41:30 [registry.py:912] File "<frozen importlib._bootstrap>", line 688, in _load_unlocked
(APIServer pid=3314) ERROR 05-21 09:41:30 [registry.py:912] File "<frozen importlib._bootstrap_external>", line 883, in exec_module
(APIServer pid=3314) ERROR 05-21 09:41:30 [registry.py:912] File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
(APIServer pid=3314) ERROR 05-21 09:41:30 [registry.py:912] File "/opt/conda/lib/python3.10/site-packages/vllm_metax/models/deepseek_v4.py", line 23, in <module>
(APIServer pid=3314) ERROR 05-21 09:41:30 [registry.py:912] from vllm_metax.customized.layers.deepseek_v4_attention import (
(APIServer pid=3314) ERROR 05-21 09:41:30 [registry.py:912] File "/opt/conda/lib/python3.10/site-packages/vllm_metax/customized/layers/deepseek_v4_attention.py", line 486, in <module>
(APIServer pid=3314) ERROR 05-21 09:41:30 [registry.py:912] direct_register_custom_op(
(APIServer pid=3314) ERROR 05-21 09:41:30 [registry.py:912] File "/opt/conda/lib/python3.10/site-packages/vllm/utils/torch_utils.py", line 934, in direct_register_custom_op
(APIServer pid=3314) ERROR 05-21 09:41:30 [registry.py:912] my_lib.define(op_name + schema_str, tags=tags)
(APIServer pid=3314) ERROR 05-21 09:41:30 [registry.py:912] File "/opt/conda/lib/python3.10/site-packages/torch/library.py", line 177, in define
(APIServer pid=3314) ERROR 05-21 09:41:30 [registry.py:912] result = self.m.define(schema, alias_analysis, tuple(tags))
(APIServer pid=3314) ERROR 05-21 09:41:30 [registry.py:912] RuntimeError: Tried to register an operator (vllm::deepseek_v4_attention(Tensor hidden_states, Tensor qr, Tensor kv, Tensor positions, Tensor(a4!) out, str layer_name) -> ()) with the same name and overload name multiple times. Each overload's schema should only be registered with a single call to def(). Duplicate registration: registered at /dev/null:241. Original registration: registered at /dev/null:241
(APIServer pid=3314) ERROR 05-21 09:41:30 [registry.py:912]
(APIServer pid=3314) Traceback (most recent call last):
(APIServer pid=3314) File "/opt/conda/bin/vllm", line 8, in <module>
(APIServer pid=3314) sys.exit(main())
(APIServer pid=3314) File "/opt/conda/lib/python3.10/site-packages/vllm/entrypoints/cli/main.py", line 92, in main
(APIServer pid=3314) args.dispatch_function(args)
(APIServer pid=3314) File "/opt/conda/lib/python3.10/site-packages/vllm/entrypoints/cli/serve.py", line 122, in cmd
(APIServer pid=3314) uvloop.run(run_server(args))
(APIServer pid=3314) File "/opt/conda/lib/python3.10/site-packages/uvloop/init.py", line 69, in run
(APIServer pid=3314) return loop.run_until_complete(wrapper())
(APIServer pid=3314) File "uvloop/loop.pyx", line 1518, in uvloop.loop.Loop.run_until_complete
(APIServer pid=3314) File "/opt/conda/lib/python3.10/site-packages/uvloop/init.py", line 48, in wrapper
(APIServer pid=3314) return await main
(APIServer pid=3314) File "/opt/conda/lib/python3.10/site-packages/vllm/entrypoints/openai/api_server.py", line 678, in run_server
(APIServer pid=3314) await run_server_worker(listen_address, sock, args, *uvicorn_kwargs)
(APIServer pid=3314) File "/opt/conda/lib/python3.10/site-packages/vllm/entrypoints/openai/api_server.py", line 692, in run_server_worker
(APIServer pid=3314) async with build_async_engine_client(
(APIServer pid=3314) File "/opt/conda/lib/python3.10/contextlib.py", line 199, in aenter
(APIServer pid=3314) return await anext(self.gen)
(APIServer pid=3314) File "/opt/conda/lib/python3.10/site-packages/vllm/entrypoints/openai/api_server.py", line 100, in build_async_engine_client
(APIServer pid=3314) async with build_async_engine_client_from_engine_args(
(APIServer pid=3314) File "/opt/conda/lib/python3.10/contextlib.py", line 199, in aenter
(APIServer pid=3314) return await anext(self.gen)
(APIServer pid=3314) File "/opt/conda/lib/python3.10/site-packages/vllm/entrypoints/openai/api_server.py", line 124, in build_async_engine_client_from_engine_args
(APIServer pid=3314) vllm_config = engine_args.create_engine_config(usage_context=usage_context)
(APIServer pid=3314) File "/opt/conda/lib/python3.10/site-packages/vllm/engine/arg_utils.py", line 1627, in create_engine_config
(APIServer pid=3314) model_config = self.create_model_config()
(APIServer pid=3314) File "/opt/conda/lib/python3.10/site-packages/vllm/engine/arg_utils.py", line 1475, in create_model_config
(APIServer pid=3314) return ModelConfig(
(APIServer pid=3314) File "/opt/conda/lib/python3.10/site-packages/pydantic/_internal/_dataclasses.py", line 121, in init
(APIServer pid=3314) s.pydantic_validator.validate_python(ArgsKwargs(args, kwargs), self_instance=s)
(APIServer pid=3314) pydantic_core._pydantic_core.ValidationError: 1 validation error for ModelConfig
(APIServer pid=3314) Value error, Model architectures ['DeepseekV4ForCausalLM'] failed to be inspected. Please check the logs for more details. [type=value_error, input_value=ArgsKwargs((), {'model': ...nderer_num_workers': 1}), input_type=ArgsKwargs]
(APIServer pid=3314) For further information visit errors.pydantic.dev/2.13/v/value_error

这个该怎么解决？