• Members 5 posts
    2026年1月12日 14:28

    使用vllm部署deepseek v3.2量化版本,无论是0.11.0或者0.11.2都没办法部署成功。是目前vllm不支持最新版本deepseek么?
    还有个问题是在容器中添加export MACA_SMALL_PAGESIZE_ENABLE=1、export MACA_GRAPH_LAUNCH_MODE=1、export VLLM_USE_V1=0这三个环境变量后,出现如下错误
    Failed to check the "should dump" flag on TCPStore, (maybe TCPStore server has shut down too early), with error: Failed to recv, got 0 bytes. Connection was likely closed. Did the remote server shutdown or crash?
    (EngineCore_DP0 pid=3532) ERROR 01-12 14:21:44 [multiproc_executor.py:230] Worker proc VllmWorker-4 died unexpectedly, shutting down executor.
    (Worker_TP5 pid=3675) INFO 01-12 14:21:44 [multiproc_executor.py:702] Parent process exited, terminating worker
    (Worker_TP5 pid=3675) INFO 01-12 14:21:44 [multiproc_executor.py:745] WorkerProc shutting down.
    (EngineCore_DP0 pid=3532) Process EngineCore_DP0:
    (EngineCore_DP0 pid=3532) Traceback (most recent call last):
    (EngineCore_DP0 pid=3532) File "/opt/conda/lib/python3.12/multiprocessing/process.py", line 314, in _bootstrap
    (EngineCore_DP0 pid=3532) self.run()
    (EngineCore_DP0 pid=3532) File "/opt/conda/lib/python3.12/multiprocessing/process.py", line 108, in run
    (EngineCore_DP0 pid=3532) self._target(self._args, self._kwargs)
    (EngineCore_DP0 pid=3532) File "/opt/conda/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 846, in run_engine_core
    (EngineCore_DP0 pid=3532) raise e
    (EngineCore_DP0 pid=3532) File "/opt/conda/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 833, in run_engine_core
    (EngineCore_DP0 pid=3532) engine_core = EngineCoreProc(
    args, kwargs)
    (EngineCore_DP0 pid=3532) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    (EngineCore_DP0 pid=3532) File "/opt/conda/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 606, in init
    (EngineCore_DP0 pid=3532) super().init(
    (EngineCore_DP0 pid=3532) File "/opt/conda/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 109, in init
    (EngineCore_DP0 pid=3532) num_gpu_blocks, num_cpu_blocks, kv_cache_config = self._initialize_kv_caches(
    (EngineCore_DP0 pid=3532) ^^^^^^^^^^^^^^^^^^^^^^^^^^^
    (EngineCore_DP0 pid=3532) File "/opt/conda/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 231, in _initialize_kv_caches
    (EngineCore_DP0 pid=3532) available_gpu_memory = self.model_executor.determine_available_memory()
    (EngineCore_DP0 pid=3532) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    (EngineCore_DP0 pid=3532) File "/opt/conda/lib/python3.12/site-packages/vllm/v1/executor/abstract.py", line 126, in determine_available_memory
    (EngineCore_DP0 pid=3532) return self.collective_rpc("determine_available_memory")
    (EngineCore_DP0 pid=3532) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    (EngineCore_DP0 pid=3532) File "/opt/conda/lib/python3.12/site-packages/vllm/v1/executor/multiproc_executor.py", line 358, in collective_rpc
    (EngineCore_DP0 pid=3532) return aggregate(get_response())
    (EngineCore_DP0 pid=3532) ^^^^^^^^^^^^^^
    (EngineCore_DP0 pid=3532) File "/opt/conda/lib/python3.12/site-packages/vllm/v1/executor/multiproc_executor.py", line 341, in get_response
    (EngineCore_DP0 pid=3532) raise RuntimeError(
    (EngineCore_DP0 pid=3532) RuntimeError: Worker failed with error 'name 'q_scale' is not defined
    (EngineCore_DP0 pid=3532)
    (EngineCore_DP0 pid=3532) from user code:
    (EngineCore_DP0 pid=3532) File "/opt/conda/lib/python3.12/site-packages/vllm_metax/models/deepseek_v2.py", line 1258, in forward
    (EngineCore_DP0 pid=3532) hidden_states, residual = layer(positions, hidden_states, residual)
    (EngineCore_DP0 pid=3532) File "/opt/conda/lib/python3.12/site-packages/vllm_metax/models/deepseek_v2.py", line 1153, in forward
    (EngineCore_DP0 pid=3532) hidden_states = self.self_attn(
    (EngineCore_DP0 pid=3532) File "/opt/conda/lib/python3.12/site-packages/vllm_metax/models/deepseek_v2.py", line 1052, in forward
    (EngineCore_DP0 pid=3532) return self.mla_attn(positions, hidden_states)
    (EngineCore_DP0 pid=3532) File "/opt/conda/lib/python3.12/site-packages/vllm/model_executor/custom_op.py", line 46, in forward
    (EngineCore_DP0 pid=3532) return self._forward_method(*args,
    kwargs)
    (EngineCore_DP0 pid=3532) File "/opt/conda/lib/python3.12/site-packages/vllm/model_executor/layers/mla.py", line 156, in forward_native
    (EngineCore_DP0 pid=3532) _topk_indices = self.indexer(hidden_states, q_c, positions, self.rotary_emb)
    (EngineCore_DP0 pid=3532) File "/opt/conda/lib/python3.12/site-packages/vllm_metax/models/deepseek_v2.py", line 860, in forward
    (EngineCore_DP0 pid=3532) weights.unsqueeze(-1) * q_scale * self.softmax_scale * self.n_head**-0.5
    (EngineCore_DP0 pid=3532)
    (EngineCore_DP0 pid=3532) Set TORCHDYNAMO_VERBOSE=1 for the internal stack trace (please do this especially if you're reporting a bug to PyTorch). For even more developer context, set TORCH_LOGS="+dynamo"

  • Members 221 posts
    2026年1月12日 14:31

    尊敬的开发者您好,请通过商务渠道获取deepseek v3.2 AWQ量化部署指南。

  • arrow_forward

    Thread has been moved from 产品&运维.

  • Members 5 posts
    2026年1月12日 14:41

    是通过魔搭社区获取沐曦官方提供的量化版本么?

  • Members 221 posts
    2026年1月12日 14:43

    尊敬的开发者您好,请联系相关接口人获取。

  • arrow_forward

    Thread has been moved from 解决中.