MetaX-Tech Developer Forum 论坛首页
  • 沐曦开发者
search
Sign in

pengfeng

  • Members
  • Joined 2025年11月13日
  • message 帖子
  • forum 主题
  • favorite 关注者
  • favorite_border Follows
  • person_outline 详细信息

pengfeng has posted 1 message.

  • See post chevron_right
    pengfeng
    Members
    Qwen3-Next-80B-A3B-Thinking-AWQ-8bit模型启动失败 解决中 2025年11月13日 20:07

    1.模型:cpatonn-mirror/Qwen3-Next-80B-A3B-Thinking-AWQ-8bit
    2.镜像:cr.metax-tech.com/public-ai-release/maca/modelzoo.llm.vllm: 1.0.0-maca.ai3.2.1.8-torch2.6-py310-ubuntu22.04-amd64
    3: GPU: C500 *8
    4:操作系统:ubuntu 22.04

    执行命令:
    docker run --device=/dev/dri --device=/dev/mxcd --name Qwen3-Next-80B-A3B-Thinking-AWQ-8bit -v /8T/xxxx/model/cpatonn-mirror/Qwen3-Next-80B-A3B-Thinking-AWQ-8bit:/data/Qwen3-Next-80B-A3B-Thinking-AWQ-8bit -e CUDA_VISIBLE_DEVICES=4,5,6,7, -e TRITON_ENABLE_MACA_OPT_MOVE_DOT_OPERANDS_OUT_LOOP=1 -e TRITON_DISABLE_MACA_OPT_MMA_PREFETCH=1 -e TRITON_ENABLE_MACA_CHAIN_DOT_OPT=1 -e TRITON_ENABLE_MACA_COMPILER_INT8_OPT=True -e MACA_SMALL_PAGESIZE_ENABLE=1 -p 2031:30889 --security-opt seccomp=unconfined --security-opt apparmor=unconfined --shm-size 100gb --ulimit memlock=-1 --group-add video af4bbc08aa93 /opt/conda/bin/python -m vllm.entrypoints.openai.api_server --model /data/Qwen3-Next-80B-A3B-Thinking-AWQ-8bit --api-key c01b24fc-4bf1-4871-a1c3-8663e151555b --served-model-name Qwen3-Next-80B-A3B-Thinking-AWQ-8bit --max-model-len 8192 --gpu-memory-utilization 0.95 --port 30889 --tensor-parallel-size 4 --disable-log-stats --disable-log-requests --max-num-seqs 50

    报错如下:
    INFO 11-13 20:06:15 [shm_broadcast.py:289] vLLM message queue communication handle: Handle(local_reader_ranks=[1, 2, 3], buffer_handle=(3, 4194304, 6, 'psm_482a0da8'), local_subscribe_addr='ipc:///tmp/fafe91bf-0c90-4f70-81ca-b406e9c4f98c', remote_subscribe_addr=None, remote_addr_ipv6=False)
    INFO 11-13 20:06:15 [parallel_state.py:1165] rank 0 in world size 4 is assigned as DP rank 0, PP rank 0, TP rank 0, EP rank 0
    INFO 11-13 20:06:15 [parallel_state.py:1165] rank 2 in world size 4 is assigned as DP rank 0, PP rank 0, TP rank 2, EP rank 2
    INFO 11-13 20:06:15 [parallel_state.py:1165] rank 3 in world size 4 is assigned as DP rank 0, PP rank 0, TP rank 3, EP rank 3
    INFO 11-13 20:06:15 [parallel_state.py:1165] rank 1 in world size 4 is assigned as DP rank 0, PP rank 0, TP rank 1, EP rank 1
    (Worker_TP2 pid=435) INFO 11-13 20:06:15 [gpu_model_runner.py:2338] Starting to load model /data/Qwen3-Next-80B-A3B-Thinking-AWQ-8bit...
    (Worker_TP3 pid=449) INFO 11-13 20:06:15 [gpu_model_runner.py:2338] Starting to load model /data/Qwen3-Next-80B-A3B-Thinking-AWQ-8bit...
    (Worker_TP1 pid=427) INFO 11-13 20:06:15 [gpu_model_runner.py:2338] Starting to load model /data/Qwen3-Next-80B-A3B-Thinking-AWQ-8bit...
    (Worker_TP0 pid=424) INFO 11-13 20:06:15 [gpu_model_runner.py:2338] Starting to load model /data/Qwen3-Next-80B-A3B-Thinking-AWQ-8bit...
    (Worker_TP2 pid=435) INFO 11-13 20:06:15 [gpu_model_runner.py:2370] Loading model from scratch...
    (Worker_TP1 pid=427) INFO 11-13 20:06:15 [gpu_model_runner.py:2370] Loading model from scratch...
    (Worker_TP3 pid=449) INFO 11-13 20:06:15 [gpu_model_runner.py:2370] Loading model from scratch...
    (Worker_TP2 pid=435) torch_dtype is deprecated! Use dtype instead!
    (Worker_TP0 pid=424) INFO 11-13 20:06:15 [gpu_model_runner.py:2370] Loading model from scratch...
    (Worker_TP2 pid=435) INFO 11-13 20:06:15 [compressed_tensors.py:122] Using CompressedTensorsWNA16MoEMethod
    (Worker_TP0 pid=424) torch_dtype is deprecated! Use dtype instead!
    (Worker_TP0 pid=424) INFO 11-13 20:06:15 [compressed_tensors.py:122] Using CompressedTensorsWNA16MoEMethod
    (Worker_TP3 pid=449) torch_dtype is deprecated! Use dtype instead!
    (Worker_TP1 pid=427) torch_dtype is deprecated! Use dtype instead!
    (Worker_TP3 pid=449) INFO 11-13 20:06:15 [compressed_tensors.py:122] Using CompressedTensorsWNA16MoEMethod
    (Worker_TP1 pid=427) INFO 11-13 20:06:15 [compressed_tensors.py:122] Using CompressedTensorsWNA16MoEMethod
    (Worker_TP2 pid=435) ERROR 11-13 20:06:15 [multiproc_executor.py:585] WorkerProc failed to start.
    (Worker_TP2 pid=435) ERROR 11-13 20:06:15 [multiproc_executor.py:585] Traceback (most recent call last):
    (Worker_TP2 pid=435) ERROR 11-13 20:06:15 [multiproc_executor.py:585] File "/opt/conda/lib/python3.10/site-packages/vllm/v1/executor/multiproc_executor.py", line 559, in worker_main
    (Worker_TP2 pid=435) ERROR 11-13 20:06:15 [multiproc_executor.py:585] worker = WorkerProc(args, kwargs)
    (Worker_TP2 pid=435) ERROR 11-13 20:06:15 [multiproc_executor.py:585] File "/opt/conda/lib/python3.10/site-packages/vllm/v1/executor/multiproc_executor.py", line 427, in init
    (Worker_TP2 pid=435) ERROR 11-13 20:06:15 [multiproc_executor.py:585] self.worker.load_model()
    (Worker_TP2 pid=435) ERROR 11-13 20:06:15 [multiproc_executor.py:585] File "/opt/conda/lib/python3.10/site-packages/vllm/v1/worker/gpu_worker.py", line 213, in load_model
    (Worker_TP2 pid=435) ERROR 11-13 20:06:15 [multiproc_executor.py:585] self.model_runner.load_model(eep_scale_up=eep_scale_up)
    (Worker_TP2 pid=435) ERROR 11-13 20:06:15 [multiproc_executor.py:585] File "/opt/conda/lib/python3.10/site-packages/vllm/v1/worker/gpu_model_runner.py", line 2371, in load_model
    (Worker_TP2 pid=435) ERROR 11-13 20:06:15 [multiproc_executor.py:585] self.model = model_loader.load_model(
    (Worker_TP2 pid=435) ERROR 11-13 20:06:15 [multiproc_executor.py:585] File "/opt/conda/lib/python3.10/site-packages/vllm/model_executor/model_loader/base_loader.py", line 45, in load_model
    (Worker_TP2 pid=435) ERROR 11-13 20:06:15 [multiproc_executor.py:585] model = initialize_model(vllm_config=vllm_config,
    (Worker_TP2 pid=435) ERROR 11-13 20:06:15 [multiproc_executor.py:585] File "/opt/conda/lib/python3.10/site-packages/vllm/model_executor/model_loader/utils.py", line 64, in initialize_model
    (Worker_TP2 pid=435) ERROR 11-13 20:06:15 [multiproc_executor.py:585] return model_class(vllm_config=vllm_config, prefix=prefix)
    (Worker_TP2 pid=435) ERROR 11-13 20:06:15 [multiproc_executor.py:585] File "/opt/conda/lib/python3.10/site-packages/vllm/model_executor/models/qwen3_next.py", line 1079, in init
    (Worker_TP2 pid=435) ERROR 11-13 20:06:15 [multiproc_executor.py:585] self.model = Qwen3NextModel(vllm_config=vllm_config,
    (Worker_TP2 pid=435) ERROR 11-13 20:06:15 [multiproc_executor.py:585] File "/opt/conda/lib/python3.10/site-packages/vllm/compilation/decorators.py", line 199, in init
    (Worker_TP2 pid=435) ERROR 11-13 20:06:15 [multiproc_executor.py:585] old_init(self, vllm_config=vllm_config, prefix=prefix,
    kwargs)
    (Worker_TP2 pid=435) ERROR 11-13 20:06:15 [multiproc_executor.py:585] File "/opt/conda/lib/python3.10/site-packages/vllm/model_executor/models/qwen3_next.py", line 915, in init
    (Worker_TP2 pid=435) ERROR 11-13 20:06:15 [multiproc_executor.py:585] self.start_layer, self.end_layer, self.layers = make_layers(
    (Worker_TP2 pid=435) ERROR 11-13 20:06:15 [multiproc_executor.py:585] File "/opt/conda/lib/python3.10/site-packages/vllm/model_executor/models/utils.py", line 642, in make_layers
    (Worker_TP2 pid=435) ERROR 11-13 20:06:15 [multiproc_executor.py:585] [PPMissingLayer() for _ in range(start_layer)] + [
    (Worker_TP2 pid=435) ERROR 11-13 20:06:15 [multiproc_executor.py:585] File "/opt/conda/lib/python3.10/site-packages/vllm/model_executor/models/utils.py", line 643, in <listcomp>
    (Worker_TP2 pid=435) ERROR 11-13 20:06:15 [multiproc_executor.py:585] maybe_offload_to_cpu(layer_fn(prefix=f"{prefix}.{idx}"))
    (Worker_TP2 pid=435) ERROR 11-13 20:06:15 [multiproc_executor.py:585] File "/opt/conda/lib/python3.10/site-packages/vllm/model_executor/models/qwen3_next.py", line 904, in get_layer
    (Worker_TP2 pid=435) ERROR 11-13 20:06:15 [multiproc_executor.py:585] return Qwen3NextDecoderLayer(
    (Worker_TP2 pid=435) ERROR 11-13 20:06:15 [multiproc_executor.py:585] File "/opt/conda/lib/python3.10/site-packages/vllm/model_executor/models/qwen3_next.py", line 782, in init
    (Worker_TP2 pid=435) ERROR 11-13 20:06:15 [multiproc_executor.py:585] self.mlp = Qwen3NextSparseMoeBlock(
    (Worker_TP2 pid=435) ERROR 11-13 20:06:15 [multiproc_executor.py:585] File "/opt/conda/lib/python3.10/site-packages/vllm/model_executor/models/qwen3_next.py", line 134, in init
    (Worker_TP2 pid=435) ERROR 11-13 20:06:15 [multiproc_executor.py:585] self.shared_expert = Qwen3NextMLP(
    (Worker_TP2 pid=435) ERROR 11-13 20:06:15 [multiproc_executor.py:585] File "/opt/conda/lib/python3.10/site-packages/vllm/model_executor/models/qwen2_moe.py", line 77, in init
    (Worker_TP2 pid=435) ERROR 11-13 20:06:15 [multiproc_executor.py:585] self.gate_up_proj = MergedColumnParallelLinear(
    (Worker_TP2 pid=435) ERROR 11-13 20:06:15 [multiproc_executor.py:585] File "/opt/conda/lib/python3.10/site-packages/vllm/model_executor/layers/linear.py", line 588, in init
    (Worker_TP2 pid=435) ERROR 11-13 20:06:15 [multiproc_executor.py:585] super().init(input_size=input_size,
    (Worker_TP2 pid=435) ERROR 11-13 20:06:15 [multiproc_executor.py:585] File "/opt/conda/lib/python3.10/site-packages/vllm/model_executor/layers/linear.py", line 442, in init
    (Worker_TP2 pid=435) ERROR 11-13 20:06:15 [multiproc_executor.py:585] self.quant_method.create_weights(
    (Worker_TP2 pid=435) ERROR 11-13 20:06:15 [multiproc_executor.py:585] File "/opt/conda/lib/python3.10/site-packages/vllm/model_executor/layers/quantization/compressed_tensors/compressed_tensors.py", line 729, in create_weights
    (Worker_TP2 pid=435) ERROR 11-13 20:06:15 [multiproc_executor.py:585] layer.scheme.create_weights(
    (Worker_TP2 pid=435) ERROR 11-13 20:06:15 [multiproc_executor.py:585] File "/opt/conda/lib/python3.10/site-packages/vllm/model_executor/layers/quantization/compressed_tensors/schemes/compressed_tensors_wNa16.py", line 92, in create_weights
    (Worker_TP2 pid=435) ERROR 11-13 20:06:15 [multiproc_executor.py:585] kernel_type = choose_mp_linear_kernel(mp_linear_kernel_config)
    (Worker_TP2 pid=435) ERROR 11-13 20:06:15 [multiproc_executor.py:585] File "/opt/conda/lib/python3.10/site-packages/vllm/model_executor/layers/quantization/kernels/mixed_precision/init.py", line 90, in choose_mp_linear_kernel
    (Worker_TP2 pid=435) ERROR 11-13 20:06:15 [multiproc_executor.py:585] raise ValueError(
    (Worker_TP2 pid=435) ERROR 11-13 20:06:15 [multiproc_executor.py:585] ValueError: Failed to find a kernel that can implement the WNA16 linear layer. Reasons:
    (Worker_TP2 pid=435) ERROR 11-13 20:06:15 [multiproc_executor.py:585] CutlassW4A8LinearKernel cannot implement due to: CUTLASS only supported on CUDA
    (Worker_TP2 pid=435) ERROR 11-13 20:06:15 [multiproc_executor.py:585] MacheteLinearKernel cannot implement due to: Machete only supported on CUDA
    (Worker_TP2 pid=435) ERROR 11-13 20:06:15 [multiproc_executor.py:585] AllSparkLinearKernel cannot implement due to: AllSpark currently does not support device_capability = 90.
    (Worker_TP2 pid=435) ERROR 11-13 20:06:15 [multiproc_executor.py:585] MarlinLinearKernel cannot implement due to: Marlin only supported on CUDA
    (Worker_TP2 pid=435) ERROR 11-13 20:06:15 [multiproc_executor.py:585] Dynamic4bitLinearKernel cannot implement due to: Only CPU is supported
    (Worker_TP2 pid=435) ERROR 11-13 20:06:15 [multiproc_executor.py:585] BitBLASLinearKernel cannot implement due to: bitblas is not installed. Please install bitblas by running pip install bitblas>=0.1.0
    (Worker_TP2 pid=435) ERROR 11-13 20:06:15 [multiproc_executor.py:585] ConchLinearKernel cannot implement due to: Group size (32) not supported by ConchLinearKernel, supported group sizes are: [-1, 128]
    (Worker_TP2 pid=435) ERROR 11-13 20:06:15 [multiproc_executor.py:585] ExllamaLinearKernel cannot implement due to: Exllama only supports float16 activations
    (Worker_TP2 pid=435) INFO 11-13 20:06:15 [multiproc_executor.py:546] Parent process exited, terminating worker
    (Worker_TP0 pid=424) INFO 11-13 20:06:15 [multiproc_executor.py:546] Parent process exited, terminating worker
    (Worker_TP1 pid=427) INFO 11-13 20:06:15 [multiproc_executor.py:546] Parent process exited, terminating worker
    (Worker_TP3 pid=449) INFO 11-13 20:06:16 [multiproc_executor.py:546] Parent process exited, terminating worker
    [rank0]:[W1113 20:06:16.555908797 ProcessGroupNCCL.cpp:1502] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see pytorch.org/docs/stable/distributed.html#shutdown (function operator())
    (EngineCore_DP0 pid=286) ERROR 11-13 20:06:19 [core.py:718] EngineCore failed to start.
    (EngineCore_DP0 pid=286) ERROR 11-13 20:06:19 [core.py:718] Traceback (most recent call last):
    (EngineCore_DP0 pid=286) ERROR 11-13 20:06:19 [core.py:718] File "/opt/conda/lib/python3.10/site-packages/vllm/v1/engine/core.py", line 709, in run_engine_core
    (EngineCore_DP0 pid=286) ERROR 11-13 20:06:19 [core.py:718] engine_core = EngineCoreProc(
    args, kwargs)
    (EngineCore_DP0 pid=286) ERROR 11-13 20:06:19 [core.py:718] File "/opt/conda/lib/python3.10/site-packages/vllm/v1/engine/core.py", line 505, in init
    (EngineCore_DP0 pid=286) ERROR 11-13 20:06:19 [core.py:718] super().init(vllm_config, executor_class, log_stats,
    (EngineCore_DP0 pid=286) ERROR 11-13 20:06:19 [core.py:718] File "/opt/conda/lib/python3.10/site-packages/vllm/v1/engine/core.py", line 82, in init
    (EngineCore_DP0 pid=286) ERROR 11-13 20:06:19 [core.py:718] self.model_executor = executor_class(vllm_config)
    (EngineCore_DP0 pid=286) ERROR 11-13 20:06:19 [core.py:718] File "/opt/conda/lib/python3.10/site-packages/vllm/executor/executor_base.py", line 54, in init
    (EngineCore_DP0 pid=286) ERROR 11-13 20:06:19 [core.py:718] self._init_executor()
    (EngineCore_DP0 pid=286) ERROR 11-13 20:06:19 [core.py:718] File "/opt/conda/lib/python3.10/site-packages/vllm/v1/executor/multiproc_executor.py", line 99, in _init_executor
    (EngineCore_DP0 pid=286) ERROR 11-13 20:06:19 [core.py:718] self.workers = WorkerProc.wait_for_ready(unready_workers)
    (EngineCore_DP0 pid=286) ERROR 11-13 20:06:19 [core.py:718] File "/opt/conda/lib/python3.10/site-packages/vllm/v1/executor/multiproc_executor.py", line 497, in wait_for_ready
    (EngineCore_DP0 pid=286) ERROR 11-13 20:06:19 [core.py:718] raise e from None
    (EngineCore_DP0 pid=286) ERROR 11-13 20:06:19 [core.py:718] Exception: WorkerProc initialization failed due to an exception in a background process. See stack trace for root cause.
    (EngineCore_DP0 pid=286) Process EngineCore_DP0:
    (EngineCore_DP0 pid=286) Traceback (most recent call last):
    (EngineCore_DP0 pid=286) File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap
    (EngineCore_DP0 pid=286) self.run()
    (EngineCore_DP0 pid=286) File "/opt/conda/lib/python3.10/multiprocessing/process.py", line 108, in run
    (EngineCore_DP0 pid=286) self._target(*self._args,
    self._kwargs)
    (EngineCore_DP0 pid=286) File "/opt/conda/lib/python3.10/site-packages/vllm/v1/engine/core.py", line 722, in run_engine_core
    (EngineCore_DP0 pid=286) raise e
    (EngineCore_DP0 pid=286) File "/opt/conda/lib/python3.10/site-packages/vllm/v1/engine/core.py", line 709, in run_engine_core
    (EngineCore_DP0 pid=286) engine_core = EngineCoreProc(args, kwargs)
    (EngineCore_DP0 pid=286) File "/opt/conda/lib/python3.10/site-packages/vllm/v1/engine/core.py", line 505, in init
    (EngineCore_DP0 pid=286) super().init(vllm_config, executor_class, log_stats,
    (EngineCore_DP0 pid=286) File "/opt/conda/lib/python3.10/site-packages/vllm/v1/engine/core.py", line 82, in init
    (EngineCore_DP0 pid=286) self.model_executor = executor_class(vllm_config)
    (EngineCore_DP0 pid=286) File "/opt/conda/lib/python3.10/site-packages/vllm/executor/executor_base.py", line 54, in init
    (EngineCore_DP0 pid=286) self._init_executor()
    (EngineCore_DP0 pid=286) File "/opt/conda/lib/python3.10/site-packages/vllm/v1/executor/multiproc_executor.py", line 99, in _init_executor
    (EngineCore_DP0 pid=286) self.workers = WorkerProc.wait_for_ready(unready_workers)
    (EngineCore_DP0 pid=286) File "/opt/conda/lib/python3.10/site-packages/vllm/v1/executor/multiproc_executor.py", line 497, in wait_for_ready
    (EngineCore_DP0 pid=286) raise e from None
    (EngineCore_DP0 pid=286) Exception: WorkerProc initialization failed due to an exception in a background process. See stack trace for root cause.
    (APIServer pid=1) Traceback (most recent call last):
    (APIServer pid=1) File "/opt/conda/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    (APIServer pid=1) return _run_code(code, main_globals, None,
    (APIServer pid=1) File "/opt/conda/lib/python3.10/runpy.py", line 86, in _run_code
    (APIServer pid=1) exec(code, run_globals)
    (APIServer pid=1) File "/opt/conda/lib/python3.10/site-packages/vllm/entrypoints/openai/api_server.py", line 2011, in <module>
    (APIServer pid=1) uvloop.run(run_server(args))
    (APIServer pid=1) File "/opt/conda/lib/python3.10/site-packages/uvloop/init.py", line 69, in run
    (APIServer pid=1) return loop.run_until_complete(wrapper())
    (APIServer pid=1) File "uvloop/loop.pyx", line 1518, in uvloop.loop.Loop.run_until_complete
    (APIServer pid=1) File "/opt/conda/lib/python3.10/site-packages/uvloop/init.py", line 48, in wrapper
    (APIServer pid=1) return await main
    (APIServer pid=1) File "/opt/conda/lib/python3.10/site-packages/vllm/entrypoints/openai/api_server.py", line 1941, in run_server
    (APIServer pid=1) await run_server_worker(listen_address, sock, args,
    uvicorn_kwargs)
    (APIServer pid=1) File "/opt/conda/lib/python3.10/site-packages/vllm/entrypoints/openai/api_server.py", line 1961, in run_server_worker
    (APIServer pid=1) async with build_async_engine_client(
    (APIServer pid=1) File "/opt/conda/lib/python3.10/contextlib.py", line 199, in aenter
    (APIServer pid=1) return await anext(self.gen)
    (APIServer pid=1) File "/opt/conda/lib/python3.10/site-packages/vllm/entrypoints/openai/api_server.py", line 179, in build_async_engine_client
    (APIServer pid=1) async with build_async_engine_client_from_engine_args(
    (APIServer pid=1) File "/opt/conda/lib/python3.10/contextlib.py", line 199, in aenter
    (APIServer pid=1) return await anext(self.gen)
    (APIServer pid=1) File "/opt/conda/lib/python3.10/site-packages/vllm/entrypoints/openai/api_server.py", line 221, in build_async_engine_client_from_engine_args
    (APIServer pid=1) async_llm = AsyncLLM.from_vllm_config(
    (APIServer pid=1) File "/opt/conda/lib/python3.10/site-packages/vllm/utils/init.py", line 1589, in inner
    (APIServer pid=1) return fn(
    args, *kwargs)
    (APIServer pid=1) File "/opt/conda/lib/python3.10/site-packages/vllm/v1/engine/async_llm.py", line 212, in from_vllm_config
    (APIServer pid=1) return cls(
    (APIServer pid=1) File "/opt/conda/lib/python3.10/site-packages/vllm/v1/engine/async_llm.py", line 136, in init
    (APIServer pid=1) self.engine_core = EngineCoreClient.make_async_mp_client(
    (APIServer pid=1) File "/opt/conda/lib/python3.10/site-packages/vllm/v1/engine/core_client.py", line 102, in make_async_mp_client
    (APIServer pid=1) return AsyncMPClient(
    client_args)
    (APIServer pid=1) File "/opt/conda/lib/python3.10/site-packages/vllm/v1/engine/core_client.py", line 769, in init
    (APIServer pid=1) super().init(
    (APIServer pid=1) File "/opt/conda/lib/python3.10/site-packages/vllm/v1/engine/core_client.py", line 448, in init
    (APIServer pid=1) with launch_core_engines(vllm_config, executor_class,
    (APIServer pid=1) File "/opt/conda/lib/python3.10/contextlib.py", line 142, in exit
    (APIServer pid=1) next(self.gen)
    (APIServer pid=1) File "/opt/conda/lib/python3.10/site-packages/vllm/v1/engine/utils.py", line 729, in launch_core_engines
    (APIServer pid=1) wait_for_engine_startup(
    (APIServer pid=1) File "/opt/conda/lib/python3.10/site-packages/vllm/v1/engine/utils.py", line 782, in wait_for_engine_startup
    (APIServer pid=1) raise RuntimeError("Engine core initialization failed. "
    (APIServer pid=1) RuntimeError: Engine core initialization failed. See root cause above. Failed core proc(s): {}

  • 沐曦开发者论坛
powered by misago