不关闭的话,有报错,报错如下:
(EngineCore_DP0 pid=3266) ERROR 04-17 11:46:53 [core.py:1100] EngineCore failed to start.
(EngineCore_DP0 pid=3266) ERROR 04-17 11:46:53 [core.py:1100] Traceback (most recent call last):
(EngineCore_DP0 pid=3266) ERROR 04-17 11:46:53 [core.py:1100] File "/opt/conda/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 1090, in run_engine_core
(EngineCore_DP0 pid=3266) ERROR 04-17 11:46:53 [core.py:1100] engine_core = EngineCoreProc(*args, engine_index=dp_rank, **kwargs)
(EngineCore_DP0 pid=3266) ERROR 04-17 11:46:53 [core.py:1100] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=3266) ERROR 04-17 11:46:53 [core.py:1100] File "/opt/conda/lib/python3.12/site-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
(EngineCore_DP0 pid=3266) ERROR 04-17 11:46:53 [core.py:1100] return func(*args, **kwargs)
(EngineCore_DP0 pid=3266) ERROR 04-17 11:46:53 [core.py:1100] ^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=3266) ERROR 04-17 11:46:53 [core.py:1100] File "/opt/conda/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 834, in __init__
(EngineCore_DP0 pid=3266) ERROR 04-17 11:46:53 [core.py:1100] super().__init__(
(EngineCore_DP0 pid=3266) ERROR 04-17 11:46:53 [core.py:1100] File "/opt/conda/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 120, in __init__
(EngineCore_DP0 pid=3266) ERROR 04-17 11:46:53 [core.py:1100] num_gpu_blocks, num_cpu_blocks, kv_cache_config = self._initialize_kv_caches(
(EngineCore_DP0 pid=3266) ERROR 04-17 11:46:53 [core.py:1100] ^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=3266) ERROR 04-17 11:46:53 [core.py:1100] File "/opt/conda/lib/python3.12/site-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
(EngineCore_DP0 pid=3266) ERROR 04-17 11:46:53 [core.py:1100] return func(*args, **kwargs)
(EngineCore_DP0 pid=3266) ERROR 04-17 11:46:53 [core.py:1100] ^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=3266) ERROR 04-17 11:46:53 [core.py:1100] File "/opt/conda/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 279, in _initialize_kv_caches
(EngineCore_DP0 pid=3266) ERROR 04-17 11:46:53 [core.py:1100] self.model_executor.initialize_from_config(kv_cache_configs)
(EngineCore_DP0 pid=3266) ERROR 04-17 11:46:53 [core.py:1100] File "/opt/conda/lib/python3.12/site-packages/vllm/v1/executor/abstract.py", line 118, in initialize_from_config
(EngineCore_DP0 pid=3266) ERROR 04-17 11:46:53 [core.py:1100] compilation_times: list[float] = self.collective_rpc("compile_or_warm_up_model")
(EngineCore_DP0 pid=3266) ERROR 04-17 11:46:53 [core.py:1100] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=3266) ERROR 04-17 11:46:53 [core.py:1100] File "/opt/conda/lib/python3.12/site-packages/vllm_metax/v1/executor/multiproc_executor.py", line 389, in collective_rpc
(EngineCore_DP0 pid=3266) ERROR 04-17 11:46:53 [core.py:1100] return aggregate(get_response())
(EngineCore_DP0 pid=3266) ERROR 04-17 11:46:53 [core.py:1100] ^^^^^^^^^^^^^^
(EngineCore_DP0 pid=3266) ERROR 04-17 11:46:53 [core.py:1100] File "/opt/conda/lib/python3.12/site-packages/vllm_metax/v1/executor/multiproc_executor.py", line 372, in get_response
(EngineCore_DP0 pid=3266) ERROR 04-17 11:46:53 [core.py:1100] raise RuntimeError(
(EngineCore_DP0 pid=3266) ERROR 04-17 11:46:53 [core.py:1100] RuntimeError: Worker failed with error 'CUDA error: operation not permitted when stream is capturing
(EngineCore_DP0 pid=3266) ERROR 04-17 11:46:53 [core.py:1100] CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
(EngineCore_DP0 pid=3266) ERROR 04-17 11:46:53 [core.py:1100] For debugging consider passing CUDA_LAUNCH_BLOCKING=1
(EngineCore_DP0 pid=3266) ERROR 04-17 11:46:53 [core.py:1100] Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
(EngineCore_DP0 pid=3266) ERROR 04-17 11:46:53 [core.py:1100] ', please check the stack trace above for the root cause