问题现象:详情请见附件
(EngineCore_DP0 pid=296) (RayWorkerWrapper pid=791) ERROR 11-21 17:05:53 [fa_utils.py:57] Cannot use FA version 2 is not supported due to FA2 is unavaible due to: libcudart.so.12: cannot open shared object file: No such file or directory
(EngineCore_DP0 pid=296) (RayWorkerWrapper pid=791) INFO 11-21 17:05:55 [parallel_state.py:1165] rank 0 in world size 1 is assigned as DP rank 0, PP rank 0, TP rank 0, EP rank 0
(EngineCore_DP0 pid=296) (RayWorkerWrapper pid=791) WARNING 11-21 17:05:55 [utils.py:181] TransformersForMultimodalLM has no vLLM implementation, falling back to Transformers implementation. Some features may not be supported and performance may not be optimal.
(EngineCore_DP0 pid=296) (RayWorkerWrapper pid=791) INFO 11-21 17:05:57 [gpu_model_runner.py:2338] Starting to load model /data/Qwen3-VL-8B-Instruct...
(EngineCore_DP0 pid=296) (RayWorkerWrapper pid=791) `torch_dtype` is deprecated! Use `dtype` instead!
(EngineCore_DP0 pid=296) (RayWorkerWrapper pid=791) INFO 11-21 17:05:57 [gpu_model_runner.py:2370] Loading model from scratch...
(EngineCore_DP0 pid=296) (RayWorkerWrapper pid=791) INFO 11-21 17:05:57 [transformers.py:439] Using Transformers backend.
(EngineCore_DP0 pid=296) (RayWorkerWrapper pid=791) INFO 11-21 17:05:58 [platform.py:298] Using Flash Attention backend on V1 engine.
Loading safetensors checkpoint shards: 0% Completed | 0/4 [00:00<?, ?it/s]
Loading safetensors checkpoint shards: 25% Completed | 1/4 [00:05<00:16, 5.52s/it]
Loading safetensors checkpoint shards: 50% Completed | 2/4 [00:11<00:11, 5.79s/it]
Loading safetensors checkpoint shards: 75% Completed | 3/4 [00:13<00:03, 3.97s/it]
Loading safetensors checkpoint shards: 100% Completed | 4/4 [00:16<00:00, 3.57s/it]
Loading safetensors checkpoint shards: 100% Completed | 4/4 [00:16<00:00, 4.06s/it]
(EngineCore_DP0 pid=296) (RayWorkerWrapper pid=791)
(EngineCore_DP0 pid=296) (RayWorkerWrapper pid=791) INFO 11-21 17:06:14 [default_loader.py:268] Loading weights took 16.46 seconds
(EngineCore_DP0 pid=296) (RayWorkerWrapper pid=791) INFO 11-21 17:06:15 [gpu_model_runner.py:2392] Model loading took 16.3341 GiB and 16.796520 seconds
(EngineCore_DP0 pid=296) (RayWorkerWrapper pid=791) INFO 11-21 17:06:15 [gpu_model_runner.py:3000] Encoder cache will be initialized with a budget of 16384 tokens, and profiled with 1 image items of the maximum feature size.
(EngineCore_DP0 pid=296) ERROR 11-21 17:06:26 [core.py:718] EngineCore failed to start.
(EngineCore_DP0 pid=296) ERROR 11-21 17:06:26 [core.py:718] Traceback (most recent call last):
(EngineCore_DP0 pid=296) ERROR 11-21 17:06:26 [core.py:718] File "/opt/conda/lib/python3.10/site-packages/vllm/v1/engine/core.py", line 709, in run_engine_core
(EngineCore_DP0 pid=296) ERROR 11-21 17:06:26 [core.py:718] engine_core = EngineCoreProc(*args, **kwargs)
(EngineCore_DP0 pid=296) ERROR 11-21 17:06:26 [core.py:718] File "/opt/conda/lib/python3.10/site-packages/vllm/v1/engine/core.py", line 505, in __init__
(EngineCore_DP0 pid=296) ERROR 11-21 17:06:26 [core.py:718] super().__init__(vllm_config, executor_class, log_stats,
(EngineCore_DP0 pid=296) ERROR 11-21 17:06:26 [core.py:718] File "/opt/conda/lib/python3.10/site-packages/vllm/v1/engine/core.py", line 91, in __init__
(EngineCore_DP0 pid=296) ERROR 11-21 17:06:26 [core.py:718] self._initialize_kv_caches(vllm_config)
(EngineCore_DP0 pid=296) ERROR 11-21 17:06:26 [core.py:718] File "/opt/conda/lib/python3.10/site-packages/vllm/v1/engine/core.py", line 183, in _initialize_kv_caches
(EngineCore_DP0 pid=296) ERROR 11-21 17:06:26 [core.py:718] self.model_executor.determine_available_memory())
(EngineCore_DP0 pid=296) ERROR 11-21 17:06:26 [core.py:718] File "/opt/conda/lib/python3.10/site-packages/vllm/v1/executor/abstract.py", line 84, in determine_available_memory
(EngineCore_DP0 pid=296) ERROR 11-21 17:06:26 [core.py:718] return self.collective_rpc("determine_available_memory")
(EngineCore_DP0 pid=296) ERROR 11-21 17:06:26 [core.py:718] File "/opt/conda/lib/python3.10/site-packages/vllm/executor/executor_base.py", line 309, in collective_rpc
(EngineCore_DP0 pid=296) ERROR 11-21 17:06:26 [core.py:718] return self._run_workers(method, *args, **(kwargs or {}))
(EngineCore_DP0 pid=296) ERROR 11-21 17:06:26 [core.py:718] File "/opt/conda/lib/python3.10/site-packages/vllm/executor/ray_distributed_executor.py", line 505, in _run_workers
(EngineCore_DP0 pid=296) ERROR 11-21 17:06:26 [core.py:718] ray_worker_outputs = ray.get(ray_worker_outputs)
(EngineCore_DP0 pid=296) ERROR 11-21 17:06:26 [core.py:718] File "/opt/conda/lib/python3.10/site-packages/ray/_private/auto_init_hook.py", line 22, in auto_init_wrapper
(EngineCore_DP0 pid=296) ERROR 11-21 17:06:26 [core.py:718] return fn(*args, **kwargs)
(EngineCore_DP0 pid=296) ERROR 11-21 17:06:26 [core.py:718] File "/opt/conda/lib/python3.10/site-packages/ray/_private/client_mode_hook.py", line 104, in wrapper
(EngineCore_DP0 pid=296) ERROR 11-21 17:06:26 [core.py:718] return func(*args, **kwargs)
(EngineCore_DP0 pid=296) ERROR 11-21 17:06:26 [core.py:718] File "/opt/conda/lib/python3.10/site-packages/ray/_private/worker.py", line 2858, in get
(EngineCore_DP0 pid=296) ERROR 11-21 17:06:26 [core.py:718] values, debugger_breakpoint = worker.get_objects(object_refs, timeout=timeout)
(EngineCore_DP0 pid=296) ERROR 11-21 17:06:26 [core.py:718] File "/opt/conda/lib/python3.10/site-packages/ray/_private/worker.py", line 958, in get_objects
(EngineCore_DP0 pid=296) ERROR 11-21 17:06:26 [core.py:718] raise value.as_instanceof_cause()
(EngineCore_DP0 pid=296) ERROR 11-21 17:06:26 [core.py:718] ray.exceptions.RayTaskError(OutOfMemoryError): ray::RayWorkerWrapper.execute_method() (pid=791, ip=172.17.0.4, actor_id=2b9d7f7d597adf4159ecbb8101000000, repr=<vllm.executor.ray_utils.RayWorkerWrapper object at 0x7f5d455714e0>)
(EngineCore_DP0 pid=296) ERROR 11-21 17:06:26 [core.py:718] File "/opt/conda/lib/python3.10/site-packages/vllm/worker/worker_base.py", line 628, in execute_method
(EngineCore_DP0 pid=296) ERROR 11-21 17:06:26 [core.py:718] raise e
(EngineCore_DP0 pid=296) ERROR 11-21 17:06:26 [core.py:718] File "/opt/conda/lib/python3.10/site-packages/vllm/worker/worker_base.py", line 619, in execute_method
(EngineCore_DP0 pid=296) ERROR 11-21 17:06:26 [core.py:718] return run_method(self, method, args, kwargs)
(EngineCore_DP0 pid=296) ERROR 11-21 17:06:26 [core.py:718] File "/opt/conda/lib/python3.10/site-packages/vllm/utils/__init__.py", line 3060, in run_method
(EngineCore_DP0 pid=296) ERROR 11-21 17:06:26 [core.py:718] return func(*args, **kwargs)
(EngineCore_DP0 pid=296) ERROR 11-21 17:06:26 [core.py:718] File "/opt/conda/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
(EngineCore_DP0 pid=296) ERROR 11-21 17:06:26 [core.py:718] return func(*args, **kwargs)
(EngineCore_DP0 pid=296) ERROR 11-21 17:06:26 [core.py:718] File "/opt/conda/lib/python3.10/site-packages/vllm/v1/worker/gpu_worker.py", line 263, in determine_available_memory
(EngineCore_DP0 pid=296) ERROR 11-21 17:06:26 [core.py:718] self.model_runner.profile_run()
(EngineCore_DP0 pid=296) ERROR 11-21 17:06:26 [core.py:718] File "/opt/conda/lib/python3.10/site-packages/vllm/v1/worker/gpu_model_runner.py", line 3017, in profile_run
(EngineCore_DP0 pid=296) ERROR 11-21 17:06:26 [core.py:718] self.model.get_multimodal_embeddings(
(EngineCore_DP0 pid=296) ERROR 11-21 17:06:26 [core.py:718] File "/opt/conda/lib/python3.10/site-packages/vllm/model_executor/models/transformers.py", line 844, in get_multimodal_embeddings
(EngineCore_DP0 pid=296) ERROR 11-21 17:06:26 [core.py:718] vision_embeddings = self.model.get_image_features(
(EngineCore_DP0 pid=296) ERROR 11-21 17:06:26 [core.py:718] File "/opt/conda/lib/python3.10/site-packages/transformers/models/qwen3_vl/modeling_qwen3_vl.py", line 1061, in get_image_features
(EngineCore_DP0 pid=296) ERROR 11-21 17:06:26 [core.py:718] image_embeds, deepstack_image_embeds = self.visual(pixel_values, grid_thw=image_grid_thw)
(EngineCore_DP0 pid=296) ERROR 11-21 17:06:26 [core.py:718] File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
(EngineCore_DP0 pid=296) ERROR 11-21 17:06:26 [core.py:718] return self._call_impl(*args, **kwargs)
(EngineCore_DP0 pid=296) ERROR 11-21 17:06:26 [core.py:718] File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1750, in _call_impl
(EngineCore_DP0 pid=296) ERROR 11-21 17:06:26 [core.py:718] return forward_call(*args, **kwargs)
(EngineCore_DP0 pid=296) ERROR 11-21 17:06:26 [core.py:718] File "/opt/conda/lib/python3.10/site-packages/transformers/models/qwen3_vl/modeling_qwen3_vl.py", line 739, in forward
(EngineCore_DP0 pid=296) ERROR 11-21 17:06:26 [core.py:718] hidden_states = blk(
(EngineCore_DP0 pid=296) ERROR 11-21 17:06:26 [core.py:718] File "/opt/conda/lib/python3.10/site-packages/transformers/modeling_layers.py", line 94, in __call__
(EngineCore_DP0 pid=296) ERROR 11-21 17:06:26 [core.py:718] return super().__call__(*args, **kwargs)
(EngineCore_DP0 pid=296) ERROR 11-21 17:06:26 [core.py:718] File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
(EngineCore_DP0 pid=296) ERROR 11-21 17:06:26 [core.py:718] return self._call_impl(*args, **kwargs)
(EngineCore_DP0 pid=296) ERROR 11-21 17:06:26 [core.py:718] File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1750, in _call_impl
(EngineCore_DP0 pid=296) ERROR 11-21 17:06:26 [core.py:718] return forward_call(*args, **kwargs)
(EngineCore_DP0 pid=296) ERROR 11-21 17:06:26 [core.py:718] File "/opt/conda/lib/python3.10/site-packages/transformers/models/qwen3_vl/modeling_qwen3_vl.py", line 267, in forward
(EngineCore_DP0 pid=296) ERROR 11-21 17:06:26 [core.py:718] hidden_states = hidden_states + self.attn(
(EngineCore_DP0 pid=296) ERROR 11-21 17:06:26 [core.py:718] File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1739, in _wrapped_call_impl
(EngineCore_DP0 pid=296) ERROR 11-21 17:06:26 [core.py:718] return self._call_impl(*args, **kwargs)
(EngineCore_DP0 pid=296) ERROR 11-21 17:06:26 [core.py:718] File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1750, in _call_impl
(EngineCore_DP0 pid=296) ERROR 11-21 17:06:26 [core.py:718] return forward_call(*args, **kwargs)
(EngineCore_DP0 pid=296) ERROR 11-21 17:06:26 [core.py:718] File "/opt/conda/lib/python3.10/site-packages/transformers/models/qwen3_vl/modeling_qwen3_vl.py", line 230, in forward
(EngineCore_DP0 pid=296) ERROR 11-21 17:06:26 [core.py:718] attn_outputs = [
(EngineCore_DP0 pid=296) ERROR 11-21 17:06:26 [core.py:718] File "/opt/conda/lib/python3.10/site-packages/transformers/models/qwen3_vl/modeling_qwen3_vl.py", line 231, in <listcomp>
(EngineCore_DP0 pid=296) ERROR 11-21 17:06:26 [core.py:718] attention_interface(
(EngineCore_DP0 pid=296) ERROR 11-21 17:06:26 [core.py:718] File "/opt/conda/lib/python3.10/site-packages/transformers/integrations/sdpa_attention.py", line 96, in sdpa_attention_forward
(EngineCore_DP0 pid=296) ERROR 11-21 17:06:26 [core.py:718] attn_output = torch.nn.functional.scaled_dot_product_attention(
(EngineCore_DP0 pid=296) ERROR 11-21 17:06:26 [core.py:718] File "/opt/conda/lib/python3.10/site-packages/torch/nn/functional.py", line 5912, in scaled_dot_product_attention
(EngineCore_DP0 pid=296) ERROR 11-21 17:06:26 [core.py:718] return _scaled_dot_product_attention(query, key, value, attn_mask, dropout_p, is_causal, scale = scale, enable_gqa = enable_gqa)
(EngineCore_DP0 pid=296) ERROR 11-21 17:06:26 [core.py:718] torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 256.00 GiB. GPU 0 has a total capacity of 63.59 GiB of which 41.68 GiB is free. Of the allocated memory 19.19 GiB is allocated by PyTorch, and 442.72 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)