WARNING 05-07 15:27:48 [__init__.py:80] The quantization method 'gptq' already exists and will be overwritten by the quantization config . WARNING 05-07 15:27:48 [__init__.py:80] The quantization method 'gptq_marlin' already exists and will be overwritten by the quantization config . WARNING 05-07 15:27:48 [__init__.py:80] The quantization method 'moe_wna16' already exists and will be overwritten by the quantization config . WARNING 05-07 15:27:48 [registry.py:886] Model architecture DeepSeekMTPModel is already registered, and will be overwritten by the new model class vllm_metax.models.deepseek_mtp:DeepSeekMTP. WARNING 05-07 15:27:48 [registry.py:886] Model architecture DeepseekV2ForCausalLM is already registered, and will be overwritten by the new model class vllm_metax.models.deepseek_v2:DeepseekV2ForCausalLM. WARNING 05-07 15:27:48 [registry.py:886] Model architecture DeepseekV3ForCausalLM is already registered, and will be overwritten by the new model class vllm_metax.models.deepseek_v2:DeepseekV3ForCausalLM. WARNING 05-07 15:27:48 [registry.py:886] Model architecture DeepseekV32ForCausalLM is already registered, and will be overwritten by the new model class vllm_metax.models.deepseek_v2:DeepseekV3ForCausalLM. WARNING 05-07 15:27:48 [registry.py:886] Model architecture KimiK25ForConditionalGeneration is already registered, and will be overwritten by the new model class vllm_metax.models.kimi_k25:KimiK25ForConditionalGeneration. WARNING 05-07 15:27:48 [registry.py:886] Model architecture GlmMoeDsaForCausalLM is already registered, and will be overwritten by the new model class vllm_metax.models.deepseek_v2:GlmMoeDsaForCausalLM. WARNING 05-07 15:27:48 [registry.py:886] Model architecture Step3p5MTP is already registered, and will be overwritten by the new model class vllm_metax.models.step3p5_mtp:Step3p5MTP. (APIServer pid=17) INFO 05-07 15:27:49 [utils.py:302] (APIServer pid=17) INFO 05-07 15:27:49 [utils.py:302] █ █ █▄ ▄█ (APIServer pid=17) INFO 05-07 15:27:49 [utils.py:302] ▄▄ ▄█ █ █ █ ▀▄▀ █ version 0.17.0 (APIServer pid=17) INFO 05-07 15:27:49 [utils.py:302] █▄█▀ █ █ █ █ model /models/MiniMax-M2.7 (APIServer pid=17) INFO 05-07 15:27:49 [utils.py:302] ▀▀ ▀▀▀▀▀ ▀▀▀▀▀ ▀ ▀ (APIServer pid=17) INFO 05-07 15:27:49 [utils.py:302] (APIServer pid=17) INFO 05-07 15:27:49 [utils.py:238] non-default args: {'model_tag': '/models/MiniMax-M2.7', 'enable_auto_tool_choice': True, 'tool_call_parser': 'minimax_m2', 'port': 2025, 'model': '/models/MiniMax-M2.7', 'trust_remote_code': True, 'reasoning_parser': 'minimax_m2_append_think', 'tensor_parallel_size': 8, 'swap_space': 16.0, 'enable_prefix_caching': False, 'max_num_batched_tokens': 32768, 'async_scheduling': True} (APIServer pid=17) The argument `trust_remote_code` is to be used with Auto classes. It has no effect here and is ignored. (APIServer pid=17) The argument `trust_remote_code` is to be used with Auto classes. It has no effect here and is ignored. (APIServer pid=17) INFO 05-07 15:28:09 [model.py:531] Resolved architecture: MiniMaxM2ForCausalLM (APIServer pid=17) INFO 05-07 15:28:09 [model.py:1554] Using max model len 196608 (APIServer pid=17) INFO 05-07 15:28:09 [scheduler.py:231] Chunked prefill is enabled with max_num_batched_tokens=32768. (APIServer pid=17) /opt/conda/lib/python3.12/site-packages/compressed_tensors/quantization/quant_args.py:362: UserWarning: No observer is used for dynamic quant., setting to None (APIServer pid=17) warnings.warn( (APIServer pid=17) INFO 05-07 15:28:09 [vllm.py:747] Asynchronous scheduling is enabled. (APIServer pid=17) INFO 05-07 15:28:09 [envs.py:104] Plugin sets VLLM_TUNED_CONFIG_FOLDER to /opt/conda/lib/python3.12/site-packages/vllm_metax/model_executor/layers/fused_moe/configs/H=3072. Reason: set FusedMoE tuned config dir by hidden_size=3072 (APIServer pid=17) The tokenizer you are loading from '/models/MiniMax-M2.7' with an incorrect regex pattern: https://huggingface.co/mistralai/Mistral-Small-3.1-24B-Instruct-2503/discussions/84#69121093e8b480e709447d5e. This will lead to incorrect tokenization. You should set the `fix_mistral_regex=True` flag when loading this tokenizer to fix this issue. INFO 05-07 15:28:12 [__init__.py:44] Available plugins for group vllm.platform_plugins: INFO 05-07 15:28:12 [__init__.py:46] - metax -> vllm_metax:register INFO 05-07 15:28:12 [__init__.py:49] All plugins in this group will be loaded. Set `VLLM_PLUGINS` to control which plugins to load. INFO 05-07 15:28:12 [__init__.py:212] Platform plugin metax is activated INFO 05-07 15:28:12 [envs.py:104] Plugin sets VLLM_USE_FLASHINFER_SAMPLER to False. Reason: flashinfer sampler are not supported on maca INFO 05-07 15:28:12 [envs.py:104] Plugin sets VLLM_ENGINE_READY_TIMEOUT_S to 3600. Reason: set timeout to 3600s for model loading INFO 05-07 15:28:12 [envs.py:104] Plugin sets VLLM_DISABLE_SHARED_EXPERTS_STREAM to True. Reason: no used on maca INFO Print the version information of mcoplib during compilation. Version info:Mcoplib_Version = '0.4.2' Build_Maca_Version = '3.5.3.20' GIT_BRANCH = 'HEAD' GIT_COMMIT = 'e482051' Vllm Op Version = 0.17.0 SGlang Op Version = 0.5.8 && 0.5.9 INFO Staring Check the current MACA version of the operating environment. INFO: Release major.minor matching, successful:3.5. WARNING 05-07 15:28:15 [__init__.py:80] The quantization method 'awq' already exists and will be overwritten by the quantization config . WARNING 05-07 15:28:15 [__init__.py:80] The quantization method 'awq_marlin' already exists and will be overwritten by the quantization config . WARNING 05-07 15:28:16 [__init__.py:80] The quantization method 'compressed-tensors' already exists and will be overwritten by the quantization config . WARNING 05-07 15:28:16 [__init__.py:80] The quantization method 'gptq' already exists and will be overwritten by the quantization config . WARNING 05-07 15:28:16 [__init__.py:80] The quantization method 'gptq_marlin' already exists and will be overwritten by the quantization config . WARNING 05-07 15:28:16 [__init__.py:80] The quantization method 'moe_wna16' already exists and will be overwritten by the quantization config . (EngineCore_DP0 pid=302) WARNING 05-07 15:28:21 [registry.py:886] Model architecture DeepSeekMTPModel is already registered, and will be overwritten by the new model class vllm_metax.models.deepseek_mtp:DeepSeekMTP. (EngineCore_DP0 pid=302) WARNING 05-07 15:28:21 [registry.py:886] Model architecture DeepseekV2ForCausalLM is already registered, and will be overwritten by the new model class vllm_metax.models.deepseek_v2:DeepseekV2ForCausalLM. (EngineCore_DP0 pid=302) WARNING 05-07 15:28:21 [registry.py:886] Model architecture DeepseekV3ForCausalLM is already registered, and will be overwritten by the new model class vllm_metax.models.deepseek_v2:DeepseekV3ForCausalLM. (EngineCore_DP0 pid=302) WARNING 05-07 15:28:21 [registry.py:886] Model architecture DeepseekV32ForCausalLM is already registered, and will be overwritten by the new model class vllm_metax.models.deepseek_v2:DeepseekV3ForCausalLM. (EngineCore_DP0 pid=302) WARNING 05-07 15:28:21 [registry.py:886] Model architecture KimiK25ForConditionalGeneration is already registered, and will be overwritten by the new model class vllm_metax.models.kimi_k25:KimiK25ForConditionalGeneration. (EngineCore_DP0 pid=302) WARNING 05-07 15:28:21 [registry.py:886] Model architecture GlmMoeDsaForCausalLM is already registered, and will be overwritten by the new model class vllm_metax.models.deepseek_v2:GlmMoeDsaForCausalLM. (EngineCore_DP0 pid=302) WARNING 05-07 15:28:21 [registry.py:886] Model architecture Step3p5MTP is already registered, and will be overwritten by the new model class vllm_metax.models.step3p5_mtp:Step3p5MTP. (EngineCore_DP0 pid=302) INFO 05-07 15:28:21 [core.py:101] Initializing a V1 LLM engine (v0.17.0) with config: model='/models/MiniMax-M2.7', speculative_config=None, tokenizer='/models/MiniMax-M2.7', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, tokenizer_revision=None, trust_remote_code=True, dtype=torch.bfloat16, max_seq_len=196608, download_dir=None, load_format=auto, tensor_parallel_size=8, pipeline_parallel_size=1, data_parallel_size=1, disable_custom_all_reduce=True, quantization=compressed-tensors, enforce_eager=False, enable_return_routed_experts=False, kv_cache_dtype=auto, device_config=cuda, structured_outputs_config=StructuredOutputsConfig(backend='auto', disable_fallback=False, disable_any_whitespace=False, disable_additional_properties=False, reasoning_parser='minimax_m2_append_think', reasoning_parser_plugin='', enable_in_reasoning=False), observability_config=ObservabilityConfig(show_hidden_metrics_for_version=None, otlp_traces_endpoint=None, collect_detailed_traces=None, kv_cache_metrics=False, kv_cache_metrics_sample=0.01, cudagraph_metrics=False, enable_layerwise_nvtx_tracing=False, enable_mfu_metrics=False, enable_mm_processor_stats=False, enable_logging_iteration_details=False), seed=0, served_model_name=/models/MiniMax-M2.7, enable_prefix_caching=False, enable_chunked_prefill=True, pooler_config=None, compilation_config={'level': None, 'mode': , 'debug_dump_path': None, 'cache_dir': '', 'compile_cache_save_format': 'binary', 'backend': 'inductor', 'custom_ops': ['none'], 'splitting_ops': ['vllm::unified_attention', 'vllm::unified_attention_with_output', 'vllm::unified_mla_attention', 'vllm::unified_mla_attention_with_output', 'vllm::mamba_mixer2', 'vllm::mamba_mixer', 'vllm::short_conv', 'vllm::linear_attention', 'vllm::plamo2_mamba_mixer', 'vllm::gdn_attention_core', 'vllm::kda_attention', 'vllm::sparse_attn_indexer', 'vllm::rocm_aiter_sparse_attn_indexer', 'vllm::mx_sparse_attn_indexer', 'vllm::unified_kv_cache_update', 'vllm::unified_mla_kv_cache_update'], 'compile_mm_encoder': False, 'compile_sizes': [], 'compile_ranges_split_points': [32768], 'inductor_compile_config': {'enable_auto_functionalized_v2': False}, 'inductor_passes': {}, 'cudagraph_mode': , 'cudagraph_num_of_warmups': 1, 'cudagraph_capture_sizes': [1, 2, 4, 8, 16, 24, 32, 40, 48, 56, 64, 72, 80, 88, 96, 104, 112, 120, 128, 136, 144, 152, 160, 168, 176, 184, 192, 200, 208, 216, 224, 232, 240, 248, 256, 272, 288, 304, 320, 336, 352, 368, 384, 400, 416, 432, 448, 464, 480, 496, 512], 'cudagraph_copy_inputs': False, 'cudagraph_specialize_lora': True, 'use_inductor_graph_partition': False, 'pass_config': {'fuse_norm_quant': False, 'fuse_act_quant': False, 'fuse_attn_quant': False, 'enable_sp': False, 'fuse_gemm_comms': False, 'fuse_allreduce_rms': False}, 'max_cudagraph_capture_size': 512, 'dynamic_shapes_config': {'type': , 'evaluate_guards': False, 'assume_32_bit_indexing': False}, 'local_cache_dir': None, 'fast_moe_cold_start': True, 'static_all_moe_layers': []} INFO 05-07 15:28:23 [__init__.py:44] Available plugins for group vllm.platform_plugins: INFO 05-07 15:28:23 [__init__.py:46] - metax -> vllm_metax:register INFO 05-07 15:28:23 [__init__.py:49] All plugins in this group will be loaded. Set `VLLM_PLUGINS` to control which plugins to load. INFO 05-07 15:28:23 [__init__.py:212] Platform plugin metax is activated INFO 05-07 15:28:23 [__init__.py:44] Available plugins for group vllm.platform_plugins: INFO 05-07 15:28:23 [__init__.py:46] - metax -> vllm_metax:register INFO 05-07 15:28:23 [__init__.py:49] All plugins in this group will be loaded. Set `VLLM_PLUGINS` to control which plugins to load. INFO 05-07 15:28:23 [__init__.py:212] Platform plugin metax is activated INFO 05-07 15:28:23 [envs.py:104] Plugin sets VLLM_USE_FLASHINFER_SAMPLER to False. Reason: flashinfer sampler are not supported on maca INFO 05-07 15:28:23 [envs.py:104] Plugin sets VLLM_ENGINE_READY_TIMEOUT_S to 3600. Reason: set timeout to 3600s for model loading INFO 05-07 15:28:23 [envs.py:104] Plugin sets VLLM_DISABLE_SHARED_EXPERTS_STREAM to True. Reason: no used on maca INFO 05-07 15:28:23 [__init__.py:44] Available plugins for group vllm.platform_plugins: INFO 05-07 15:28:23 [__init__.py:46] - metax -> vllm_metax:register INFO 05-07 15:28:23 [__init__.py:49] All plugins in this group will be loaded. Set `VLLM_PLUGINS` to control which plugins to load. INFO 05-07 15:28:23 [__init__.py:212] Platform plugin metax is activated INFO 05-07 15:28:23 [envs.py:104] Plugin sets VLLM_USE_FLASHINFER_SAMPLER to False. Reason: flashinfer sampler are not supported on maca INFO 05-07 15:28:23 [envs.py:104] Plugin sets VLLM_ENGINE_READY_TIMEOUT_S to 3600. Reason: set timeout to 3600s for model loading INFO 05-07 15:28:23 [envs.py:104] Plugin sets VLLM_DISABLE_SHARED_EXPERTS_STREAM to True. Reason: no used on maca INFO 05-07 15:28:23 [__init__.py:44] Available plugins for group vllm.platform_plugins: INFO 05-07 15:28:23 [__init__.py:46] - metax -> vllm_metax:register INFO 05-07 15:28:23 [__init__.py:49] All plugins in this group will be loaded. Set `VLLM_PLUGINS` to control which plugins to load. INFO 05-07 15:28:23 [__init__.py:212] Platform plugin metax is activated INFO 05-07 15:28:23 [envs.py:104] Plugin sets VLLM_USE_FLASHINFER_SAMPLER to False. Reason: flashinfer sampler are not supported on maca INFO 05-07 15:28:23 [envs.py:104] Plugin sets VLLM_ENGINE_READY_TIMEOUT_S to 3600. Reason: set timeout to 3600s for model loading INFO 05-07 15:28:23 [envs.py:104] Plugin sets VLLM_DISABLE_SHARED_EXPERTS_STREAM to True. Reason: no used on maca INFO 05-07 15:28:23 [__init__.py:44] Available plugins for group vllm.platform_plugins: INFO 05-07 15:28:23 [__init__.py:46] - metax -> vllm_metax:register INFO 05-07 15:28:23 [__init__.py:49] All plugins in this group will be loaded. Set `VLLM_PLUGINS` to control which plugins to load. INFO 05-07 15:28:23 [__init__.py:44] Available plugins for group vllm.platform_plugins: INFO 05-07 15:28:23 [__init__.py:46] - metax -> vllm_metax:register INFO 05-07 15:28:23 [__init__.py:49] All plugins in this group will be loaded. Set `VLLM_PLUGINS` to control which plugins to load. INFO 05-07 15:28:23 [__init__.py:212] Platform plugin metax is activated INFO 05-07 15:28:23 [envs.py:104] Plugin sets VLLM_USE_FLASHINFER_SAMPLER to False. Reason: flashinfer sampler are not supported on maca INFO 05-07 15:28:23 [envs.py:104] Plugin sets VLLM_ENGINE_READY_TIMEOUT_S to 3600. Reason: set timeout to 3600s for model loading INFO 05-07 15:28:23 [envs.py:104] Plugin sets VLLM_DISABLE_SHARED_EXPERTS_STREAM to True. Reason: no used on maca INFO 05-07 15:28:23 [__init__.py:212] Platform plugin metax is activated INFO 05-07 15:28:23 [__init__.py:44] Available plugins for group vllm.platform_plugins: INFO 05-07 15:28:23 [__init__.py:46] - metax -> vllm_metax:register INFO 05-07 15:28:23 [__init__.py:49] All plugins in this group will be loaded. Set `VLLM_PLUGINS` to control which plugins to load. INFO 05-07 15:28:23 [__init__.py:212] Platform plugin metax is activated INFO 05-07 15:28:23 [envs.py:104] Plugin sets VLLM_USE_FLASHINFER_SAMPLER to False. Reason: flashinfer sampler are not supported on maca INFO 05-07 15:28:23 [envs.py:104] Plugin sets VLLM_ENGINE_READY_TIMEOUT_S to 3600. Reason: set timeout to 3600s for model loading INFO 05-07 15:28:23 [envs.py:104] Plugin sets VLLM_DISABLE_SHARED_EXPERTS_STREAM to True. Reason: no used on maca INFO 05-07 15:28:23 [envs.py:104] Plugin sets VLLM_USE_FLASHINFER_SAMPLER to False. Reason: flashinfer sampler are not supported on maca INFO 05-07 15:28:23 [envs.py:104] Plugin sets VLLM_ENGINE_READY_TIMEOUT_S to 3600. Reason: set timeout to 3600s for model loading INFO 05-07 15:28:23 [envs.py:104] Plugin sets VLLM_DISABLE_SHARED_EXPERTS_STREAM to True. Reason: no used on maca INFO 05-07 15:28:23 [envs.py:104] Plugin sets VLLM_USE_FLASHINFER_SAMPLER to False. Reason: flashinfer sampler are not supported on maca INFO 05-07 15:28:23 [envs.py:104] Plugin sets VLLM_ENGINE_READY_TIMEOUT_S to 3600. Reason: set timeout to 3600s for model loading INFO 05-07 15:28:23 [envs.py:104] Plugin sets VLLM_DISABLE_SHARED_EXPERTS_STREAM to True. Reason: no used on maca INFO 05-07 15:28:23 [__init__.py:44] Available plugins for group vllm.platform_plugins: INFO 05-07 15:28:23 [__init__.py:46] - metax -> vllm_metax:register INFO 05-07 15:28:23 [__init__.py:49] All plugins in this group will be loaded. Set `VLLM_PLUGINS` to control which plugins to load. INFO 05-07 15:28:23 [__init__.py:212] Platform plugin metax is activated INFO 05-07 15:28:24 [envs.py:104] Plugin sets VLLM_USE_FLASHINFER_SAMPLER to False. Reason: flashinfer sampler are not supported on maca INFO 05-07 15:28:24 [envs.py:104] Plugin sets VLLM_ENGINE_READY_TIMEOUT_S to 3600. Reason: set timeout to 3600s for model loading INFO 05-07 15:28:24 [envs.py:104] Plugin sets VLLM_DISABLE_SHARED_EXPERTS_STREAM to True. Reason: no used on maca INFO Print the version information of mcoplib during compilation. Version info:Mcoplib_Version = '0.4.2' Build_Maca_Version = '3.5.3.20' GIT_BRANCH = 'HEAD' GIT_COMMIT = 'e482051' Vllm Op Version = 0.17.0 SGlang Op Version = 0.5.8 && 0.5.9 INFO Staring Check the current MACA version of the operating environment. INFO: Release major.minor matching, successful:3.5. INFO Print the version information of mcoplib during compilation. Version info:Mcoplib_Version = '0.4.2' Build_Maca_Version = '3.5.3.20' GIT_BRANCH = 'HEAD' GIT_COMMIT = 'e482051' Vllm Op Version = 0.17.0 SGlang Op Version = 0.5.8 && 0.5.9 INFO Staring Check the current MACA version of the operating environment. INFO: Release major.minor matching, successful:3.5. INFO Print the version information of mcoplib during compilation. Version info:Mcoplib_Version = '0.4.2' Build_Maca_Version = '3.5.3.20' GIT_BRANCH = 'HEAD' GIT_COMMIT = 'e482051' Vllm Op Version = 0.17.0 SGlang Op Version = 0.5.8 && 0.5.9 INFO Staring Check the current MACA version of the operating environment. INFO: Release major.minor matching, successful:3.5. INFO Print the version information of mcoplib during compilation. Version info:Mcoplib_Version = '0.4.2' Build_Maca_Version = '3.5.3.20' GIT_BRANCH = 'HEAD' GIT_COMMIT = 'e482051' Vllm Op Version = 0.17.0 SGlang Op Version = 0.5.8 && 0.5.9 INFO Staring Check the current MACA version of the operating environment. INFO: Release major.minor matching, successful:3.5. INFO Print the version information of mcoplib during compilation. Version info:Mcoplib_Version = '0.4.2' Build_Maca_Version = '3.5.3.20' GIT_BRANCH = 'HEAD' GIT_COMMIT = 'e482051' Vllm Op Version = 0.17.0 SGlang Op Version = 0.5.8 && 0.5.9 INFO Staring Check the current MACA version of the operating environment. INFO: Release major.minor matching, successful:3.5. INFO Print the version information of mcoplib during compilation. Version info:Mcoplib_Version = '0.4.2' Build_Maca_Version = '3.5.3.20' GIT_BRANCH = 'HEAD' GIT_COMMIT = 'e482051' Vllm Op Version = 0.17.0 SGlang Op Version = 0.5.8 && 0.5.9 INFO Staring Check the current MACA version of the operating environment. INFO: Release major.minor matching, successful:3.5. INFO Print the version information of mcoplib during compilation. Version info:Mcoplib_Version = '0.4.2' Build_Maca_Version = '3.5.3.20' GIT_BRANCH = 'HEAD' GIT_COMMIT = 'e482051' Vllm Op Version = 0.17.0 SGlang Op Version = 0.5.8 && 0.5.9 INFO Staring Check the current MACA version of the operating environment. INFO: Release major.minor matching, successful:3.5. INFO Print the version information of mcoplib during compilation. Version info:Mcoplib_Version = '0.4.2' Build_Maca_Version = '3.5.3.20' GIT_BRANCH = 'HEAD' GIT_COMMIT = 'e482051' Vllm Op Version = 0.17.0 SGlang Op Version = 0.5.8 && 0.5.9 INFO Staring Check the current MACA version of the operating environment. INFO: Release major.minor matching, successful:3.5. WARNING 05-07 15:28:27 [__init__.py:80] The quantization method 'awq' already exists and will be overwritten by the quantization config . WARNING 05-07 15:28:27 [__init__.py:80] The quantization method 'awq_marlin' already exists and will be overwritten by the quantization config . WARNING 05-07 15:28:27 [__init__.py:80] The quantization method 'awq' already exists and will be overwritten by the quantization config . WARNING 05-07 15:28:27 [__init__.py:80] The quantization method 'awq_marlin' already exists and will be overwritten by the quantization config . WARNING 05-07 15:28:27 [__init__.py:80] The quantization method 'compressed-tensors' already exists and will be overwritten by the quantization config . WARNING 05-07 15:28:27 [__init__.py:80] The quantization method 'gptq' already exists and will be overwritten by the quantization config . WARNING 05-07 15:28:27 [__init__.py:80] The quantization method 'gptq_marlin' already exists and will be overwritten by the quantization config . WARNING 05-07 15:28:27 [__init__.py:80] The quantization method 'moe_wna16' already exists and will be overwritten by the quantization config . WARNING 05-07 15:28:27 [__init__.py:80] The quantization method 'awq' already exists and will be overwritten by the quantization config . WARNING 05-07 15:28:27 [__init__.py:80] The quantization method 'awq_marlin' already exists and will be overwritten by the quantization config . WARNING 05-07 15:28:27 [__init__.py:80] The quantization method 'awq' already exists and will be overwritten by the quantization config . WARNING 05-07 15:28:27 [__init__.py:80] The quantization method 'awq_marlin' already exists and will be overwritten by the quantization config . WARNING 05-07 15:28:27 [__init__.py:80] The quantization method 'awq' already exists and will be overwritten by the quantization config . WARNING 05-07 15:28:27 [__init__.py:80] The quantization method 'awq_marlin' already exists and will be overwritten by the quantization config . WARNING 05-07 15:28:27 [__init__.py:80] The quantization method 'awq' already exists and will be overwritten by the quantization config . WARNING 05-07 15:28:27 [__init__.py:80] The quantization method 'awq_marlin' already exists and will be overwritten by the quantization config . WARNING 05-07 15:28:27 [__init__.py:80] The quantization method 'compressed-tensors' already exists and will be overwritten by the quantization config . WARNING 05-07 15:28:27 [__init__.py:80] The quantization method 'gptq' already exists and will be overwritten by the quantization config . WARNING 05-07 15:28:27 [__init__.py:80] The quantization method 'gptq_marlin' already exists and will be overwritten by the quantization config . WARNING 05-07 15:28:27 [__init__.py:80] The quantization method 'moe_wna16' already exists and will be overwritten by the quantization config . WARNING 05-07 15:28:27 [__init__.py:80] The quantization method 'compressed-tensors' already exists and will be overwritten by the quantization config . WARNING 05-07 15:28:27 [__init__.py:80] The quantization method 'gptq' already exists and will be overwritten by the quantization config . WARNING 05-07 15:28:27 [__init__.py:80] The quantization method 'gptq_marlin' already exists and will be overwritten by the quantization config . WARNING 05-07 15:28:27 [__init__.py:80] The quantization method 'moe_wna16' already exists and will be overwritten by the quantization config . WARNING 05-07 15:28:27 [__init__.py:80] The quantization method 'compressed-tensors' already exists and will be overwritten by the quantization config . WARNING 05-07 15:28:27 [__init__.py:80] The quantization method 'gptq' already exists and will be overwritten by the quantization config . WARNING 05-07 15:28:27 [__init__.py:80] The quantization method 'gptq_marlin' already exists and will be overwritten by the quantization config . WARNING 05-07 15:28:27 [__init__.py:80] The quantization method 'moe_wna16' already exists and will be overwritten by the quantization config . WARNING 05-07 15:28:27 [__init__.py:80] The quantization method 'compressed-tensors' already exists and will be overwritten by the quantization config . WARNING 05-07 15:28:27 [__init__.py:80] The quantization method 'gptq' already exists and will be overwritten by the quantization config . WARNING 05-07 15:28:27 [__init__.py:80] The quantization method 'gptq_marlin' already exists and will be overwritten by the quantization config . WARNING 05-07 15:28:27 [__init__.py:80] The quantization method 'moe_wna16' already exists and will be overwritten by the quantization config . WARNING 05-07 15:28:27 [__init__.py:80] The quantization method 'compressed-tensors' already exists and will be overwritten by the quantization config . WARNING 05-07 15:28:27 [__init__.py:80] The quantization method 'gptq' already exists and will be overwritten by the quantization config . WARNING 05-07 15:28:27 [__init__.py:80] The quantization method 'gptq_marlin' already exists and will be overwritten by the quantization config . WARNING 05-07 15:28:27 [__init__.py:80] The quantization method 'moe_wna16' already exists and will be overwritten by the quantization config . WARNING 05-07 15:28:27 [__init__.py:80] The quantization method 'awq' already exists and will be overwritten by the quantization config . WARNING 05-07 15:28:27 [__init__.py:80] The quantization method 'awq_marlin' already exists and will be overwritten by the quantization config . WARNING 05-07 15:28:27 [__init__.py:80] The quantization method 'awq' already exists and will be overwritten by the quantization config . WARNING 05-07 15:28:27 [__init__.py:80] The quantization method 'awq_marlin' already exists and will be overwritten by the quantization config . WARNING 05-07 15:28:27 [__init__.py:80] The quantization method 'compressed-tensors' already exists and will be overwritten by the quantization config . WARNING 05-07 15:28:27 [__init__.py:80] The quantization method 'gptq' already exists and will be overwritten by the quantization config . WARNING 05-07 15:28:27 [__init__.py:80] The quantization method 'gptq_marlin' already exists and will be overwritten by the quantization config . WARNING 05-07 15:28:27 [__init__.py:80] The quantization method 'moe_wna16' already exists and will be overwritten by the quantization config . WARNING 05-07 15:28:27 [__init__.py:80] The quantization method 'compressed-tensors' already exists and will be overwritten by the quantization config . WARNING 05-07 15:28:27 [__init__.py:80] The quantization method 'gptq' already exists and will be overwritten by the quantization config . WARNING 05-07 15:28:27 [__init__.py:80] The quantization method 'gptq_marlin' already exists and will be overwritten by the quantization config . WARNING 05-07 15:28:27 [__init__.py:80] The quantization method 'moe_wna16' already exists and will be overwritten by the quantization config . WARNING 05-07 15:28:33 [registry.py:886] Model architecture DeepSeekMTPModel is already registered, and will be overwritten by the new model class vllm_metax.models.deepseek_mtp:DeepSeekMTP. WARNING 05-07 15:28:33 [registry.py:886] Model architecture DeepseekV2ForCausalLM is already registered, and will be overwritten by the new model class vllm_metax.models.deepseek_v2:DeepseekV2ForCausalLM. WARNING 05-07 15:28:33 [registry.py:886] Model architecture DeepseekV3ForCausalLM is already registered, and will be overwritten by the new model class vllm_metax.models.deepseek_v2:DeepseekV3ForCausalLM. WARNING 05-07 15:28:33 [registry.py:886] Model architecture DeepseekV32ForCausalLM is already registered, and will be overwritten by the new model class vllm_metax.models.deepseek_v2:DeepseekV3ForCausalLM. WARNING 05-07 15:28:33 [registry.py:886] Model architecture KimiK25ForConditionalGeneration is already registered, and will be overwritten by the new model class vllm_metax.models.kimi_k25:KimiK25ForConditionalGeneration. WARNING 05-07 15:28:33 [registry.py:886] Model architecture GlmMoeDsaForCausalLM is already registered, and will be overwritten by the new model class vllm_metax.models.deepseek_v2:GlmMoeDsaForCausalLM. WARNING 05-07 15:28:33 [registry.py:886] Model architecture Step3p5MTP is already registered, and will be overwritten by the new model class vllm_metax.models.step3p5_mtp:Step3p5MTP. WARNING 05-07 15:28:33 [registry.py:886] Model architecture DeepSeekMTPModel is already registered, and will be overwritten by the new model class vllm_metax.models.deepseek_mtp:DeepSeekMTP. WARNING 05-07 15:28:33 [registry.py:886] Model architecture DeepseekV2ForCausalLM is already registered, and will be overwritten by the new model class vllm_metax.models.deepseek_v2:DeepseekV2ForCausalLM. WARNING 05-07 15:28:33 [registry.py:886] Model architecture DeepseekV3ForCausalLM is already registered, and will be overwritten by the new model class vllm_metax.models.deepseek_v2:DeepseekV3ForCausalLM. WARNING 05-07 15:28:33 [registry.py:886] Model architecture DeepseekV32ForCausalLM is already registered, and will be overwritten by the new model class vllm_metax.models.deepseek_v2:DeepseekV3ForCausalLM. WARNING 05-07 15:28:33 [registry.py:886] Model architecture KimiK25ForConditionalGeneration is already registered, and will be overwritten by the new model class vllm_metax.models.kimi_k25:KimiK25ForConditionalGeneration. WARNING 05-07 15:28:33 [registry.py:886] Model architecture GlmMoeDsaForCausalLM is already registered, and will be overwritten by the new model class vllm_metax.models.deepseek_v2:GlmMoeDsaForCausalLM. WARNING 05-07 15:28:33 [registry.py:886] Model architecture Step3p5MTP is already registered, and will be overwritten by the new model class vllm_metax.models.step3p5_mtp:Step3p5MTP. (Worker pid=445) INFO 05-07 15:28:33 [parallel_state.py:1393] world_size=8 rank=1 local_rank=1 distributed_init_method=tcp://127.0.0.1:41001 backend=nccl (Worker pid=451) INFO 05-07 15:28:33 [parallel_state.py:1393] world_size=8 rank=7 local_rank=7 distributed_init_method=tcp://127.0.0.1:41001 backend=nccl WARNING 05-07 15:28:33 [registry.py:886] Model architecture DeepSeekMTPModel is already registered, and will be overwritten by the new model class vllm_metax.models.deepseek_mtp:DeepSeekMTP. WARNING 05-07 15:28:33 [registry.py:886] Model architecture DeepseekV2ForCausalLM is already registered, and will be overwritten by the new model class vllm_metax.models.deepseek_v2:DeepseekV2ForCausalLM. WARNING 05-07 15:28:33 [registry.py:886] Model architecture DeepseekV3ForCausalLM is already registered, and will be overwritten by the new model class vllm_metax.models.deepseek_v2:DeepseekV3ForCausalLM. WARNING 05-07 15:28:33 [registry.py:886] Model architecture DeepseekV32ForCausalLM is already registered, and will be overwritten by the new model class vllm_metax.models.deepseek_v2:DeepseekV3ForCausalLM. WARNING 05-07 15:28:33 [registry.py:886] Model architecture KimiK25ForConditionalGeneration is already registered, and will be overwritten by the new model class vllm_metax.models.kimi_k25:KimiK25ForConditionalGeneration. WARNING 05-07 15:28:33 [registry.py:886] Model architecture GlmMoeDsaForCausalLM is already registered, and will be overwritten by the new model class vllm_metax.models.deepseek_v2:GlmMoeDsaForCausalLM. WARNING 05-07 15:28:33 [registry.py:886] Model architecture Step3p5MTP is already registered, and will be overwritten by the new model class vllm_metax.models.step3p5_mtp:Step3p5MTP. [15:28:33.097][MXKW][E]queues.c :832 : [mxkwCreateQueueBlock]ioctl create queue block failed -1 [15:28:33.098][MXC][E]exception: DMAQueue create failed at mxkwCreateQueueBlock. [15:28:33.098][MCR][E]mx_device.cpp :3913: Mxc copy from host to device failed with code 4104 [15:28:33.101][MXKW][E]queues.c :832 : [mxkwCreateQueueBlock]ioctl create queue block failed -1 [15:28:33.101][MXC][E]exception: DMAQueue create failed at mxkwCreateQueueBlock. [15:28:33.101][MCR][E]mx_device.cpp :4006: Mxc copy from device to device failed with code 4104 [15:28:33.104][MXKW][E]queues.c :832 : [mxkwCreateQueueBlock]ioctl create queue block failed -1 [15:28:33.104][MXC][E]exception: DMAQueue create failed at mxkwCreateQueueBlock. [15:28:33.104][MCR][E]mx_device.cpp :3913: Mxc copy from host to device failed with code 4104 [15:28:33.108][MXKW][E]queues.c :832 : [mxkwCreateQueueBlock]ioctl create queue block failed -1 [15:28:33.108][MXC][E]exception: DMAQueue create failed at mxkwCreateQueueBlock. [15:28:33.108][MCR][E]mx_device.cpp :3913: Mxc copy from host to device failed with code 4104 [15:28:33.112][MXKW][E]queues.c :832 : [mxkwCreateQueueBlock]ioctl create queue block failed -1 [15:28:33.113][MCR][E]mx_device.cpp :1379: Device::acquireQueue: mxc_queue_acquire failed! WARNING 05-07 15:28:33 [registry.py:886] Model architecture DeepSeekMTPModel is already registered, and will be overwritten by the new model class vllm_metax.models.deepseek_mtp:DeepSeekMTP. WARNING 05-07 15:28:33 [registry.py:886] Model architecture DeepseekV2ForCausalLM is already registered, and will be overwritten by the new model class vllm_metax.models.deepseek_v2:DeepseekV2ForCausalLM. WARNING 05-07 15:28:33 [registry.py:886] Model architecture DeepseekV3ForCausalLM is already registered, and will be overwritten by the new model class vllm_metax.models.deepseek_v2:DeepseekV3ForCausalLM. WARNING 05-07 15:28:33 [registry.py:886] Model architecture DeepseekV32ForCausalLM is already registered, and will be overwritten by the new model class vllm_metax.models.deepseek_v2:DeepseekV3ForCausalLM. WARNING 05-07 15:28:33 [registry.py:886] Model architecture KimiK25ForConditionalGeneration is already registered, and will be overwritten by the new model class vllm_metax.models.kimi_k25:KimiK25ForConditionalGeneration. WARNING 05-07 15:28:33 [registry.py:886] Model architecture GlmMoeDsaForCausalLM is already registered, and will be overwritten by the new model class vllm_metax.models.deepseek_v2:GlmMoeDsaForCausalLM. WARNING 05-07 15:28:33 [registry.py:886] Model architecture Step3p5MTP is already registered, and will be overwritten by the new model class vllm_metax.models.step3p5_mtp:Step3p5MTP. (Worker pid=447) INFO 05-07 15:28:33 [parallel_state.py:1393] world_size=8 rank=3 local_rank=3 distributed_init_method=tcp://127.0.0.1:41001 backend=nccl WARNING 05-07 15:28:33 [registry.py:886] Model architecture DeepSeekMTPModel is already registered, and will be overwritten by the new model class vllm_metax.models.deepseek_mtp:DeepSeekMTP. WARNING 05-07 15:28:33 [registry.py:886] Model architecture DeepseekV2ForCausalLM is already registered, and will be overwritten by the new model class vllm_metax.models.deepseek_v2:DeepseekV2ForCausalLM. WARNING 05-07 15:28:33 [registry.py:886] Model architecture DeepseekV3ForCausalLM is already registered, and will be overwritten by the new model class vllm_metax.models.deepseek_v2:DeepseekV3ForCausalLM. WARNING 05-07 15:28:33 [registry.py:886] Model architecture DeepseekV32ForCausalLM is already registered, and will be overwritten by the new model class vllm_metax.models.deepseek_v2:DeepseekV3ForCausalLM. WARNING 05-07 15:28:33 [registry.py:886] Model architecture KimiK25ForConditionalGeneration is already registered, and will be overwritten by the new model class vllm_metax.models.kimi_k25:KimiK25ForConditionalGeneration. WARNING 05-07 15:28:33 [registry.py:886] Model architecture GlmMoeDsaForCausalLM is already registered, and will be overwritten by the new model class vllm_metax.models.deepseek_v2:GlmMoeDsaForCausalLM. WARNING 05-07 15:28:33 [registry.py:886] Model architecture Step3p5MTP is already registered, and will be overwritten by the new model class vllm_metax.models.step3p5_mtp:Step3p5MTP. (Worker pid=450) INFO 05-07 15:28:33 [parallel_state.py:1393] world_size=8 rank=6 local_rank=6 distributed_init_method=tcp://127.0.0.1:41001 backend=nccl WARNING 05-07 15:28:33 [registry.py:886] Model architecture DeepSeekMTPModel is already registered, and will be overwritten by the new model class vllm_metax.models.deepseek_mtp:DeepSeekMTP. WARNING 05-07 15:28:33 [registry.py:886] Model architecture DeepseekV2ForCausalLM is already registered, and will be overwritten by the new model class vllm_metax.models.deepseek_v2:DeepseekV2ForCausalLM. WARNING 05-07 15:28:33 [registry.py:886] Model architecture DeepseekV3ForCausalLM is already registered, and will be overwritten by the new model class vllm_metax.models.deepseek_v2:DeepseekV3ForCausalLM. WARNING 05-07 15:28:33 [registry.py:886] Model architecture DeepseekV32ForCausalLM is already registered, and will be overwritten by the new model class vllm_metax.models.deepseek_v2:DeepseekV3ForCausalLM. WARNING 05-07 15:28:33 [registry.py:886] Model architecture KimiK25ForConditionalGeneration is already registered, and will be overwritten by the new model class vllm_metax.models.kimi_k25:KimiK25ForConditionalGeneration. WARNING 05-07 15:28:33 [registry.py:886] Model architecture GlmMoeDsaForCausalLM is already registered, and will be overwritten by the new model class vllm_metax.models.deepseek_v2:GlmMoeDsaForCausalLM. WARNING 05-07 15:28:33 [registry.py:886] Model architecture Step3p5MTP is already registered, and will be overwritten by the new model class vllm_metax.models.step3p5_mtp:Step3p5MTP. WARNING 05-07 15:28:33 [registry.py:886] Model architecture DeepSeekMTPModel is already registered, and will be overwritten by the new model class vllm_metax.models.deepseek_mtp:DeepSeekMTP. WARNING 05-07 15:28:33 [registry.py:886] Model architecture DeepseekV2ForCausalLM is already registered, and will be overwritten by the new model class vllm_metax.models.deepseek_v2:DeepseekV2ForCausalLM. WARNING 05-07 15:28:33 [registry.py:886] Model architecture DeepseekV3ForCausalLM is already registered, and will be overwritten by the new model class vllm_metax.models.deepseek_v2:DeepseekV3ForCausalLM. WARNING 05-07 15:28:33 [registry.py:886] Model architecture DeepseekV32ForCausalLM is already registered, and will be overwritten by the new model class vllm_metax.models.deepseek_v2:DeepseekV3ForCausalLM. WARNING 05-07 15:28:33 [registry.py:886] Model architecture KimiK25ForConditionalGeneration is already registered, and will be overwritten by the new model class vllm_metax.models.kimi_k25:KimiK25ForConditionalGeneration. WARNING 05-07 15:28:33 [registry.py:886] Model architecture GlmMoeDsaForCausalLM is already registered, and will be overwritten by the new model class vllm_metax.models.deepseek_v2:GlmMoeDsaForCausalLM. WARNING 05-07 15:28:33 [registry.py:886] Model architecture Step3p5MTP is already registered, and will be overwritten by the new model class vllm_metax.models.step3p5_mtp:Step3p5MTP. (Worker pid=446) INFO 05-07 15:28:33 [parallel_state.py:1393] world_size=8 rank=2 local_rank=2 distributed_init_method=tcp://127.0.0.1:41001 backend=nccl (Worker pid=449) INFO 05-07 15:28:33 [parallel_state.py:1393] world_size=8 rank=5 local_rank=5 distributed_init_method=tcp://127.0.0.1:41001 backend=nccl WARNING 05-07 15:28:33 [registry.py:886] Model architecture DeepSeekMTPModel is already registered, and will be overwritten by the new model class vllm_metax.models.deepseek_mtp:DeepSeekMTP. WARNING 05-07 15:28:33 [registry.py:886] Model architecture DeepseekV2ForCausalLM is already registered, and will be overwritten by the new model class vllm_metax.models.deepseek_v2:DeepseekV2ForCausalLM. WARNING 05-07 15:28:33 [registry.py:886] Model architecture DeepseekV3ForCausalLM is already registered, and will be overwritten by the new model class vllm_metax.models.deepseek_v2:DeepseekV3ForCausalLM. WARNING 05-07 15:28:33 [registry.py:886] Model architecture DeepseekV32ForCausalLM is already registered, and will be overwritten by the new model class vllm_metax.models.deepseek_v2:DeepseekV3ForCausalLM. WARNING 05-07 15:28:33 [registry.py:886] Model architecture KimiK25ForConditionalGeneration is already registered, and will be overwritten by the new model class vllm_metax.models.kimi_k25:KimiK25ForConditionalGeneration. WARNING 05-07 15:28:33 [registry.py:886] Model architecture GlmMoeDsaForCausalLM is already registered, and will be overwritten by the new model class vllm_metax.models.deepseek_v2:GlmMoeDsaForCausalLM. WARNING 05-07 15:28:33 [registry.py:886] Model architecture Step3p5MTP is already registered, and will be overwritten by the new model class vllm_metax.models.step3p5_mtp:Step3p5MTP. (Worker pid=444) INFO 05-07 15:28:33 [parallel_state.py:1393] world_size=8 rank=0 local_rank=0 distributed_init_method=tcp://127.0.0.1:41001 backend=nccl (EngineCore_DP0 pid=302) ERROR 05-07 15:28:41 [core.py:1100] EngineCore failed to start. (EngineCore_DP0 pid=302) ERROR 05-07 15:28:41 [core.py:1100] Traceback (most recent call last): (EngineCore_DP0 pid=302) ERROR 05-07 15:28:41 [core.py:1100] File "/opt/conda/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 1090, in run_engine_core (EngineCore_DP0 pid=302) ERROR 05-07 15:28:41 [core.py:1100] engine_core = EngineCoreProc(*args, engine_index=dp_rank, **kwargs) (EngineCore_DP0 pid=302) ERROR 05-07 15:28:41 [core.py:1100] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore_DP0 pid=302) ERROR 05-07 15:28:41 [core.py:1100] File "/opt/conda/lib/python3.12/site-packages/vllm/tracing/otel.py", line 178, in sync_wrapper (EngineCore_DP0 pid=302) ERROR 05-07 15:28:41 [core.py:1100] return func(*args, **kwargs) (EngineCore_DP0 pid=302) ERROR 05-07 15:28:41 [core.py:1100] ^^^^^^^^^^^^^^^^^^^^^ (EngineCore_DP0 pid=302) ERROR 05-07 15:28:41 [core.py:1100] File "/opt/conda/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 834, in __init__ (EngineCore_DP0 pid=302) ERROR 05-07 15:28:41 [core.py:1100] super().__init__( (EngineCore_DP0 pid=302) ERROR 05-07 15:28:41 [core.py:1100] File "/opt/conda/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 110, in __init__ (EngineCore_DP0 pid=302) ERROR 05-07 15:28:41 [core.py:1100] self.model_executor = executor_class(vllm_config) (EngineCore_DP0 pid=302) ERROR 05-07 15:28:41 [core.py:1100] ^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore_DP0 pid=302) ERROR 05-07 15:28:41 [core.py:1100] File "/opt/conda/lib/python3.12/site-packages/vllm_metax/v1/executor/multiproc_executor.py", line 100, in __init__ (EngineCore_DP0 pid=302) ERROR 05-07 15:28:41 [core.py:1100] super().__init__(vllm_config) (EngineCore_DP0 pid=302) ERROR 05-07 15:28:41 [core.py:1100] File "/opt/conda/lib/python3.12/site-packages/vllm/tracing/otel.py", line 178, in sync_wrapper (EngineCore_DP0 pid=302) ERROR 05-07 15:28:41 [core.py:1100] return func(*args, **kwargs) (EngineCore_DP0 pid=302) ERROR 05-07 15:28:41 [core.py:1100] ^^^^^^^^^^^^^^^^^^^^^ (EngineCore_DP0 pid=302) ERROR 05-07 15:28:41 [core.py:1100] File "/opt/conda/lib/python3.12/site-packages/vllm/v1/executor/abstract.py", line 103, in __init__ (EngineCore_DP0 pid=302) ERROR 05-07 15:28:41 [core.py:1100] self._init_executor() (EngineCore_DP0 pid=302) ERROR 05-07 15:28:41 [core.py:1100] File "/opt/conda/lib/python3.12/site-packages/vllm_metax/v1/executor/multiproc_executor.py", line 180, in _init_executor (EngineCore_DP0 pid=302) ERROR 05-07 15:28:41 [core.py:1100] self.workers = WorkerProc.wait_for_ready(unready_workers) (EngineCore_DP0 pid=302) ERROR 05-07 15:28:41 [core.py:1100] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore_DP0 pid=302) ERROR 05-07 15:28:41 [core.py:1100] File "/opt/conda/lib/python3.12/site-packages/vllm_metax/v1/executor/multiproc_executor.py", line 701, in wait_for_ready (EngineCore_DP0 pid=302) ERROR 05-07 15:28:41 [core.py:1100] raise e from None (EngineCore_DP0 pid=302) ERROR 05-07 15:28:41 [core.py:1100] Exception: WorkerProc initialization failed due to an exception in a background process. See stack trace for root cause. (EngineCore_DP0 pid=302) Process EngineCore_DP0: (EngineCore_DP0 pid=302) Traceback (most recent call last): (EngineCore_DP0 pid=302) File "/opt/conda/lib/python3.12/multiprocessing/process.py", line 314, in _bootstrap (EngineCore_DP0 pid=302) self.run() (EngineCore_DP0 pid=302) File "/opt/conda/lib/python3.12/multiprocessing/process.py", line 108, in run (EngineCore_DP0 pid=302) self._target(*self._args, **self._kwargs) (EngineCore_DP0 pid=302) File "/opt/conda/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 1104, in run_engine_core (EngineCore_DP0 pid=302) raise e (EngineCore_DP0 pid=302) File "/opt/conda/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 1090, in run_engine_core (EngineCore_DP0 pid=302) engine_core = EngineCoreProc(*args, engine_index=dp_rank, **kwargs) (EngineCore_DP0 pid=302) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore_DP0 pid=302) File "/opt/conda/lib/python3.12/site-packages/vllm/tracing/otel.py", line 178, in sync_wrapper (EngineCore_DP0 pid=302) return func(*args, **kwargs) (EngineCore_DP0 pid=302) ^^^^^^^^^^^^^^^^^^^^^ (EngineCore_DP0 pid=302) File "/opt/conda/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 834, in __init__ (EngineCore_DP0 pid=302) super().__init__( (EngineCore_DP0 pid=302) File "/opt/conda/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 110, in __init__ (EngineCore_DP0 pid=302) self.model_executor = executor_class(vllm_config) (EngineCore_DP0 pid=302) ^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore_DP0 pid=302) File "/opt/conda/lib/python3.12/site-packages/vllm_metax/v1/executor/multiproc_executor.py", line 100, in __init__ (EngineCore_DP0 pid=302) super().__init__(vllm_config) (EngineCore_DP0 pid=302) File "/opt/conda/lib/python3.12/site-packages/vllm/tracing/otel.py", line 178, in sync_wrapper (EngineCore_DP0 pid=302) return func(*args, **kwargs) (EngineCore_DP0 pid=302) ^^^^^^^^^^^^^^^^^^^^^ (EngineCore_DP0 pid=302) File "/opt/conda/lib/python3.12/site-packages/vllm/v1/executor/abstract.py", line 103, in __init__ (EngineCore_DP0 pid=302) self._init_executor() (EngineCore_DP0 pid=302) File "/opt/conda/lib/python3.12/site-packages/vllm_metax/v1/executor/multiproc_executor.py", line 180, in _init_executor (EngineCore_DP0 pid=302) self.workers = WorkerProc.wait_for_ready(unready_workers) (EngineCore_DP0 pid=302) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (EngineCore_DP0 pid=302) File "/opt/conda/lib/python3.12/site-packages/vllm_metax/v1/executor/multiproc_executor.py", line 701, in wait_for_ready (EngineCore_DP0 pid=302) raise e from None (EngineCore_DP0 pid=302) Exception: WorkerProc initialization failed due to an exception in a background process. See stack trace for root cause. (APIServer pid=17) Traceback (most recent call last): (APIServer pid=17) File "/opt/conda/bin/vllm", line 8, in (APIServer pid=17) sys.exit(main()) (APIServer pid=17) ^^^^^^ (APIServer pid=17) File "/opt/conda/lib/python3.12/site-packages/vllm/entrypoints/cli/main.py", line 73, in main (APIServer pid=17) args.dispatch_function(args) (APIServer pid=17) File "/opt/conda/lib/python3.12/site-packages/vllm/entrypoints/cli/serve.py", line 112, in cmd (APIServer pid=17) uvloop.run(run_server(args)) (APIServer pid=17) File "/opt/conda/lib/python3.12/site-packages/uvloop/__init__.py", line 96, in run (APIServer pid=17) return __asyncio.run( (APIServer pid=17) ^^^^^^^^^^^^^^ (APIServer pid=17) File "/opt/conda/lib/python3.12/asyncio/runners.py", line 195, in run (APIServer pid=17) return runner.run(main) (APIServer pid=17) ^^^^^^^^^^^^^^^^ (APIServer pid=17) File "/opt/conda/lib/python3.12/asyncio/runners.py", line 118, in run (APIServer pid=17) return self._loop.run_until_complete(task) (APIServer pid=17) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (APIServer pid=17) File "uvloop/loop.pyx", line 1518, in uvloop.loop.Loop.run_until_complete (APIServer pid=17) File "/opt/conda/lib/python3.12/site-packages/uvloop/__init__.py", line 48, in wrapper (APIServer pid=17) return await main (APIServer pid=17) ^^^^^^^^^^ (APIServer pid=17) File "/opt/conda/lib/python3.12/site-packages/vllm/entrypoints/openai/api_server.py", line 471, in run_server (APIServer pid=17) await run_server_worker(listen_address, sock, args, **uvicorn_kwargs) (APIServer pid=17) File "/opt/conda/lib/python3.12/site-packages/vllm/entrypoints/openai/api_server.py", line 490, in run_server_worker (APIServer pid=17) async with build_async_engine_client( (APIServer pid=17) ^^^^^^^^^^^^^^^^^^^^^^^^^^ (APIServer pid=17) File "/opt/conda/lib/python3.12/contextlib.py", line 210, in __aenter__ (APIServer pid=17) return await anext(self.gen) (APIServer pid=17) ^^^^^^^^^^^^^^^^^^^^^ (APIServer pid=17) File "/opt/conda/lib/python3.12/site-packages/vllm/entrypoints/openai/api_server.py", line 96, in build_async_engine_client (APIServer pid=17) async with build_async_engine_client_from_engine_args( (APIServer pid=17) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (APIServer pid=17) File "/opt/conda/lib/python3.12/contextlib.py", line 210, in __aenter__ (APIServer pid=17) return await anext(self.gen) (APIServer pid=17) ^^^^^^^^^^^^^^^^^^^^^ (APIServer pid=17) File "/opt/conda/lib/python3.12/site-packages/vllm/entrypoints/openai/api_server.py", line 137, in build_async_engine_client_from_engine_args (APIServer pid=17) async_llm = AsyncLLM.from_vllm_config( (APIServer pid=17) ^^^^^^^^^^^^^^^^^^^^^^^^^^ (APIServer pid=17) File "/opt/conda/lib/python3.12/site-packages/vllm/v1/engine/async_llm.py", line 225, in from_vllm_config (APIServer pid=17) return cls( (APIServer pid=17) ^^^^ (APIServer pid=17) File "/opt/conda/lib/python3.12/site-packages/vllm/v1/engine/async_llm.py", line 154, in __init__ (APIServer pid=17) self.engine_core = EngineCoreClient.make_async_mp_client( (APIServer pid=17) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ (APIServer pid=17) File "/opt/conda/lib/python3.12/site-packages/vllm/tracing/otel.py", line 178, in sync_wrapper (APIServer pid=17) return func(*args, **kwargs) (APIServer pid=17) ^^^^^^^^^^^^^^^^^^^^^ (APIServer pid=17) File "/opt/conda/lib/python3.12/site-packages/vllm/v1/engine/core_client.py", line 127, in make_async_mp_client (APIServer pid=17) return AsyncMPClient(*client_args) (APIServer pid=17) ^^^^^^^^^^^^^^^^^^^^^^^^^^^ (APIServer pid=17) File "/opt/conda/lib/python3.12/site-packages/vllm/tracing/otel.py", line 178, in sync_wrapper (APIServer pid=17) return func(*args, **kwargs) (APIServer pid=17) ^^^^^^^^^^^^^^^^^^^^^ (APIServer pid=17) File "/opt/conda/lib/python3.12/site-packages/vllm/v1/engine/core_client.py", line 911, in __init__ (APIServer pid=17) super().__init__( (APIServer pid=17) File "/opt/conda/lib/python3.12/site-packages/vllm/v1/engine/core_client.py", line 569, in __init__ (APIServer pid=17) with launch_core_engines( (APIServer pid=17) ^^^^^^^^^^^^^^^^^^^^ (APIServer pid=17) File "/opt/conda/lib/python3.12/contextlib.py", line 144, in __exit__ (APIServer pid=17) next(self.gen) (APIServer pid=17) File "/opt/conda/lib/python3.12/site-packages/vllm/v1/engine/utils.py", line 951, in launch_core_engines (APIServer pid=17) wait_for_engine_startup( (APIServer pid=17) File "/opt/conda/lib/python3.12/site-packages/vllm/v1/engine/utils.py", line 1010, in wait_for_engine_startup (APIServer pid=17) raise RuntimeError( (APIServer pid=17) RuntimeError: Engine core initialization failed. See root cause above. Failed core proc(s): {}