WARNING 05-07 15:27:48 [__init__.py:80] The quantization method 'gptq' already exists and will be overwritten by the quantization config <class 'vllm_metax.quant_config.gptq.MacaGPTQConfig'>.
WARNING 05-07 15:27:48 [__init__.py:80] The quantization method 'gptq_marlin' already exists and will be overwritten by the quantization config <class 'vllm_metax.quant_config.gptq_marlin.MacaGPTQMarlinConfig'>.
WARNING 05-07 15:27:48 [__init__.py:80] The quantization method 'moe_wna16' already exists and will be overwritten by the quantization config <class 'vllm_metax.quant_config.moe_wna16.MacaMoeWNA16Config'>.
WARNING 05-07 15:27:48 [registry.py:886] Model architecture DeepSeekMTPModel is already registered, and will be overwritten by the new model class vllm_metax.models.deepseek_mtp:DeepSeekMTP.
WARNING 05-07 15:27:48 [registry.py:886] Model architecture DeepseekV2ForCausalLM is already registered, and will be overwritten by the new model class vllm_metax.models.deepseek_v2:DeepseekV2ForCausalLM.
WARNING 05-07 15:27:48 [registry.py:886] Model architecture DeepseekV3ForCausalLM is already registered, and will be overwritten by the new model class vllm_metax.models.deepseek_v2:DeepseekV3ForCausalLM.
WARNING 05-07 15:27:48 [registry.py:886] Model architecture DeepseekV32ForCausalLM is already registered, and will be overwritten by the new model class vllm_metax.models.deepseek_v2:DeepseekV3ForCausalLM.
WARNING 05-07 15:27:48 [registry.py:886] Model architecture KimiK25ForConditionalGeneration is already registered, and will be overwritten by the new model class vllm_metax.models.kimi_k25:KimiK25ForConditionalGeneration.
WARNING 05-07 15:27:48 [registry.py:886] Model architecture GlmMoeDsaForCausalLM is already registered, and will be overwritten by the new model class vllm_metax.models.deepseek_v2:GlmMoeDsaForCausalLM.
WARNING 05-07 15:27:48 [registry.py:886] Model architecture Step3p5MTP is already registered, and will be overwritten by the new model class vllm_metax.models.step3p5_mtp:Step3p5MTP.
(APIServer pid=17) INFO 05-07 15:27:49 [utils.py:302] 
(APIServer pid=17) INFO 05-07 15:27:49 [utils.py:302]        █     █     █▄   ▄█
(APIServer pid=17) INFO 05-07 15:27:49 [utils.py:302]  ▄▄ ▄█ █     █     █ ▀▄▀ █  version 0.17.0
(APIServer pid=17) INFO 05-07 15:27:49 [utils.py:302]   █▄█▀ █     █     █     █  model   /models/MiniMax-M2.7
(APIServer pid=17) INFO 05-07 15:27:49 [utils.py:302]    ▀▀  ▀▀▀▀▀ ▀▀▀▀▀ ▀     ▀
(APIServer pid=17) INFO 05-07 15:27:49 [utils.py:302] 
(APIServer pid=17) INFO 05-07 15:27:49 [utils.py:238] non-default args: {'model_tag': '/models/MiniMax-M2.7', 'enable_auto_tool_choice': True, 'tool_call_parser': 'minimax_m2', 'port': 2025, 'model': '/models/MiniMax-M2.7', 'trust_remote_code': True, 'reasoning_parser': 'minimax_m2_append_think', 'tensor_parallel_size': 8, 'swap_space': 16.0, 'enable_prefix_caching': False, 'max_num_batched_tokens': 32768, 'async_scheduling': True}
(APIServer pid=17) The argument `trust_remote_code` is to be used with Auto classes. It has no effect here and is ignored.
(APIServer pid=17) The argument `trust_remote_code` is to be used with Auto classes. It has no effect here and is ignored.
(APIServer pid=17) INFO 05-07 15:28:09 [model.py:531] Resolved architecture: MiniMaxM2ForCausalLM
(APIServer pid=17) INFO 05-07 15:28:09 [model.py:1554] Using max model len 196608
(APIServer pid=17) INFO 05-07 15:28:09 [scheduler.py:231] Chunked prefill is enabled with max_num_batched_tokens=32768.
(APIServer pid=17) /opt/conda/lib/python3.12/site-packages/compressed_tensors/quantization/quant_args.py:362: UserWarning: No observer is used for dynamic quant., setting to None
(APIServer pid=17)   warnings.warn(
(APIServer pid=17) INFO 05-07 15:28:09 [vllm.py:747] Asynchronous scheduling is enabled.
(APIServer pid=17) INFO 05-07 15:28:09 [envs.py:104] Plugin sets VLLM_TUNED_CONFIG_FOLDER to /opt/conda/lib/python3.12/site-packages/vllm_metax/model_executor/layers/fused_moe/configs/H=3072. Reason: set FusedMoE tuned config dir by hidden_size=3072
(APIServer pid=17) The tokenizer you are loading from '/models/MiniMax-M2.7' with an incorrect regex pattern: https://huggingface.co/mistralai/Mistral-Small-3.1-24B-Instruct-2503/discussions/84#69121093e8b480e709447d5e. This will lead to incorrect tokenization. You should set the `fix_mistral_regex=True` flag when loading this tokenizer to fix this issue.
INFO 05-07 15:28:12 [__init__.py:44] Available plugins for group vllm.platform_plugins:
INFO 05-07 15:28:12 [__init__.py:46] - metax -> vllm_metax:register
INFO 05-07 15:28:12 [__init__.py:49] All plugins in this group will be loaded. Set `VLLM_PLUGINS` to control which plugins to load.
INFO 05-07 15:28:12 [__init__.py:212] Platform plugin metax is activated
INFO 05-07 15:28:12 [envs.py:104] Plugin sets VLLM_USE_FLASHINFER_SAMPLER to False. Reason: flashinfer sampler are not supported on maca
INFO 05-07 15:28:12 [envs.py:104] Plugin sets VLLM_ENGINE_READY_TIMEOUT_S to 3600. Reason: set timeout to 3600s for model loading
INFO 05-07 15:28:12 [envs.py:104] Plugin sets VLLM_DISABLE_SHARED_EXPERTS_STREAM to True. Reason: no used on maca
INFO Print the version information of mcoplib during compilation.

Version info:Mcoplib_Version = '0.4.2'
Build_Maca_Version = '3.5.3.20'
GIT_BRANCH = 'HEAD'
GIT_COMMIT = 'e482051'
Vllm Op Version = 0.17.0
SGlang Op Version  = 0.5.8 && 0.5.9 

INFO Staring Check the current MACA version of the operating environment.

INFO: Release major.minor matching,  successful:3.5. 

WARNING 05-07 15:28:15 [__init__.py:80] The quantization method 'awq' already exists and will be overwritten by the quantization config <class 'vllm_metax.quant_config.awq.MacaAWQConfig'>.
WARNING 05-07 15:28:15 [__init__.py:80] The quantization method 'awq_marlin' already exists and will be overwritten by the quantization config <class 'vllm_metax.quant_config.awq_marlin.MacaAWQMarlinConfig'>.
WARNING 05-07 15:28:16 [__init__.py:80] The quantization method 'compressed-tensors' already exists and will be overwritten by the quantization config <class 'vllm_metax.quant_config.compressed_tensors.MacaCompressedTensorsConfig'>.
WARNING 05-07 15:28:16 [__init__.py:80] The quantization method 'gptq' already exists and will be overwritten by the quantization config <class 'vllm_metax.quant_config.gptq.MacaGPTQConfig'>.
WARNING 05-07 15:28:16 [__init__.py:80] The quantization method 'gptq_marlin' already exists and will be overwritten by the quantization config <class 'vllm_metax.quant_config.gptq_marlin.MacaGPTQMarlinConfig'>.
WARNING 05-07 15:28:16 [__init__.py:80] The quantization method 'moe_wna16' already exists and will be overwritten by the quantization config <class 'vllm_metax.quant_config.moe_wna16.MacaMoeWNA16Config'>.
(EngineCore_DP0 pid=302) WARNING 05-07 15:28:21 [registry.py:886] Model architecture DeepSeekMTPModel is already registered, and will be overwritten by the new model class vllm_metax.models.deepseek_mtp:DeepSeekMTP.
(EngineCore_DP0 pid=302) WARNING 05-07 15:28:21 [registry.py:886] Model architecture DeepseekV2ForCausalLM is already registered, and will be overwritten by the new model class vllm_metax.models.deepseek_v2:DeepseekV2ForCausalLM.
(EngineCore_DP0 pid=302) WARNING 05-07 15:28:21 [registry.py:886] Model architecture DeepseekV3ForCausalLM is already registered, and will be overwritten by the new model class vllm_metax.models.deepseek_v2:DeepseekV3ForCausalLM.
(EngineCore_DP0 pid=302) WARNING 05-07 15:28:21 [registry.py:886] Model architecture DeepseekV32ForCausalLM is already registered, and will be overwritten by the new model class vllm_metax.models.deepseek_v2:DeepseekV3ForCausalLM.
(EngineCore_DP0 pid=302) WARNING 05-07 15:28:21 [registry.py:886] Model architecture KimiK25ForConditionalGeneration is already registered, and will be overwritten by the new model class vllm_metax.models.kimi_k25:KimiK25ForConditionalGeneration.
(EngineCore_DP0 pid=302) WARNING 05-07 15:28:21 [registry.py:886] Model architecture GlmMoeDsaForCausalLM is already registered, and will be overwritten by the new model class vllm_metax.models.deepseek_v2:GlmMoeDsaForCausalLM.
(EngineCore_DP0 pid=302) WARNING 05-07 15:28:21 [registry.py:886] Model architecture Step3p5MTP is already registered, and will be overwritten by the new model class vllm_metax.models.step3p5_mtp:Step3p5MTP.
(EngineCore_DP0 pid=302) INFO 05-07 15:28:21 [core.py:101] Initializing a V1 LLM engine (v0.17.0) with config: model='/models/MiniMax-M2.7', speculative_config=None, tokenizer='/models/MiniMax-M2.7', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, tokenizer_revision=None, trust_remote_code=True, dtype=torch.bfloat16, max_seq_len=196608, download_dir=None, load_format=auto, tensor_parallel_size=8, pipeline_parallel_size=1, data_parallel_size=1, disable_custom_all_reduce=True, quantization=compressed-tensors, enforce_eager=False, enable_return_routed_experts=False, kv_cache_dtype=auto, device_config=cuda, structured_outputs_config=StructuredOutputsConfig(backend='auto', disable_fallback=False, disable_any_whitespace=False, disable_additional_properties=False, reasoning_parser='minimax_m2_append_think', reasoning_parser_plugin='', enable_in_reasoning=False), observability_config=ObservabilityConfig(show_hidden_metrics_for_version=None, otlp_traces_endpoint=None, collect_detailed_traces=None, kv_cache_metrics=False, kv_cache_metrics_sample=0.01, cudagraph_metrics=False, enable_layerwise_nvtx_tracing=False, enable_mfu_metrics=False, enable_mm_processor_stats=False, enable_logging_iteration_details=False), seed=0, served_model_name=/models/MiniMax-M2.7, enable_prefix_caching=False, enable_chunked_prefill=True, pooler_config=None, compilation_config={'level': None, 'mode': <CompilationMode.VLLM_COMPILE: 3>, 'debug_dump_path': None, 'cache_dir': '', 'compile_cache_save_format': 'binary', 'backend': 'inductor', 'custom_ops': ['none'], 'splitting_ops': ['vllm::unified_attention', 'vllm::unified_attention_with_output', 'vllm::unified_mla_attention', 'vllm::unified_mla_attention_with_output', 'vllm::mamba_mixer2', 'vllm::mamba_mixer', 'vllm::short_conv', 'vllm::linear_attention', 'vllm::plamo2_mamba_mixer', 'vllm::gdn_attention_core', 'vllm::kda_attention', 'vllm::sparse_attn_indexer', 'vllm::rocm_aiter_sparse_attn_indexer', 'vllm::mx_sparse_attn_indexer', 'vllm::unified_kv_cache_update', 'vllm::unified_mla_kv_cache_update'], 'compile_mm_encoder': False, 'compile_sizes': [], 'compile_ranges_split_points': [32768], 'inductor_compile_config': {'enable_auto_functionalized_v2': False}, 'inductor_passes': {}, 'cudagraph_mode': <CUDAGraphMode.FULL_AND_PIECEWISE: (2, 1)>, 'cudagraph_num_of_warmups': 1, 'cudagraph_capture_sizes': [1, 2, 4, 8, 16, 24, 32, 40, 48, 56, 64, 72, 80, 88, 96, 104, 112, 120, 128, 136, 144, 152, 160, 168, 176, 184, 192, 200, 208, 216, 224, 232, 240, 248, 256, 272, 288, 304, 320, 336, 352, 368, 384, 400, 416, 432, 448, 464, 480, 496, 512], 'cudagraph_copy_inputs': False, 'cudagraph_specialize_lora': True, 'use_inductor_graph_partition': False, 'pass_config': {'fuse_norm_quant': False, 'fuse_act_quant': False, 'fuse_attn_quant': False, 'enable_sp': False, 'fuse_gemm_comms': False, 'fuse_allreduce_rms': False}, 'max_cudagraph_capture_size': 512, 'dynamic_shapes_config': {'type': <DynamicShapesType.BACKED: 'backed'>, 'evaluate_guards': False, 'assume_32_bit_indexing': False}, 'local_cache_dir': None, 'fast_moe_cold_start': True, 'static_all_moe_layers': []}
INFO 05-07 15:28:23 [__init__.py:44] Available plugins for group vllm.platform_plugins:
INFO 05-07 15:28:23 [__init__.py:46] - metax -> vllm_metax:register
INFO 05-07 15:28:23 [__init__.py:49] All plugins in this group will be loaded. Set `VLLM_PLUGINS` to control which plugins to load.
INFO 05-07 15:28:23 [__init__.py:212] Platform plugin metax is activated
INFO 05-07 15:28:23 [__init__.py:44] Available plugins for group vllm.platform_plugins:
INFO 05-07 15:28:23 [__init__.py:46] - metax -> vllm_metax:register
INFO 05-07 15:28:23 [__init__.py:49] All plugins in this group will be loaded. Set `VLLM_PLUGINS` to control which plugins to load.
INFO 05-07 15:28:23 [__init__.py:212] Platform plugin metax is activated
INFO 05-07 15:28:23 [envs.py:104] Plugin sets VLLM_USE_FLASHINFER_SAMPLER to False. Reason: flashinfer sampler are not supported on maca
INFO 05-07 15:28:23 [envs.py:104] Plugin sets VLLM_ENGINE_READY_TIMEOUT_S to 3600. Reason: set timeout to 3600s for model loading
INFO 05-07 15:28:23 [envs.py:104] Plugin sets VLLM_DISABLE_SHARED_EXPERTS_STREAM to True. Reason: no used on maca
INFO 05-07 15:28:23 [__init__.py:44] Available plugins for group vllm.platform_plugins:
INFO 05-07 15:28:23 [__init__.py:46] - metax -> vllm_metax:register
INFO 05-07 15:28:23 [__init__.py:49] All plugins in this group will be loaded. Set `VLLM_PLUGINS` to control which plugins to load.
INFO 05-07 15:28:23 [__init__.py:212] Platform plugin metax is activated
INFO 05-07 15:28:23 [envs.py:104] Plugin sets VLLM_USE_FLASHINFER_SAMPLER to False. Reason: flashinfer sampler are not supported on maca
INFO 05-07 15:28:23 [envs.py:104] Plugin sets VLLM_ENGINE_READY_TIMEOUT_S to 3600. Reason: set timeout to 3600s for model loading
INFO 05-07 15:28:23 [envs.py:104] Plugin sets VLLM_DISABLE_SHARED_EXPERTS_STREAM to True. Reason: no used on maca
INFO 05-07 15:28:23 [__init__.py:44] Available plugins for group vllm.platform_plugins:
INFO 05-07 15:28:23 [__init__.py:46] - metax -> vllm_metax:register
INFO 05-07 15:28:23 [__init__.py:49] All plugins in this group will be loaded. Set `VLLM_PLUGINS` to control which plugins to load.
INFO 05-07 15:28:23 [__init__.py:212] Platform plugin metax is activated
INFO 05-07 15:28:23 [envs.py:104] Plugin sets VLLM_USE_FLASHINFER_SAMPLER to False. Reason: flashinfer sampler are not supported on maca
INFO 05-07 15:28:23 [envs.py:104] Plugin sets VLLM_ENGINE_READY_TIMEOUT_S to 3600. Reason: set timeout to 3600s for model loading
INFO 05-07 15:28:23 [envs.py:104] Plugin sets VLLM_DISABLE_SHARED_EXPERTS_STREAM to True. Reason: no used on maca
INFO 05-07 15:28:23 [__init__.py:44] Available plugins for group vllm.platform_plugins:
INFO 05-07 15:28:23 [__init__.py:46] - metax -> vllm_metax:register
INFO 05-07 15:28:23 [__init__.py:49] All plugins in this group will be loaded. Set `VLLM_PLUGINS` to control which plugins to load.
INFO 05-07 15:28:23 [__init__.py:44] Available plugins for group vllm.platform_plugins:
INFO 05-07 15:28:23 [__init__.py:46] - metax -> vllm_metax:register
INFO 05-07 15:28:23 [__init__.py:49] All plugins in this group will be loaded. Set `VLLM_PLUGINS` to control which plugins to load.
INFO 05-07 15:28:23 [__init__.py:212] Platform plugin metax is activated
INFO 05-07 15:28:23 [envs.py:104] Plugin sets VLLM_USE_FLASHINFER_SAMPLER to False. Reason: flashinfer sampler are not supported on maca
INFO 05-07 15:28:23 [envs.py:104] Plugin sets VLLM_ENGINE_READY_TIMEOUT_S to 3600. Reason: set timeout to 3600s for model loading
INFO 05-07 15:28:23 [envs.py:104] Plugin sets VLLM_DISABLE_SHARED_EXPERTS_STREAM to True. Reason: no used on maca
INFO 05-07 15:28:23 [__init__.py:212] Platform plugin metax is activated
INFO 05-07 15:28:23 [__init__.py:44] Available plugins for group vllm.platform_plugins:
INFO 05-07 15:28:23 [__init__.py:46] - metax -> vllm_metax:register
INFO 05-07 15:28:23 [__init__.py:49] All plugins in this group will be loaded. Set `VLLM_PLUGINS` to control which plugins to load.
INFO 05-07 15:28:23 [__init__.py:212] Platform plugin metax is activated
INFO 05-07 15:28:23 [envs.py:104] Plugin sets VLLM_USE_FLASHINFER_SAMPLER to False. Reason: flashinfer sampler are not supported on maca
INFO 05-07 15:28:23 [envs.py:104] Plugin sets VLLM_ENGINE_READY_TIMEOUT_S to 3600. Reason: set timeout to 3600s for model loading
INFO 05-07 15:28:23 [envs.py:104] Plugin sets VLLM_DISABLE_SHARED_EXPERTS_STREAM to True. Reason: no used on maca
INFO 05-07 15:28:23 [envs.py:104] Plugin sets VLLM_USE_FLASHINFER_SAMPLER to False. Reason: flashinfer sampler are not supported on maca
INFO 05-07 15:28:23 [envs.py:104] Plugin sets VLLM_ENGINE_READY_TIMEOUT_S to 3600. Reason: set timeout to 3600s for model loading
INFO 05-07 15:28:23 [envs.py:104] Plugin sets VLLM_DISABLE_SHARED_EXPERTS_STREAM to True. Reason: no used on maca
INFO 05-07 15:28:23 [envs.py:104] Plugin sets VLLM_USE_FLASHINFER_SAMPLER to False. Reason: flashinfer sampler are not supported on maca
INFO 05-07 15:28:23 [envs.py:104] Plugin sets VLLM_ENGINE_READY_TIMEOUT_S to 3600. Reason: set timeout to 3600s for model loading
INFO 05-07 15:28:23 [envs.py:104] Plugin sets VLLM_DISABLE_SHARED_EXPERTS_STREAM to True. Reason: no used on maca
INFO 05-07 15:28:23 [__init__.py:44] Available plugins for group vllm.platform_plugins:
INFO 05-07 15:28:23 [__init__.py:46] - metax -> vllm_metax:register
INFO 05-07 15:28:23 [__init__.py:49] All plugins in this group will be loaded. Set `VLLM_PLUGINS` to control which plugins to load.
INFO 05-07 15:28:23 [__init__.py:212] Platform plugin metax is activated
INFO 05-07 15:28:24 [envs.py:104] Plugin sets VLLM_USE_FLASHINFER_SAMPLER to False. Reason: flashinfer sampler are not supported on maca
INFO 05-07 15:28:24 [envs.py:104] Plugin sets VLLM_ENGINE_READY_TIMEOUT_S to 3600. Reason: set timeout to 3600s for model loading
INFO 05-07 15:28:24 [envs.py:104] Plugin sets VLLM_DISABLE_SHARED_EXPERTS_STREAM to True. Reason: no used on maca
INFO Print the version information of mcoplib during compilation.

Version info:Mcoplib_Version = '0.4.2'
Build_Maca_Version = '3.5.3.20'
GIT_BRANCH = 'HEAD'
GIT_COMMIT = 'e482051'
Vllm Op Version = 0.17.0
SGlang Op Version  = 0.5.8 && 0.5.9 

INFO Staring Check the current MACA version of the operating environment.

INFO: Release major.minor matching,  successful:3.5. 

INFO Print the version information of mcoplib during compilation.

Version info:Mcoplib_Version = '0.4.2'
Build_Maca_Version = '3.5.3.20'
GIT_BRANCH = 'HEAD'
GIT_COMMIT = 'e482051'
Vllm Op Version = 0.17.0
SGlang Op Version  = 0.5.8 && 0.5.9 

INFO Staring Check the current MACA version of the operating environment.

INFO: Release major.minor matching,  successful:3.5. 

INFO Print the version information of mcoplib during compilation.

Version info:Mcoplib_Version = '0.4.2'
Build_Maca_Version = '3.5.3.20'
GIT_BRANCH = 'HEAD'
GIT_COMMIT = 'e482051'
Vllm Op Version = 0.17.0
SGlang Op Version  = 0.5.8 && 0.5.9 

INFO Staring Check the current MACA version of the operating environment.

INFO: Release major.minor matching,  successful:3.5. 

INFO Print the version information of mcoplib during compilation.

Version info:Mcoplib_Version = '0.4.2'
Build_Maca_Version = '3.5.3.20'
GIT_BRANCH = 'HEAD'
GIT_COMMIT = 'e482051'
Vllm Op Version = 0.17.0
SGlang Op Version  = 0.5.8 && 0.5.9 

INFO Staring Check the current MACA version of the operating environment.

INFO: Release major.minor matching,  successful:3.5. 

INFO Print the version information of mcoplib during compilation.

Version info:Mcoplib_Version = '0.4.2'
Build_Maca_Version = '3.5.3.20'
GIT_BRANCH = 'HEAD'
GIT_COMMIT = 'e482051'
Vllm Op Version = 0.17.0
SGlang Op Version  = 0.5.8 && 0.5.9 

INFO Staring Check the current MACA version of the operating environment.

INFO: Release major.minor matching,  successful:3.5. 

INFO Print the version information of mcoplib during compilation.

Version info:Mcoplib_Version = '0.4.2'
Build_Maca_Version = '3.5.3.20'
GIT_BRANCH = 'HEAD'
GIT_COMMIT = 'e482051'
Vllm Op Version = 0.17.0
SGlang Op Version  = 0.5.8 && 0.5.9 

INFO Staring Check the current MACA version of the operating environment.

INFO: Release major.minor matching,  successful:3.5. 

INFO Print the version information of mcoplib during compilation.

Version info:Mcoplib_Version = '0.4.2'
Build_Maca_Version = '3.5.3.20'
GIT_BRANCH = 'HEAD'
GIT_COMMIT = 'e482051'
Vllm Op Version = 0.17.0
SGlang Op Version  = 0.5.8 && 0.5.9 

INFO Staring Check the current MACA version of the operating environment.

INFO: Release major.minor matching,  successful:3.5. 

INFO Print the version information of mcoplib during compilation.

Version info:Mcoplib_Version = '0.4.2'
Build_Maca_Version = '3.5.3.20'
GIT_BRANCH = 'HEAD'
GIT_COMMIT = 'e482051'
Vllm Op Version = 0.17.0
SGlang Op Version  = 0.5.8 && 0.5.9 

INFO Staring Check the current MACA version of the operating environment.

INFO: Release major.minor matching,  successful:3.5. 

WARNING 05-07 15:28:27 [__init__.py:80] The quantization method 'awq' already exists and will be overwritten by the quantization config <class 'vllm_metax.quant_config.awq.MacaAWQConfig'>.
WARNING 05-07 15:28:27 [__init__.py:80] The quantization method 'awq_marlin' already exists and will be overwritten by the quantization config <class 'vllm_metax.quant_config.awq_marlin.MacaAWQMarlinConfig'>.
WARNING 05-07 15:28:27 [__init__.py:80] The quantization method 'awq' already exists and will be overwritten by the quantization config <class 'vllm_metax.quant_config.awq.MacaAWQConfig'>.
WARNING 05-07 15:28:27 [__init__.py:80] The quantization method 'awq_marlin' already exists and will be overwritten by the quantization config <class 'vllm_metax.quant_config.awq_marlin.MacaAWQMarlinConfig'>.
WARNING 05-07 15:28:27 [__init__.py:80] The quantization method 'compressed-tensors' already exists and will be overwritten by the quantization config <class 'vllm_metax.quant_config.compressed_tensors.MacaCompressedTensorsConfig'>.
WARNING 05-07 15:28:27 [__init__.py:80] The quantization method 'gptq' already exists and will be overwritten by the quantization config <class 'vllm_metax.quant_config.gptq.MacaGPTQConfig'>.
WARNING 05-07 15:28:27 [__init__.py:80] The quantization method 'gptq_marlin' already exists and will be overwritten by the quantization config <class 'vllm_metax.quant_config.gptq_marlin.MacaGPTQMarlinConfig'>.
WARNING 05-07 15:28:27 [__init__.py:80] The quantization method 'moe_wna16' already exists and will be overwritten by the quantization config <class 'vllm_metax.quant_config.moe_wna16.MacaMoeWNA16Config'>.
WARNING 05-07 15:28:27 [__init__.py:80] The quantization method 'awq' already exists and will be overwritten by the quantization config <class 'vllm_metax.quant_config.awq.MacaAWQConfig'>.
WARNING 05-07 15:28:27 [__init__.py:80] The quantization method 'awq_marlin' already exists and will be overwritten by the quantization config <class 'vllm_metax.quant_config.awq_marlin.MacaAWQMarlinConfig'>.
WARNING 05-07 15:28:27 [__init__.py:80] The quantization method 'awq' already exists and will be overwritten by the quantization config <class 'vllm_metax.quant_config.awq.MacaAWQConfig'>.
WARNING 05-07 15:28:27 [__init__.py:80] The quantization method 'awq_marlin' already exists and will be overwritten by the quantization config <class 'vllm_metax.quant_config.awq_marlin.MacaAWQMarlinConfig'>.
WARNING 05-07 15:28:27 [__init__.py:80] The quantization method 'awq' already exists and will be overwritten by the quantization config <class 'vllm_metax.quant_config.awq.MacaAWQConfig'>.
WARNING 05-07 15:28:27 [__init__.py:80] The quantization method 'awq_marlin' already exists and will be overwritten by the quantization config <class 'vllm_metax.quant_config.awq_marlin.MacaAWQMarlinConfig'>.
WARNING 05-07 15:28:27 [__init__.py:80] The quantization method 'awq' already exists and will be overwritten by the quantization config <class 'vllm_metax.quant_config.awq.MacaAWQConfig'>.
WARNING 05-07 15:28:27 [__init__.py:80] The quantization method 'awq_marlin' already exists and will be overwritten by the quantization config <class 'vllm_metax.quant_config.awq_marlin.MacaAWQMarlinConfig'>.
WARNING 05-07 15:28:27 [__init__.py:80] The quantization method 'compressed-tensors' already exists and will be overwritten by the quantization config <class 'vllm_metax.quant_config.compressed_tensors.MacaCompressedTensorsConfig'>.
WARNING 05-07 15:28:27 [__init__.py:80] The quantization method 'gptq' already exists and will be overwritten by the quantization config <class 'vllm_metax.quant_config.gptq.MacaGPTQConfig'>.
WARNING 05-07 15:28:27 [__init__.py:80] The quantization method 'gptq_marlin' already exists and will be overwritten by the quantization config <class 'vllm_metax.quant_config.gptq_marlin.MacaGPTQMarlinConfig'>.
WARNING 05-07 15:28:27 [__init__.py:80] The quantization method 'moe_wna16' already exists and will be overwritten by the quantization config <class 'vllm_metax.quant_config.moe_wna16.MacaMoeWNA16Config'>.
WARNING 05-07 15:28:27 [__init__.py:80] The quantization method 'compressed-tensors' already exists and will be overwritten by the quantization config <class 'vllm_metax.quant_config.compressed_tensors.MacaCompressedTensorsConfig'>.
WARNING 05-07 15:28:27 [__init__.py:80] The quantization method 'gptq' already exists and will be overwritten by the quantization config <class 'vllm_metax.quant_config.gptq.MacaGPTQConfig'>.
WARNING 05-07 15:28:27 [__init__.py:80] The quantization method 'gptq_marlin' already exists and will be overwritten by the quantization config <class 'vllm_metax.quant_config.gptq_marlin.MacaGPTQMarlinConfig'>.
WARNING 05-07 15:28:27 [__init__.py:80] The quantization method 'moe_wna16' already exists and will be overwritten by the quantization config <class 'vllm_metax.quant_config.moe_wna16.MacaMoeWNA16Config'>.
WARNING 05-07 15:28:27 [__init__.py:80] The quantization method 'compressed-tensors' already exists and will be overwritten by the quantization config <class 'vllm_metax.quant_config.compressed_tensors.MacaCompressedTensorsConfig'>.
WARNING 05-07 15:28:27 [__init__.py:80] The quantization method 'gptq' already exists and will be overwritten by the quantization config <class 'vllm_metax.quant_config.gptq.MacaGPTQConfig'>.
WARNING 05-07 15:28:27 [__init__.py:80] The quantization method 'gptq_marlin' already exists and will be overwritten by the quantization config <class 'vllm_metax.quant_config.gptq_marlin.MacaGPTQMarlinConfig'>.
WARNING 05-07 15:28:27 [__init__.py:80] The quantization method 'moe_wna16' already exists and will be overwritten by the quantization config <class 'vllm_metax.quant_config.moe_wna16.MacaMoeWNA16Config'>.
WARNING 05-07 15:28:27 [__init__.py:80] The quantization method 'compressed-tensors' already exists and will be overwritten by the quantization config <class 'vllm_metax.quant_config.compressed_tensors.MacaCompressedTensorsConfig'>.
WARNING 05-07 15:28:27 [__init__.py:80] The quantization method 'gptq' already exists and will be overwritten by the quantization config <class 'vllm_metax.quant_config.gptq.MacaGPTQConfig'>.
WARNING 05-07 15:28:27 [__init__.py:80] The quantization method 'gptq_marlin' already exists and will be overwritten by the quantization config <class 'vllm_metax.quant_config.gptq_marlin.MacaGPTQMarlinConfig'>.
WARNING 05-07 15:28:27 [__init__.py:80] The quantization method 'moe_wna16' already exists and will be overwritten by the quantization config <class 'vllm_metax.quant_config.moe_wna16.MacaMoeWNA16Config'>.
WARNING 05-07 15:28:27 [__init__.py:80] The quantization method 'compressed-tensors' already exists and will be overwritten by the quantization config <class 'vllm_metax.quant_config.compressed_tensors.MacaCompressedTensorsConfig'>.
WARNING 05-07 15:28:27 [__init__.py:80] The quantization method 'gptq' already exists and will be overwritten by the quantization config <class 'vllm_metax.quant_config.gptq.MacaGPTQConfig'>.
WARNING 05-07 15:28:27 [__init__.py:80] The quantization method 'gptq_marlin' already exists and will be overwritten by the quantization config <class 'vllm_metax.quant_config.gptq_marlin.MacaGPTQMarlinConfig'>.
WARNING 05-07 15:28:27 [__init__.py:80] The quantization method 'moe_wna16' already exists and will be overwritten by the quantization config <class 'vllm_metax.quant_config.moe_wna16.MacaMoeWNA16Config'>.
WARNING 05-07 15:28:27 [__init__.py:80] The quantization method 'awq' already exists and will be overwritten by the quantization config <class 'vllm_metax.quant_config.awq.MacaAWQConfig'>.
WARNING 05-07 15:28:27 [__init__.py:80] The quantization method 'awq_marlin' already exists and will be overwritten by the quantization config <class 'vllm_metax.quant_config.awq_marlin.MacaAWQMarlinConfig'>.
WARNING 05-07 15:28:27 [__init__.py:80] The quantization method 'awq' already exists and will be overwritten by the quantization config <class 'vllm_metax.quant_config.awq.MacaAWQConfig'>.
WARNING 05-07 15:28:27 [__init__.py:80] The quantization method 'awq_marlin' already exists and will be overwritten by the quantization config <class 'vllm_metax.quant_config.awq_marlin.MacaAWQMarlinConfig'>.
WARNING 05-07 15:28:27 [__init__.py:80] The quantization method 'compressed-tensors' already exists and will be overwritten by the quantization config <class 'vllm_metax.quant_config.compressed_tensors.MacaCompressedTensorsConfig'>.
WARNING 05-07 15:28:27 [__init__.py:80] The quantization method 'gptq' already exists and will be overwritten by the quantization config <class 'vllm_metax.quant_config.gptq.MacaGPTQConfig'>.
WARNING 05-07 15:28:27 [__init__.py:80] The quantization method 'gptq_marlin' already exists and will be overwritten by the quantization config <class 'vllm_metax.quant_config.gptq_marlin.MacaGPTQMarlinConfig'>.
WARNING 05-07 15:28:27 [__init__.py:80] The quantization method 'moe_wna16' already exists and will be overwritten by the quantization config <class 'vllm_metax.quant_config.moe_wna16.MacaMoeWNA16Config'>.
WARNING 05-07 15:28:27 [__init__.py:80] The quantization method 'compressed-tensors' already exists and will be overwritten by the quantization config <class 'vllm_metax.quant_config.compressed_tensors.MacaCompressedTensorsConfig'>.
WARNING 05-07 15:28:27 [__init__.py:80] The quantization method 'gptq' already exists and will be overwritten by the quantization config <class 'vllm_metax.quant_config.gptq.MacaGPTQConfig'>.
WARNING 05-07 15:28:27 [__init__.py:80] The quantization method 'gptq_marlin' already exists and will be overwritten by the quantization config <class 'vllm_metax.quant_config.gptq_marlin.MacaGPTQMarlinConfig'>.
WARNING 05-07 15:28:27 [__init__.py:80] The quantization method 'moe_wna16' already exists and will be overwritten by the quantization config <class 'vllm_metax.quant_config.moe_wna16.MacaMoeWNA16Config'>.
WARNING 05-07 15:28:33 [registry.py:886] Model architecture DeepSeekMTPModel is already registered, and will be overwritten by the new model class vllm_metax.models.deepseek_mtp:DeepSeekMTP.
WARNING 05-07 15:28:33 [registry.py:886] Model architecture DeepseekV2ForCausalLM is already registered, and will be overwritten by the new model class vllm_metax.models.deepseek_v2:DeepseekV2ForCausalLM.
WARNING 05-07 15:28:33 [registry.py:886] Model architecture DeepseekV3ForCausalLM is already registered, and will be overwritten by the new model class vllm_metax.models.deepseek_v2:DeepseekV3ForCausalLM.
WARNING 05-07 15:28:33 [registry.py:886] Model architecture DeepseekV32ForCausalLM is already registered, and will be overwritten by the new model class vllm_metax.models.deepseek_v2:DeepseekV3ForCausalLM.
WARNING 05-07 15:28:33 [registry.py:886] Model architecture KimiK25ForConditionalGeneration is already registered, and will be overwritten by the new model class vllm_metax.models.kimi_k25:KimiK25ForConditionalGeneration.
WARNING 05-07 15:28:33 [registry.py:886] Model architecture GlmMoeDsaForCausalLM is already registered, and will be overwritten by the new model class vllm_metax.models.deepseek_v2:GlmMoeDsaForCausalLM.
WARNING 05-07 15:28:33 [registry.py:886] Model architecture Step3p5MTP is already registered, and will be overwritten by the new model class vllm_metax.models.step3p5_mtp:Step3p5MTP.
WARNING 05-07 15:28:33 [registry.py:886] Model architecture DeepSeekMTPModel is already registered, and will be overwritten by the new model class vllm_metax.models.deepseek_mtp:DeepSeekMTP.
WARNING 05-07 15:28:33 [registry.py:886] Model architecture DeepseekV2ForCausalLM is already registered, and will be overwritten by the new model class vllm_metax.models.deepseek_v2:DeepseekV2ForCausalLM.
WARNING 05-07 15:28:33 [registry.py:886] Model architecture DeepseekV3ForCausalLM is already registered, and will be overwritten by the new model class vllm_metax.models.deepseek_v2:DeepseekV3ForCausalLM.
WARNING 05-07 15:28:33 [registry.py:886] Model architecture DeepseekV32ForCausalLM is already registered, and will be overwritten by the new model class vllm_metax.models.deepseek_v2:DeepseekV3ForCausalLM.
WARNING 05-07 15:28:33 [registry.py:886] Model architecture KimiK25ForConditionalGeneration is already registered, and will be overwritten by the new model class vllm_metax.models.kimi_k25:KimiK25ForConditionalGeneration.
WARNING 05-07 15:28:33 [registry.py:886] Model architecture GlmMoeDsaForCausalLM is already registered, and will be overwritten by the new model class vllm_metax.models.deepseek_v2:GlmMoeDsaForCausalLM.
WARNING 05-07 15:28:33 [registry.py:886] Model architecture Step3p5MTP is already registered, and will be overwritten by the new model class vllm_metax.models.step3p5_mtp:Step3p5MTP.
(Worker pid=445) INFO 05-07 15:28:33 [parallel_state.py:1393] world_size=8 rank=1 local_rank=1 distributed_init_method=tcp://127.0.0.1:41001 backend=nccl
(Worker pid=451) INFO 05-07 15:28:33 [parallel_state.py:1393] world_size=8 rank=7 local_rank=7 distributed_init_method=tcp://127.0.0.1:41001 backend=nccl
WARNING 05-07 15:28:33 [registry.py:886] Model architecture DeepSeekMTPModel is already registered, and will be overwritten by the new model class vllm_metax.models.deepseek_mtp:DeepSeekMTP.
WARNING 05-07 15:28:33 [registry.py:886] Model architecture DeepseekV2ForCausalLM is already registered, and will be overwritten by the new model class vllm_metax.models.deepseek_v2:DeepseekV2ForCausalLM.
WARNING 05-07 15:28:33 [registry.py:886] Model architecture DeepseekV3ForCausalLM is already registered, and will be overwritten by the new model class vllm_metax.models.deepseek_v2:DeepseekV3ForCausalLM.
WARNING 05-07 15:28:33 [registry.py:886] Model architecture DeepseekV32ForCausalLM is already registered, and will be overwritten by the new model class vllm_metax.models.deepseek_v2:DeepseekV3ForCausalLM.
WARNING 05-07 15:28:33 [registry.py:886] Model architecture KimiK25ForConditionalGeneration is already registered, and will be overwritten by the new model class vllm_metax.models.kimi_k25:KimiK25ForConditionalGeneration.
WARNING 05-07 15:28:33 [registry.py:886] Model architecture GlmMoeDsaForCausalLM is already registered, and will be overwritten by the new model class vllm_metax.models.deepseek_v2:GlmMoeDsaForCausalLM.
WARNING 05-07 15:28:33 [registry.py:886] Model architecture Step3p5MTP is already registered, and will be overwritten by the new model class vllm_metax.models.step3p5_mtp:Step3p5MTP.
[15:28:33.097][MXKW][E]queues.c                :832 : [mxkwCreateQueueBlock]ioctl create queue block failed -1
[15:28:33.098][MXC][E]exception: DMAQueue create failed at mxkwCreateQueueBlock.
[15:28:33.098][MCR][E]mx_device.cpp            :3913: Mxc copy from host to device failed with code 4104
[15:28:33.101][MXKW][E]queues.c                :832 : [mxkwCreateQueueBlock]ioctl create queue block failed -1
[15:28:33.101][MXC][E]exception: DMAQueue create failed at mxkwCreateQueueBlock.
[15:28:33.101][MCR][E]mx_device.cpp            :4006: Mxc copy from device to device failed with code 4104
[15:28:33.104][MXKW][E]queues.c                :832 : [mxkwCreateQueueBlock]ioctl create queue block failed -1
[15:28:33.104][MXC][E]exception: DMAQueue create failed at mxkwCreateQueueBlock.
[15:28:33.104][MCR][E]mx_device.cpp            :3913: Mxc copy from host to device failed with code 4104
[15:28:33.108][MXKW][E]queues.c                :832 : [mxkwCreateQueueBlock]ioctl create queue block failed -1
[15:28:33.108][MXC][E]exception: DMAQueue create failed at mxkwCreateQueueBlock.
[15:28:33.108][MCR][E]mx_device.cpp            :3913: Mxc copy from host to device failed with code 4104
[15:28:33.112][MXKW][E]queues.c                :832 : [mxkwCreateQueueBlock]ioctl create queue block failed -1
[15:28:33.113][MCR][E]mx_device.cpp            :1379: Device::acquireQueue: mxc_queue_acquire failed!
WARNING 05-07 15:28:33 [registry.py:886] Model architecture DeepSeekMTPModel is already registered, and will be overwritten by the new model class vllm_metax.models.deepseek_mtp:DeepSeekMTP.
WARNING 05-07 15:28:33 [registry.py:886] Model architecture DeepseekV2ForCausalLM is already registered, and will be overwritten by the new model class vllm_metax.models.deepseek_v2:DeepseekV2ForCausalLM.
WARNING 05-07 15:28:33 [registry.py:886] Model architecture DeepseekV3ForCausalLM is already registered, and will be overwritten by the new model class vllm_metax.models.deepseek_v2:DeepseekV3ForCausalLM.
WARNING 05-07 15:28:33 [registry.py:886] Model architecture DeepseekV32ForCausalLM is already registered, and will be overwritten by the new model class vllm_metax.models.deepseek_v2:DeepseekV3ForCausalLM.
WARNING 05-07 15:28:33 [registry.py:886] Model architecture KimiK25ForConditionalGeneration is already registered, and will be overwritten by the new model class vllm_metax.models.kimi_k25:KimiK25ForConditionalGeneration.
WARNING 05-07 15:28:33 [registry.py:886] Model architecture GlmMoeDsaForCausalLM is already registered, and will be overwritten by the new model class vllm_metax.models.deepseek_v2:GlmMoeDsaForCausalLM.
WARNING 05-07 15:28:33 [registry.py:886] Model architecture Step3p5MTP is already registered, and will be overwritten by the new model class vllm_metax.models.step3p5_mtp:Step3p5MTP.
(Worker pid=447) INFO 05-07 15:28:33 [parallel_state.py:1393] world_size=8 rank=3 local_rank=3 distributed_init_method=tcp://127.0.0.1:41001 backend=nccl
WARNING 05-07 15:28:33 [registry.py:886] Model architecture DeepSeekMTPModel is already registered, and will be overwritten by the new model class vllm_metax.models.deepseek_mtp:DeepSeekMTP.
WARNING 05-07 15:28:33 [registry.py:886] Model architecture DeepseekV2ForCausalLM is already registered, and will be overwritten by the new model class vllm_metax.models.deepseek_v2:DeepseekV2ForCausalLM.
WARNING 05-07 15:28:33 [registry.py:886] Model architecture DeepseekV3ForCausalLM is already registered, and will be overwritten by the new model class vllm_metax.models.deepseek_v2:DeepseekV3ForCausalLM.
WARNING 05-07 15:28:33 [registry.py:886] Model architecture DeepseekV32ForCausalLM is already registered, and will be overwritten by the new model class vllm_metax.models.deepseek_v2:DeepseekV3ForCausalLM.
WARNING 05-07 15:28:33 [registry.py:886] Model architecture KimiK25ForConditionalGeneration is already registered, and will be overwritten by the new model class vllm_metax.models.kimi_k25:KimiK25ForConditionalGeneration.
WARNING 05-07 15:28:33 [registry.py:886] Model architecture GlmMoeDsaForCausalLM is already registered, and will be overwritten by the new model class vllm_metax.models.deepseek_v2:GlmMoeDsaForCausalLM.
WARNING 05-07 15:28:33 [registry.py:886] Model architecture Step3p5MTP is already registered, and will be overwritten by the new model class vllm_metax.models.step3p5_mtp:Step3p5MTP.
(Worker pid=450) INFO 05-07 15:28:33 [parallel_state.py:1393] world_size=8 rank=6 local_rank=6 distributed_init_method=tcp://127.0.0.1:41001 backend=nccl
WARNING 05-07 15:28:33 [registry.py:886] Model architecture DeepSeekMTPModel is already registered, and will be overwritten by the new model class vllm_metax.models.deepseek_mtp:DeepSeekMTP.
WARNING 05-07 15:28:33 [registry.py:886] Model architecture DeepseekV2ForCausalLM is already registered, and will be overwritten by the new model class vllm_metax.models.deepseek_v2:DeepseekV2ForCausalLM.
WARNING 05-07 15:28:33 [registry.py:886] Model architecture DeepseekV3ForCausalLM is already registered, and will be overwritten by the new model class vllm_metax.models.deepseek_v2:DeepseekV3ForCausalLM.
WARNING 05-07 15:28:33 [registry.py:886] Model architecture DeepseekV32ForCausalLM is already registered, and will be overwritten by the new model class vllm_metax.models.deepseek_v2:DeepseekV3ForCausalLM.
WARNING 05-07 15:28:33 [registry.py:886] Model architecture KimiK25ForConditionalGeneration is already registered, and will be overwritten by the new model class vllm_metax.models.kimi_k25:KimiK25ForConditionalGeneration.
WARNING 05-07 15:28:33 [registry.py:886] Model architecture GlmMoeDsaForCausalLM is already registered, and will be overwritten by the new model class vllm_metax.models.deepseek_v2:GlmMoeDsaForCausalLM.
WARNING 05-07 15:28:33 [registry.py:886] Model architecture Step3p5MTP is already registered, and will be overwritten by the new model class vllm_metax.models.step3p5_mtp:Step3p5MTP.
WARNING 05-07 15:28:33 [registry.py:886] Model architecture DeepSeekMTPModel is already registered, and will be overwritten by the new model class vllm_metax.models.deepseek_mtp:DeepSeekMTP.
WARNING 05-07 15:28:33 [registry.py:886] Model architecture DeepseekV2ForCausalLM is already registered, and will be overwritten by the new model class vllm_metax.models.deepseek_v2:DeepseekV2ForCausalLM.
WARNING 05-07 15:28:33 [registry.py:886] Model architecture DeepseekV3ForCausalLM is already registered, and will be overwritten by the new model class vllm_metax.models.deepseek_v2:DeepseekV3ForCausalLM.
WARNING 05-07 15:28:33 [registry.py:886] Model architecture DeepseekV32ForCausalLM is already registered, and will be overwritten by the new model class vllm_metax.models.deepseek_v2:DeepseekV3ForCausalLM.
WARNING 05-07 15:28:33 [registry.py:886] Model architecture KimiK25ForConditionalGeneration is already registered, and will be overwritten by the new model class vllm_metax.models.kimi_k25:KimiK25ForConditionalGeneration.
WARNING 05-07 15:28:33 [registry.py:886] Model architecture GlmMoeDsaForCausalLM is already registered, and will be overwritten by the new model class vllm_metax.models.deepseek_v2:GlmMoeDsaForCausalLM.
WARNING 05-07 15:28:33 [registry.py:886] Model architecture Step3p5MTP is already registered, and will be overwritten by the new model class vllm_metax.models.step3p5_mtp:Step3p5MTP.
(Worker pid=446) INFO 05-07 15:28:33 [parallel_state.py:1393] world_size=8 rank=2 local_rank=2 distributed_init_method=tcp://127.0.0.1:41001 backend=nccl
(Worker pid=449) INFO 05-07 15:28:33 [parallel_state.py:1393] world_size=8 rank=5 local_rank=5 distributed_init_method=tcp://127.0.0.1:41001 backend=nccl
WARNING 05-07 15:28:33 [registry.py:886] Model architecture DeepSeekMTPModel is already registered, and will be overwritten by the new model class vllm_metax.models.deepseek_mtp:DeepSeekMTP.
WARNING 05-07 15:28:33 [registry.py:886] Model architecture DeepseekV2ForCausalLM is already registered, and will be overwritten by the new model class vllm_metax.models.deepseek_v2:DeepseekV2ForCausalLM.
WARNING 05-07 15:28:33 [registry.py:886] Model architecture DeepseekV3ForCausalLM is already registered, and will be overwritten by the new model class vllm_metax.models.deepseek_v2:DeepseekV3ForCausalLM.
WARNING 05-07 15:28:33 [registry.py:886] Model architecture DeepseekV32ForCausalLM is already registered, and will be overwritten by the new model class vllm_metax.models.deepseek_v2:DeepseekV3ForCausalLM.
WARNING 05-07 15:28:33 [registry.py:886] Model architecture KimiK25ForConditionalGeneration is already registered, and will be overwritten by the new model class vllm_metax.models.kimi_k25:KimiK25ForConditionalGeneration.
WARNING 05-07 15:28:33 [registry.py:886] Model architecture GlmMoeDsaForCausalLM is already registered, and will be overwritten by the new model class vllm_metax.models.deepseek_v2:GlmMoeDsaForCausalLM.
WARNING 05-07 15:28:33 [registry.py:886] Model architecture Step3p5MTP is already registered, and will be overwritten by the new model class vllm_metax.models.step3p5_mtp:Step3p5MTP.
(Worker pid=444) INFO 05-07 15:28:33 [parallel_state.py:1393] world_size=8 rank=0 local_rank=0 distributed_init_method=tcp://127.0.0.1:41001 backend=nccl
(EngineCore_DP0 pid=302) ERROR 05-07 15:28:41 [core.py:1100] EngineCore failed to start.
(EngineCore_DP0 pid=302) ERROR 05-07 15:28:41 [core.py:1100] Traceback (most recent call last):
(EngineCore_DP0 pid=302) ERROR 05-07 15:28:41 [core.py:1100]   File "/opt/conda/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 1090, in run_engine_core
(EngineCore_DP0 pid=302) ERROR 05-07 15:28:41 [core.py:1100]     engine_core = EngineCoreProc(*args, engine_index=dp_rank, **kwargs)
(EngineCore_DP0 pid=302) ERROR 05-07 15:28:41 [core.py:1100]                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=302) ERROR 05-07 15:28:41 [core.py:1100]   File "/opt/conda/lib/python3.12/site-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
(EngineCore_DP0 pid=302) ERROR 05-07 15:28:41 [core.py:1100]     return func(*args, **kwargs)
(EngineCore_DP0 pid=302) ERROR 05-07 15:28:41 [core.py:1100]            ^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=302) ERROR 05-07 15:28:41 [core.py:1100]   File "/opt/conda/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 834, in __init__
(EngineCore_DP0 pid=302) ERROR 05-07 15:28:41 [core.py:1100]     super().__init__(
(EngineCore_DP0 pid=302) ERROR 05-07 15:28:41 [core.py:1100]   File "/opt/conda/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 110, in __init__
(EngineCore_DP0 pid=302) ERROR 05-07 15:28:41 [core.py:1100]     self.model_executor = executor_class(vllm_config)
(EngineCore_DP0 pid=302) ERROR 05-07 15:28:41 [core.py:1100]                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=302) ERROR 05-07 15:28:41 [core.py:1100]   File "/opt/conda/lib/python3.12/site-packages/vllm_metax/v1/executor/multiproc_executor.py", line 100, in __init__
(EngineCore_DP0 pid=302) ERROR 05-07 15:28:41 [core.py:1100]     super().__init__(vllm_config)
(EngineCore_DP0 pid=302) ERROR 05-07 15:28:41 [core.py:1100]   File "/opt/conda/lib/python3.12/site-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
(EngineCore_DP0 pid=302) ERROR 05-07 15:28:41 [core.py:1100]     return func(*args, **kwargs)
(EngineCore_DP0 pid=302) ERROR 05-07 15:28:41 [core.py:1100]            ^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=302) ERROR 05-07 15:28:41 [core.py:1100]   File "/opt/conda/lib/python3.12/site-packages/vllm/v1/executor/abstract.py", line 103, in __init__
(EngineCore_DP0 pid=302) ERROR 05-07 15:28:41 [core.py:1100]     self._init_executor()
(EngineCore_DP0 pid=302) ERROR 05-07 15:28:41 [core.py:1100]   File "/opt/conda/lib/python3.12/site-packages/vllm_metax/v1/executor/multiproc_executor.py", line 180, in _init_executor
(EngineCore_DP0 pid=302) ERROR 05-07 15:28:41 [core.py:1100]     self.workers = WorkerProc.wait_for_ready(unready_workers)
(EngineCore_DP0 pid=302) ERROR 05-07 15:28:41 [core.py:1100]                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=302) ERROR 05-07 15:28:41 [core.py:1100]   File "/opt/conda/lib/python3.12/site-packages/vllm_metax/v1/executor/multiproc_executor.py", line 701, in wait_for_ready
(EngineCore_DP0 pid=302) ERROR 05-07 15:28:41 [core.py:1100]     raise e from None
(EngineCore_DP0 pid=302) ERROR 05-07 15:28:41 [core.py:1100] Exception: WorkerProc initialization failed due to an exception in a background process. See stack trace for root cause.
(EngineCore_DP0 pid=302) Process EngineCore_DP0:
(EngineCore_DP0 pid=302) Traceback (most recent call last):
(EngineCore_DP0 pid=302)   File "/opt/conda/lib/python3.12/multiprocessing/process.py", line 314, in _bootstrap
(EngineCore_DP0 pid=302)     self.run()
(EngineCore_DP0 pid=302)   File "/opt/conda/lib/python3.12/multiprocessing/process.py", line 108, in run
(EngineCore_DP0 pid=302)     self._target(*self._args, **self._kwargs)
(EngineCore_DP0 pid=302)   File "/opt/conda/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 1104, in run_engine_core
(EngineCore_DP0 pid=302)     raise e
(EngineCore_DP0 pid=302)   File "/opt/conda/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 1090, in run_engine_core
(EngineCore_DP0 pid=302)     engine_core = EngineCoreProc(*args, engine_index=dp_rank, **kwargs)
(EngineCore_DP0 pid=302)                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=302)   File "/opt/conda/lib/python3.12/site-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
(EngineCore_DP0 pid=302)     return func(*args, **kwargs)
(EngineCore_DP0 pid=302)            ^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=302)   File "/opt/conda/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 834, in __init__
(EngineCore_DP0 pid=302)     super().__init__(
(EngineCore_DP0 pid=302)   File "/opt/conda/lib/python3.12/site-packages/vllm/v1/engine/core.py", line 110, in __init__
(EngineCore_DP0 pid=302)     self.model_executor = executor_class(vllm_config)
(EngineCore_DP0 pid=302)                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=302)   File "/opt/conda/lib/python3.12/site-packages/vllm_metax/v1/executor/multiproc_executor.py", line 100, in __init__
(EngineCore_DP0 pid=302)     super().__init__(vllm_config)
(EngineCore_DP0 pid=302)   File "/opt/conda/lib/python3.12/site-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
(EngineCore_DP0 pid=302)     return func(*args, **kwargs)
(EngineCore_DP0 pid=302)            ^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=302)   File "/opt/conda/lib/python3.12/site-packages/vllm/v1/executor/abstract.py", line 103, in __init__
(EngineCore_DP0 pid=302)     self._init_executor()
(EngineCore_DP0 pid=302)   File "/opt/conda/lib/python3.12/site-packages/vllm_metax/v1/executor/multiproc_executor.py", line 180, in _init_executor
(EngineCore_DP0 pid=302)     self.workers = WorkerProc.wait_for_ready(unready_workers)
(EngineCore_DP0 pid=302)                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=302)   File "/opt/conda/lib/python3.12/site-packages/vllm_metax/v1/executor/multiproc_executor.py", line 701, in wait_for_ready
(EngineCore_DP0 pid=302)     raise e from None
(EngineCore_DP0 pid=302) Exception: WorkerProc initialization failed due to an exception in a background process. See stack trace for root cause.
(APIServer pid=17) Traceback (most recent call last):
(APIServer pid=17)   File "/opt/conda/bin/vllm", line 8, in <module>
(APIServer pid=17)     sys.exit(main())
(APIServer pid=17)              ^^^^^^
(APIServer pid=17)   File "/opt/conda/lib/python3.12/site-packages/vllm/entrypoints/cli/main.py", line 73, in main
(APIServer pid=17)     args.dispatch_function(args)
(APIServer pid=17)   File "/opt/conda/lib/python3.12/site-packages/vllm/entrypoints/cli/serve.py", line 112, in cmd
(APIServer pid=17)     uvloop.run(run_server(args))
(APIServer pid=17)   File "/opt/conda/lib/python3.12/site-packages/uvloop/__init__.py", line 96, in run
(APIServer pid=17)     return __asyncio.run(
(APIServer pid=17)            ^^^^^^^^^^^^^^
(APIServer pid=17)   File "/opt/conda/lib/python3.12/asyncio/runners.py", line 195, in run
(APIServer pid=17)     return runner.run(main)
(APIServer pid=17)            ^^^^^^^^^^^^^^^^
(APIServer pid=17)   File "/opt/conda/lib/python3.12/asyncio/runners.py", line 118, in run
(APIServer pid=17)     return self._loop.run_until_complete(task)
(APIServer pid=17)            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=17)   File "uvloop/loop.pyx", line 1518, in uvloop.loop.Loop.run_until_complete
(APIServer pid=17)   File "/opt/conda/lib/python3.12/site-packages/uvloop/__init__.py", line 48, in wrapper
(APIServer pid=17)     return await main
(APIServer pid=17)            ^^^^^^^^^^
(APIServer pid=17)   File "/opt/conda/lib/python3.12/site-packages/vllm/entrypoints/openai/api_server.py", line 471, in run_server
(APIServer pid=17)     await run_server_worker(listen_address, sock, args, **uvicorn_kwargs)
(APIServer pid=17)   File "/opt/conda/lib/python3.12/site-packages/vllm/entrypoints/openai/api_server.py", line 490, in run_server_worker
(APIServer pid=17)     async with build_async_engine_client(
(APIServer pid=17)                ^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=17)   File "/opt/conda/lib/python3.12/contextlib.py", line 210, in __aenter__
(APIServer pid=17)     return await anext(self.gen)
(APIServer pid=17)            ^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=17)   File "/opt/conda/lib/python3.12/site-packages/vllm/entrypoints/openai/api_server.py", line 96, in build_async_engine_client
(APIServer pid=17)     async with build_async_engine_client_from_engine_args(
(APIServer pid=17)                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=17)   File "/opt/conda/lib/python3.12/contextlib.py", line 210, in __aenter__
(APIServer pid=17)     return await anext(self.gen)
(APIServer pid=17)            ^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=17)   File "/opt/conda/lib/python3.12/site-packages/vllm/entrypoints/openai/api_server.py", line 137, in build_async_engine_client_from_engine_args
(APIServer pid=17)     async_llm = AsyncLLM.from_vllm_config(
(APIServer pid=17)                 ^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=17)   File "/opt/conda/lib/python3.12/site-packages/vllm/v1/engine/async_llm.py", line 225, in from_vllm_config
(APIServer pid=17)     return cls(
(APIServer pid=17)            ^^^^
(APIServer pid=17)   File "/opt/conda/lib/python3.12/site-packages/vllm/v1/engine/async_llm.py", line 154, in __init__
(APIServer pid=17)     self.engine_core = EngineCoreClient.make_async_mp_client(
(APIServer pid=17)                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=17)   File "/opt/conda/lib/python3.12/site-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
(APIServer pid=17)     return func(*args, **kwargs)
(APIServer pid=17)            ^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=17)   File "/opt/conda/lib/python3.12/site-packages/vllm/v1/engine/core_client.py", line 127, in make_async_mp_client
(APIServer pid=17)     return AsyncMPClient(*client_args)
(APIServer pid=17)            ^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=17)   File "/opt/conda/lib/python3.12/site-packages/vllm/tracing/otel.py", line 178, in sync_wrapper
(APIServer pid=17)     return func(*args, **kwargs)
(APIServer pid=17)            ^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=17)   File "/opt/conda/lib/python3.12/site-packages/vllm/v1/engine/core_client.py", line 911, in __init__
(APIServer pid=17)     super().__init__(
(APIServer pid=17)   File "/opt/conda/lib/python3.12/site-packages/vllm/v1/engine/core_client.py", line 569, in __init__
(APIServer pid=17)     with launch_core_engines(
(APIServer pid=17)          ^^^^^^^^^^^^^^^^^^^^
(APIServer pid=17)   File "/opt/conda/lib/python3.12/contextlib.py", line 144, in __exit__
(APIServer pid=17)     next(self.gen)
(APIServer pid=17)   File "/opt/conda/lib/python3.12/site-packages/vllm/v1/engine/utils.py", line 951, in launch_core_engines
(APIServer pid=17)     wait_for_engine_startup(
(APIServer pid=17)   File "/opt/conda/lib/python3.12/site-packages/vllm/v1/engine/utils.py", line 1010, in wait_for_engine_startup
(APIServer pid=17)     raise RuntimeError(
(APIServer pid=17) RuntimeError: Engine core initialization failed. See root cause above. Failed core proc(s): {}