问题已解决,更换了启动命令:
vllm serve /models/Qwen3.5-122B-A10B -pp 1 -tp 8 \
--trust-remote-code --dtype bfloat16 --distributed-executor-backend mp --swap-space 16 \
--gpu-memory-utilization 0.85 --max-model-len 131072 --max-num-batched-tokens 131072 --no-async-scheduling --mm-encoder-tp-mode data --mm-processor-cache-type shm --limit-mm-per-prompt '{"image": 5, "video": 1}' --skip-mm-profiling --enable-prefix-caching \
--served-model-name Qwen3.5-122B-A10B \
--enable-auto-tool-choice --tool-call-parser qwen3_coder --reasoning-parser qwen3 \
--port 8000