modelscope上下载的MiniMax-M2.7-W8A8模型;
镜像用的: :0.14.0-maca.ai3.5.3.102-torch2.8-py310-ubuntu22.04-amd64
c500 单机16卡。
启动命令:
export VLLM_ALLOW_RUNTIME_LORA_UPDATING=True
export MACA_DIRECT_DISPATCH=1
export MACA_GRAPH_LAUNCH_MODE=5
export MACA_SMALL_PAGESIZE_ENABLE=1
export MACA_TORCH_COMPILE_CONF=triton.multi_kernel:1
MODELPATH=/data/opensource-models/MiniMax-M2.7-W8A8-official/
MODEL_NAME=MiniMax-M2.7-W8A8
port=${1:-12001}
currenttime=date "+%Y%m%d%H%M%S"
vllm serve ${MODELPATH} \
--host 0.0.0.0 \
--port ${port} \
--served-model-name ${MODEL_NAME} \
--tensor-parallel-size 16 \
--pipeline-parallel-size 1 \
--dtype half \
--gpu-memory-utilization 0.9 \
--max-num-batched-tokens 8192 \
--max-model-len 8192 \
--swap-space 64 \
--mm-encoder-tp-mode data \
--trust-remote-code \
--max-num-seqs=64 \
--no-enable-prefix-caching --enable-auto-tool-choice --tool-call-parser minimax_m2 --reasoning-parser minimax_m2_append_think \
2>&1 | tee ./${currenttime}.log
报错了。。。。