Threads | xiaoo | 沐曦开发者论坛

配置：
联想 SR658H，内存 512GB，显卡：N260 * 2

问题： 只运行一个不会出问题，但运行第二个就无法分配到显存，卡住了

模型版本：
cr.metax-tech.com/public-ai-release/maca/vllm-metax:0.11.2-maca.ai3.3.0.103-torch2.8-py312-ubuntu22.04-amd64

dokcer 命令：

docker run -itd \
  --restart always \
  --privileged \
  --device=/dev/dri \
  --device=/dev/mxcd \
  --group-add video \
  --network=host \
  --name Qwen3-Next-80B-A3B-Instruct.w8a8 \
  --security-opt seccomp=unconfined \
  --security-opt apparmor=unconfined \
  --shm-size 100gb \
  --ulimit memlock=-1 \
  -v /models:/models \
  cr.metax-tech.com/public-ai-release/maca/vllm-metax:0.11.2-maca.ai3.3.0.103-torch2.8-py312-ubuntu22.04-amd64 \
  /bin/bash

模型启动命令:

VLLM_USE_V1=0 nohup vllm serve /models/Qwen3-Next-80B-A3B-Instruct.w8a8 \
  --port 8889 \
  -tp 2 \
  --enforce-eager \
  --max-model-len 15000 \
  --gpu-memory-utilization 0.7 \
  --api-key Dzdwd@85416 \
  --max-num-seqs 35 \
  --served-model-name Qwen3-Next-80B-A3B-Instruct.w8a8 > vllm-80b.log 2>&1 &

向量启动命令：

nohup vllm serve /models/qwen3-Embedding-0.6B \
  --port 8890 \
  --enforce-eager \
  --served-model-name qwen3-Embedding-0.6B \
  --max-model-len 1024 \
  --gpu-memory-utilization 0.1 \
  --trust-remote-code \
  --task embed \
  --api-key Dzdwd@85416 > vllm-emb.log 2>&1 &

问题：只运行一个不会出问题，但运行第二个就无法分配到显存，卡住了

(EngineCore_DP0 pid=20179) INFO 01-30 12:54:48 [core.py:93] Initializing a V1 LLM engine (v0.11.2) with config: model='/models/qwen3-Embedding-0.6B', speculative_config=None, tokenizer='/models/qwen3-Embedding-0.6B', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, tokenizer_revision=None, trust_remote_code=True, dtype=torch.bfloat16, max_seq_len=2048, download_dir=None, load_format=auto, tensor_parallel_size=1, pipeline_parallel_size=1, data_parallel_size=1, disable_custom_all_reduce=True, quantization=None, enforce_eager=True, kv_cache_dtype=auto, device_config=cuda, structured_outputs_config=StructuredOutputsConfig(backend='auto', disable_fallback=False, disable_any_whitespace=False, disable_additional_properties=False, reasoning_parser='', reasoning_parser_plugin='', enable_in_reasoning=False), observability_config=ObservabilityConfig(show_hidden_metrics_for_version=None, otlp_traces_endpoint=None, collect_detailed_traces=None), seed=0, served_model_name=qwen3-Embedding-0.6B, enable_prefix_caching=True, enable_chunked_prefill=True, pooler_config=PoolerConfig(pooling_type='LAST', normalize=True, dimensions=None, enable_chunked_processing=None, max_embed_len=None, softmax=None, activation=None, use_activation=None, logit_bias=None, step_tag_id=None, returned_token_ids=None), compilation_config={'level': None, 'mode': <CompilationMode.NONE: 0>, 'debug_dump_path': None, 'cache_dir': '', 'compile_cache_save_format': 'binary', 'backend': 'inductor', 'custom_ops': ['all'], 'splitting_ops': None, 'compile_mm_encoder': False, 'use_inductor': None, 'compile_sizes': [], 'inductor_compile_config': {'enable_auto_functionalized_v2': False}, 'inductor_passes': {}, 'cudagraph_mode': <CUDAGraphMode.NONE: 0>, 'cudagraph_num_of_warmups': 0, 'cudagraph_capture_sizes': [], 'cudagraph_copy_inputs': False, 'cudagraph_specialize_lora': True, 'use_inductor_graph_partition': False, 'pass_config': {}, 'max_cudagraph_capture_size': 0, 'local_cache_dir': None}
(EngineCore_DP0 pid=20179) INFO 01-30 12:54:48 [parallel_state.py:1208] world_size=1 rank=0 local_rank=0 distributed_init_method=tcp://10.196.210.3:40141 backend=nccl
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
[Gloo] Rank 0 is connected to 0 peer ranks. Expected number of connected peer ranks is : 0
(EngineCore_DP0 pid=20179) INFO 01-30 12:54:49 [parallel_state.py:1394] rank 0 in world size 1 is assigned as DP rank 0, PP rank 0, TP rank 0, EP rank 0

mx-smi：

mx-smi  version: 2.2.9

=================== MetaX System Management Interface Log ===================
Timestamp                                         : Fri Jan 30 12:59:17 2026

Attached GPUs                                     : 2
+---------------------------------------------------------------------------------+
| MX-SMI 2.2.9                       Kernel Mode Driver Version: 3.4.4            |
| MACA Version: 3.3.0.15             BIOS Version: 1.29.1.0                       |
|------------------+-----------------+---------------------+----------------------|
| Board       Name | GPU   Persist-M | Bus-id              | GPU-Util      sGPU-M |
| Pwr:Usage/Cap    | Temp       Perf | Memory-Usage        | GPU-State            |
|==================+=================+=====================+======================|
| 0     MetaX N260 | 0           Off | 0000:41:00.0        | 0%          Disabled |
| 52W / 225W       | 43C          P9 | 47883/65536 MiB     | Available            |
+------------------+-----------------+---------------------+----------------------+
| 1     MetaX N260 | 1           Off | 0000:c1:00.0        | 0%          Disabled |
| 47W / 225W       | 40C          P9 | 47867/65536 MiB     | Available            |
+------------------+-----------------+---------------------+----------------------+

+---------------------------------------------------------------------------------+
| Process:                                                                        |
|  GPU                    PID         Process Name                 GPU Memory     |
|                                                                  Usage(MiB)     |
|=================================================================================|
|  0                  2322349         VLLM::Worker_TP              47198          |
|  0                  2343541         VLLM::EngineCor              16             |
|  1                  2322350         VLLM::Worker_TP              47198          |
+---------------------------------------------------------------------------------+

请问该如何操作，还是参数有问题？