• Members 12 posts
    2026年5月22日 12:22

    一、软硬件信息:
    1.服务器厂家:浪潮

    2.沐曦GPU型号:MetaX C500 8卡

    3.操作系统内核版本:6.6.0-32.7.v2505.ky11.x86_64

    4.是否开启CPU虚拟化:开启

    5.mx-smi回显:
    mx-smi version: 2.2.12

    =================== MetaX System Management Interface Log ===================
    Timestamp : Wed May 20 18:14:56 2026

    Attached GPUs : 8
    +---------------------------------------------------------------------------------+
    | MX-SMI 2.2.12 Kernel Mode Driver Version: 3.6.11 |
    | MACA Version: unknown BIOS Version: 1.31.1.0 |
    |------------------+-----------------+---------------------+----------------------|
    | Board Name | GPU Persist-M | Bus-id | GPU-Util sGPU-M |
    | Pwr:Usage/Cap | Temp Perf | Memory-Usage | GPU-State |
    |==================+=================+=====================+======================|
    | 0 MetaX C500 | 0 Off | 0000:04:00.0 | 0% Disabled |
    | 82W / 350W | 61C P9 | 40353/65536 MiB | Available |
    +------------------+-----------------+---------------------+----------------------+
    | 1 MetaX C500 | 1 Off | 0000:05:00.0 | 0% Disabled |
    | 75W / 350W | 58C P9 | 40993/65536 MiB | Available |
    +------------------+-----------------+---------------------+----------------------+
    | 2 MetaX C500 | 2 Off | 0000:63:00.0 | 0% Disabled |
    | 80W / 350W | 56C P9 | 40353/65536 MiB | Available |
    +------------------+-----------------+---------------------+----------------------+
    | 3 MetaX C500 | 3 Off | 0000:64:00.0 | 0% Disabled |
    | 80W / 350W | 59C P9 | 40993/65536 MiB | Available |
    +------------------+-----------------+---------------------+----------------------+
    | 4 MetaX C500 | 4 Off | 0000:83:00.0 | 0% Disabled |
    | 82W / 350W | 56C P9 | 40993/65536 MiB | Available |
    +------------------+-----------------+---------------------+----------------------+
    | 5 MetaX C500 | 5 Off | 0000:84:00.0 | 0% Disabled |
    | 72W / 350W | 53C P9 | 40353/65536 MiB | Available |
    +------------------+-----------------+---------------------+----------------------+
    | 6 MetaX C500 | 6 Off | 0000:e4:00.0 | 0% Disabled |
    | 81W / 350W | 58C P9 | 40993/65536 MiB | Available |
    +------------------+-----------------+---------------------+----------------------+
    | 7 MetaX C500 | 7 Off | 0000:e5:00.0 | 0% Disabled |
    | 74W / 350W | 54C P9 | 40353/65536 MiB | Available |
    +------------------+-----------------+---------------------+----------------------+

    +---------------------------------------------------------------------------------+
    | Process: |
    | GPU PID Process Name GPU Memory |
    | Usage(MiB) |
    |=================================================================================|
    | 0 1025936 VLLM::Worker_TP 39386 |
    | 1 1025937 VLLM::Worker_TP 40026 |
    | 2 1025938 VLLM::Worker_TP 39386 |
    | 3 1025939 VLLM::Worker_TP 40026 |
    | 4 1025940 VLLM::Worker_TP 40026 |
    | 5 1025941 VLLM::Worker_TP 39386 |
    | 6 1025942 VLLM::Worker_TP 40026 |
    | 7 1025943 VLLM::Worker_TP 39386 |
    +---------------------------------------------------------------------------------+

    6.docker info回显:
    [root@localhost ~]# docker info
    Client:
    Version: 24.0.9
    Context: default
    Debug Mode: false

    Server:
    Containers: 1
    Running: 1
    Paused: 0
    Stopped: 0
    Images: 1
    Server Version: 24.0.9
    Storage Driver: overlay2
    Backing Filesystem: xfs
    Supports d_type: true
    Using metacopy: false
    Native Overlay Diff: true
    userxattr: false
    Logging Driver: json-file
    Cgroup Driver: cgroupfs
    Cgroup Version: 1
    Plugins:
    Volume: local
    Network: bridge host ipvlan macvlan null overlay
    Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
    Swarm: inactive
    Runtimes: io.containerd.runc.v2 runc
    Default Runtime: runc
    Init Binary: docker-init
    containerd version: 9a04df1519ac2967eece6c6a5d13d3b846b574b2.m
    runc version:
    init version:
    Security Options:
    seccomp
    Profile: builtin
    Kernel Version: 6.6.0-32.7.v2505.ky11.x86_64
    Operating System: Kylin Linux Advanced Server V11 (Swan25)
    OSType: linux
    Architecture: x86_64
    CPUs: 256
    Total Memory: 1.472TiB
    Name: localhost.localdomain
    ID: ded90092-4000-426b-a3ca-08950e376242
    Docker Root Dir: /home/docker
    Debug Mode: false
    Experimental: false
    Insecure Registries:
    127.0.0.0/8
    Registry Mirrors:
    docker.1ms.run/
    dockerpull.com/
    registry.docker-cn.com/
    Live Restore Enabled: false

    二问题
    metaX C500如何部署bge-m3和bge-reranker-v2-m3模型?

  • arrow_forward

    Thread has been moved from 公共.

  • Members 12 posts
    2026年5月22日 14:47

    1.镜像版本:
    cr.metax-tech.com/public-ai-release/maca/vllm-metax:0.19.0-maca.ai3.5.3.502-torch2.8-py312-kylinv11-amd64

    2.启动容器命令:
    docker run -itd \
    --name qwen3.6 \
    --network host \
    --shm-size 512G \
    --device=/dev/dri \
    --device=/dev/mxcd \
    --group-add video \
    --security-opt seccomp=unconfined \
    --security-opt apparmor=unconfined \
    --shm-size 100gb \
    --ulimit memlock=-1 \
    -v /home/modelscope:/root/vllm \
    -e TZ=Asia/Shanghai \
    -p 8000:8000 \
    -p 8001:8001 \
    -p 8002:8002 \
    cr.metax-tech.com/public-ai-release/maca/vllm-metax:0.19.0-maca.ai3.5.3.502-torch2.8-py312-kylinv11-amd64

    nohup vllm serve /root/vllm/bge-m3/ \
    --host 0.0.0.0 \
    --port 8001 \
    --served-model-name bge-m3 \
    --tensor-parallel-size 1 \
    --gpu-memory-utilization 0.1 \
    --trust-remote-code \
    --dtype auto \

    bge-m3.log 2>&1 &

    nohup vllm serve /root/vllm/bge-reranker-v2-m3/ \
    --host 0.0.0.0 \
    --port 8001 \
    --served-model-name bge-reranker-v2-m3 \
    --tensor-parallel-size 1 \
    --gpu-memory-utilization 0.1 \
    --trust-remote-code \
    --dtype auto \

    reranker.log 2>&1 &

    二问题:
    在容器中启动多个服务时报错,报错信息如下
    itionalGeneration.
    WARNING 05-22 14:34:58 [registry.py:915] Model architecture GlmMoeDsaForCausalLM is already registered, and will be overwritten by the new model class vllm_metax.models.deepseek_v2:GlmMoeDsaForCausalLM.
    (Worker pid=2004) INFO 05-22 14:34:59 [parallel_state.py:1400] world_size=8 rank=5 local_rank=5 distributed_init_method=tcp://127.0.0.1:45863 backend=nccl
    (Worker pid=2000) [rank1]:W0522 14:34:59.640000 2000 site-packages/torch/utils/cpp_extension.py:2527] TORCH_CUDA_ARCH_LIST is not set, all archs for visible cards are included for compilation.
    (Worker pid=2000) [rank1]:W0522 14:34:59.640000 2000 site-packages/torch/utils/cpp_extension.py:2527] If this is not desired, please set os.environ['TORCH_CUDA_ARCH_LIST'] to specific architectures.
    (Worker pid=2002) [rank3]:W0522 14:34:59.640000 2002 site-packages/torch/utils/cpp_extension.py:2527] TORCH_CUDA_ARCH_LIST is not set, all archs for visible cards are included for compilation.
    (Worker pid=2002) [rank3]:W0522 14:34:59.640000 2002 site-packages/torch/utils/cpp_extension.py:2527] If this is not desired, please set os.environ['TORCH_CUDA_ARCH_LIST'] to specific architectures.
    (Worker pid=2001) [rank2]:W0522 14:34:59.640000 2001 site-packages/torch/utils/cpp_extension.py:2527] TORCH_CUDA_ARCH_LIST is not set, all archs for visible cards are included for compilation.
    (Worker pid=2001) [rank2]:W0522 14:34:59.640000 2001 site-packages/torch/utils/cpp_extension.py:2527] If this is not desired, please set os.environ['TORCH_CUDA_ARCH_LIST'] to specific architectures.
    (Worker pid=1999) [rank0]:W0522 14:34:59.640000 1999 site-packages/torch/utils/cpp_extension.py:2527] TORCH_CUDA_ARCH_LIST is not set, all archs for visible cards are included for compilation.
    (Worker pid=1999) [rank0]:W0522 14:34:59.640000 1999 site-packages/torch/utils/cpp_extension.py:2527] If this is not desired, please set os.environ['TORCH_CUDA_ARCH_LIST'] to specific architectures.
    (Worker pid=2005) [rank6]:W0522 14:34:59.641000 2005 site-packages/torch/utils/cpp_extension.py:2527] TORCH_CUDA_ARCH_LIST is not set, all archs for visible cards are included for compilation.
    (Worker pid=2006) [rank7]:W0522 14:34:59.641000 2006 site-packages/torch/utils/cpp_extension.py:2527] TORCH_CUDA_ARCH_LIST is not set, all archs for visible cards are included for compilation.
    (Worker pid=2005) [rank6]:W0522 14:34:59.641000 2005 site-packages/torch/utils/cpp_extension.py:2527] If this is not desired, please set os.environ['TORCH_CUDA_ARCH_LIST'] to specific architectures.
    (Worker pid=2006) [rank7]:W0522 14:34:59.641000 2006 site-packages/torch/utils/cpp_extension.py:2527] If this is not desired, please set os.environ['TORCH_CUDA_ARCH_LIST'] to specific architectures.
    (Worker pid=2004) [rank5]:W0522 14:34:59.641000 2004 site-packages/torch/utils/cpp_extension.py:2527] TORCH_CUDA_ARCH_LIST is not set, all archs for visible cards are included for compilation.
    (Worker pid=2004) [rank5]:W0522 14:34:59.641000 2004 site-packages/torch/utils/cpp_extension.py:2527] If this is not desired, please set os.environ['TORCH_CUDA_ARCH_LIST'] to specific architectures.
    (Worker pid=2003) [rank4]:W0522 14:34:59.642000 2003 site-packages/torch/utils/cpp_extension.py:2527] TORCH_CUDA_ARCH_LIST is not set, all archs for visible cards are included for compilation.
    (Worker pid=2003) [rank4]:W0522 14:34:59.642000 2003 site-packages/torch/utils/cpp_extension.py:2527] If this is not desired, please set os.environ['TORCH_CUDA_ARCH_LIST'] to specific architectures.
    (Worker pid=1999) INFO 05-22 14:34:59 [mccl.py:27] Found mccl from library libmccl.so
    (Worker pid=1999) INFO 05-22 14:34:59 [pynccl.py:111] vLLM is using nccl==2.16.5
    [14:35:11.312][MXKW][E]queues.c :826 : [mxkwCreateQueueBlock][Hint]ioctl create queue block timeout, gpu_id:65475 type:21. Retrying.
    [14:35:21.552][MXKW][E]queues.c :826 : [mxkwCreateQueueBlock][Hint]ioctl create queue block timeout, gpu_id:65475 type:21. Retrying.
    [14:35:31.792][MXKW][E]queues.c :826 : [mxkwCreateQueueBlock][Hint]ioctl create queue block timeout, gpu_id:65475 type:21. Retrying.
    [14:35:42.032][MXKW][E]queues.c :826 : [mxkwCreateQueueBlock][Hint]ioctl create queue block timeout, gpu_id:65475 type:21. Retrying.
    [14:35:52.273][MXKW][E]queues.c :826 : [mxkwCreateQueueBlock][Hint]ioctl create queue block timeout, gpu_id:65475 type:21. Retrying.
    [14:36:02.512][MXKW][E]queues.c :826 : [mxkwCreateQueueBlock][Hint]ioctl create queue block timeout, gpu_id:65475 type:21. Retrying.
    [14:36:12.752][MXKW][E]queues.c :826 : [mxkwCreateQueueBlock][Hint]ioctl create queue block timeout, gpu_id:65475 type:21. Retrying.

  • Members 458 posts
    2026年5月22日 14:49

    尊敬的开发者您好,您两个服务的port一致,请更换尝试

  • Members 12 posts
    2026年5月23日 16:38

    端口错开后依旧有如上报错
    启动命令
    nohup vllm serve /root/vllm/Qwen/Qwen3.6-35B-A3B/ \
    --host 0.0.0.0 \
    --port 8000 \
    --served-model-name qwen3.6 \
    --dtype bfloat16 \
    --trust-remote-code \
    --tensor-parallel-size 4 \
    --distributed-executor-backend mp \
    --gpu-memory-utilization 0.8 \
    --max-model-len 32768 \
    --max-num-batched-tokens 131072 \
    --max-num-seqs 128 \
    --enable-chunked-prefill \
    --enable-prefix-caching \

    qwen.log 2>&1 &

    nohup vllm serve /root/vllm/bge-m3/ \
    --host 0.0.0.0 \
    --port 8001 \
    --served-model-name bge-m3 \
    --tensor-parallel-size 1 \
    --gpu-memory-utilization 0.1 \
    --trust-remote-code \
    --dtype auto \

    bge-m3.log 2>&1 &

    报错日志:
    (EngineCore pid=26596) INFO 05-23 16:26:48 [core.py:105] Initializing a V1 LLM engine (v0.19.0) with config: model='/root/vllm/bge-m3/', speculative_config=None, tokenizer='/root/vllm/bge-m3/', skip_toke
    nizer_init=False, tokenizer_mode=auto, revision=None, tokenizer_revision=None, trust_remote_code=True, dtype=torch.float16, max_seq_len=8192, download_dir=None, load_format=auto, tensor_parallel_size=1,
    pipeline_parallel_size=1, data_parallel_size=1, decode_context_parallel_size=1, dcp_comm_backend=ag_rs, disable_custom_all_reduce=True, quantization=None, enforce_eager=False, enable_return_routed_expert
    s=False, kv_cache_dtype=auto, device_config=cuda, structured_outputs_config=StructuredOutputsConfig(backend='auto', disable_any_whitespace=False, disable_additional_properties=False, reasoning_parser='',
    reasoning_parser_plugin='', enable_in_reasoning=False), observability_config=ObservabilityConfig(show_hidden_metrics_for_version=None, otlp_traces_endpoint=None, collect_detailed_traces=None, kv_cache_m
    etrics=False, kv_cache_metrics_sample=0.01, cudagraph_metrics=False, enable_layerwise_nvtx_tracing=False, enable_mfu_metrics=False, enable_mm_processor_stats=False, enable_logging_iteration_details=False
    ), seed=0, served_model_name=bge-m3, enable_prefix_caching=False, enable_chunked_prefill=False, pooler_config=PoolerConfig(task=None, pooling_type=None, seq_pooling_type='CLS', tok_pooling_type='ALL', us
    e_activation=True, dimensions=None, enable_chunked_processing=False, max_embed_len=None, logit_bias=None, step_tag_id=None, returned_token_ids=None), compilation_config={'mode': <CompilationMode.VLLM_COM
    PILE: 3>, 'debug_dump_path': None, 'cache_dir': '', 'compile_cache_save_format': 'binary', 'backend': 'inductor', 'custom_ops': ['none'], 'splitting_ops': ['vllm::unified_attention', 'vllm::unified_atten
    tion_with_output', 'vllm::unified_mla_attention', 'vllm::unified_mla_attention_with_output', 'vllm::mamba_mixer2', 'vllm::mamba_mixer', 'vllm::short_conv', 'vllm::linear_attention', 'vllm::plamo2_mamba_m
    ixer', 'vllm::gdn_attention_core', 'vllm::olmo_hybrid_gdn_full_forward', 'vllm::kda_attention', 'vllm::sparse_attn_indexer', 'vllm::rocm_aiter_sparse_attn_indexer', 'vllm::mx_sparse_attn_indexer', 'vllm:
    :mx_sparse_attn_indexer_bf16', 'vllm::unified_kv_cache_update', 'vllm::unified_mla_kv_cache_update'], 'compile_mm_encoder': False, 'cudagraph_mm_encoder': False, 'encoder_cudagraph_token_budgets': [], 'e
    ncoder_cudagraph_max_images_per_batch': 0, 'compile_sizes': [], 'compile_ranges_endpoints': [8192], 'inductor_compile_config': {'enable_auto_functionalized_v2': False, 'size_asserts': False, 'alignment_a
    sserts': False, 'scalar_asserts': False}, 'inductor_passes': {}, 'cudagraph_mode': <CUDAGraphMode.PIECEWISE: 1>, 'cudagraph_num_of_warmups': 1, 'cudagraph_capture_sizes': [1, 2, 4, 8, 16, 24, 32, 40, 48,
    56, 64, 72, 80, 88, 96, 104, 112, 120, 128, 136, 144, 152, 160, 168, 176, 184, 192, 200, 208, 216, 224, 232, 240, 248, 256, 272, 288, 304, 320, 336, 352, 368, 384, 400, 416, 432, 448, 464, 480, 496, 512
    ], 'cudagraph_copy_inputs': False, 'cudagraph_specialize_lora': True, 'use_inductor_graph_partition': False, 'pass_config': {'fuse_norm_quant': False, 'fuse_act_quant': False, 'fuse_attn_quant': False, '
    enable_sp': False, 'fuse_gemm_comms': False, 'fuse_allreduce_rms': False}, 'max_cudagraph_capture_size': 512, 'dynamic_shapes_config': {'type': <DynamicShapesType.BACKED: 'backed'>, 'evaluate_guards': Fa
    lse, 'assume_32_bit_indexing': False}, 'local_cache_dir': None, 'fast_moe_cold_start': True, 'static_all_moe_layers': []}
    (EngineCore pid=26596) INFO 05-23 16:26:48 [parallel_state.py:1400] world_size=1 rank=0 local_rank=0 distributed_init_method=tcp://10.217.247.136:40835 backend=nccl
    (EngineCore pid=26596) INFO 05-23 16:26:48 [parallel_state.py:1716] rank 0 in world size 1 is assigned as DP rank 0, PP rank 0, PCP rank 0, TP rank 0, EP rank N/A, EPLB rank N/A
    [16:26:59.536][MXKW][E]queues.c :826 : [mxkwCreateQueueBlock][Hint]ioctl create queue block timeout, gpu_id:65475 type:21. Retrying.
    [16:27:09.776][MXKW][E]queues.c :826 : [mxkwCreateQueueBlock][Hint]ioctl create queue block timeout, gpu_id:65475 type:21. Retrying.
    [16:27:20.016][MXKW][E]queues.c :826 : [mxkwCreateQueueBlock][Hint]ioctl create queue block timeout, gpu_id:65475 type:21. Retrying.
    [16:27:30.256][MXKW][E]queues.c :826 : [mxkwCreateQueueBlock][Hint]ioctl create queue block timeout, gpu_id:65475 type:21. Retrying.
    [16:27:40.496][MXKW][E]queues.c :826 : [mxkwCreateQueueBlock][Hint]ioctl create queue block timeout, gpu_id:65475 type:21. Retrying.
    [16:27:50.736][MXKW][E]queues.c :826 : [mxkwCreateQueueBlock][Hint]ioctl create queue block timeout, gpu_id:65475 type:21. Retrying.
    [16:28:00.977][MXKW][E]queues.c :826 : [mxkwCreateQueueBlock][Hint]ioctl create queue block timeout, gpu_id:65475 type:21. Retrying.