MetaX-Tech Developer Forum 论坛首页
  • 沐曦开发者
search
Sign in

Raining

  • Members
  • Joined 2025年10月21日
  • message 帖子
  • forum 主题
  • favorite 关注者
  • favorite_border Follows
  • person_outline 详细信息

Raining has started 2 threads.

  • See post chevron_right
    Raining
    Members
    vllm推理问题 已解决 2025年10月23日 19:46

    之前部署一个模型,推理也没问题,但是升级显卡驱动到3.1.0.14,突然就不行了,推理启动有时候有问题,有时候能启动,但是不能推理,没有返回数据,有时候启动还报错,镜像版本是:cr.metax-tech.com/public-ai-release/maca/modelzoo.llm.vllm:maca.ai3.1.0.7-torch2.6-py310-ubuntu22.04-amd64, 会不会是因为镜像版本低于驱动版本造成的?如果是的话,去哪里下载最新的docker镜像:
    [19:40:55.226][MXKW][E]queues.c :812 : [mxkwCreateQueueBlock]ioctl create queue block failed -1
    [19:40:55.228][MXC][E]exception: DMAQueue create failed at mxkwCreateQueueBlock.
    [19:40:55.229][MCR][E]mx_device.cpp :3544: Mxc copy from host to device failed with code 4104
    [19:40:55.243][MXKW][E]queues.c :812 : [mxkwCreateQueueBlock]ioctl create queue block failed -1
    [19:40:55.244][MXC][E]exception: DMAQueue create failed at mxkwCreateQueueBlock.
    [19:40:55.244][MCR][E]mx_device.cpp :3637: Mxc copy from device to device failed with code 4104
    [19:40:55.260][MXKW][E]queues.c :812 : [mxkwCreateQueueBlock]ioctl create queue block failed -1
    [19:40:55.263][MXC][E]exception: DMAQueue create failed at mxkwCreateQueueBlock.
    [19:40:55.263][MCR][E]mx_device.cpp :3544: Mxc copy from host to device failed with code 4104
    [19:40:55.288][MXKW][E]queues.c :812 : [mxkwCreateQueueBlock]ioctl create queue block failed -1
    [19:40:55.288][MXC][E]exception: DMAQueue create failed at mxkwCreateQueueBlock.
    [19:40:55.289][MCR][E]mx_device.cpp :3544: Mxc copy from host to device failed with code 4104
    [19:40:55.306][MXKW][E]queues.c :812 : [mxkwCreateQueueBlock]ioctl create queue block failed -1
    [19:40:55.306][MCR][E]mx_device.cpp :1219: Device::acquireQueue: mxc_queue_acquire failed!
    Traceback (most recent call last):
    File "/opt/conda/bin/vllm", line 8, in <module>
    sys.exit(main())
    File "/opt/conda/lib/python3.10/site-packages/vllm/entrypoints/cli/main.py", line 54, in main
    args.dispatch_function(args)
    File "/opt/conda/lib/python3.10/site-packages/vllm/entrypoints/cli/serve.py", line 52, in cmd
    uvloop.run(run_server(args))
    File "/opt/conda/lib/python3.10/site-packages/uvloop/init.py", line 82, in run
    return loop.run_until_complete(wrapper())
    File "uvloop/loop.pyx", line 1518, in uvloop.loop.Loop.run_until_complete
    File "/opt/conda/lib/python3.10/site-packages/uvloop/init.py", line 61, in wrapper
    return await main
    File "/opt/conda/lib/python3.10/site-packages/vllm/entrypoints/openai/api_server.py", line 1791, in run_server
    await run_server_worker(listen_address, sock, args, *uvicorn_kwargs)
    File "/opt/conda/lib/python3.10/site-packages/vllm/entrypoints/openai/api_server.py", line 1811, in run_server_worker
    async with build_async_engine_client(args, client_config) as engine_client:
    File "/opt/conda/lib/python3.10/contextlib.py", line 199, in aenter
    return await anext(self.gen)
    File "/opt/conda/lib/python3.10/site-packages/vllm/entrypoints/openai/api_server.py", line 158, in build_async_engine_client
    async with build_async_engine_client_from_engine_args(
    File "/opt/conda/lib/python3.10/contextlib.py", line 199, in aenter
    return await anext(self.gen)
    File "/opt/conda/lib/python3.10/site-packages/vllm/entrypoints/openai/api_server.py", line 194, in build_async_engine_client_from_engine_args
    async_llm = AsyncLLM.from_vllm_config(
    File "/opt/conda/lib/python3.10/site-packages/vllm/v1/engine/async_llm.py", line 163, in from_vllm_config
    return cls(
    File "/opt/conda/lib/python3.10/site-packages/vllm/v1/engine/async_llm.py", line 117, in init
    self.engine_core = EngineCoreClient.make_async_mp_client(
    File "/opt/conda/lib/python3.10/site-packages/vllm/v1/engine/core_client.py", line 98, in make_async_mp_client
    return AsyncMPClient(
    client_args)
    File "/opt/conda/lib/python3.10/site-packages/vllm/v1/engine/core_client.py", line 677, in init
    super().init(
    File "/opt/conda/lib/python3.10/site-packages/vllm/v1/engine/core_client.py", line 408, in init
    with launch_core_engines(vllm_config, executor_class,
    File "/opt/conda/lib/python3.10/contextlib.py", line 142, in exit
    next(self.gen)
    File "/opt/conda/lib/python3.10/site-packages/vllm/v1/engine/utils.py", line 697, in launch_core_engines
    wait_for_engine_startup(
    File "/opt/conda/lib/python3.10/site-packages/vllm/v1/engine/utils.py", line 750, in wait_for_engine_startup
    raise RuntimeError("Engine core initialization failed. "
    RuntimeError: Engine core initialization failed. See root cause above. Failed core proc(s): {'EngineCore_0': -11}
    root@host:/workspace# mx-smi
    mx-smi version: 2.2.8

    =================== MetaX System Management Interface Log ===================
    Timestamp : Thu Oct 23 19:43:59 2025

    Attached GPUs : 1
    +---------------------------------------------------------------------------------+
    | MX-SMI 2.2.8 Kernel Mode Driver Version: 3.0.11 |
    | MACA Version: 3.1.0.14 BIOS Version: unknown |
    |------------------------------------+---------------------+----------------------+
    | GPU NAME Persistence-M | Bus-id | GPU-Util sGPU-M |
    | Temp Pwr:Usage/Cap Perf | Memory-Usage | GPU-State |
    |====================================+=====================+======================|
    | 0 MetaX C500 N/A | 0000:0c:00.0 | N/A Native |
    | N/A NA / NA N/A | 858/65536 MiB | Not Available |
    +------------------------------------+---------------------+----------------------+

    +---------------------------------------------------------------------------------+
    | Process: |
    | GPU PID Process Name GPU Memory |
    | Usage(MiB) |
    |=================================================================================|
    | no process found |
    +---------------------------------------------------------------------------------+

  • See post chevron_right
    Raining
    Members
    vllm镜像部署问题 已解决 2025年10月21日 09:25

    我一台C500, 单GPU 64G, 部署了2个镜像,我想要每一个镜像单独控制一个vllm推理服务,目前发现第一个服务启动完毕以后,第二个在启动就卡在中间不动了,但是如果我把第一个服务关闭以后,第二个服务立刻就继续执行启动完毕了,第二个启动的时候中途有时候会报一个.c文件的错误,显示的是GPU那里报错了,是不是不能一张显卡公用GPU呀?还是我用的方式有问题。下面的是我的2个镜像:
    docker run -itd --device=/dev/dri --device=/dev/mxcd --group-add video --network=host --name vllm --security-opt seccomp=unconfined --security-opt apparmor=unconfined --shm-size 100gb --ulimit memlock=-1 -v /data:/data cr.metax-tech.com/public-ai-release/maca/modelzoo.llm.vllm:maca.ai3.1.0.7-torch2.6-py310-ubuntu22.04-amd64

    服务是:vllm serve /data/model/qwen2.5-14b-instruct-awq -tp 1 --trust-remote-code --dtype bfloat16 --max-model-len 8192 --gpu-memory-utilization 0.4

    docker run -itd --device=/dev/dri --device=/dev/mxcd --group-add video --network=host --name vllm-tm --security-opt seccomp=unconfined --security-opt apparmor=unconfined --shm-size 100gb --ulimit memlock=-1 -v /data:/data cr.metax-tech.com/public-ai-release/maca/modelzoo.llm.vllm:maca.ai3.1.0.7-torch2.6-py310-ubuntu22.04-amd64

    服务是:vllm serve /data/model/hunyuan-mt-chimera-7b -tp 1 --trust-remote-code --dtype bfloat16 --max-model-len 8192 --gpu-memory-utilization 0.4 --port 8001

  • 沐曦开发者论坛
powered by misago