基础镜像:mxc500-torch2.4-py310:mc2.33.0.6-ubuntu22.04-amd64
地址:docker login --username=cr_temp_user --password=eyJpbnN0YW5jZUlkIjoiY3JpLXpxYTIzejI2YTU5M3R3M2QiLCJ0aW1lIjoiMTc1ODYxNjEyMDAwMCIsInR5cGUiOiJzdWIiLCJ1c2VySWQiOiIyMDcwOTQwMTA1NjYzNDE3OTIifQ:0a030f9722ca8c4b27f898903ab62e1bc43b6d30 cr.metax-tech.com && docker pull cr.metax-tech.com/public-library/maca-c500-pytorch:2.33.0.6-ubuntu22.04-amd64
就是在沐曦官网下载的,amd64 python3.10
这个镜像的cuda版本是:11.6, PyTorch版本是:2.4
DGL最低要求的是cuda 11.8
从而导致 DGL 得用源码安装,于是下载了DGL源码,DGL 2.1.0 版本
cmake指令是:
cmake -DUSE_CUDA=ON -DCMAKE_C_COMPILER=${CC} -DCMAKE_CXX_COMPILER=${CXX} -DCUDA_TOOLKIT_ROOT_DIR=${CUDA_TOOLKIT_ROOT_DIR} -DCUDA_NVCC_EXECUTABLE=${CUDA_TOOLKIT_ROOT_DIR}/bin/cucc -DCUDA_CUDART_LIBRARY=/opt/maca/lib/libmcruntime.so -DCMAKE_PREFIX_PATH="/opt/maca;/opt/conda/lib/python3.10/site-packages/torch" ..
其中:
echo ${CC}
/opt/maca/tools/cu-bridge/bin/cucc
echo ${CXX}
/opt/maca/tools/cu-bridge/bin/cucc
echo ${CUDA_TOOLKIT_ROOT_DIR}
/opt/maca/tools/cu-bridge
最终cmake 结果是:
root@73d3caf3ab8c:/home/root/dgl-2.1.0/build# cmake -DUSE_CUDA=ON -DCMAKE_C_COMPILER=${CC} -DCMAKE_CXX_COMPILER=${CXX} -DCUDA_TOOLKIT_ROOT_DIR=${CUDA_TOOLKIT_ROOT_DIR} -DCUDA_NVCC_EXECUTABLE=${CUDA_TOOLKIT_ROOT_DIR}/cu-bridge/bin/cucc -DCUDA_CUDART_LIBRARY=/opt/maca/lib/libmcruntime.so -DCMAKE_PREFIX_PATH="/opt/maca;/opt/conda/lib/python3.10/site-packages/torch" ..
-- The C compiler identification is Clang 12.0.0
-- The CXX compiler identification is Clang 12.0.0
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /opt/maca/tools/cu-bridge/bin/cucc - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /opt/maca/tools/cu-bridge/bin/cucc - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Start configuring project dgl
-- Build for dev
-- Build with CUDA support
-- Could NOT find CUDA (missing: CUDA_NVCC_EXECUTABLE CUDA_CUDART_LIBRARY)
-- Performing Test SUPPORT_CXX17
-- Performing Test SUPPORT_CXX17 - Success
-- Use external CCCL library for a consistent API and performance.
-- Found OpenMP_C: -fopenmp=libomp (found version "5.0")
-- Found OpenMP_CXX: -fopenmp=libomp (found version "5.0")
-- Found OpenMP: TRUE (found version "5.0")
-- Build with OpenMP.
-- Build with LIBXSMM optimization.
-- Looking for sys/epoll.h
-- Looking for sys/epoll.h - found
CMake Error at cmake/modules/CUDA.cmake:214 (message):
Cannot find CUDA.
Call Stack (most recent call first):
CMakeLists.txt:275 (dgl_config_cuda)
-- Configuring incomplete, errors occurred!
See also "/home/root/dgl-2.1.0/build/CMakeFiles/CMakeOutput.log".
找不到 CUDA,我找了一下类似的库路径:
find /opt/maca/ -name "*.so" | grep -E '(cuda|runtime)'
/opt/maca/lib/libmcruntime.so
/opt/maca/lib/libmxc-runtime64.so
/opt/maca/lib/libruntime_cu.so
/opt/maca/mxgpu_llvm/lib/libmlir_maca_runtime.so
/opt/maca/tools/cu-bridge/lib/libcuda.so
这几个路径我都试过,在cmake里面,-DCUDA_CUDART_LIBRARY = 指过去,但是没有用,一直报错:
CMake Error at cmake/modules/CUDA.cmake:214 (message):
Cannot find CUDA.
Call Stack (most recent call first):
CMakeLists.txt:275 (dgl_config_cuda)
请问有没有办法编译DGL,麻烦支持一下