3. 功能支持

mcPyTorch以兼容PyTorch的原生使用方式为设计目标。大部分情况下，用户可以参考官方文档获得mcPyTorch的使用方式。

本章参照PyTorch官方文档内容，介绍mcPyTorch各模块使用的兼容性和注意事项。其中：

支持：支持并兼容原生用法
部分支持：部分支持原生用法，或API行为和原生用法有区别

3.1. torch

支持

3.2. torch.nn

支持

3.2.1. Convolution Layers

支持：

nn.Conv1d
nn.Conv2d
nn.Conv3d
nn.ConvTranspose1d
nn.ConvTranspose2d
nn.ConvTranspose3d
nn.LazyConv1d
nn.LazyConv2d
nn.LazyConv3d
nn.LazyConvTranspose1d
nn.LazyConvTranspose2d
nn.LazyConvTranspose3d
nn.Unfold
nn.Fold

备注

默认情况下，conv的FP32数据类型没有使用TF32而使用FP32进行计算加速，可以使用 torch.backends.cudnn.allow_tf32=True 打开。

3.2.2. Recurrent Layers

支持：

nn.RNNBase
nn.RNN
nn.LSTM
nn.GRU
nn.RNNCell
nn.LSTMCell
nn.GRUCell

3.2.3. Dropout Layers

支持：

nn.Dropout
nn.Dropout1d
nn.Dropout2d
nn.Dropout3d
nn.AlphaDropout
nn.FeatureAlphaDropout

备注

PyTorch中Dropout类运算随机行为和硬件参数相关，因此MXMACA设备和CUDA设备上默认运行结果随机行为不一致。

3.3. torch.nn.functional

支持

3.4. torch.Tensor

支持

3.5. Tensor Attributes

支持

3.6. Tensor Views

支持：

basic slicing and indexing
adjoint
as_strided
detach
diagonal
expand
expand_as
movedim
narrow
permute
select
squeeze
transpose
t
T
H
mT
mH
real
imag
view_as_real
unflatten
unfold
unsqueeze
view
view_as
unbind
split
hsplit
vsplit
tensor_split
split_with_sizes
swapaxes
swapdims
chunk
indices (sparse tensor only)
values (sparse tensor only)

3.7. torch.amp

支持

3.8. torch.autograd

支持

3.9. torch.library

支持

3.10. torch.cuda

3.10.1. cuda

支持：

StreamContext
can_device_access_peer
current_blas_handle
current_device
current_stream
default_stream
device
device_count
device_of
get_sync_debug_mode
init
ipc_collect
is_available
is_initialized
set_device
set_stream
set_sync_debug_mode
stream
synchronize
OutOfMemoryError

部分支持：

get_arch_list

返回 ['sm_80']。模拟 CUDA capability==8.0 时的行为。
get_device_capability

返回 (8. 0)。模拟 CUDA capability==8.0 时的行为。
get_device_name

返回MXMACA硬件相应设备名，而非NVIDIA设备名例如：NVIDIA A100-PCIE-40GB。
get_device_properties

返回对应CUDA架构描述的MXMACA架构信息。
get_gencode_flags

返回 "-gencode compute=compute_80,code=sm_80"。模拟 CUDA capability==8.0 时的行为。

3.10.2. Random Number Generator

支持：

get_rng_state
get_rng_state_all
set_rng_state
set_rng_state_all
manual_seed
manual_seed_all
seed
seed_all
initial_seed

3.10.3. Communication collectives

支持：

comm.broadcast
comm.broadcast_coalesced
comm.reduce_add
comm.scatter
comm.gather

3.10.4. Streams and events

支持：

Stream
Event

3.10.5. Memory management

支持：

empty_cache
list_gpu_processes
mem_get_info
memory_stats
memory_summary
memory_snapshot
memory_allocated
max_memory_allocated
reset_max_memory_allocated
memory_reserved
max_memory_reserved
set_per_process_memory_fraction
memory_cached
max_memory_cached
reset_max_memory_cached
reset_peak_memory_stats
caching_allocator_alloc
caching_allocator_delete
get_allocator_backend
change_current_allocator

3.10.6. Tools Extension

支持：

nvtx.mark
nvtx.range_push
nvtx.range_pop

3.10.7. Jiterator (beta)

不支持：

jiterator._create_jit_fn
jiterator._create_multi_output_jit_fn

3.11. torch.backends

3.11.1. torch.backends.cuda

支持：

torch.backends.cuda.is_built

返回 True，表示mcPyTorch包含MXMACA backends。
torch.backends.cuda.matmul.allow_tf32
torch.backends.cuda.matmul.allow_fp16_reduced_precision_reduction
torch.backends.cuda.matmul.allow_bf16_reduced_precision_reduction
torch.backends.cuda.cufft_plan_cache
torch.backends.cuda.math_sdp_enabled
torch.backends.cuda.enable_math_sdp
torch.backends.cuda.sdp_kernel(enable_flash=True, enable_math=True, enable_mem_efficient=True)

目前仅支持 enable_math=True

3.11.2. torch.backends.cudnn

支持：

torch.backends.cudnn.is_available
torch.backends.cudnn.enabled
torch.backends.cudnn.allow_tf32
torch.backends.cudnn.deterministic
torch.backends.cudnn.benchmark
torch.backends.cudnn.benchmark_limit
torch.backends.cudnn.version

备注

默认情况下，torch.backends.cudnn.allow_tf32为False，torch.backends.cuda.matmul.allow_tf32为False。

3.11.3. torch.backends.mps

支持：

torch.backends.mps.is_available
torch.backends.mps.is_built

备注

mcPyTorch不包含MPS backends，上述两个接口总是返回False。

3.11.4. torch.backends.mkl

支持：

torch.backends.mkl.is_available
torch.backends.mkl.verbose

3.11.5. torch.backends.mkldnn

支持：

torch.backends.mkldnn.is_available
torch.backends.mkldnn.verbose

3.11.6. torch.backends.openmp

支持：

torch.backends.openmp.is_available

3.11.7. torch.backends.opt_einsum

3.12. torch.distributed

支持

3.13. torch.distributions

支持

3.14. torch._dynamo

支持

3.15. torch.fft

支持

3.16. torch.fx

支持

3.17. torch.jit

支持

3.18. torch.linalg

支持

3.19. torch.package

支持

3.20. torch.profiler

支持

3.21. torch.nn.init

支持

3.22. torch.onnx

支持

3.23. torch.optim

支持

3.24. Complex Numbers

支持

3.25. DDP Communication Hooks

支持

3.26. torch.random

支持

3.27. torch.masked

支持

3.28. torch.nested

支持

3.29. torch.sparse

支持

3.30. torch.Storage

支持

3.31. torch.testing

支持

3.32. torch.utils.benchmark

支持

3.33. torch.utils.bottleneck

支持

3.34. torch.utils.checkpoint

支持

3.35. torch.utils.cpp_extension

支持

备注

使用mcPyTorch 2.4.0及之后版本的cpp_extension功能时，x86下与PyTorch官方行为相同，编译环境需满足GCC/G++ 9.0或更高版本；Arm下编译环境需满足GCC/G++ 7.x。

3.36. torch.utils.data

支持

3.37. torch.utils.jit

支持

3.38. torch.utils.model_zoo

支持

3.39. torch.utils.tensorboard

支持

3.40. Type Info

支持

3.41. torch.config

支持