可以正常训练,训练部分输入如下:
50 loss=0.142824187874794 tgs (tokens/gpu/second)=1379.59 tgs/last_tgs_1=1379.6 tgs/tgs_all=1370.23 tgs/tgs_avg=1371.04 tgs/tgs_SMA=1374.72 tgs/last_tgs_10=1373.6 tgs/last_tgs_50=1370.04 lr=8.707123771204882e-05 loss_scale=65536.0 grad_norm={'0_default': 2.7461187543313974} micro_num=32 num_consumed_tokens=53477376 inf_nan_skip_batches=0 num_samples_in_batch=149 largest_length=2048 largest_batch=8 smallest_batch=3 adam_beta2=0.95 fwd_bwd_time=94.83 acc=0.9835 perplexity=1.1596 acc/en=0.0 acc/cn=0.0 acc/code=0.0 tokens/en=0 tokens/cn=0 tokens/code=0 loss_from_metric=0.1479 loss/en=nan loss/cn=nan loss/code=nan