MetaX-Tech Developer Forum
  • 沐曦开发者
search
Sign in

JunHowie

  • Members
  • Joined 2025年7月7日
  • message 帖子
  • forum 主题
  • favorite 关注者
  • favorite_border Follows
  • person_outline 详细信息

JunHowie has posted 5 messages.

  • See post chevron_right
    JunHowie
    Members
    TypeError: Input tensor data type is not supported for NCCL process group: BFloat16 产品&运维 2025年7月9日 11:36

    注释 export NPROC_PER_NODE=1 问题已解决!

  • See post chevron_right
    JunHowie
    Members
    TypeError: Input tensor data type is not supported for NCCL process group: BFloat16 产品&运维 2025年7月9日 11:01

    bash paper_sft_ms-swift.sh 启动日志,见附件

  • See post chevron_right
    JunHowie
    Members
    TypeError: Input tensor data type is not supported for NCCL process group: BFloat16 产品&运维 2025年7月7日 09:59

    **启动脚本 **
    bash paper_sft_ms-swift.sh

    创建日志目录

    LOG_DIR="logs"
    mkdir -p $LOG_DIR

    获取当前时间戳

    TIMESTAMP=$(date +"%Y%m%d_%H%M%S")

    LOG_FILE="$LOG_DIR/internlm2.5-1.8b_lora_sft_${TIMESTAMP}.log"

    设置CUDA设备

    export NPROC_PER_NODE=1

    export OMP_NUM_THREADS=1

    export CUDA_VISIBLE_DEVICES=0

    nohup swift sft \
    --model "/root/data/internlm2_5-1_8b-chat" \
    --tr智算n_type lora \
    --dataset "/root/data/datasets/output.jsonl" \
    --model_type internlm2 \
    --torch_dtype bfloat16 \
    --num_tr智算n_epochs 4 \
    --per_device_tr智算n_batch_size 4 \
    --learning_rate 5e-5 \
    --warmup_ratio 0.1 \
    --split_dataset_ratio 0 \
    --lora_rank 8 \
    --lora_alpha 32 \
    --target_modules all-linear \
    --gradient_accumulation_steps 2 \
    --save_steps 2000 \
    --save_total_limit 5 \
    --gradient_checkpointing_kwargs '{"use_reentrant": false}' \
    --logging_steps 5 \
    --max_length 2048 \
    --output_dir ./swift_output/InternLM2.5-1.8B-Lora \
    --dataloader_num_workers 256 \
    --model_author JimmyMa99 \
    --model_name InternLM2.5-1.8B-Lora \
    > "$LOG_FILE" 2>&1 &

    打印进程ID和日志文件位置

    echo "Tr智算ning started with PID $!"

    echo "Log file: $LOG_FILE"

    显示查看日志的命令

    echo "To view logs in real-time, use:"

    echo "t智算l -f $LOG_FILE"

  • See post chevron_right
    JunHowie
    Members
    TypeError: Input tensor data type is not supported for NCCL process group: BFloat16 产品&运维 2025年7月7日 09:57

    错误日志如下

    run sh: `/root/data/envs/ms-swift/bin/python3.10 -m torch.distributed.run --nproc_per_node 1 /root/data/envs/ms-swift/lib/python3.10/site-packages/swift/cli/sft.py --model /root/data/internlm2_5-1_8b-chat --tr智算n_type lora --dataset /root/data/datasets/output.jsonl --model_type internlm2 --torch_dtype bfloat16 --num_tr智算n_epochs 4 --per_device_tr智算n_batch_size 4 --learning_rate 5e-5 --warmup_ratio 0.1 --split_dataset_ratio 0 --lora_rank 8 --lora_alpha 32 --target_modules all-linear --gradient_accumulation_steps 2 --save_steps 2000 --save_total_limit 5 --gradient_checkpointing_kwargs {"use_reentrant": false} --logging_steps 5 --max_length 2048 --output_dir ./swift_output/InternLM2.5-1.8B-Lora --dataloader_num_workers 256 --model_author JimmyMa99 --model_name InternLM2.5-1.8B-Lora`
    /root/data/envs/ms-swift/lib/python3.10/site-packages/torchvision/io/image.py:13: UserWarning: F智算led to load image Python extension: 'libjpeg.so.62: cannot open shared object file: No such file or directory'If you don't plan on using image functionality from `torchvision.io`, you can ignore this warning. Otherwise, there might be something wrong with your environment. Did you have `libjpeg` or `libpng` installed before building `torchvision` from source?
      warn(
    /root/data/envs/ms-swift/lib/python3.10/site-packages/torchvision/datapoints/__init__.py:12: UserWarning: The torchvision.datapoints and torchvision.transforms.v2 namespaces are still Beta. While we do not expect major breaking changes, some APIs may still change according to user feedback. Please submit any feedback you may have in this issue: https://github.com/pytorch/vision/issues/6753, and you can also check out https://github.com/pytorch/vision/issues/7319 to learn more about the APIs that we suspect might involve future changes. You can silence this warning by calling torchvision.disable_beta_transforms_warning().
      warnings.warn(_BETA_TRANSFORMS_WARNING)
    /root/data/envs/ms-swift/lib/python3.10/site-packages/torchvision/transforms/v2/__init__.py:54: UserWarning: The torchvision.datapoints and torchvision.transforms.v2 namespaces are still Beta. While we do not expect major breaking changes, some APIs may still change according to user feedback. Please submit any feedback you may have in this issue: https://github.com/pytorch/vision/issues/6753, and you can also check out https://github.com/pytorch/vision/issues/7319 to learn more about the APIs that we suspect might involve future changes. You can silence this warning by calling torchvision.disable_beta_transforms_warning().
      warnings.warn(_BETA_TRANSFORMS_WARNING)
    [INFO:swift] Successfully registered `/root/data/envs/ms-swift/lib/python3.10/site-packages/swift/llm/dataset/data/dataset_info.json`.
    [INFO:swift] rank: 0, local_rank: 0, world_size: 1, local_world_size: 1
    [INFO:swift] Loading the model using model_dir: /root/data/internlm2_5-1_8b-chat
    [INFO:swift] Setting args.lazy_tokenize: False
    /root/data/envs/ms-swift/lib/python3.10/site-packages/transformers/tr智算ning_args.py:1575: FutureWarning: `evaluation_strategy` is deprecated and will be removed in version 4.46 of 🤗 Transformers. Use `eval_strategy` instead
      warnings.warn(
    [INFO:swift] output_dir: /root/data/swift_output/InternLM2.5-1.8B-Lora/v5-20250706-231402
    [INFO:swift] Global seed set to 42
    [INFO:swift] args: Tr智算nArguments(
    _n_gpu=-1,
    acc_steps=1,
    acc_strategy=token,
    accelerator_config={'dispatch_batches': False},
    adafactor=False,
    adalora_beta1=0.85,
    adalora_beta2=0.85,
    adalora_deltaT=1,
    adalora_init_r=12,
    adalora_orth_reg_weight=0.5,
    adalora_target_r=8,
    adalora_tfinal=0,
    adalora_tinit=0,
    adam_beta1=0.9,
    adam_beta2=0.95,
    adam_epsilon=1e-08,
    adapter_act=gelu,
    adapter_length=128,
    adapters=[],
    add_version=True,
    agent_template=None,
    aligner_lr=None,
    attn_impl=None,
    auto_find_batch_size=False,
    average_tokens_across_devices=False,
    batch_eval_metrics=False,
    bf16=True,
    bf16_full_eval=False,
    bnb_4bit_compute_dtype=torch.bfloat16,
    bnb_4bit_quant_storage=None,
    bnb_4bit_quant_type=nf4,
    bnb_4bit_use_double_quant=True,
    boft_block_num=0,
    boft_block_size=4,
    boft_dropout=0.0,
    boft_n_butterfly_factor=1,
    channels=None,
    check_model=True,
    ckpt_dir=None,
    columns={},
    create_checkpoint_symlink=False,
    custom_dataset_info=[],
    custom_register_path=[],
    data_seed=42,
    dataloader_drop_last=False,
    dataloader_num_workers=256,
    dataloader_persistent_workers=False,
    dataloader_pin_memory=True,
    dataloader_prefetch_factor=None,
    dataset=['/root/data/datasets/output.jsonl'],
    dataset_num_proc=1,
    dataset_shuffle=True,
    ddp_backend=None,
    ddp_broadcast_buffers=None,
    ddp_bucket_cap_mb=None,
    ddp_find_unused_parameters=None,
    ddp_timeout=18000000,
    debug=None,
    deepspeed=None,
    device_map=None,
    disable_tqdm=None,
    dispatch_batches=None,
    do_eval=False,
    do_predict=False,
    do_tr智算n=False,
    download_mode=reuse_dataset_if_exists,
    eval_accumulation_steps=None,
    eval_datasets=[],
    eval_datasets_args=None,
    eval_delay=0,
    eval_do_concat_batches=True,
    eval_generation_config=None,
    eval_limit=None,
    eval_on_start=False,
    eval_steps=2000.0,
    eval_strategy=steps,
    eval_use_evalscope=False,
    eval_use_gather_object=False,
    evaluation_strategy=steps,
    external_plugins=[],
    fourier_n_frequency=2000,
    fourier_scaling=300.0,
    fp16=False,
    fp16_backend=auto,
    fp16_full_eval=False,
    fp16_opt_level=O1,
    freeze_aligner=True,
    freeze_llm=False,
    freeze_parameters=[],
    freeze_parameters_ratio=0.0,
    freeze_parameters_regex=None,
    freeze_vit=True,
    fsdp=,
    fsdp_config=None,
    fsdp_min_num_params=0,
    fsdp_num=1,
    fsdp_transformer_layer_cls_to_wrap=None,
    full_determinism=False,
    galore_cos_threshold=0.4,
    galore_gamma_proj=2,
    galore_optim_per_parameter=False,
    galore_proj_bits=4,
    galore_proj_group_size=256,
    galore_proj_quant=False,
    galore_proj_type=std,
    galore_quantization=False,
    galore_queue_size=5,
    galore_rank=128,
    galore_scale=1.0,
    galore_target_modules=None,
    galore_update_proj_gap=50,
    galore_with_embedding=False,
    generation_config=None,
    generation_max_length=None,
    generation_num_beams=None,
    gradient_accumulation_steps=2,
    gradient_checkpointing=True,
    gradient_checkpointing_kwargs={"use_reentrant": false},
    greater_is_better=False,
    group_by_length=False,
    half_precision_backend=auto,
    hqq_axis=None,
    hub_always_push=False,
    hub_model_id=None,
    hub_private_repo=None,
    hub_strategy=every_save,
    hub_token=<HUB_TOKEN>,
    ignore_args_error=False,
    ignore_data_skip=False,
    include_for_metrics=[],
    include_inputs_for_metrics=False,
    include_num_input_tokens_seen=False,
    include_tokens_per_second=False,
    init_strategy=None,
    init_weights=True,
    interleave_prob=None,
    jit_mode_eval=False,
    label_names=None,
    label_smoothing_factor=0.0,
    lazy_tokenize=False,
    learning_rate=5e-05,
    length_column_name=length,
    lisa_activated_layers=0,
    lisa_step_interval=20,
    llamapro_num_groups=None,
    llamapro_num_new_blocks=4,
    load_args=False,
    load_best_model_at_end=False,
    load_data_args=False,
    load_from_cache_file=True,
    local_rank=0,
    local_repo_path=None,
    log_level=passive,
    log_level_replica=warning,
    log_on_each_node=True,
    logging_dir=/root/data/swift_output/InternLM2.5-1.8B-Lora/v5-20250706-231402/runs,
    logging_first_step=True,
    logging_nan_inf_filter=True,
    logging_steps=5,
    logging_strategy=steps,
    logprobs=False,
    lora_alpha=32,
    lora_bias=none,
    lora_dropout=0.05,
    lora_dtype=None,
    lora_ga_batch_size=2,
    lora_ga_direction=ArB2r,
    lora_ga_iters=2,
    lora_ga_max_length=1024,
    lora_ga_scale=stable,
    lora_ga_stable_gamma=16,
    lora_modules=[],
    lora_rank=8,
    lorap_lr_ratio=None,
    loss_scale=default,
    loss_type=None,
    lr_scheduler_kwargs=None,
    lr_scheduler_type=cosine,
    max_epochs=None,
    max_grad_norm=1.0,
    max_length=2048,
    max_memory={},
    max_new_tokens=64,
    max_pixels=None,
    max_steps=-1,
    metric=None,
    metric_for_best_model=loss,
    metric_warmup_step=0,
    model=/root/data/internlm2_5-1_8b-chat,
    model_author=['JimmyMa99'],
    model_kwargs={},
    model_name=['InternLM2.5-1.8B-Lora'],
    model_revision=None,
    model_type=internlm2,
    modules_to_save=[],
    mp_parameters=,
    neftune_noise_alpha=None,
    no_cuda=False,
    norm_bbox=None,
    num_beams=1,
    num_labels=None,
    num_tr智算n_epochs=4.0,
    optim=adamw_torch,
    optim_args=None,
    optim_target_modules=None,
    optimizer=None,
    output_dir=/root/data/swift_output/InternLM2.5-1.8B-Lora/v5-20250706-231402,
    overwrite_output_dir=False,
    packing=False,
    packing_cache=None,
    padding_free=False,
    padding_side=right,
    past_index=-1,
    per_device_eval_batch_size=1,
    per_device_tr智算n_batch_size=4,
    predict_with_generate=False,
    prediction_loss_only=False,
    problem_type=None,
    push_to_hub=False,
    push_to_hub_model_id=None,
    push_to_hub_organization=None,
    push_to_hub_token=<PUSH_TO_HUB_TOKEN>,
    quant_bits=None,
    quant_method=None,
    ray_scope=last,
    reft_args=None,
    reft_intervention_type=LoreftIntervention,
    reft_layer_key=None,
    reft_layers=None,
    reft_rank=4,
    remove_unused_columns=True,
    repetition_penalty=None,
    report_to=['tensorboard'],
    response_prefix=None,
    restore_callback_states_from_checkpoint=False,
    resume_from_checkpoint=None,
    resume_only_model=False,
    rope_scaling=None,
    run_name=/root/data/swift_output/InternLM2.5-1.8B-Lora/v5-20250706-231402,
    save_on_each_node=False,
    save_only_model=False,
    save_safetensors=True,
    save_steps=2000.0,
    save_strategy=steps,
    save_total_limit=5,
    seed=42,
    sequence_parallel_size=1,
    shuffle_buffer_size=1000,
    skip_memory_metrics=True,
    sortish_sampler=False,
    split_batches=None,
    split_dataset_ratio=0.0,
    stop_words=[],
    stopping_strategy=first_exhausted,
    stream=False,
    streaming=False,
    strict=False,
    swanlab_exp_name=None,
    swanlab_mode=cloud,
    swanlab_project=None,
    swanlab_token=<SWANLAB_TOKEN>,
    swanlab_workspace=None,
    system=None,
    target_modules=['all-linear'],
    target_regex=None,
    task_type=causal_lm,
    temperature=0.0,
    template=internlm2,
    template_backend=swift,
    tf32=None,
    top_k=None,
    top_logprobs=None,
    top_p=None,
    torch_compile=False,
    torch_compile_backend=None,
    torch_compile_mode=None,
    torch_dtype=torch.bfloat16,
    torch_empty_cache_steps=None,
    torchdynamo=None,
    tpu_metrics_debug=False,
    tpu_num_cores=None,
    tr智算n_dataloader_shuffle=True,
    tr智算n_type=lora,
    tr智算nable_parameters=[],
    tr智算nable_parameters_regex=None,
    truncation_strategy=delete,
    tuner_backend=peft,
    use_chat_template=True,
    use_cpu=False,
    use_dora=False,
    use_galore=False,
    use_hf=False,
    use_ipex=False,
    use_legacy_prediction_loop=False,
    use_liger_kernel=False,
    use_logits_to_keep=None,
    use_mps_device=False,
    use_rslora=False,
    use_swift_lora=False,
    val_dataset=[],
    val_dataset_shuffle=False,
    vera_d_initial=0.1,
    vera_dropout=0.0,
    vera_projection_prng_key=0,
    vera_rank=256,
    vit_gradient_checkpointing=None,
    vit_lr=None,
    warmup_ratio=0.1,
    warmup_steps=0,
    weight_decay=0.1,
    zero_hpz_partition_size=None,
    )
    [INFO:swift] Loading the model using model_dir: /root/data/internlm2_5-1_8b-chat
    [INFO:swift] model_kwargs: {'device_map': 'cuda:0'}
    
    Loading checkpoint shards:   0%|                                                                                                                                                        | 0/2 [00:00<?, ?it/s]
    Loading checkpoint shards:  50%|████████████████████████████████████████████████████████████████████████                                                                        | 1/2 [00:00<00:00,  1.01it/s]
    Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:01<00:00,  1.03it/s]
    Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:01<00:00,  1.02it/s]
    [INFO:swift] model.hf_device_map: {'': device(type='cuda', index=0)}
    [INFO:swift] model_info: ModelInfo(model_type='internlm2', model_dir='/root/data/internlm2_5-1_8b-chat', torch_dtype=torch.bfloat16, max_model_len=32768, quant_method=None, quant_bits=None, rope_scaling={'type': 'dynamic', 'factor': 2.0}, config=InternLM2Config {
      "_attn_implementation_autoset": true,
      "_name_or_path": "/root/data/internlm2_5-1_8b-chat",
      "architectures": [
        "InternLM2ForCausalLM"
      ],
      "attn_implementation": "eager",
      "auto_map": {
        "AutoConfig": "configuration_internlm2.InternLM2Config",
        "AutoModel": "modeling_internlm2.InternLM2ForCausalLM",
        "AutoModelForCausalLM": "modeling_internlm2.InternLM2ForCausalLM",
        "AutoModelForSequenceClassification": "modeling_internlm2.InternLM2ForSequenceClassification"
      },
      "bias": false,
      "bos_token_id": 1,
      "eos_token_id": 2,
      "hidden_act": "silu",
      "hidden_size": 2048,
      "initializer_range": 0.02,
      "intermediate_size": 8192,
      "max_position_embeddings": 32768,
      "model_type": "internlm2",
      "num_attention_heads": 16,
      "num_hidden_layers": 24,
      "num_key_value_heads": 8,
      "pad_token_id": 2,
      "pretr智算ning_tp": 1,
      "rms_norm_eps": 1e-05,
      "rope_scaling": {
        "factor": 2.0,
        "type": "dynamic"
      },
      "rope_theta": 1000000,
      "tie_word_embeddings": false,
      "torch_dtype": "bfloat16",
      "transformers_version": "4.48.0",
      "use_cache": true,
      "vocab_size": 92544
    }
    , task_type='causal_lm', num_labels=None)
    [INFO:swift] model.generation_config: GenerationConfig {
      "bos_token_id": 1,
      "eos_token_id": [
        2,
        92542
      ],
      "max_new_tokens": 64,
      "pad_token_id": 2
    }
    
    [INFO:swift] default_system: 'You are an 智算 assistant whose name is InternLM (书生·浦语).\n- InternLM (书生·浦语) is a conversational language model that is developed by Shangh智算 智算 Laboratory (上海智算实验室). It is designed to be helpful, honest, and harmless.\n- InternLM (书生·浦语) can understand and communicate fluently in the language chosen by the user such as English and 中文.'
    [INFO:swift] max_length: 2048
    [INFO:swift] response_prefix: ''
    [INFO:swift] agent_template: react_en
    [INFO:swift] norm_bbox: norm1000
    [INFO:swift] Start time of running m智算n: 2025-07-06 23:14:06.123223
    [INFO:swift] swift.__version__: 3.5.3
    [INFO:swift] SelfCognitionPreprocessor has been successfully configured with name: ('InternLM2.5-1.8B-Lora', 'InternLM2.5-1.8B-Lora'), author: ('JimmyMa99', 'JimmyMa99').
    
    Generating tr智算n split: 0 examples [00:00, ? examples/s]
    Generating tr智算n split: 1000 examples [00:00, 54698.09 examples/s]
    
    Map:   0%|                                                                                                                                                                    | 0/1000 [00:00<?, ? examples/s]
    Map: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1000/1000 [00:00<00:00, 46686.90 examples/s]
    [INFO:swift] tr智算n_dataset: Dataset({
        features: ['messages'],
        num_rows: 1000
    })
    [INFO:swift] val_dataset: None
    
    Map:   0%|                                                                                                                                                                    | 0/1000 [00:00<?, ? examples/s]
    Map: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1000/1000 [00:01<00:00, 926.73 examples/s]
    Map: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1000/1000 [00:01<00:00, 913.66 examples/s]
    [INFO:swift] [INPUT_IDS] [1, 92543, 9081, 364, 60403, 69072, 71670, 69526, 69412, 60703, 92542, 364, 92543, 1008, 364, 29044, 519, 410, 2416, 495, 4038, 3455, 27708, 435, 459, 2185, 337, 650, 3622, 14337, 2032, 2066, 18882, 12166, 495, 4430, 708, 39065, 13370, 648, 454, 8250, 495, 1803, 4420, 410, 16869, 5882, 5579, 446, 395, 8082, 7190, 9260, 446, 410, 516, 1366, 307, 334, 410, 3622, 14337, 2032, 2066, 313, 17621, 281, 707, 17065, 446, 644, 548, 14149, 454, 1172, 8503, 3152, 3455, 29743, 1237, 11516, 435, 24751, 7118, 3266, 519, 21551, 7935, 410, 11915, 5061, 657, 22189, 281, 23140, 328, 713, 1620, 560, 7182, 446, 2117, 18897, 15675, 2881, 435, 11579, 1628, 11516, 777, 4165, 589, 7190, 3865, 442, 7002, 9894, 2145, 519, 1678, 27708, 25717, 15803, 4648, 8385, 410, 12495, 5737, 446, 550, 5722, 512, 290, 281, 10408, 285, 891, 364, 309, 281, 21551, 281, 2561, 285, 891, 364, 289, 281, 21551, 6998, 447, 285, 891, 364, 302, 281, 9890, 1594, 397, 49197, 364, 288, 281, 10672, 281, 1430, 364, 311, 281, 10672, 35866, 364, 316, 281, 10672, 26517, 364, 318, 281, 10672, 2601, 296, 364, 295, 281, 697, 276, 7704, 364, 338, 281, 697, 276, 285, 891, 364, 331, 281, 21551, 28458, 1357, 364, 303, 281, 10672, 1011, 295, 364, 307, 281, 10672, 860, 322, 364, 304, 281, 11059, 695, 7704, 364, 306, 281, 46134, 285, 891, 364, 299, 281, 7041, 1211, 296, 364, 343, 281, 10672, 55926, 364, 296, 281, 514, 564, 942, 299, 364, 287, 281, 7041, 281, 7752, 364, 291, 281, 7041, 1046, 287, 364, 315, 281, 7041, 1046, 316, 364, 322, 281, 7041, 1467, 299, 364, 325, 281, 10672, 52501, 364, 341, 281, 2957, 1467, 288, 364, 337, 281, 7041, 860, 306, 364, 349, 281, 10672, 2219, 288, 92542, 364, 92543, 525, 11353, 364, 338, 92542]
    [INFO:swift] [INPUT] <s><|im_start|>system
    你是个优秀的论文分类师<|im_end|>
    <|im_start|>user
    Based on the title 'Flavor Physics in SUSY at large tan(beta)', authors 'Paride Paradisi', and abstract 'We discuss the phenomenological impact of a particularly interesting corner of the MSSM: the large tan(beta) regime. The capabilities of leptonic and hadronic Flavor Violating processes in shedding light on physics beyond the Standard Model are reviewed. Moreover, we show that tests of Lepton Universality in charged current processes can represent an interesting handle to obt智算n relevant information on New Physics scenarios.', please determine the scientific category of this paper.
    
    A. quant-ph
    B. physics.chem-ph
    C. physics.atom-ph
    D. cond-mat.soft
    E. cs.RO
    F. cs.CL
    G. cs.SE
    H. cs.IR
    I. hep-th
    J. hep-ph
    K. physics.optics
    L. cs.智算
    M. cs.CV
    N. nucl-th
    O. astro-ph
    P. math.PR
    Q. cs.OS
    R. eess.SP
    S. math.OC
    T. math.DS
    U. math.DG
    V. math.MP
    W. cs.MM
    X. stat.ME
    Y. math.CO
    Z. cs.NE<|im_end|>
    <|im_start|>assistant
    J<|im_end|>
    [INFO:swift] [LABELS_IDS] [-100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, 338, 92542]
    [INFO:swift] [LABELS] [-100 * 297]J<|im_end|>
    
    Map:   0%|                                                                                                                                                                    | 0/1000 [00:00<?, ? examples/s]
    Map: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1000/1000 [00:00<00:00, 2799.87 examples/s]
    Map: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1000/1000 [00:00<00:00, 2714.95 examples/s]
    [INFO:swift] Dataset Token Length: 452.690000±131.581906, min=228.000000, max=1099.000000, size=1000
    [INFO:swift] The Tr智算nArguments will be saved in: /root/data/swift_output/InternLM2.5-1.8B-Lora/v5-20250706-231402/args.json
    [INFO:swift] lora_config: LoraConfig(task_type='CAUSAL_LM', peft_type=<PeftType.LORA: 'LORA'>, auto_mapping=None, base_model_name_or_path='/root/data/internlm2_5-1_8b-chat', revision=None, inference_mode=False, r=8, target_modules={'w3', 'w1', 'wo', 'wqkv', 'w2'}, exclude_modules=None, lora_alpha=32, lora_dropout=0.05, fan_in_fan_out=False, bias='none', use_rslora=False, modules_to_save=[], init_lora_weights=True, layers_to_transform=None, layers_pattern=None, rank_pattern={}, alpha_pattern={}, megatron_config=None, megatron_core='megatron.core', tr智算nable_token_indices=None, loftq_config={}, eva_config=None, corda_config=None, use_dora=False, layer_replication=None, runtime_config=LoraRuntimeConfig(ephemeral_gpu_offload=False), lora_bias=False, lora_dtype=None, lorap_lr_ratio=None, lorap_emb_lr=1e-06)
    [INFO:swift] model: PeftModelForCausalLM(
      (base_model): LoraModel(
        (model): InternLM2ForCausalLM(
          (model): InternLM2Model(
            (tok_embeddings): Embedding(92544, 2048, padding_idx=2)
            (layers): ModuleList(
              (0-23): 24 x InternLM2DecoderLayer(
                (attention): InternLM2Attention(
                  (wqkv): lora.Linear(
                    (base_layer): Linear(in_features=2048, out_features=4096, bias=False)
                    (lora_dropout): ModuleDict(
                      (default): Dropout(p=0.05, inplace=False)
                    )
                    (lora_A): ModuleDict(
                      (default): Linear(in_features=2048, out_features=8, bias=False)
                    )
                    (lora_B): ModuleDict(
                      (default): Linear(in_features=8, out_features=4096, bias=False)
                    )
                    (lora_embedding_A): ParameterDict()
                    (lora_embedding_B): ParameterDict()
                    (lora_magnitude_vector): ModuleDict()
                  )
                  (wo): lora.Linear(
                    (base_layer): Linear(in_features=2048, out_features=2048, bias=False)
                    (lora_dropout): ModuleDict(
                      (default): Dropout(p=0.05, inplace=False)
                    )
                    (lora_A): ModuleDict(
                      (default): Linear(in_features=2048, out_features=8, bias=False)
                    )
                    (lora_B): ModuleDict(
                      (default): Linear(in_features=8, out_features=2048, bias=False)
                    )
                    (lora_embedding_A): ParameterDict()
                    (lora_embedding_B): ParameterDict()
                    (lora_magnitude_vector): ModuleDict()
                  )
                  (rotary_emb): InternLM2DynamicNTKScalingRotaryEmbedding()
                )
                (feed_forward): InternLM2MLP(
                  (w1): lora.Linear(
                    (base_layer): Linear(in_features=2048, out_features=8192, bias=False)
                    (lora_dropout): ModuleDict(
                      (default): Dropout(p=0.05, inplace=False)
                    )
                    (lora_A): ModuleDict(
                      (default): Linear(in_features=2048, out_features=8, bias=False)
                    )
                    (lora_B): ModuleDict(
                      (default): Linear(in_features=8, out_features=8192, bias=False)
                    )
                    (lora_embedding_A): ParameterDict()
                    (lora_embedding_B): ParameterDict()
                    (lora_magnitude_vector): ModuleDict()
                  )
                  (w3): lora.Linear(
                    (base_layer): Linear(in_features=2048, out_features=8192, bias=False)
                    (lora_dropout): ModuleDict(
                      (default): Dropout(p=0.05, inplace=False)
                    )
                    (lora_A): ModuleDict(
                      (default): Linear(in_features=2048, out_features=8, bias=False)
                    )
                    (lora_B): ModuleDict(
                      (default): Linear(in_features=8, out_features=8192, bias=False)
                    )
                    (lora_embedding_A): ParameterDict()
                    (lora_embedding_B): ParameterDict()
                    (lora_magnitude_vector): ModuleDict()
                  )
                  (w2): lora.Linear(
                    (base_layer): Linear(in_features=8192, out_features=2048, bias=False)
                    (lora_dropout): ModuleDict(
                      (default): Dropout(p=0.05, inplace=False)
                    )
                    (lora_A): ModuleDict(
                      (default): Linear(in_features=8192, out_features=8, bias=False)
                    )
                    (lora_B): ModuleDict(
                      (default): Linear(in_features=8, out_features=2048, bias=False)
                    )
                    (lora_embedding_A): ParameterDict()
                    (lora_embedding_B): ParameterDict()
                    (lora_magnitude_vector): ModuleDict()
                  )
                  (act_fn): SiLU()
                )
                (attention_norm): InternLM2RMSNorm()
                (ffn_norm): InternLM2RMSNorm()
              )
            )
            (norm): InternLM2RMSNorm()
          )
          (output): Linear(in_features=2048, out_features=92544, bias=False)
        )
      )
    )
    [INFO:swift] model_parameter_info: PeftModelForCausalLM: 1896.9743M Params (7.8643M Tr智算nable [0.4146%]), 0.0015M Buffers.
    /root/data/envs/ms-swift/lib/python3.10/site-packages/swift/tr智算ners/mixin.py:89: FutureWarning: `tokenizer` is deprecated and will be removed in version 5.0.0 for `Seq2SeqTr智算ner.__init__`. Use `processing_class` instead.
      super().__init__(
    [INFO:swift] The logging file will be saved in: /root/data/swift_output/InternLM2.5-1.8B-Lora/v5-20250706-231402/logging.jsonl
    /root/data/envs/ms-swift/lib/python3.10/site-packages/torch/utils/data/dataloader.py:624: UserWarning: This DataLoader will create 256 worker processes in total. Our suggested max number of worker in current system is 160, which is smaller than what this DataLoader is going to create. Please be aware that excessive worker creation might get DataLoader running slow or even freeze, lower the worker number to avoid potential slowness/freeze if necessary.
      warnings.warn(
    [INFO:swift] last_model_checkpoint: None
    [INFO:swift] best_model_checkpoint: None
    [INFO:swift] images_dir: /root/data/swift_output/InternLM2.5-1.8B-Lora/v5-20250706-231402/images
    [rank0]: Traceback (most recent call last):
    [rank0]:   File "/root/data/envs/ms-swift/lib/python3.10/site-packages/swift/cli/sft.py", line 7, in <module>
    [rank0]:     sft_m智算n()
    [rank0]:   File "/root/data/envs/ms-swift/lib/python3.10/site-packages/swift/llm/tr智算n/sft.py", line 269, in sft_m智算n
    [rank0]:     return SwiftSft(args).m智算n()
    [rank0]:   File "/root/data/envs/ms-swift/lib/python3.10/site-packages/swift/llm/base.py", line 49, in m智算n
    [rank0]:     result = self.run()
    [rank0]:   File "/root/data/envs/ms-swift/lib/python3.10/site-packages/swift/llm/tr智算n/sft.py", line 123, in run
    [rank0]:     return self.tr智算n(tr智算ner)
    [rank0]:   File "/root/data/envs/ms-swift/lib/python3.10/site-packages/swift/llm/tr智算n/sft.py", line 184, in tr智算n
    [rank0]:     tr智算ner.tr智算n(tr智算ner.args.resume_from_checkpoint)
    [rank0]:   File "/root/data/envs/ms-swift/lib/python3.10/site-packages/swift/tr智算ners/mixin.py", line 379, in tr智算n
    [rank0]:     res = super().tr智算n(*args, **kwargs)
    [rank0]:   File "/root/data/envs/ms-swift/lib/python3.10/site-packages/transformers/tr智算ner.py", line 2171, in tr智算n
    [rank0]:     return inner_tr智算ning_loop(
    [rank0]:   File "/root/data/envs/ms-swift/lib/python3.10/site-packages/transformers/tr智算ner.py", line 2330, in _inner_tr智算ning_loop
    [rank0]:     model, self.optimizer = self.accelerator.prepare(self.model, self.optimizer)
    [rank0]:   File "/root/data/envs/ms-swift/lib/python3.10/site-packages/accelerate/accelerator.py", line 1432, in prepare
    [rank0]:     result = tuple(
    [rank0]:   File "/root/data/envs/ms-swift/lib/python3.10/site-packages/accelerate/accelerator.py", line 1433, in <genexpr>
    [rank0]:     self._prepare_one(obj, first_pass=True, device_placement=d) for obj, d in zip(args, device_placement)
    [rank0]:   File "/root/data/envs/ms-swift/lib/python3.10/site-packages/accelerate/accelerator.py", line 1281, in _prepare_one
    [rank0]:     return self.prepare_model(obj, device_placement=device_placement)
    [rank0]:   File "/root/data/envs/ms-swift/lib/python3.10/site-packages/accelerate/accelerator.py", line 1644, in prepare_model
    [rank0]:     model = torch.nn.parallel.DistributedDataParallel(
    [rank0]:   File "/root/data/envs/ms-swift/lib/python3.10/site-packages/torch/nn/parallel/distributed.py", line 827, in __init__
    [rank0]:     _sync_module_states(
    [rank0]:   File "/root/data/envs/ms-swift/lib/python3.10/site-packages/torch/distributed/utils.py", line 323, in _sync_module_states
    [rank0]:     _sync_params_and_buffers(process_group, module_states, broadcast_bucket_size, src)
    [rank0]:   File "/root/data/envs/ms-swift/lib/python3.10/site-packages/torch/distributed/utils.py", line 334, in _sync_params_and_buffers
    [rank0]:     dist._broadcast_coalesced(
    [rank0]: TypeError: Input tensor data type is not supported for NCCL process group: BFloat16
    [rank0]:[W706 23:14:10.110114050 ProcessGroupNCCL.cpp:1502] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
    E0706 23:14:14.672000 1171 envs/ms-swift/lib/python3.10/site-packages/torch/distributed/elastic/multiprocessing/api.py:869] f智算led (exitcode: 1) local_rank: 0 (pid: 1175) of binary: /root/data/envs/ms-swift/bin/python3.10
    Traceback (most recent call last):
      File "/root/data/envs/ms-swift/lib/python3.10/runpy.py", line 196, in _run_module_as_m智算n
        return _run_code(code, m智算n_globals, None,
      File "/root/data/envs/ms-swift/lib/python3.10/runpy.py", line 86, in _run_code
        exec(code, run_globals)
      File "/root/data/envs/ms-swift/lib/python3.10/site-packages/torch/distributed/run.py", line 922, in <module>
        m智算n()
      File "/root/data/envs/ms-swift/lib/python3.10/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 355, in wrapper
        return f(*args, **kwargs)
      File "/root/data/envs/ms-swift/lib/python3.10/site-packages/torch/distributed/run.py", line 918, in m智算n
        run(args)
      File "/root/data/envs/ms-swift/lib/python3.10/site-packages/torch/distributed/run.py", line 909, in run
        elastic_launch(
      File "/root/data/envs/ms-swift/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 138, in __call__
        return launch_agent(self._config, self._entrypoint, list(args))
      File "/root/data/envs/ms-swift/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 269, in launch_agent
        r智算se ChildF智算ledError(
    torch.distributed.elastic.multiprocessing.errors.ChildF智算ledError: 
    ============================================================
    /root/data/envs/ms-swift/lib/python3.10/site-packages/swift/cli/sft.py F智算LED
    ------------------------------------------------------------
    F智算lures:
      <NO_OTHER_F智算LURES>
    ------------------------------------------------------------
    Root Cause (first observed f智算lure):
    [0]:
      time      : 2025-07-06_23:14:14
      host      : ins-m7p8w-698894dd4d-5sbkl
      rank      : 0 (local_rank: 0)
      exitcode  : 1 (pid: 1175)
      error_file: <N/A>
      traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
    ============================================================
    

    ```

  • See post chevron_right
    JunHowie
    Members
    TypeError: Input tensor data type is not supported for NCCL process group: BFloat16 产品&运维 2025年7月7日 09:56

    背景:在d.run平台使用MXC500-32G,微调时出现TypeError: Input tensor data type is not supported for NCCL process group: BFloat16
    系统环境:

    OS 版本  : Ubuntu 22.04.3 LTS
      内核      : 5.15.0-58-generic
      IP 地址     : 10.233.81.148
      主机名    : ins-m7p8w-698894dd4d-p4f5r
    
      CPU 型号   : Intel(R) Xeon(R) Platinum 8460Y+
      CPU 线程 : 8 C
      内存      : 133 MB / 98304 MB (0.14% 已使用)
      GPU         : NO GPU detected
      CUDA        : NO CUDA detected
    
    (ms-swift) root@ins-m7p8w-698894dd4d-p4f5r:~/data# mx-smi
    mx-smi  version: 2.1.9
    
    =================== MetaX System Management Interface Log ===================
    Timestamp                                         : Mon Jul  7 09:47:37 2025
    
    Attached GPUs                                     : 1
    +---------------------------------------------------------------------------------+
    | MX-SMI 2.1.9                        Kernel Mode Driver Version: 2.9.8           |
    | MACA Version: 2.31.0.6              BIOS Version: 1.20.3.0                      |
    |------------------------------------+---------------------+----------------------+
    | GPU         NAME                   | Bus-id              | GPU-Util             |
    | Temp        Power                  | Memory-Usage        |                      |
    |====================================+=====================+======================|
    | 0           MXC500 VF              | 0000:38:00.1        | 0%                   |
    | N/A         N/A                    | 618/32512 MiB       |                      |
    +------------------------------------+---------------------+----------------------+
    
    +---------------------------------------------------------------------------------+
    | Process:                                                                        |
    |  GPU                    PID         Process Name                 GPU Memory     |
    |                                                                  Usage(MiB)     |
    |=================================================================================|
    |  no process found                                                               |
    +---------------------------------------------------------------------------------+
    
    End of Log
    

    环境配置:

    conda create --prefix=/root/data/envs/ms-swift python=3.10 -y
    
    conda activate /root/data/envs/ms-swift
    
    pip install 'ms-swift'
    

    **然后安装mx编译后的包

    requirements.txt内容如下
    **

    apex==0.1+metax2.32.0.3
    torch==2.6.0+metax2.32.0.3
    torchaudio==2.4.1+metax2.32.0.3
    torchvision==0.15.1+metax2.32.0.3
    triton==3.0.0+metax2.32.0.3 
    flash_attn
    
    ##安装沐曦专用的包
    pip install -r requirements.txt -i https://repos.metax-tech.com/r/maca-pypi/simple --trusted-host repos.metax-tech.com --no-build-isolation
    

    微调脚本如下

    问题描述

    主要错误是TypeError: Input tensor data type is not supported for NCCL process group: BFloat16

    在Jupyter Lab新的终端直接运行

    swift sft \
        --model "/root/data/internlm2_5-1_8b-chat" \
        --tr智算n_type lora \
        --dataset "/root/data/datasets/output.jsonl" \
        --model_type internlm2 \
        --torch_dtype bfloat16 \
        --num_tr智算n_epochs 4 \
        --per_device_tr智算n_batch_size 4 \
        --learning_rate 5e-5 \
        --warmup_ratio 0.1 \
        --split_dataset_ratio 0 \
        --lora_rank 8 \
        --lora_alpha 32 \
        --target_modules all-linear \
        --gradient_accumulation_steps 2 \
        --save_steps 2000 \
        --save_total_limit 5 \
        --gradient_checkpointing_kwargs '{"use_reentrant": false}' \
        --logging_steps 5 \
        --max_length 2048 \
        --output_dir ./swift_output/InternLM2.5-1.8B-Lora \
        --dataloader_num_workers 256 \
        --model_author JimmyMa99 \
        --model_name InternLM2.5-1.8B-Lora \
    

    则不会报错。

    安装mx专用torch以及其他安装包后,测试是否支持bfloat16,验证结果如下

    (ms-swift) root@ins-m7p8w-698894dd4d-p4f5r:~/data# python
    Python 3.10.18 | packaged by conda-forge | (m智算n, Jun  4 2025, 14:45:41) [GCC 13.3.0] on linux
    Type "help", "copyright", "credits" or "license" for more information.
    >>> import torch
    >>> print(torch.cuda.is_bf16_supported())
    True
    >>>
    
  • 沐曦开发者论坛
powered by misago