Sweep invalid float value

I am doing hyperparamater optimization with wandb sweep. I defined epoch, learning_rate and other values in the parameters, and then past them to the wandb command. However, when I run sweep, it always shows error “run_ner.py: error: argument --learning_rate: invalid float value: ‘${learning_rate}’”, or “${epoch}, ${batch_size}”. It seems every variable I referred to using ${VAR} wasn’t processed correctly.

Here is an example of my sweep.yaml

program: run_ner.py
method: random
metric:
  name: f1
  goal: maximize
parameters:
  batch_size:
    values: [16, 32, 64, 128]
  model_name_or_path:
    values: ["a_bert_model","b_bert_model"]
  weight_decay:
    distribution: "uniform"
    min: 0.0
    max: 0.1
  learning_rate:
    distribution: "uniform"
    min: 0.001
    max: 0.01
  epochs:
    values: [400]
  datadir:
    values: ["fullData","partialData"]
  max_seq_length:
    values: [128, 256, 512]
  do_lower_case:
    values: [True, False]
  pad_to_max_length:
    values: [True, False]
command:
  - ${env}
  - python
  - run_ner.py
  - --model_name_or_path=${model_name_or_path}
  - --train_file="temp"
  - --do_train
  - --validation_file="temp"
  - --test_file="temp"
  - --do_predict
  - --do_eval
  - --num_train_epochs=${epochs}
  - --text_column_name="words"
  - --label_column_name="labels"
  - --seed=42
  - --per_device_train_batch_size=${batch_size}
  - --per_device_eval_batch_size=${batch_size}
  - --learning_rate=${learning_rate}
  - --weight_decay=${weight_decay}
  - --label_all_tokens=True
  - --load_best_model_at_end=True
  - --save_strategy=steps
  - --evaluation_strategy=epoch
  - --logging_strategy=epoch
  - --max_seq_length=${max_seq_length}
  - --pad_to_max_length=${pad_to_max_length}
  - --output_dir="outputs/"

And a snippet of the execution output.

run_ner.py: error: argument --learning_rate: invalid float value: '${learning_rate}'
2024-03-04 13:22:56,058 - wandb.wandb_agent - INFO - Agent received command: run
2024-03-04 13:22:56,058 - wandb.wandb_agent - INFO - Agent starting run with config:
    datadir: fullData
    do_lower_case: True
    learning_rate: 0.0067381448491156066
    max_seq_length: 512
    model_name_or_path: a_bert_model
    pad_to_max_length: False
    weight_decay: 0.003210073163495664
2024-03-04 13:22:56,071 - wandb.wandb_agent - INFO - About to run command: /usr/bi

Hey @xuperx , thanks for raising this! Would you mind sharing a code snippet of how you’re parsing those args so I can test on my end?

Hi @xuperx , I wanted to follow up on this request. Would it be possible to share a code snippet of how you’re parsing those args so I can test on my end?

Hi, Luis. I don’t know how I could miss this message. Many thanks for being willing to help.

I fixed the invalid float value issue by changing the yaml file. Instead of listing all flags in the command section, I used args_no_hyphen instead.

command:
  - ${env}
  - python
  - run_ner_sw2.py
  - ${args_no_hyphens}

BUT I do have a similar issue with the argument “-- output_dir”.
my sweep.yaml states:

.......
parameters:
  ... ...
  output_dir:
    values: ["output"]
command:
  - ${env}
  - python
  - run_ner_sw2.py
  - ${args_no_hyphens}

The error message is

2024-03-12 16:10:19,729 - wandb.wandb_agent - INFO - Agent starting run with config:
	datadir: fullData
	do_eval: True
	do_lower_case: False
	do_predict: True
	do_train: True
	evaluation_strategy: epoch
	label_all_tokens: True
	label_column_name: labels
	learning_rate: 0.004748553941502207
	load_best_model_at_end: True
	logging_strategy: epoch
	max_seq_length: 128
	model_name_or_path: microsoft/BiomedNLP-BiomedBERT-base-uncased-abstract-fulltext
	num_train_epochs: 400
	output_dir: output
	pad_to_max_length: False
	per_device_eval_batch_size: 32
	per_device_train_batch_size: 32
	save_strategy: steps
	seed: 42
	test_file: temp
	text_column_name: words
	train_file: temp
	validation_file: temp
	weight_decay: 0.013055535954381949
2024-03-12 16:10:19,745 - wandb.wandb_agent - INFO - About to run command: /usr/bin/env python run_ner_sw1.py datadir=fullData do_eval=True do_lower_case=False do_predict=True do_train=True evaluation_strategy=epoch label_all_tokens=True label_column_name=labels learning_rate=0.004748553941502207 load_best_model_at_end=True logging_strategy=epoch max_seq_length=128 model_name_or_path=microsoft/BiomedNLP-BiomedBERT-base-uncased-abstract-fulltext num_train_epochs=400 output_dir=output pad_to_max_length=False per_device_eval_batch_size=32 per_device_train_batch_size=32 save_strategy=steps seed=42 test_file=temp text_column_name=words train_file=temp validation_file=temp weight_decay=0.013055535954381949
run_ner_sw1.py: error: the following arguments are required: --output_dir

I tried two ways to parse the args:
Way 1 automatically update:

parser = HfArgumentParser((ModelArguments, DataTrainingArguments, TrainingArguments))
    if len(sys.argv) == 2 and sys.argv[1].endswith(".json"):
         model_args, data_args, training_args = parser.parse_json_file(json_file=os.path.abspath(sys.argv[1]))
    else:
        model_args, data_args, training_args = parser.parse_args_into_dataclasses()

    training_args=TrainingArguments(
        output_dir="output/",
        report_to="wandb")
    config = wandb.config
    for arg in [model_args, data_args, training_args]:
        for key, value in vars(arg).items():
            if hasattr(config, key):
                setattr(arg, key, getattr(config, key))
    config_updater = ConfigUpdater(model_args, data_args, training_args)
    config_updater.update_from_wandb()

Way 2: manually update. I didn’t manually specify the “outdir” argument below, cause it would cause error saying there are conflicting outputdirs.

    if len(sys.argv) == 2 and sys.argv[1].endswith(".json"):
         model_args, data_args, training_args = parser.parse_json_file(json_file=os.path.abspath(sys.argv[1]))
    else:
        model_args, data_args, training_args = parser.parse_args_into_dataclasses()

    training_args=TrainingArguments(
        output_dir="output/",
        report_to="wandb")
    config = wandb.config
#    for arg in [model_args, data_args, training_args]:
#        for key, value in vars(arg).items():
#            if hasattr(config, key):
#                setattr(arg, key, getattr(config, key))
    #config_updater = ConfigUpdater(model_args, data_args, training_args)
    #config_updater.update_from_wandb()

    data_args.train_file = os.path.join(wandb.config.datadir, "train.csv")
    data_args.dev_file = os.path.join(wandb.config.datadir,"dev.csv")
    data_args.test_file = os.path.join(wandb.config.datadir, "test.csv")
    # what is in train args: https://github.com/huggingface/transformers/blob/main/src/transformers/training_args.py
    wandb.config.update({
    "model_name_or_path": model_args.model_name_or_path,
    "weight_decay": training_args.weight_decay,
    "learning_rate": training_args.learning_rate,
    "datadir": data_args.datadir,
    "max_seq_length": data_args.max_seq_length,
    "do_lower_case": data_args.do_lower_case,
    "pad_to_max_length": data_args.pad_to_max_length,
    "per_device_train_batch_size": training_args.per_device_train_batch_size,
    "per_device_eval_batch_size": training_args.per_device_eval_batch_size,
    "label_all_tokens": training_args.label_all_tokens,
    "load_best_model_at_end": training_args.load_best_model_at_end,
    "save_strategy": training_args.save_strategy,
    "evaluation_strategy": training_args.evaluation_strategy,
    "logging_strategy": training_args.logging_strategy,
    "train_file": data_args.train_file,
    "validation_file": data_args.validation_file,
    "test_file": data_args.test_file,
    "do_train": training_args.do_train,
    "do_predict": training_args.do_predict,
    "do_eval": training_args.do_eval,
    "num_train_epochs": training_args.num_train_epochs,
    "text_column_name": data_args.text_column_name,
    "label_column_name": data_args.label_column_name,
    "seed": training_args.seed
     })

    training_args.output_dir = os.path.join(wandb.config.output_dir, wandb.run.id)

Hey @xuperx, thanks for sharing this and apologies about the delay! What if you try to add -- output_dir=output to the command? Also, would you mind sharing your wandb --version?

Hi @xuperx , I wanted to follow up on this request. Please let us know if we can be of further assistance or if your issue has been resolved.

Hi @xuperx, since we have not heard back from you we are going to close this request. If you would like to re-open the conversation, please let us know!