When I do the following
checkpoint_exists = False
wandb.init(project="somename",
config=opt,
resume="allow" if checkpoint_exists else None, id="someid"
)
and the run with id “someid” already exists, then it just keeps adding new data to the existing data rather than discarding the existing data, which contradicts the documentation for resume=None
(init | Weights & Biases Documentation).
I would like this functionality to continue training for more epochs, logging to the same run id if the run is continued, otherwise start over.
Hello @mordigm !
Looking at our docs resume='allow'
will do the following:
“allow”: if id is set with init(id=“UNIQUE_ID”) or WANDB_RUN_ID=“UNIQUE_ID” and it is identical to a
previous run, wandb will automatically resume the run with that id. Otherwise, wandb will start a new run.
I believe I am a bit confused about the issue you are facing as it seems that you are describing the behavior of resume='allow'
. Since someid
exists, it will continue the run and add more data to the run. If the runid did not exist yet, then it will make a new run.
Is this not what you are seeing?
Hi Maximilian, since we have not heard back from you we are going to close this request. If you would like to re-open the conversation, please let us know!
checkpoint_exists
is False in this case, so resume=None
. As I mentioned, resume=None
does not have the desired effect that is stated in the docs.