When I do the following
checkpoint_exists = False
resume="allow" if checkpoint_exists else None, id="someid"
and the run with id “someid” already exists, then it just keeps adding new data to the existing data rather than discarding the existing data, which contradicts the documentation for
resume=None (init | Weights & Biases Documentation).
I would like this functionality to continue training for more epochs, logging to the same run id if the run is continued, otherwise start over.
Hello @mordigm !
Looking at our docs
resume='allow' will do the following:
“allow”: if id is set with init(id=“UNIQUE_ID”) or WANDB_RUN_ID=“UNIQUE_ID” and it is identical to a
previous run, wandb will automatically resume the run with that id. Otherwise, wandb will start a new run.
I believe I am a bit confused about the issue you are facing as it seems that you are describing the behavior of
someid exists, it will continue the run and add more data to the run. If the runid did not exist yet, then it will make a new run.
Is this not what you are seeing?
Hi Maximilian, since we have not heard back from you we are going to close this request. If you would like to re-open the conversation, please let us know!
checkpoint_exists is False in this case, so
resume=None. As I mentioned,
resume=None does not have the desired effect that is stated in the docs.