I have model checkpoints that I am trying to log_artifact(), even with .wait() removed, the upload is taking very long, the directory of checkpoint is about 120+gb. There was a time where the upload was happening in 20 minutes, but now its been 1+ hrs and still the upload is stuck.
the debug.log, and debug-internal.log are over 300+gb, my IDE hangs when i try to open them and see whats going on