First time user here. My runs take forever to finish up. It looks like it’s stuck, but it just takes a really long time to finish after you get the message mentioned in the title of this post. Can take up to 20 minutes.
I’m performing several runs where the scripts looks something like the following. Because I’m still getting things working I often start a run, but interrupt/crash it halfway, so sometimes run.finish()
isn’t called for a test run.
I’ve seen this problem reported loads of times, but haven’t really seen much answers. Just devs closing the threads because of inactivity
I hope the below information is enough to pinpoint the problem. I can’t upload all my code unfortunately.
wandb.init(project="mytest", name='test_run', entity='mycomp')
train(num_epochs) # which calls wandb.log({"some_var", same_val})
wandb.finish()
My debug.log ends in:
2023-06-28 16:01:43,966 INFO MainThread:345959 [wandb_run.py:_config_callback():1283] config_cb ('_wandb', 'visualize', 'batch confusion matrix') {'panel_type': 'Vega2', 'panel_config': {'panelDefId': 'wandb/confusion_matrix/v1', 'fieldSettings': {'Actual': 'Actual', 'Predicted': 'Predicted', 'nPredictions': 'nPredictions'}, 'stringSettings': {'title': ''}, 'transform': {'name': 'tableWithLeafColNames'}, 'userQuery': {'queryFields': [{'name': 'runSets', 'args': [{'name': 'runSets', 'value': '${runSets}'}], 'fields': [{'name': 'id', 'fields': []}, {'name': 'name', 'fields': []}, {'name': '_defaultColorIndex', 'fields': []}, {'name': 'summaryTable', 'args': [{'name': 'tableKey', 'value': 'batch confusion matrix_table'}], 'fields': []}]}]}}} None
2023-06-28 16:02:04,250 INFO MainThread:345959 [wandb_run.py:_finish():1890] finishing run robin-radar/IrisTorch_test/o6snnjiv
2023-06-28 16:02:04,250 INFO MainThread:345959 [jupyter.py:save_history():445] not saving jupyter history
2023-06-28 16:02:04,250 INFO MainThread:345959 [jupyter.py:save_ipynb():373] not saving jupyter notebook
2023-06-28 16:02:04,250 INFO MainThread:345959 [wandb_init.py:_jupyter_teardown():435] cleaning up jupyter logic
2023-06-28 16:02:04,250 INFO MainThread:345959 [wandb_run.py:_atexit_cleanup():2124] got exitcode: 0
2023-06-28 16:02:04,250 INFO MainThread:345959 [wandb_run.py:_restore():2107] restore
2023-06-28 16:02:04,250 INFO MainThread:345959 [wandb_run.py:_restore():2113] restore done
2023-06-28 16:23:37,067 INFO MainThread:345959 [wandb_run.py:_footer_history_summary_info():3467] rendering history
2023-06-28 16:23:37,067 INFO MainThread:345959 [wandb_run.py:_footer_history_summary_info():3499] rendering summary
2023-06-28 16:23:37,069 INFO MainThread:345959 [wandb_run.py:_footer_sync_info():3426] logging synced files
(The first line you see above is printed a load of times, so I didn’t copy the whole log)
debug.cli has a bunch of messages like this in it:
2023-06-28 16:01:44 WARNING Connection pool is full, discarding connection: api.wandb.ai. Connection pool size: 10
2023-06-28 16:01:44 WARNING Connection pool is full, discarding connection: storage.googleapis.com. Connection pool size: 10
My debug-internal.log tail looks like
2023-06-28 16:23:35,928 DEBUG SenderThread:349532 [sender.py:send_request():396] send_request: poll_exit
2023-06-28 16:23:35,928 DEBUG HandlerThread:349532 [handler.py:handle_request():144] handle_request: sampled_history
2023-06-28 16:23:35,929 DEBUG SenderThread:349532 [sender.py:send_request():396] send_request: server_info
2023-06-28 16:23:36,065 DEBUG HandlerThread:349532 [handler.py:handle_request():144] handle_request: shutdown
2023-06-28 16:23:36,065 INFO HandlerThread:349532 [handler.py:finish():854] shutting down handler
2023-06-28 16:23:36,928 INFO WriterThread:349532 [datastore.py:close():298] close: /home/tim.kuipers/dev/deeplearning/sandbox/tims_iris_drone/wandb/run-20230628_155311-o6snnjiv/run-o6snnjiv.wandb
2023-06-28 16:23:37,065 INFO SenderThread:349532 [sender.py:finish():1526] shutting down sender
2023-06-28 16:23:37,065 INFO SenderThread:349532 [file_pusher.py:finish():159] shutting down file pusher
2023-06-28 16:23:37,065 INFO SenderThread:349532 [file_pusher.py:join():164] waiting for file pusher
Please note that around 16:40 the jupyter cell was still running.
Ubuntu 20.04, jupyter notebook in VSCode, wandb version 0.15.4
Please advise.