Wandb: Network error (TransientError), entering retry loop.help

wandb continuing upload artifacts but wasn’t success,
wandb: / 0.000 MB of 0.003 MB uploaded

debug.log ↓↓↓

2024-01-23 09:35:35,080 INFO    MainThread:142846 [wandb_setup.py:_flush():76] Current SDK version is 0.16.2
2024-01-23 09:35:35,080 INFO    MainThread:142846 [wandb_setup.py:_flush():76] Configure stats pid to 142846
2024-01-23 09:35:35,080 INFO    MainThread:142846 [wandb_setup.py:_flush():76] Loading settings from /home/zanzhuheng/.config/wandb/settings
2024-01-23 09:35:35,080 INFO    MainThread:142846 [wandb_setup.py:_flush():76] Loading settings from /home/zanzhuheng/Desktop/Working/wandb/settings
2024-01-23 09:35:35,080 INFO    MainThread:142846 [wandb_setup.py:_flush():76] Loading settings from environment variables: {}
2024-01-23 09:35:35,080 INFO    MainThread:142846 [wandb_setup.py:_flush():76] Applying setup settings: {'_disable_service': False}
2024-01-23 09:35:35,080 INFO    MainThread:142846 [wandb_setup.py:_flush():76] Inferring run settings from compute environment: {'program_relpath': 'NCOC/step3_train_hold_out.py', 'program_abspath': '/home/zanzhuheng/Desktop/Working/NCOC/step3_train_hold_out.py', 'program': '/home/zanzhuheng/Desktop/Working/NCOC/step3_train_hold_out.py'}
2024-01-23 09:35:35,081 INFO    MainThread:142846 [wandb_init.py:_log_setup():526] Logging user logs to /home/zanzhuheng/Desktop/Working/wandb/run-20240123_093535-3n65dtgd/logs/debug.log
2024-01-23 09:35:35,081 INFO    MainThread:142846 [wandb_init.py:_log_setup():527] Logging internal logs to /home/zanzhuheng/Desktop/Working/wandb/run-20240123_093535-3n65dtgd/logs/debug-internal.log
2024-01-23 09:35:35,081 INFO    MainThread:142846 [wandb_init.py:init():566] calling init triggers
2024-01-23 09:35:35,081 INFO    MainThread:142846 [wandb_init.py:init():573] wandb.init called with sweep_config: {}
config: {'seed': 42, 'version': 'v5', 'scheduler': 'CosineAnnealingLR', 'accumulate': True, 'label_names': ['HGSC', 'CCOC', 'LGSC', 'ECOC'], 'in_dim': 384, 'learning_rate': 0.0005, 'weight_decay': 0.0005, 'min_lr': 5e-05, 'patch_size': [224, 1.0], 'batch_size': 1, 'dataset': 'NCOC', 'epochs': 20, 'DEVICE': 'NVIDIA GeForce RTX 4090', 'model list': ['abmil', 'dsmil', 'transmil'], 'model_type': [8, 16], 'ratio': [0.5, 1.0], 'losses': ['CrossEntropy']}
2024-01-23 09:35:35,081 INFO    MainThread:142846 [wandb_init.py:init():616] starting backend
2024-01-23 09:35:35,081 INFO    MainThread:142846 [wandb_init.py:init():620] setting up manager
2024-01-23 09:35:35,082 INFO    MainThread:142846 [backend.py:_multiprocessing_setup():105] multiprocessing start_methods=fork,spawn,forkserver, using: spawn
2024-01-23 09:35:35,085 INFO    MainThread:142846 [wandb_init.py:init():628] backend started and connected
2024-01-23 09:35:35,087 INFO    MainThread:142846 [wandb_init.py:init():720] updated telemetry
2024-01-23 09:35:35,093 INFO    MainThread:142846 [wandb_init.py:init():753] communicating run to backend with 90.0 second timeout
2024-01-23 09:35:37,289 INFO    MainThread:142846 [wandb_run.py:_on_init():2254] communicating current version
2024-01-23 09:35:37,944 INFO    MainThread:142846 [wandb_run.py:_on_init():2263] got version response 
2024-01-23 09:35:37,944 INFO    MainThread:142846 [wandb_init.py:init():804] starting run threads in backend
2024-01-23 09:35:50,883 INFO    MainThread:142846 [wandb_run.py:_console_start():2233] atexit reg
2024-01-23 09:35:50,884 INFO    MainThread:142846 [wandb_run.py:_redirect():2088] redirect: wrap_raw
2024-01-23 09:35:50,884 INFO    MainThread:142846 [wandb_run.py:_redirect():2153] Wrapping output streams.
2024-01-23 09:35:50,884 INFO    MainThread:142846 [wandb_run.py:_redirect():2178] Redirects installed.
2024-01-23 09:35:50,885 INFO    MainThread:142846 [wandb_init.py:init():847] run started, returning control to user process

debug-internal.log

2024-01-23 10:01:49,777 DEBUG   HandlerThread:142898 [handler.py:handle_request():146] handle_request: keepalive
2024-01-23 10:01:54,779 DEBUG   HandlerThread:142898 [handler.py:handle_request():146] handle_request: keepalive
2024-01-23 10:01:59,780 DEBUG   HandlerThread:142898 [handler.py:handle_request():146] handle_request: keepalive
2024-01-23 10:02:04,782 DEBUG   HandlerThread:142898 [handler.py:handle_request():146] handle_request: keepalive
2024-01-23 10:02:09,784 DEBUG   HandlerThread:142898 [handler.py:handle_request():146] handle_request: keepalive
2024-01-23 10:02:12,206 ERROR   wandb-upload_0:142898 [internal_api.py:upload_file():2765] upload_file exception https://storage.googleapis.com/wandb-production.appspot.com/seeingtimes/SenNet%20+%20HOA%20-%20Hacking%20the%20Human%20Vasculature%20in%203D/3n65dtgd/wandb-metadata.json?Expires=1706060151&GoogleAccessId=gorilla-files-url-signer-man%40wandb-production.iam.gserviceaccount.com&Signature=qNF6DSRTyuibP6XMrDZkAOpMdRS%2BlvjNwFq7Q9te%2Bq4nSt%2FDwv0yA1%2FGgdd5inf5%2FcIE5RR%2Fdb5oI%2FcNs0gPIaEEc8rIFz56Wc%2BfRXhzzAbZ0G7rQaC%2BYYvJQ5nawzIr5K%2BDTEQhn0isi54EJz%2BOVY8XA2BXDEau2z48T3PXcBWQtrI5rhcVjvUkm6wgDQIrRGQBL4xoQUvgkU5PV6OMFlPiFvBGn60d0Nel3V%2FfzTtLyc3i6c7QgmddprHW1qwBE4rayQZtHu3KzWhcISrhKkXXv%2FYjkRGEEcHZQSS9G1nKIUZIujHpxv8kzcOgANucfkyE1dUooFB2TAe5nml5nQ%3D%3D: ('Connection aborted.', ConnectionResetError(104, 'Connection reset by peer'))
2024-01-23 10:02:12,206 ERROR   wandb-upload_0:142898 [internal_api.py:upload_file():2767] upload_file request headers: {'User-Agent': 'python-requests/2.28.1', 'Accept-Encoding': 'gzip, deflate', 'Accept': '*/*', 'Connection': 'keep-alive', 'Content-Length': '2652'}
2024-01-23 10:02:12,207 ERROR   wandb-upload_0:142898 [internal_api.py:upload_file():2769] upload_file response body: 
2024-01-23 10:02:12,608 ERROR   wandb-upload_2:142898 [internal_api.py:upload_file():2765] upload_file exception https://storage.googleapis.com/wandb-artifacts-prod/wandb_artifacts/132055356/699946003/9d25f23ff28988f102311cbe95cf907e?Expires=1706060155&GoogleAccessId=gorilla-files-url-signer-man%40wandb-production.iam.gserviceaccount.com&Signature=AIuwpr8BSjRxXjW35efWRIfQKsnd8soUIGA28ZhJzxHFiDAMqTer7lUXtuxhnOUhNMe69OcYT3ai9teRcD%2FWVD4drh99O0UNohXELE8XSrCOboWCbjtBUCdv4ZVI3%2F1DofsrjKeP4Y36wJwUh9toQzVQFaF1l6Humawo1ftwp1ScGPNNW%2BMnM22TQTw0IIDt7mZ5HobB3NwpjQd75DsBSKz59N2%2B4Vf81GtvWNckEQfVu41BghpJjLSL%2B4zri0iByp7sStz40%2Fg8LARH3xfZWbqmqP%2BUJkMgCmrq%2FBvpNzjtdvZwvWZjhykHbpa%2FUeq9zvdZUI1fgFU%2FW3eB1LyiGw%3D%3D: ('Connection aborted.', ConnectionResetError(104, 'Connection reset by peer'))
2024-01-23 10:02:12,608 ERROR   wandb-upload_2:142898 [internal_api.py:upload_file():2767] upload_file request headers: {'User-Agent': 'python-requests/2.28.1', 'Accept-Encoding': 'gzip, deflate', 'Accept': '*/*', 'Connection': 'keep-alive', 'Content-MD5': 'nSXyP/KJiPECMRy+lc+Qfg==', 'Content-Type': 'application/json', 'Content-Length': '3480'}
2024-01-23 10:02:12,608 ERROR   wandb-upload_2:142898 [internal_api.py:upload_file():2769] upload_file response body: 
2024-01-23 10:02:14,786 DEBUG   HandlerThread:142898 [handler.py:handle_request():146] handle_request: keepalive
2024-01-23 10:02:15,098 ERROR   wandb-upload_1:142898 [internal_api.py:upload_file():2765] upload_file exception https://storage.googleapis.com/wandb-artifacts-prod/wandb_artifacts/132055356/699946003/39649cc54c2f1eb89a404c5621898bb6?Expires=1706060155&GoogleAccessId=gorilla-files-url-signer-man%40wandb-production.iam.gserviceaccount.com&Signature=MNc0BHx82iE8flk7Yp7Be7J1W9LlqcoEqUOA1DCqzP0QuLG7w9ixwt4c907VOBzkxiY4RsMWAXm5eJvDZtHbUzLlVH3g4YX33o1AFkTYrnq%2F%2Bh5zKiNeRU7qcAsh5Xz2zms4iutLNswz0DdniFuqjOHhCZXclq%2B3GzwIHMhsxHnTPjJWcROJVsZOjqogXXlrwEyatzW8HCBzkL8wXxZ6WwWh9zk4WgP9Y3TCUzX8QE514s0wJfhKdhUiTlMOsVub8UMtpzvtiSTulJDH5rlpaPmQU2vxfYsWkpiRqZAumkTiTNMugCm3fdBsDCmYBR1dSKjR%2Bsi1pKjDcsV6cA6Zdw%3D%3D: ('Connection aborted.', ConnectionResetError(104, 'Connection reset by peer'))
2024-01-23 10:02:15,098 ERROR   wandb-upload_1:142898 [internal_api.py:upload_file():2767] upload_file request headers: {'User-Agent': 'python-requests/2.28.1', 'Accept-Encoding': 'gzip, deflate', 'Accept': '*/*', 'Connection': 'keep-alive', 'Content-MD5': 'OWScxUwvHriaQExWIYmLtg==', 'Content-Type': 'text/plain; charset=utf-8', 'Content-Length': '3795'}
2024-01-23 10:02:15,098 ERROR   wandb-upload_1:142898 [internal_api.py:upload_file():2769] upload_file response body: 
2024-01-23 10:02:19,789 DEBUG   HandlerThread:142898 [handler.py:handle_request():146] handle_request: keepalive
2024-01-23 10:02:24,791 DEBUG   HandlerThread:142898 [handler.py:handle_request():146] handle_request: keepalive
2024-01-23 10:02:29,792 DEBUG   HandlerThread:142898 [handler.py:handle_request():146] handle_request: keepalive
2024-01-23 10:02:34,794 DEBUG   HandlerThread:142898 [handler.py:handle_request():146] handle_request: keepalive
2024-01-23 10:02:39,796 DEBUG   HandlerThread:142898 [handler.py:handle_request():146] handle_request: keepalive
2024-01-23 10:02:44,798 DEBUG   HandlerThread:142898 [handler.py:handle_request():146] handle_request: keepalive

hey @seeingtimes - few questions to help me dig into this:

  • what wandb SDK version are you using?
  • are you running this locally or through wandb.ai? if you could provide a run link, that would be great!
  • do you know the approximate size of the artifact you’re trying to upload?

Thank you for your reply, I think my issue was just solved by rebooting my machine :rofl:

1 Like