Hi all,
I am facing a problem while I am training my model.
the training crashed due to wandb run.
could anyone explain why this error happened and how can I avoid it.
And the following is the wandb debug-internal.log:
2023-07-30 21:32:45,718 INFO StreamThr :3775545 [internal.py:wandb_internal():86] W&B internal server running at pid: 3775545, started at: 2023-07-30 21:32:45.716165
2023-07-30 21:32:45,721 DEBUG HandlerThread:3775545 [handler.py:handle_request():144] handle_request: status
2023-07-30 21:32:45,725 INFO WriterThread:3775545 [datastore.py:open_for_write():85] open: /ai/mnt/code/YOLOX/wandb/run-20230730_213245-42ltpe8i/run-42ltpe8i.wandb
2023-07-30 21:32:45,728 DEBUG SenderThread:3775545 [sender.py:send():379] send: header
2023-07-30 21:32:45,800 DEBUG SenderThread:3775545 [sender.py:send():379] send: run
2023-07-30 21:32:45,823 INFO SenderThread:3775545 [sender.py:_maybe_setup_resume():758] checking resume status for None/YOLOX/42ltpe8i
2023-07-30 21:32:46,822 INFO SenderThread:3775545 [dir_watcher.py:__init__():211] watching files in: /ai/mnt/code/YOLOX/wandb/run-20230730_213245-42ltpe8i/files
2023-07-30 21:32:46,822 INFO SenderThread:3775545 [sender.py:_start_run_threads():1122] run started: 42ltpe8i with start time 1690723965.717156
2023-07-30 21:32:46,822 DEBUG SenderThread:3775545 [sender.py:send_request():406] send_request: summary_record
2023-07-30 21:32:46,824 INFO SenderThread:3775545 [sender.py:_save_file():1376] saving file wandb-summary.json with policy end
2023-07-30 21:32:46,839 DEBUG HandlerThread:3775545 [handler.py:handle_request():144] handle_request: check_version
2023-07-30 21:32:46,840 DEBUG SenderThread:3775545 [sender.py:send_request():406] send_request: check_version
2023-07-30 21:32:47,276 DEBUG HandlerThread:3775545 [handler.py:handle_request():144] handle_request: run_start
2023-07-30 21:32:47,291 DEBUG HandlerThread:3775545 [system_info.py:__init__():31] System info init
2023-07-30 21:32:47,291 DEBUG HandlerThread:3775545 [system_info.py:__init__():46] System info init done
2023-07-30 21:32:47,291 INFO HandlerThread:3775545 [system_monitor.py:start():181] Starting system monitor
2023-07-30 21:32:47,292 INFO SystemMonitor:3775545 [system_monitor.py:_start():145] Starting system asset monitoring threads
2023-07-30 21:32:47,292 INFO HandlerThread:3775545 [system_monitor.py:probe():201] Collecting system info
2023-07-30 21:32:47,292 INFO SystemMonitor:3775545 [interfaces.py:start():190] Started cpu monitoring
2023-07-30 21:32:47,293 INFO SystemMonitor:3775545 [interfaces.py:start():190] Started disk monitoring
2023-07-30 21:32:47,294 INFO SystemMonitor:3775545 [interfaces.py:start():190] Started gpu monitoring
2023-07-30 21:32:47,295 INFO SystemMonitor:3775545 [interfaces.py:start():190] Started memory monitoring
2023-07-30 21:32:47,295 INFO SystemMonitor:3775545 [interfaces.py:start():190] Started network monitoring
2023-07-30 21:32:47,448 DEBUG HandlerThread:3775545 [system_info.py:probe():195] Probing system
2023-07-30 21:32:47,453 DEBUG HandlerThread:3775545 [system_info.py:_probe_git():180] Probing git
2023-07-30 21:32:47,485 DEBUG HandlerThread:3775545 [system_info.py:_probe_git():188] Probing git done
2023-07-30 21:32:47,486 DEBUG HandlerThread:3775545 [system_info.py:probe():240] Probing system done
2023-07-30 21:32:47,486 DEBUG HandlerThread:3775545 [system_monitor.py:probe():210] {'os': 'Linux-5.8.0-43-generic-x86_64-with-glibc2.31', 'python': '3.9.1', 'heartbeatAt': '2023-07-30T13:32:47.449454', 'startedAt': '2023-07-30T13:32:45.684482', 'docker': None, 'cuda': None, 'args': (), 'state': 'running', 'program': '/ai/mnt/code/YOLOX/train.py', 'codePath': 'train.py', 'git': {'remote': , 'commit': 'e73eb9f8b3c0c70493bcc38f4fcf3fbdfaa6da07'}, 'email': 'x, 'root': '/ai/mnt/code/YOLOX', 'host': 'ai', 'username': 'root', 'executable': '/root/miniconda3/bin/python', 'cpu_count': 64, 'cpu_count_logical': 128, 'cpu_freq': {'current': 977.1614218749996, 'min': 800.0, 'max': 3400.0}, 'cpu_freq_per_core': [{'current': 839.44, 'min': 800.0, 'max': 3400.0}, {'current': 799.629, 'min': 800.0, 'max': 3400.0},], 'disk': {'total': 7096.5516357421875, 'used': 371.7781410217285}, 'gpu': 'NVIDIA GeForce RTX 3090', 'gpu_count': 1, 'gpu_devices': [{'name': 'NVIDIA GeForce RTX 3090', 'memory_total': 25769803776}], 'memory': {'total': 48.0}}
2023-07-30 21:32:47,487 INFO HandlerThread:3775545 [system_monitor.py:probe():211] Finished collecting system info
2023-07-30 21:32:47,487 INFO HandlerThread:3775545 [system_monitor.py:probe():214] Publishing system info
2023-07-30 21:32:47,487 DEBUG HandlerThread:3775545 [system_info.py:_save_pip():51] Saving list of pip packages installed into the current environment
2023-07-30 21:32:47,489 DEBUG HandlerThread:3775545 [system_info.py:_save_pip():67] Saving pip packages done
2023-07-30 21:32:47,489 DEBUG HandlerThread:3775545 [system_info.py:_save_conda():74] Saving list of conda packages installed into the current environment
2023-07-30 21:32:47,830 INFO Thread-12 :3775545 [dir_watcher.py:_on_file_created():272] file/dir created: /ai/mnt/code/YOLOX/wandb/run-20230730_213245-42ltpe8i/files/requirements.txt
2023-07-30 21:32:47,831 INFO Thread-12 :3775545 [dir_watcher.py:_on_file_created():272] file/dir created: /ai/mnt/code/YOLOX/wandb/run-20230730_213245-42ltpe8i/files/wandb-summary.json
2023-07-30 21:32:47,832 INFO Thread-12 :3775545 [dir_watcher.py:_on_file_created():272] file/dir created: /ai/mnt/code/YOLOX/wandb/run-20230730_213245-42ltpe8i/files/conda-environment.yaml
2023-07-30 21:32:48,829 INFO Thread-12 :3775545 [dir_watcher.py:_on_file_modified():289] file/dir modified: /ai/mnt/code/YOLOX/wandb/run-20230730_213245-42ltpe8i/files/requirements.txt
2023-07-30 21:32:48,830 INFO Thread-12 :3775545 [dir_watcher.py:_on_file_modified():289] file/dir modified: /ai/mnt/code/YOLOX/wandb/run-20230730_213245-42ltpe8i/files/conda-environment.yaml
2023-07-30 21:32:52,509 DEBUG HandlerThread:3775545 [system_info.py:_save_conda():86] Saving conda packages done
2023-07-30 21:32:52,514 INFO HandlerThread:3775545 [system_monitor.py:probe():216] Finished publishing system info
2023-07-30 21:32:52,526 DEBUG HandlerThread:3775545 [handler.py:handle_request():144] handle_request: status_report
2023-07-30 21:32:52,527 DEBUG HandlerThread:3775545 [handler.py:handle_request():144] handle_request: keepalive
2023-07-30 21:32:52,528 DEBUG SenderThread:3775545 [sender.py:send():379] send: files
2023-07-30 21:32:52,528 INFO SenderThread:3775545 [sender.py:_save_file():1376] saving file wandb-metadata.json with policy now
2023-07-30 21:32:52,541 DEBUG HandlerThread:3775545 [handler.py:handle_request():144] handle_request: stop_status
2023-07-30 21:32:52,542 DEBUG SenderThread:3775545 [sender.py:send_request():406] send_request: stop_status
2023-07-30 21:32:52,835 INFO Thread-12 :3775545 [dir_watcher.py:_on_file_created():272] file/dir created: /ai/mnt/code/YOLOX/wandb/run-20230730_213245-42ltpe8i/files/wandb-metadata.json
2023-07-30 21:32:52,948 DEBUG SenderThread:3775545 [sender.py:send():379] send: telemetry
2023-07-30 21:32:52,949 DEBUG SenderThread:3775545 [sender.py:send():379] send: config
2023-07-30 21:32:52,950 DEBUG SenderThread:3775545 [sender.py:send():379] send: metric
2023-07-30 21:32:52,951 DEBUG SenderThread:3775545 [sender.py:send():379] send: telemetry
2023-07-30 21:32:52,951 DEBUG SenderThread:3775545 [sender.py:send():379] send: metric
2023-07-30 21:32:52,951 WARNING SenderThread:3775545 [sender.py:send_metric():1327] Seen metric with glob (shouldn't happen)
2023-07-30 21:32:52,952 DEBUG SenderThread:3775545 [sender.py:send():379] send: metric
2023-07-30 21:32:52,952 DEBUG SenderThread:3775545 [sender.py:send():379] send: metric
2023-07-30 21:32:52,952 WARNING SenderThread:3775545 [sender.py:send_metric():1327] Seen metric with glob (shouldn't happen)
2023-07-30 21:32:53,158 ERROR wandb-upload_0:3775545 [internal_api.py:upload_file():2290] upload_file exception https://storage.googleapis.com/wandb-production.appspot.com/scottxu-wandb/YOLOX/42ltpe8i/wandb-metadata.json?Expires=1690810372&GoogleAccessId=gorilla-files-url-signer-man%40wandb-
2023-07-30 21:32:57,252 ERROR wandb-upload_0:3775545 [internal_api.py:upload_file():2290] upload_file exception https://storage.googleapis.com/wandb-production.appspot.com/scottxu-wandb/YOLOX/42ltpe8i/wandb-metadata.json?Expires=1690810372&GoogleAccessId=gorilla-files-url-signer-man%40wandb-'Connection reset by peer'))
2023-07-30 21:32:57,252 ERROR wandb-upload_0:3775545 [internal_api.py:upload_file():2292] upload_file request headers: {'User-Agent': 'python-requests/2.29.0', 'Accept-Encoding': 'gzip, deflate, br', 'Accept': '*/*', 'Connection': 'keep-alive', 'Content-Length': '14580'}
2023-07-30 21:32:57,253 ERROR wandb-upload_0:3775545 [internal_api.py:upload_file():2294] upload_file response body:
2023-07-30 21:32:57,253 INFO wandb-upload_0:3775545 [retry.py:__call__():172] Retry attempt failed:
Traceback (most recent call last):
File "/root/miniconda3/lib/python3.9/site-packages/urllib3/connectionpool.py", line 703, in urlopen
httplib_response = self._make_request(
File "/root/miniconda3/lib/python3.9/site-packages/urllib3/connectionpool.py", line 386, in _make_request
self._validate_conn(conn)
File "/root/miniconda3/lib/python3.9/site-packages/urllib3/connectionpool.py", line 1042, in _validate_conn
conn.connect()
File "/root/miniconda3/lib/python3.9/site-packages/urllib3/connection.py", line 419, in connect
self.sock = ssl_wrap_socket(
File "/root/miniconda3/lib/python3.9/site-packages/urllib3/util/ssl_.py", line 449, in ssl_wrap_socket
ssl_sock = _ssl_wrap_socket_impl(
File "/root/miniconda3/lib/python3.9/site-packages/urllib3/util/ssl_.py", line 493, in _ssl_wrap_socket_impl
return ssl_context.wrap_socket(sock, server_hostname=server_hostname)
File "/root/miniconda3/lib/python3.9/ssl.py", line 500, in wrap_socket
return self.sslsocket_class._create(
File "/root/miniconda3/lib/python3.9/ssl.py", line 1040, in _create
self.do_handshake()
File "/root/miniconda3/lib/python3.9/ssl.py", line 1309, in do_handshake
self._sslobj.do_handshake()
ConnectionResetError: [Errno 104] Connection reset by peer
wandb.sdk.lib.retry.TransientError: None
2023-07-30 21:33:01,487 ERROR wandb-upload_0:3775545 [internal_api.py:upload_file():2290] upload_file exception https://storage.googleapis.com/wandb-production.appspot.com/scottxu-wandb/YOLOX/42ltpe8i/wandb-metadata.json?Expires=1690810372&GoogleAccessId=gorilla-files-url-signer-man%40wandb-: ('Connection aborted.', ConnectionResetError(104, 'Connection reset by peer'))
2023-07-30 21:33:01,487 ERROR wandb-upload_0:3775545 [internal_api.py:upload_file():2292] upload_file request headers: {'User-Agent': 'python-requests/2.29.0', 'Accept-Encoding': 'gzip, deflate, br', 'Accept': '*/*', 'Connection': 'keep-alive', 'Content-Length': '14580'}
2023-07-30 21:33:01,487 ERROR wandb-upload_0:3775545 [internal_api.py:upload_file():2294] upload_file response body:
2023-07-30 21:33:01,955 DEBUG HandlerThread:3775545 [handler.py:handle_request():144] handle_request: status_report
2023-07-30 21:33:06,955 DEBUG HandlerThread:3775545 [handler.py:handle_request():144] handle_request: status_report
2023-07-30 21:33:07,542 DEBUG HandlerThread:3775545 [handler.py:handle_request():144] handle_request: stop_status
2023-07-30 21:33:07,543 DEBUG SenderThread:3775545 [sender.py:send_request():406] send_request: stop_status
2023-07-30 21:33:10,363 INFO wandb-upload_0:3775545 [upload_job.py:push():131] Uploaded file /tmp/tmpct3libqbwandb/uwou19xg-wandb-metadata.json
2023-07-30 21:33:12,799 DEBUG HandlerThread:3775545 [handler.py:handle_request():144] handle_request: status_report
2023-07-30 21:33:15,972 DEBUG HandlerThread:3775545 [handler.py:handle_request():144] handle_request: log_artifact
2023-07-30 21:33:15,974 DEBUG SenderThread:3775545 [sender.py:send_request():406] send_request: log_artifact
2023-07-30 21:33:17,317 INFO SenderThread:3775545 [sender.py:send_request_log_artifact():1442] logged artifact validation_images - {'id': 'QXJ0aWZhY3Q6NDg2ODcwNDEz', 'digest': 'c2b2d644acc9d8f163b2c555a6207f98', 'state': 'COMMITTED', 'aliases': [{'artifactCollectionName': 'validation_images', 'alias': 'latest'}, {'artifactCollectionName': 'validation_images', 'alias': 'v1'}], 'artifactSequence': {'id': 'QXJ0aWZhY3RDb2xsZWN0aW9uOjc2MzYyMTU4', 'latestArtifact': {'id': 'QXJ0aWZhY3Q6NDg2ODcwNDEz', 'versionIndex': 1}}, 'version': 'v1'}
2023-07-30 21:33:17,871 DEBUG SenderThread:3775545 [sender.py:send():379] send: exit
2023-07-30 21:33:17,872 INFO SenderThread:3775545 [sender.py:send_exit():584] handling exit code: 0
2023-07-30 21:33:17,872 INFO SenderThread:3775545 [sender.py:send_exit():586] handling runtime: 30
2023-07-30 21:33:17,877 INFO SenderThread:3775545 [sender.py:_save_file():1376] saving file wandb-summary.json with policy end
2023-07-30 21:33:17,877 INFO SenderThread:3775545 [sender.py:send_exit():592] send defer
2023-07-30 21:33:17,883 DEBUG HandlerThread:3775545 [handler.py:handle_request():144] handle_request: defer
2023-07-30 21:33:17,886 INFO HandlerThread:3775545 [handler.py:handle_request_defer():170] handle defer: 0
2023-07-30 21:33:17,887 DEBUG HandlerThread:3775545 [handler.py:handle_request():144] handle_request: status_report
2023-07-30 21:33:18,326 DEBUG SenderThread:3775545 [sender.py:send_request():406] send_request: defer
2023-07-30 21:33:18,327 INFO SenderThread:3775545 [sender.py:send_request_defer():608] handle sender defer: 0
2023-07-30 21:33:18,327 INFO SenderThread:3775545 [sender.py:transition_state():612] send defer: 1
2023-07-30 21:33:18,327 DEBUG HandlerThread:3775545 [handler.py:handle_request():144] handle_request: defer
2023-07-30 21:33:18,327 INFO HandlerThread:3775545 [handler.py:handle_request_defer():170] handle defer: 1
2023-07-30 21:33:18,328 DEBUG SenderThread:3775545 [sender.py:send_request():406] send_request: defer
2023-07-30 21:33:18,329 INFO SenderThread:3775545 [sender.py:send_request_defer():608] handle sender defer: 1
2023-07-30 21:33:18,329 INFO SenderThread:3775545 [sender.py:transition_state():612] send defer: 2
2023-07-30 21:33:18,329 DEBUG HandlerThread:3775545 [handler.py:handle_request():144] handle_request: defer
2023-07-30 21:33:18,329 INFO HandlerThread:3775545 [handler.py:handle_request_defer():170] handle defer: 2
2023-07-30 21:33:18,329 INFO HandlerThread:3775545 [system_monitor.py:finish():190] Stopping system monitor
2023-07-30 21:33:18,330 DEBUG SystemMonitor:3775545 [system_monitor.py:_start():159] Starting system metrics aggregation loop
2023-07-30 21:33:18,330 DEBUG SystemMonitor:3775545 [system_monitor.py:_start():166] Finished system metrics aggregation loop
2023-07-30 21:33:18,331 INFO HandlerThread:3775545 [interfaces.py:finish():202] Joined cpu monitor
2023-07-30 21:33:18,332 DEBUG SystemMonitor:3775545 [system_monitor.py:_start():170] Publishing last batch of metrics
2023-07-30 21:33:18,332 INFO HandlerThread:3775545 [interfaces.py:finish():202] Joined disk monitor
2023-07-30 21:33:18,356 INFO HandlerThread:3775545 [interfaces.py:finish():202] Joined gpu monitor
2023-07-30 21:33:18,357 INFO HandlerThread:3775545 [interfaces.py:finish():202] Joined memory monitor
2023-07-30 21:33:18,357 INFO HandlerThread:3775545 [interfaces.py:finish():202] Joined network monitor
2023-07-30 21:33:18,358 DEBUG SenderThread:3775545 [sender.py:send_request():406] send_request: defer
2023-07-30 21:33:18,359 INFO SenderThread:3775545 [sender.py:send_request_defer():608] handle sender defer: 2
2023-07-30 21:33:18,359 INFO SenderThread:3775545 [sender.py:transition_state():612] send defer: 3
2023-07-30 21:33:18,359 DEBUG SenderThread:3775545 [sender.py:send():379] send: stats
2023-07-30 21:33:18,359 DEBUG HandlerThread:3775545 [handler.py:handle_request():144] handle_request: defer
2023-07-30 21:33:18,362 INFO HandlerThread:3775545 [handler.py:handle_request_defer():170] handle defer: 3
2023-07-30 21:33:18,363 DEBUG SenderThread:3775545 [sender.py:send_request():406] send_request: defer
2023-07-30 21:33:18,363 INFO SenderThread:3775545 [sender.py:send_request_defer():608] handle sender defer: 3
2023-07-30 21:33:18,364 INFO SenderThread:3775545 [sender.py:transition_state():612] send defer: 4
2023-07-30 21:33:18,364 DEBUG HandlerThread:3775545 [handler.py:handle_request():144] handle_request: defer
2023-07-30 21:33:18,364 INFO HandlerThread:3775545 [handler.py:handle_request_defer():170] handle defer: 4
2023-07-30 21:33:18,365 DEBUG SenderThread:3775545 [sender.py:send_request():406] send_request: defer
2023-07-30 21:33:18,365 INFO SenderThread:3775545 [sender.py:send_request_defer():608] handle sender defer: 4
2023-07-30 21:33:18,365 INFO SenderThread:3775545 [sender.py:transition_state():612] send defer: 5
2023-07-30 21:33:18,365 DEBUG HandlerThread:3775545 [handler.py:handle_request():144] handle_request: defer
2023-07-30 21:33:18,365 INFO HandlerThread:3775545 [handler.py:handle_request_defer():170] handle defer: 5
2023-07-30 21:33:18,366 DEBUG SenderThread:3775545 [sender.py:send():379] send: summary
2023-07-30 21:33:18,367 INFO SenderThread:3775545 [sender.py:_save_file():1376] saving file wandb-summary.json with policy end
2023-07-30 21:33:18,368 DEBUG SenderThread:3775545 [sender.py:send_request():406] send_request: defer
2023-07-30 21:33:18,368 INFO SenderThread:3775545 [sender.py:send_request_defer():608] handle sender defer: 5
2023-07-30 21:33:18,368 INFO SenderThread:3775545 [sender.py:transition_state():612] send defer: 6
2023-07-30 21:33:18,368 DEBUG HandlerThread:3775545 [handler.py:handle_request():144] handle_request: defer
2023-07-30 21:33:18,369 INFO HandlerThread:3775545 [handler.py:handle_request_defer():170] handle defer: 6
2023-07-30 21:33:18,369 DEBUG SenderThread:3775545 [sender.py:send_request():406] send_request: defer
2023-07-30 21:33:18,369 INFO SenderThread:3775545 [sender.py:send_request_defer():608] handle sender defer: 6
2023-07-30 21:33:18,370 INFO SenderThread:3775545 [sender.py:transition_state():612] send defer: 7
2023-07-30 21:33:18,370 DEBUG HandlerThread:3775545 [handler.py:handle_request():144] handle_request: status_report
2023-07-30 21:33:18,370 DEBUG HandlerThread:3775545 [handler.py:handle_request():144] handle_request: defer
2023-07-30 21:33:18,370 INFO HandlerThread:3775545 [handler.py:handle_request_defer():170] handle defer: 7
2023-07-30 21:33:18,371 DEBUG SenderThread:3775545 [sender.py:send_request():406] send_request: defer
2023-07-30 21:33:18,371 INFO SenderThread:3775545 [sender.py:send_request_defer():608] handle sender defer: 7
2023-07-30 21:33:18,371 INFO SenderThread:3775545 [sender.py:transition_state():612] send defer: 8
2023-07-30 21:33:18,371 DEBUG HandlerThread:3775545 [handler.py:handle_request():144] handle_request: defer
2023-07-30 21:33:18,371 INFO HandlerThread:3775545 [handler.py:handle_request_defer():170] handle defer: 8
2023-07-30 21:33:18,372 DEBUG SenderThread:3775545 [sender.py:send_request():406] send_request: defer
2023-07-30 21:33:18,372 INFO SenderThread:3775545 [sender.py:send_request_defer():608] handle sender defer: 8
2023-07-30 21:33:18,372 INFO SenderThread:3775545 [job_builder.py:build():280] Attempting to build job artifact
2023-07-30 21:33:18,374 INFO SenderThread:3775545 [job_builder.py:_get_source_type():389] is repo sourced job
2023-07-30 21:33:18,376 INFO SenderThread:3775545 [job_builder.py:build():363] adding wandb-job metadata file
2023-07-30 21:33:18,380 INFO SenderThread:3775545 [sender.py:transition_state():612] send defer: 9
2023-07-30 21:33:18,381 DEBUG HandlerThread:3775545 [handler.py:handle_request():144] handle_request: defer
2023-07-30 21:33:18,381 DEBUG SenderThread:3775545 [sender.py:send():379] send: artifact
2023-07-30 21:33:18,381 INFO HandlerThread:3775545 [handler.py:handle_request_defer():170] handle defer: 9
2023-07-30 21:33:18,871 DEBUG HandlerThread:3775545 [handler.py:handle_request():144] handle_request: poll_exit
2023-07-30 21:33:18,874 INFO Thread-12 :3775545 [dir_watcher.py:_on_file_modified():289] file/dir modified: /ai/mnt/code/YOLOX/wandb/run-20230730_213245-42ltpe8i/files/config.yaml
2023-07-30 21:33:18,874 INFO Thread-12 :3775545 [dir_watcher.py:_on_file_modified():289] file/dir modified: /ai/mnt/code/YOLOX/wandb/run-20230730_213245-42ltpe8i/files/wandb-summary.json
2023-07-30 21:33:19,913 ERROR wandb-upload_1:3775545 [internal_api.py:upload_file():2290] upload_file exception https://storage.googleapis.com/wandb-artifacts-prod/wandb_artifacts/76361891/527263450/d2b2332392269358dec15852e007353c?Expires=1690810399&GoogleAccessId=gorilla-files-url-signer-man%40wandb-: ('Connection aborted.', ConnectionResetError(104, 'Connection reset by peer'))
2023-07-30 21:33:19,914 ERROR wandb-upload_1:3775545 [internal_api.py:upload_file():2292] upload_file request headers: {'User-Agent': 'python-requests/2.29.0', 'Accept-Encoding': 'gzip, deflate, br', 'Accept': '*/*', 'Connection': 'keep-alive', 'Content-MD5': '0rIzI5Imk1jewVhS4Ac1PA==', 'Content-Type': 'application/json', 'Content-Length': '10080'}
2023-07-30 21:33:19,914 ERROR wandb-upload_1:3775545 [internal_api.py:upload_file():2294] upload_file response body:
2023-07-30 21:33:20,338 INFO wandb-upload_0:3775545 [upload_job.py:push():89] Uploaded file /root/.local/share/wandb/artifacts/staging/tmpwnmf_iz9
2023-07-30 21:33:21,197 ERROR wandb-upload_1:3775545 [internal_api.py:upload_file():2290] upload_file exception https://storage.googleapis.com/wandb-artifacts-prod/wandb_artifacts/76361891/527263450/d2b2332392269358dec15852e007353c?Expires=1690810399&GoogleAccessId=gorilla-files-url-signer-man%40wandb-: ('Connection aborted.', ConnectionResetError(104, 'Connection reset by peer'))
2023-07-30 21:33:21,198 ERROR wandb-upload_1:3775545 [internal_api.py:upload_file():2292] upload_file request headers: {'User-Agent': 'python-requests/2.29.0', 'Accept-Encoding': 'gzip, deflate, br', 'Accept': '*/*', 'Connection': 'keep-alive', 'Content-MD5': '0rIzI5Imk1jewVhS4Ac1PA==', 'Content-Type': 'application/json', 'Content-Length': '10080'}
2023-07-30 21:33:21,198 ERROR wandb-upload_1:3775545 [internal_api.py:upload_file():2294] upload_file response body:
2023-07-30 21:33:23,494 ERROR wandb-upload_1:3775545 [internal_api.py:upload_file():2290] upload_file exception https://storage.googleapis.com/wandb-artifacts-prod/wandb_artifacts/76361891/527263450/d2b2332392269358dec15852e007353c?Expires=1690810399&GoogleAccessId=gorilla-files-url-signer-man%40wandb-production.iam.gserviceaccount.com&Signature=X6HJ%2FHtwdMO8K9ht21fWYUr0NtQOY9yxvPcSRDxVaLmB%2BKZOCwqtaAikdVU8q5I%2BGWhRQLhSo1UV6mN4FLlQm2F2a%2BJ06VR5IsjsywA1iWDNMSh2LEl3aNO8edgmr%2FSl4nSKzB4zdTbXvTIQhHEMTWPRnm6miS%2BhfWF0cPYCXegQIjigHPXAI%2Bm3Xp08jlha8CEEJyF9rA80eEOU5abXcnUojNtY9oCaUkxveEKzCD45f7ZBkmzL4na66Zvpi41xfTAuKYADYAL9rrHIf4y%2Bs2hzeBb3SG5cU91brfw7NZylL9kj1%2Frt7eZvR9hUNVGM%2BAy9NaLFln8BYKwvAOXusQ%3D%3D: ('Connection aborted.', ConnectionResetError(104, 'Connection reset by peer'))
2023-07-30 21:33:23,495 ERROR wandb-upload_1:3775545 [internal_api.py:upload_file():2292] upload_file request headers: {'User-Agent': 'python-requests/2.29.0', 'Accept-Encoding': 'gzip, deflate, br', 'Accept': '*/*', 'Connection': 'keep-alive', 'Content-MD5': '0rIzI5Imk1jewVhS4Ac1PA==', 'Content-Type': 'application/json', 'Content-Length': '10080'}
2023-07-30 21:33:23,495 ERROR wandb-upload_1:3775545 [internal_api.py:upload_file():2294] upload_file response body:
2023-07-30 21:33:23,495 INFO wandb-upload_1:3775545 [retry.py:__call__():172] Retry attempt failed:
Traceback (most recent call last):
File "/root/miniconda3/lib/python3.9/site-packages/urllib3/connectionpool.py", line 703, in urlopen
httplib_response = self._make_request(
File "/root/miniconda3/lib/python3.9/site-packages/urllib3/connectionpool.py", line 386, in _make_request
self._validate_conn(conn)
File "/root/miniconda3/lib/python3.9/site-packages/urllib3/connectionpool.py", line 1042, in _validate_conn
conn.connect()
File "/root/miniconda3/lib/python3.9/site-packages/urllib3/connection.py", line 419, in connect
self.sock = ssl_wrap_socket(
File "/root/miniconda3/lib/python3.9/site-packages/urllib3/util/ssl_.py", line 449, in ssl_wrap_socket
ssl_sock = _ssl_wrap_socket_impl(
File "/root/miniconda3/lib/python3.9/site-packages/urllib3/util/ssl_.py", line 493, in _ssl_wrap_socket_impl
return ssl_context.wrap_socket(sock, server_hostname=server_hostname)
File "/root/miniconda3/lib/python3.9/ssl.py", line 500, in wrap_socket
return self.sslsocket_class._create(
File "/root/miniconda3/lib/python3.9/ssl.py", line 1040, in _create
self.do_handshake()
File "/root/miniconda3/lib/python3.9/ssl.py", line 1309, in do_handshake
self._sslobj.do_handshake()
ConnectionResetError: [Errno 104] Connection reset by peer
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/root/miniconda3/lib/python3.9/site-packages/requests/adapters.py", line 487, in send
resp = conn.urlopen(
File "/root/miniconda3/lib/python3.9/site-packages/urllib3/connectionpool.py", line 787, in urlopen
retries = retries.increment(
File "/root/miniconda3/lib/python3.9/site-packages/urllib3/util/retry.py", line 550, in increment
raise six.reraise(type(error), error, _stacktrace)
File "/root/miniconda3/lib/python3.9/site-packages/urllib3/packages/six.py", line 769, in reraise
raise value.with_traceback(tb)
File "/root/miniconda3/lib/python3.9/site-packages/urllib3/connectionpool.py", line 703, in urlopen
httplib_response = self._make_request(
File "/root/miniconda3/lib/python3.9/site-packages/urllib3/connectionpool.py", line 386, in _make_request
self._validate_conn(conn)
File "/root/miniconda3/lib/python3.9/site-packages/urllib3/connectionpool.py", line 1042, in _validate_conn
conn.connect()
File "/root/miniconda3/lib/python3.9/site-packages/urllib3/connection.py", line 419, in connect
self.sock = ssl_wrap_socket(
File "/root/miniconda3/lib/python3.9/site-packages/urllib3/util/ssl_.py", line 449, in ssl_wrap_socket
ssl_sock = _ssl_wrap_socket_impl(
File "/root/miniconda3/lib/python3.9/site-packages/urllib3/util/ssl_.py", line 493, in _ssl_wrap_socket_impl
return ssl_context.wrap_socket(sock, server_hostname=server_hostname)
File "/root/miniconda3/lib/python3.9/ssl.py", line 500, in wrap_socket
return self.sslsocket_class._create(
File "/root/miniconda3/lib/python3.9/ssl.py", line 1040, in _create
self.do_handshake()
File "/root/miniconda3/lib/python3.9/ssl.py", line 1309, in do_handshake
self._sslobj.do_handshake()
urllib3.exceptions.ProtocolError: ('Connection aborted.', ConnectionResetError(104, 'Connection reset by peer'))
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/root/miniconda3/lib/python3.9/site-packages/wandb/sdk/internal/internal_api.py", line 2285, in upload_file
response = self._upload_file_session.put(
File "/root/miniconda3/lib/python3.9/site-packages/requests/sessions.py", line 647, in put
return self.request("PUT", url, data=data, **kwargs)
File "/root/miniconda3/lib/python3.9/site-packages/requests/sessions.py", line 587, in request
resp = self.send(prep, **send_kwargs)
File "/root/miniconda3/lib/python3.9/site-packages/requests/sessions.py", line 701, in send
r = adapter.send(request, **kwargs)
File "/root/miniconda3/lib/python3.9/site-packages/requests/adapters.py", line 502, in send
raise ConnectionError(err, request=request)
requests.exceptions.ConnectionError: ('Connection aborted.', ConnectionResetError(104, 'Connection reset by peer'))
2023-07-30 21:33:23,873 DEBUG HandlerThread:3775545 [handler.py:handle_request():144] handle_request: keepalive
2023-07-30 21:33:28,495 ERROR wandb-upload_1:3775545 [internal_api.py:upload_file():2290] upload_file exception https://storage.googleapis.com/wandb-artifacts-prod/wandb_artifacts/76361891/527263450/d2b2332392269358dec15852e007353c?Expires=1690810399&GoogleAccessId=gorilla-files-url-signer-man%40wandb-production.iam.gserviceaccount.com&Signature=X6HJ%2FHtwdMO8K9ht21fWYUr0NtQOY9yxvPcSRDxVaLmB%2BKZOCwqtaAikdVU8q5I%2BGWhRQLhSo1UV6mN4FLlQm2F2a%2BJ06VR5IsjsywA1iWDNMSh2LEl3aNO8edgmr%2FSl4nSKzB4zdTbXvTIQhHEMTWPRnm6miS%2BhfWF0cPYCXegQIjigHPXAI%2Bm3Xp08jlha8CEEJyF9rA80eEOU5abXcnUojNtY9oCaUkxveEKzCD45f7ZBkmzL4na66Zvpi41xfTAuKYADYAL9rrHIf4y%2Bs2hzeBb3SG5cU91brfw7NZylL9kj1%2Frt7eZvR9hUNVGM%2BAy9NaLFln8BYKwvAOXusQ%3D%3D: ('Connection aborted.', ConnectionResetError(104, 'Connection reset by peer'))
2023-07-30 21:33:28,495 ERROR wandb-upload_1:3775545 [internal_api.py:upload_file():2292] upload_file request headers: {'User-Agent': 'python-requests/2.29.0', 'Accept-Encoding': 'gzip, deflate, br', 'Accept': '*/*', 'Connection': 'keep-alive', 'Content-MD5': '0rIzI5Imk1jewVhS4Ac1PA==', 'Content-Type': 'application/json', 'Content-Length': '10080'}
2023-07-30 21:33:28,495 ERROR wandb-upload_1:3775545 [internal_api.py:upload_file():2294] upload_file response body:
2023-07-30 21:33:28,875 DEBUG HandlerThread:3775545 [handler.py:handle_request():144] handle_request: keepalive
2023-07-30 21:33:33,878 DEBUG HandlerThread:3775545 [handler.py:handle_request():144] handle_request: keepalive
2023-07-30 21:33:38,552 ERROR wandb-upload_1:3775545 [internal_api.py:upload_file():2290] upload_file exception https://storage.googleapis.com/wandb-artifacts-prod/wandb_artifacts/76361891/527263450/d2b2332392269358dec15852e007353c?Expires=1690810399&GoogleAccessId=gorilla-files-url-signer-man%40wandb-production.iam.gserviceaccount.com&Signature=X6HJ%2FHtwdMO8K9ht21fWYUr0NtQOY9yxvPcSRDxVaLmB%2BKZOCwqtaAikdVU8q5I%2BGWhRQLhSo1UV6mN4FLlQm2F2a%2BJ06VR5IsjsywA1iWDNMSh2LEl3aNO8edgmr%2FSl4nSKzB4zdTbXvTIQhHEMTWPRnm6miS%2BhfWF0cPYCXegQIjigHPXAI%2Bm3Xp08jlha8CEEJyF9rA80eEOU5abXcnUojNtY9oCaUkxveEKzCD45f7ZBkmzL4na66Zvpi41xfTAuKYADYAL9rrHIf4y%2Bs2hzeBb3SG5cU91brfw7NZylL9kj1%2Frt7eZvR9hUNVGM%2BAy9NaLFln8BYKwvAOXusQ%3D%3D: ('Connection aborted.', ConnectionResetError(104, 'Connection reset by peer'))
2023-07-30 21:33:38,554 ERROR wandb-upload_1:3775545 [internal_api.py:upload_file():2292] upload_file request headers: {'User-Agent': 'python-requests/2.29.0', 'Accept-Encoding': 'gzip, deflate, br', 'Accept': '*/*', 'Connection': 'keep-alive', 'Content-MD5': '0rIzI5Imk1jewVhS4Ac1PA==', 'Content-Type': 'application/json', 'Content-Length': '10080'}
2023-07-30 21:33:38,554 ERROR wandb-upload_1:3775545 [internal_api.py:upload_file():2294] upload_file response body:
2023-07-30 21:33:38,880 DEBUG HandlerThread:3775545 [handler.py:handle_request():144] handle_request: keepalive
2023-07-30 21:33:43,882 DEBUG HandlerThread:3775545 [handler.py:handle_request():144] handle_request: keepalive
Thank you!