(Windows 11) `wandb.sweep()` gives ConnectionResetError: [WinError 10054]

Hello, sort of new to wandb. I’m trying sweeps for the first time - I had no problem creating and running a sweep from the web UI, then copy-pasting the command to start an agent from bash. However, starting it using wandb.agent() keeps giving me problems. It starts training, but I keep running into two problems:

  1. I keep getting the error below each time a new run starts - it seems to be originating from another thread created by wandb, so I’m not sure what to do about it. Also, the runs get logged in the sweep page, but none of them have the data I’ve logged (and each run has the “active” dot even once the script ends).
  2. I can’t figure out what wandb calls to make, and in what order, to get the agent to populate its randomized values into wandb.config. I would like to set some default config values (which are not specified by my sweep_config dict), but have the sweep agent update wandb.config with the randomized values created from the sweep_config dict (it seems like this is what happens when I run from bash). In the traceback below, you can see I print wandb.config right before the model is trained, and it simply uses the config dict I specified when calling wandb.init (it is not updated/overwritten by the agent).

Below is the full traceback. Thanks in advance for any ideas.

C:\Users\jacks\anaconda3\envs\ml-project\python.exe C:\Users\jacks\ml-project\training_script.py 
Using device: cuda
wandb: Currently logged in as: jacksth22. Use `wandb login --relogin` to force relogin
wandb: Tracking run with wandb version 0.13.3
wandb: Run data is saved locally in C:\Users\jacks\ml-project\wandb\run-20221004_162225-1aj4c6jt
wandb: Run `wandb offline` to turn off syncing.
wandb: Syncing run restful-dew-107
wandb:  View project at https://wandb.ai/jacksth22/<project>
wandb:  View run at https://wandb.ai/jacksth22/<project>/runs/1aj4c6jt
Loading data...done (elapsed=1.49s).
Converting data to tensors: 100%|██████████| 500/500 [00:03<00:00, 164.20it/s]
done (elapsed=3.05s).
Create sweep with ID: plq6uobc
Sweep URL: https://wandb.ai/jacksth22/uncategorized/sweeps/plq6uobc
=== Starting sweep agent ===
wandb: WARNING Calling wandb.login() after wandb.init() has no effect.
wandb: Waiting for W&B process to finish... (success).
wandb:                                                                                
wandb: Synced restful-dew-107: https://wandb.ai/jacksth22/<project>/runs/1aj4c6jt
wandb: Synced 6 W&B file(s), 0 media file(s), 0 artifact file(s) and 0 other file(s)
wandb: Find logs at: .\wandb\run-20221004_162225-1aj4c6jt\logs
wandb: Agent Starting Run: 1bqu30r5 with config:
wandb: 	batch_size: 16
wandb: 	dense_depth: 2
wandb: 	dense_width: 64
wandb: 	depth: 6
wandb: 	dropout: 0.25778082906860794
wandb: 	epochs: 15
wandb: 	gradient_clipping: 1
wandb: 	heads: 4
wandb: 	lr: 0.0008418633888555167
wandb: 	lr_warmup: 1960
wandb: 	max_seq_len: 64
wandb: 	optimizer: AdamW
wandb: 	use_max_pool: False
Model created with 90,714 parameters.
test: wandb.config: {'epochs': 3, 'batch_size': 16, 'test_size': 0.3, 'lr': 5, 'lr_warmup': 10000, 'optimizer': 'SGD', 'use_max_pool': True, 'embedding_dimension': 24, 'max_sequence_length': 512, 'heads': 8, 'depth': 10, 'rng_seed': 1, 'gradient_clipping': 1.0, 'dense_width': 64, 'dense_depth': 3, 'dropout': 0.2, 'log_dir': 'ml/logs/2022-10-04_16-22-22', 'using_small_dataset': True}
Training epoch 1/3:   0%|          | 0/22 [00:00<?, ?it/s]Exception in thread ChkStopThr:
Traceback (most recent call last):
  File "C:\Users\jacks\anaconda3\envs\ml-project\lib\threading.py", line 980, in _bootstrap_inner
    self.run()
  File "C:\Users\jacks\anaconda3\envs\ml-project\lib\threading.py", line 917, in run
    self._target(*self._args, **self._kwargs)
  File "C:\Users\jacks\anaconda3\envs\ml-project\lib\site-packages\wandb\sdk\wandb_run.py", line 190, in check_status
    status_response = self._interface.communicate_stop_status()
  File "C:\Users\jacks\anaconda3\envs\ml-project\lib\site-packages\wandb\sdk\interface\interface.py", line 128, in communicate_stop_status
    resp = self._communicate_stop_status(status)
  File "C:\Users\jacks\anaconda3\envs\ml-project\lib\site-packages\wandb\sdk\interface\interface_sock.py", line 69, in _communicate_stop_status
    data = super()._communicate_stop_status(status)
  File "C:\Users\jacks\anaconda3\envs\ml-project\lib\site-packages\wandb\sdk\interface\interface_shared.py", line 399, in _communicate_stop_status
    resp = self._communicate(req, local=True)
  File "C:\Users\jacks\anaconda3\envs\ml-project\lib\site-packages\wandb\sdk\interface\interface_shared.py", line 230, in _communicate
    return self._communicate_async(rec, local=local).get(timeout=timeout)
  File "C:\Users\jacks\anaconda3\envs\ml-project\lib\site-packages\wandb\sdk\interface\interface_sock.py", line 58, in _communicate_async
    future = self._router.send_and_receive(rec, local=local)
  File "C:\Users\jacks\anaconda3\envs\ml-project\lib\site-packages\wandb\sdk\interface\router.py", line 94, in send_and_receive
    self._send_message(rec)
  File "C:\Users\jacks\anaconda3\envs\ml-project\lib\site-packages\wandb\sdk\interface\router_sock.py", line 35, in _send_message
    self._sock_client.send_record_communicate(record)
  File "C:\Users\jacks\anaconda3\envs\ml-project\lib\site-packages\wandb\sdk\lib\sock_client.py", line 216, in send_record_communicate
    self.send_server_request(server_req)
  File "C:\Users\jacks\anaconda3\envs\ml-project\lib\site-packages\wandb\sdk\lib\sock_client.py", line 155, in send_server_request
    self._send_message(msg)
  File "C:\Users\jacks\anaconda3\envs\ml-project\lib\site-packages\wandb\sdk\lib\sock_client.py", line 152, in _send_message
    self._sendall_with_error_handle(header + data)
  File "C:\Users\jacks\anaconda3\envs\ml-project\lib\site-packages\wandb\sdk\lib\sock_client.py", line 130, in _sendall_with_error_handle
    sent = self._sock.send(data)
ConnectionResetError: [WinError 10054] An existing connection was forcibly closed by the remote host
Exception in thread NetStatThr:
Traceback (most recent call last):
  File "C:\Users\jacks\anaconda3\envs\ml-project\lib\threading.py", line 980, in _bootstrap_inner
    self.run()
  File "C:\Users\jacks\anaconda3\envs\ml-project\lib\threading.py", line 917, in run
    self._target(*self._args, **self._kwargs)
  File "C:\Users\jacks\anaconda3\envs\ml-project\lib\site-packages\wandb\sdk\wandb_run.py", line 172, in check_network_status
    status_response = self._interface.communicate_network_status()
  File "C:\Users\jacks\anaconda3\envs\ml-project\lib\site-packages\wandb\sdk\interface\interface.py", line 139, in communicate_network_status
    resp = self._communicate_network_status(status)
  File "C:\Users\jacks\anaconda3\envs\ml-project\lib\site-packages\wandb\sdk\interface\interface_sock.py", line 82, in _communicate_network_status
    data = super()._communicate_network_status(status)
  File "C:\Users\jacks\anaconda3\envs\ml-project\lib\site-packages\wandb\sdk\interface\interface_shared.py", line 409, in _communicate_network_status
    resp = self._communicate(req, local=True)
  File "C:\Users\jacks\anaconda3\envs\ml-project\lib\site-packages\wandb\sdk\interface\interface_shared.py", line 230, in _communicate
    return self._communicate_async(rec, local=local).get(timeout=timeout)
  File "C:\Users\jacks\anaconda3\envs\ml-project\lib\site-packages\wandb\sdk\interface\interface_sock.py", line 58, in _communicate_async
    future = self._router.send_and_receive(rec, local=local)
  File "C:\Users\jacks\anaconda3\envs\ml-project\lib\site-packages\wandb\sdk\interface\router.py", line 94, in send_and_receive
    self._send_message(rec)
  File "C:\Users\jacks\anaconda3\envs\ml-project\lib\site-packages\wandb\sdk\interface\router_sock.py", line 35, in _send_message
    self._sock_client.send_record_communicate(record)
  File "C:\Users\jacks\anaconda3\envs\ml-project\lib\site-packages\wandb\sdk\lib\sock_client.py", line 216, in send_record_communicate
    self.send_server_request(server_req)
  File "C:\Users\jacks\anaconda3\envs\ml-project\lib\site-packages\wandb\sdk\lib\sock_client.py", line 155, in send_server_request
    self._send_message(msg)
  File "C:\Users\jacks\anaconda3\envs\ml-project\lib\site-packages\wandb\sdk\lib\sock_client.py", line 152, in _send_message
    self._sendall_with_error_handle(header + data)
  File "C:\Users\jacks\anaconda3\envs\ml-project\lib\site-packages\wandb\sdk\lib\sock_client.py", line 130, in _sendall_with_error_handle
    sent = self._sock.send(data)
ConnectionResetError: [WinError 10054] An existing connection was forcibly closed by the remote host
Training epoch 1/3: 100%|██████████| 22/22 [00:06<00:00,  3.32it/s]
Testing epoch 1/3: 100%|██████████| 10/10 [00:00<00:00, 20.42it/s]
Training epoch 2/3:   0%|          | 0/22 [00:00<?, ?it/s]Train: acc =    9.43% | loss = 260.72%
Test:  acc =   16.67% | loss = 225.52%
Training epoch 2/3: 100%|██████████| 22/22 [00:03<00:00,  5.70it/s]
Testing epoch 2/3: 100%|██████████| 10/10 [00:00<00:00, 21.73it/s]
Training epoch 3/3:   0%|          | 0/22 [00:00<?, ?it/s]Train: acc =   11.60% | loss = 221.38%
Test:  acc =   13.33% | loss = 218.46%
Training epoch 3/3: 100%|██████████| 22/22 [00:03<00:00,  5.84it/s]
Testing epoch 3/3: 100%|██████████| 10/10 [00:00<00:00, 21.05it/s]
Train: acc =   12.40% | loss = 218.99%
Test:  acc =   13.33% | loss = 235.97%
Exception in thread Thread-14:
Traceback (most recent call last):
  File "C:\Users\jacks\anaconda3\envs\ml-project\lib\site-packages\wandb\agents\pyagent.py", line 299, in _run_job
    wandb.finish()
  File "C:\Users\jacks\anaconda3\envs\ml-project\lib\site-packages\wandb\sdk\wandb_run.py", line 3565, in finish
    wandb.run.finish(exit_code=exit_code, quiet=quiet)
  File "C:\Users\jacks\anaconda3\envs\ml-project\lib\site-packages\wandb\sdk\wandb_run.py", line 282, in wrapper
    return func(self, *args, **kwargs)
  File "C:\Users\jacks\anaconda3\envs\ml-project\lib\site-packages\wandb\sdk\wandb_run.py", line 245, in wrapper
    return func(self, *args, **kwargs)
  File "C:\Users\jacks\anaconda3\envs\ml-project\lib\site-packages\wandb\sdk\wandb_run.py", line 1713, in finish
    return self._finish(exit_code, quiet)
  File "C:\Users\jacks\anaconda3\envs\ml-project\lib\site-packages\wandb\sdk\wandb_run.py", line 1719, in _finish
    tel.feature.finish = True
  File "C:\Users\jacks\anaconda3\envs\ml-project\lib\site-packages\wandb\sdk\lib\telemetry.py", line 40, in __exit__
    self._run._telemetry_callback(self._obj)
  File "C:\Users\jacks\anaconda3\envs\ml-project\lib\site-packages\wandb\sdk\wandb_run.py", line 593, in _telemetry_callback
    self._telemetry_flush()
  File "C:\Users\jacks\anaconda3\envs\ml-project\lib\site-packages\wandb\sdk\wandb_run.py", line 604, in _telemetry_flush
    self._backend.interface._publish_telemetry(self._telemetry_obj)
  File "C:\Users\jacks\anaconda3\envs\ml-project\lib\site-packages\wandb\sdk\interface\interface_shared.py", line 78, in _publish_telemetry
    self._publish(rec)
  File "C:\Users\jacks\anaconda3\envs\ml-project\lib\site-packages\wandb\sdk\interface\interface_sock.py", line 51, in _publish
    self._sock_client.send_record_publish(record)
  File "C:\Users\jacks\anaconda3\envs\ml-project\lib\site-packages\wandb\sdk\lib\sock_client.py", line 221, in send_record_publish
    self.send_server_request(server_req)
  File "C:\Users\jacks\anaconda3\envs\ml-project\lib\site-packages\wandb\sdk\lib\sock_client.py", line 155, in send_server_request
    self._send_message(msg)
  File "C:\Users\jacks\anaconda3\envs\ml-project\lib\site-packages\wandb\sdk\lib\sock_client.py", line 152, in _send_message
    self._sendall_with_error_handle(header + data)
  File "C:\Users\jacks\anaconda3\envs\ml-project\lib\site-packages\wandb\sdk\lib\sock_client.py", line 130, in _sendall_with_error_handle
    sent = self._sock.send(data)
ConnectionResetError: [WinError 10054] An existing connection was forcibly closed by the remote host

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:\Users\jacks\anaconda3\envs\ml-project\lib\threading.py", line 980, in _bootstrap_inner
    self.run()
  File "C:\Users\jacks\anaconda3\envs\ml-project\lib\threading.py", line 917, in run
    self._target(*self._args, **self._kwargs)
  File "C:\Users\jacks\anaconda3\envs\ml-project\lib\site-packages\wandb\agents\pyagent.py", line 303, in _run_job
    wandb.finish(exit_code=1)
  File "C:\Users\jacks\anaconda3\envs\ml-project\lib\site-packages\wandb\sdk\wandb_run.py", line 3565, in finish
    wandb.run.finish(exit_code=exit_code, quiet=quiet)
  File "C:\Users\jacks\anaconda3\envs\ml-project\lib\site-packages\wandb\sdk\wandb_run.py", line 282, in wrapper
    return func(self, *args, **kwargs)
  File "C:\Users\jacks\anaconda3\envs\ml-project\lib\site-packages\wandb\sdk\wandb_run.py", line 245, in wrapper
    return func(self, *args, **kwargs)
  File "C:\Users\jacks\anaconda3\envs\ml-project\lib\site-packages\wandb\sdk\wandb_run.py", line 1713, in finish
    return self._finish(exit_code, quiet)
  File "C:\Users\jacks\anaconda3\envs\ml-project\lib\site-packages\wandb\sdk\wandb_run.py", line 1719, in _finish
    tel.feature.finish = True
  File "C:\Users\jacks\anaconda3\envs\ml-project\lib\site-packages\wandb\sdk\lib\telemetry.py", line 40, in __exit__
    self._run._telemetry_callback(self._obj)
  File "C:\Users\jacks\anaconda3\envs\ml-project\lib\site-packages\wandb\sdk\wandb_run.py", line 593, in _telemetry_callback
    self._telemetry_flush()
  File "C:\Users\jacks\anaconda3\envs\ml-project\lib\site-packages\wandb\sdk\wandb_run.py", line 604, in _telemetry_flush
    self._backend.interface._publish_telemetry(self._telemetry_obj)
  File "C:\Users\jacks\anaconda3\envs\ml-project\lib\site-packages\wandb\sdk\interface\interface_shared.py", line 78, in _publish_telemetry
    self._publish(rec)
  File "C:\Users\jacks\anaconda3\envs\ml-project\lib\site-packages\wandb\sdk\interface\interface_sock.py", line 51, in _publish
    self._sock_client.send_record_publish(record)
  File "C:\Users\jacks\anaconda3\envs\ml-project\lib\site-packages\wandb\sdk\lib\sock_client.py", line 221, in send_record_publish
    self.send_server_request(server_req)
  File "C:\Users\jacks\anaconda3\envs\ml-project\lib\site-packages\wandb\sdk\lib\sock_client.py", line 155, in send_server_request
    self._send_message(msg)
  File "C:\Users\jacks\anaconda3\envs\ml-project\lib\site-packages\wandb\sdk\lib\sock_client.py", line 152, in _send_message
    self._sendall_with_error_handle(header + data)
  File "C:\Users\jacks\anaconda3\envs\ml-project\lib\site-packages\wandb\sdk\lib\sock_client.py", line 130, in _sendall_with_error_handle
    sent = self._sock.send(data)
ConnectionResetError: [WinError 10054] An existing connection was forcibly closed by the remote host
wandb: Agent Starting Run: ly4ox06s with config:
wandb: 	batch_size: 16
wandb: 	dense_depth: 4
wandb: 	dense_width: 128
wandb: 	depth: 10
wandb: 	dropout: 0.3449039365016265
wandb: 	epochs: 15
wandb: 	gradient_clipping: 1
wandb: 	heads: 6
wandb: 	lr: 0.0007649760799562746
wandb: 	lr_warmup: 1057
wandb: 	max_seq_len: 64
wandb: 	optimizer: AdamW
wandb: 	use_max_pool: False
Model created with 90,714 parameters.
test: wandb.config: {'epochs': 3, 'batch_size': 16, 'test_size': 0.3, 'lr': 5, 'lr_warmup': 10000, 'optimizer': 'SGD', 'use_max_pool': True, 'embedding_dimension': 24, 'max_sequence_length': 512, 'heads': 8, 'depth': 10, 'rng_seed': 1, 'gradient_clipping': 1.0, 'dense_width': 64, 'dense_depth': 3, 'dropout': 0.2, 'log_dir': 'ml/logs/2022-10-04_16-22-22', 'using_small_dataset': True}
Training epoch 1/3: 100%|██████████| 22/22 [00:03<00:00,  5.86it/s]
Testing epoch 1/3: 100%|██████████| 10/10 [00:00<00:00, 21.62it/s]
Train: acc =   10.57% | loss = 247.77%
Test:  acc =   10.00% | loss = 227.83%
Training epoch 2/3: 100%|██████████| 22/22 [00:03<00:00,  5.84it/s]
Testing epoch 2/3: 100%|██████████| 10/10 [00:00<00:00, 21.09it/s]
Training epoch 3/3:   0%|          | 0/22 [00:00<?, ?it/s]Train: acc =   10.60% | loss = 236.37%
Test:  acc =   10.00% | loss = 231.93%
Training epoch 3/3: 100%|██████████| 22/22 [00:03<00:00,  5.83it/s]
Testing epoch 3/3: 100%|██████████| 10/10 [00:00<00:00, 21.15it/s]
Exception in thread Thread-15:
Traceback (most recent call last):
  File "C:\Users\jacks\anaconda3\envs\ml-project\lib\site-packages\wandb\agents\pyagent.py", line 299, in _run_job
    wandb.finish()
  File "C:\Users\jacks\anaconda3\envs\ml-project\lib\site-packages\wandb\sdk\wandb_run.py", line 3565, in finish
Train: acc =   10.40% | loss = 233.15%
Test:  acc =   13.33% | loss = 230.50%
    wandb.run.finish(exit_code=exit_code, quiet=quiet)
  File "C:\Users\jacks\anaconda3\envs\ml-project\lib\site-packages\wandb\sdk\wandb_run.py", line 282, in wrapper
    return func(self, *args, **kwargs)
  File "C:\Users\jacks\anaconda3\envs\ml-project\lib\site-packages\wandb\sdk\wandb_run.py", line 245, in wrapper
    return func(self, *args, **kwargs)
  File "C:\Users\jacks\anaconda3\envs\ml-project\lib\site-packages\wandb\sdk\wandb_run.py", line 1713, in finish
    return self._finish(exit_code, quiet)
  File "C:\Users\jacks\anaconda3\envs\ml-project\lib\site-packages\wandb\sdk\wandb_run.py", line 1719, in _finish
    tel.feature.finish = True
  File "C:\Users\jacks\anaconda3\envs\ml-project\lib\site-packages\wandb\sdk\lib\telemetry.py", line 40, in __exit__
    self._run._telemetry_callback(self._obj)
  File "C:\Users\jacks\anaconda3\envs\ml-project\lib\site-packages\wandb\sdk\wandb_run.py", line 593, in _telemetry_callback
    self._telemetry_flush()
  File "C:\Users\jacks\anaconda3\envs\ml-project\lib\site-packages\wandb\sdk\wandb_run.py", line 604, in _telemetry_flush
    self._backend.interface._publish_telemetry(self._telemetry_obj)
  File "C:\Users\jacks\anaconda3\envs\ml-project\lib\site-packages\wandb\sdk\interface\interface_shared.py", line 78, in _publish_telemetry
    self._publish(rec)
  File "C:\Users\jacks\anaconda3\envs\ml-project\lib\site-packages\wandb\sdk\interface\interface_sock.py", line 51, in _publish
    self._sock_client.send_record_publish(record)
  File "C:\Users\jacks\anaconda3\envs\ml-project\lib\site-packages\wandb\sdk\lib\sock_client.py", line 221, in send_record_publish
    self.send_server_request(server_req)
  File "C:\Users\jacks\anaconda3\envs\ml-project\lib\site-packages\wandb\sdk\lib\sock_client.py", line 155, in send_server_request
    self._send_message(msg)
  File "C:\Users\jacks\anaconda3\envs\ml-project\lib\site-packages\wandb\sdk\lib\sock_client.py", line 152, in _send_message
    self._sendall_with_error_handle(header + data)
  File "C:\Users\jacks\anaconda3\envs\ml-project\lib\site-packages\wandb\sdk\lib\sock_client.py", line 130, in _sendall_with_error_handle
    sent = self._sock.send(data)
ConnectionResetError: [WinError 10054] An existing connection was forcibly closed by the remote host

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:\Users\jacks\anaconda3\envs\ml-project\lib\threading.py", line 980, in _bootstrap_inner
    self.run()
  File "C:\Users\jacks\anaconda3\envs\ml-project\lib\threading.py", line 917, in run
    self._target(*self._args, **self._kwargs)
  File "C:\Users\jacks\anaconda3\envs\ml-project\lib\site-packages\wandb\agents\pyagent.py", line 303, in _run_job
    wandb.finish(exit_code=1)
  File "C:\Users\jacks\anaconda3\envs\ml-project\lib\site-packages\wandb\sdk\wandb_run.py", line 3565, in finish
    wandb.run.finish(exit_code=exit_code, quiet=quiet)
  File "C:\Users\jacks\anaconda3\envs\ml-project\lib\site-packages\wandb\sdk\wandb_run.py", line 282, in wrapper
    return func(self, *args, **kwargs)
  File "C:\Users\jacks\anaconda3\envs\ml-project\lib\site-packages\wandb\sdk\wandb_run.py", line 245, in wrapper
    return func(self, *args, **kwargs)
  File "C:\Users\jacks\anaconda3\envs\ml-project\lib\site-packages\wandb\sdk\wandb_run.py", line 1713, in finish
    return self._finish(exit_code, quiet)
  File "C:\Users\jacks\anaconda3\envs\ml-project\lib\site-packages\wandb\sdk\wandb_run.py", line 1719, in _finish
    tel.feature.finish = True
  File "C:\Users\jacks\anaconda3\envs\ml-project\lib\site-packages\wandb\sdk\lib\telemetry.py", line 40, in __exit__
    self._run._telemetry_callback(self._obj)
  File "C:\Users\jacks\anaconda3\envs\ml-project\lib\site-packages\wandb\sdk\wandb_run.py", line 593, in _telemetry_callback
    self._telemetry_flush()
  File "C:\Users\jacks\anaconda3\envs\ml-project\lib\site-packages\wandb\sdk\wandb_run.py", line 604, in _telemetry_flush
    self._backend.interface._publish_telemetry(self._telemetry_obj)
  File "C:\Users\jacks\anaconda3\envs\ml-project\lib\site-packages\wandb\sdk\interface\interface_shared.py", line 78, in _publish_telemetry
    self._publish(rec)
  File "C:\Users\jacks\anaconda3\envs\ml-project\lib\site-packages\wandb\sdk\interface\interface_sock.py", line 51, in _publish
    self._sock_client.send_record_publish(record)
  File "C:\Users\jacks\anaconda3\envs\ml-project\lib\site-packages\wandb\sdk\lib\sock_client.py", line 221, in send_record_publish
    self.send_server_request(server_req)
  File "C:\Users\jacks\anaconda3\envs\ml-project\lib\site-packages\wandb\sdk\lib\sock_client.py", line 155, in send_server_request
    self._send_message(msg)
  File "C:\Users\jacks\anaconda3\envs\ml-project\lib\site-packages\wandb\sdk\lib\sock_client.py", line 152, in _send_message
    self._sendall_with_error_handle(header + data)
  File "C:\Users\jacks\anaconda3\envs\ml-project\lib\site-packages\wandb\sdk\lib\sock_client.py", line 130, in _sendall_with_error_handle
    sent = self._sock.send(data)
ConnectionResetError: [WinError 10054] An existing connection was forcibly closed by the remote host
wandb: Agent Starting Run: viajyjrk with config:
wandb: 	batch_size: 16
wandb: 	dense_depth: 4
wandb: 	dense_width: 128
wandb: 	depth: 8
wandb: 	dropout: 0.06200113164824134
wandb: 	epochs: 15
wandb: 	gradient_clipping: 1
wandb: 	heads: 6
wandb: 	lr: 0.0006929869499125819
wandb: 	lr_warmup: 2917
wandb: 	max_seq_len: 512
wandb: 	optimizer: SGD
wandb: 	use_max_pool: False
Training epoch 1/3:   0%|          | 0/22 [00:00<?, ?it/s]Model created with 90,714 parameters.
test: wandb.config: {'epochs': 3, 'batch_size': 16, 'test_size': 0.3, 'lr': 5, 'lr_warmup': 10000, 'optimizer': 'SGD', 'use_max_pool': True, 'embedding_dimension': 24, 'max_sequence_length': 512, 'heads': 8, 'depth': 10, 'rng_seed': 1, 'gradient_clipping': 1.0, 'dense_width': 64, 'dense_depth': 3, 'dropout': 0.2, 'log_dir': 'ml/logs/2022-10-04_16-22-22', 'using_small_dataset': True}
Training epoch 1/3: 100%|██████████| 22/22 [00:03<00:00,  5.65it/s]
Testing epoch 1/3: 100%|██████████| 10/10 [00:00<00:00, 21.07it/s]
Train: acc =    9.43% | loss = 262.65%
Test:  acc =   10.00% | loss = 221.72%
Training epoch 2/3: 100%|██████████| 22/22 [00:03<00:00,  5.86it/s]
Testing epoch 2/3: 100%|██████████| 10/10 [00:00<00:00, 21.02it/s]
Train: acc =   11.60% | loss = 236.80%
Test:  acc =   10.00% | loss = 234.05%
Training epoch 3/3: 100%|██████████| 22/22 [00:03<00:00,  5.81it/s]
Testing epoch 3/3: 100%|██████████| 10/10 [00:00<00:00, 20.80it/s]
Exception in thread Thread-16:
Traceback (most recent call last):
  File "C:\Users\jacks\anaconda3\envs\ml-project\lib\site-packages\wandb\agents\pyagent.py", line 299, in _run_job
    wandb.finish()
  File "C:\Users\jacks\anaconda3\envs\ml-project\lib\site-packages\wandb\sdk\wandb_run.py", line 3565, in finish
    wandb.run.finish(exit_code=exit_code, quiet=quiet)
Train: acc =   11.00% | loss = 226.62%
Test:  acc =    9.33% | loss = 221.71%
  File "C:\Users\jacks\anaconda3\envs\ml-project\lib\site-packages\wandb\sdk\wandb_run.py", line 282, in wrapper
    return func(self, *args, **kwargs)
  File "C:\Users\jacks\anaconda3\envs\ml-project\lib\site-packages\wandb\sdk\wandb_run.py", line 245, in wrapper
    return func(self, *args, **kwargs)
  File "C:\Users\jacks\anaconda3\envs\ml-project\lib\site-packages\wandb\sdk\wandb_run.py", line 1713, in finish
    return self._finish(exit_code, quiet)
  File "C:\Users\jacks\anaconda3\envs\ml-project\lib\site-packages\wandb\sdk\wandb_run.py", line 1719, in _finish
    tel.feature.finish = True
  File "C:\Users\jacks\anaconda3\envs\ml-project\lib\site-packages\wandb\sdk\lib\telemetry.py", line 40, in __exit__
    self._run._telemetry_callback(self._obj)
  File "C:\Users\jacks\anaconda3\envs\ml-project\lib\site-packages\wandb\sdk\wandb_run.py", line 593, in _telemetry_callback
    self._telemetry_flush()
  File "C:\Users\jacks\anaconda3\envs\ml-project\lib\site-packages\wandb\sdk\wandb_run.py", line 604, in _telemetry_flush
    self._backend.interface._publish_telemetry(self._telemetry_obj)
  File "C:\Users\jacks\anaconda3\envs\ml-project\lib\site-packages\wandb\sdk\interface\interface_shared.py", line 78, in _publish_telemetry
    self._publish(rec)
  File "C:\Users\jacks\anaconda3\envs\ml-project\lib\site-packages\wandb\sdk\interface\interface_sock.py", line 51, in _publish
    self._sock_client.send_record_publish(record)
  File "C:\Users\jacks\anaconda3\envs\ml-project\lib\site-packages\wandb\sdk\lib\sock_client.py", line 221, in send_record_publish
    self.send_server_request(server_req)
  File "C:\Users\jacks\anaconda3\envs\ml-project\lib\site-packages\wandb\sdk\lib\sock_client.py", line 155, in send_server_request
    self._send_message(msg)
  File "C:\Users\jacks\anaconda3\envs\ml-project\lib\site-packages\wandb\sdk\lib\sock_client.py", line 152, in _send_message
    self._sendall_with_error_handle(header + data)
  File "C:\Users\jacks\anaconda3\envs\ml-project\lib\site-packages\wandb\sdk\lib\sock_client.py", line 130, in _sendall_with_error_handle
    sent = self._sock.send(data)
ConnectionResetError: [WinError 10054] An existing connection was forcibly closed by the remote host

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:\Users\jacks\anaconda3\envs\ml-project\lib\threading.py", line 980, in _bootstrap_inner
    self.run()
  File "C:\Users\jacks\anaconda3\envs\ml-project\lib\threading.py", line 917, in run
    self._target(*self._args, **self._kwargs)
  File "C:\Users\jacks\anaconda3\envs\ml-project\lib\site-packages\wandb\agents\pyagent.py", line 303, in _run_job
    wandb.finish(exit_code=1)
  File "C:\Users\jacks\anaconda3\envs\ml-project\lib\site-packages\wandb\sdk\wandb_run.py", line 3565, in finish
    wandb.run.finish(exit_code=exit_code, quiet=quiet)
  File "C:\Users\jacks\anaconda3\envs\ml-project\lib\site-packages\wandb\sdk\wandb_run.py", line 282, in wrapper
    return func(self, *args, **kwargs)
  File "C:\Users\jacks\anaconda3\envs\ml-project\lib\site-packages\wandb\sdk\wandb_run.py", line 245, in wrapper
    return func(self, *args, **kwargs)
  File "C:\Users\jacks\anaconda3\envs\ml-project\lib\site-packages\wandb\sdk\wandb_run.py", line 1713, in finish
    return self._finish(exit_code, quiet)
  File "C:\Users\jacks\anaconda3\envs\ml-project\lib\site-packages\wandb\sdk\wandb_run.py", line 1719, in _finish
    tel.feature.finish = True
  File "C:\Users\jacks\anaconda3\envs\ml-project\lib\site-packages\wandb\sdk\lib\telemetry.py", line 40, in __exit__
    self._run._telemetry_callback(self._obj)
  File "C:\Users\jacks\anaconda3\envs\ml-project\lib\site-packages\wandb\sdk\wandb_run.py", line 593, in _telemetry_callback
    self._telemetry_flush()
  File "C:\Users\jacks\anaconda3\envs\ml-project\lib\site-packages\wandb\sdk\wandb_run.py", line 604, in _telemetry_flush
    self._backend.interface._publish_telemetry(self._telemetry_obj)
  File "C:\Users\jacks\anaconda3\envs\ml-project\lib\site-packages\wandb\sdk\interface\interface_shared.py", line 78, in _publish_telemetry
    self._publish(rec)
  File "C:\Users\jacks\anaconda3\envs\ml-project\lib\site-packages\wandb\sdk\interface\interface_sock.py", line 51, in _publish
    self._sock_client.send_record_publish(record)
  File "C:\Users\jacks\anaconda3\envs\ml-project\lib\site-packages\wandb\sdk\lib\sock_client.py", line 221, in send_record_publish
    self.send_server_request(server_req)
  File "C:\Users\jacks\anaconda3\envs\ml-project\lib\site-packages\wandb\sdk\lib\sock_client.py", line 155, in send_server_request
    self._send_message(msg)
  File "C:\Users\jacks\anaconda3\envs\ml-project\lib\site-packages\wandb\sdk\lib\sock_client.py", line 152, in _send_message
    self._sendall_with_error_handle(header + data)
  File "C:\Users\jacks\anaconda3\envs\ml-project\lib\site-packages\wandb\sdk\lib\sock_client.py", line 130, in _sendall_with_error_handle
    sent = self._sock.send(data)
ConnectionResetError: [WinError 10054] An existing connection was forcibly closed by the remote host

Process finished with exit code 0

Hi Jackson, this might also be a proxy configuration issue if you are trying this from inside a fire-walled corporate network or are you connected to a VPN?

This also happened to me now and I do not have VPN/firewall. It happened when I changed the project name, used wandb.init(project_name = “new_project_name”) and since then every time I want to run wandb() I get the same error like above.

I see, can you give me the debug logs that are found in your wandb run directory when this occurs?

Hi Jackson, since we have not heard back from you we are going to close this request. If you would like to re-open the conversation, please let us know!

I’m receiving the same error repeatedly when I start a sweep.

The log files: 4.1 KB file on MEGA

Create sweep with ID: qs048o8w
Sweep URL: https://wandb.ai/maxw/lit-mnist/sweeps/qs048o8w
wandb: WARNING Calling wandb.login() after wandb.init() has no effect.
wandb: Waiting for W&B process to finish... (success).
wandb: Synced solar-yogurt-63: https://wandb.ai/maxw/lit-mnist/runs/vla9pdno
wandb: Synced 6 W&B file(s), 0 media file(s), 0 artifact file(s) and 0 other file(s)
wandb: Find logs at: .\wandb\run-20221118_121240-vla9pdno\logs
wandb: Agent Starting Run: 618an4iq with config:
wandb: 	batch_size: 16
wandb: 	dropout: 0.5
wandb: 	epochs: 5
wandb: 	lr: 0.06530954613403434
C:\Users\perry\miniconda3\envs\PaperReplicas\lib\site-packages\pytorch_lightning\loops\utilities.py:94: PossibleUserWarning: `max_epochs` was not set. Setting it to 1000 epochs. To train without an epoch limit, set `max_epochs=-1`.
  rank_zero_warn(
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]

  | Name | Type       | Params
------------------------------------
0 | net  | Sequential | 1.4 M 
------------------------------------
1.4 M     Trainable params
0         Non-trainable params
1.4 M     Total params
5.518     Total estimated model params size (MB)
Sanity Checking: 0it [00:00, ?it/s]C:\Users\perry\miniconda3\envs\PaperReplicas\lib\site-packages\pytorch_lightning\trainer\connectors\data_connector.py:224: PossibleUserWarning: The dataloader, val_dataloader 0, does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` (try 12 which is the number of cpus on this machine) in the `DataLoader` init to improve performance.
  rank_zero_warn(
Sanity Checking DataLoader 0:   0%|          | 0/2 [00:00<?, ?it/s]Exception in thread ChkStopThr:
Traceback (most recent call last):
  File "C:\Users\perry\miniconda3\envs\PaperReplicas\lib\threading.py", line 980, in _bootstrap_inner
    self.run()
  File "C:\Users\perry\miniconda3\envs\PaperReplicas\lib\threading.py", line 917, in run
    self._target(*self._args, **self._kwargs)
  File "C:\Users\perry\miniconda3\envs\PaperReplicas\lib\site-packages\wandb\sdk\wandb_run.py", line 200, in check_status
    status_response = self._interface.communicate_stop_status()
  File "C:\Users\perry\miniconda3\envs\PaperReplicas\lib\site-packages\wandb\sdk\interface\interface.py", line 128, in communicate_stop_status
    resp = self._communicate_stop_status(status)
  File "C:\Users\perry\miniconda3\envs\PaperReplicas\lib\site-packages\wandb\sdk\interface\interface_sock.py", line 69, in _communicate_stop_status
    data = super()._communicate_stop_status(status)
  File "C:\Users\perry\miniconda3\envs\PaperReplicas\lib\site-packages\wandb\sdk\interface\interface_shared.py", line 424, in _communicate_stop_status
    resp = self._communicate(req, local=True)
  File "C:\Users\perry\miniconda3\envs\PaperReplicas\lib\site-packages\wandb\sdk\interface\interface_shared.py", line 255, in _communicate
    return self._communicate_async(rec, local=local).get(timeout=timeout)
  File "C:\Users\perry\miniconda3\envs\PaperReplicas\lib\site-packages\wandb\sdk\interface\interface_sock.py", line 58, in _communicate_async
    future = self._router.send_and_receive(rec, local=local)
  File "C:\Users\perry\miniconda3\envs\PaperReplicas\lib\site-packages\wandb\sdk\interface\router.py", line 94, in send_and_receive
    self._send_message(rec)
  File "C:\Users\perry\miniconda3\envs\PaperReplicas\lib\site-packages\wandb\sdk\interface\router_sock.py", line 36, in _send_message
    self._sock_client.send_record_communicate(record)
  File "C:\Users\perry\miniconda3\envs\PaperReplicas\lib\site-packages\wandb\sdk\lib\sock_client.py", line 216, in send_record_communicate
Exception in thread NetStatThr:
Traceback (most recent call last):
  File "C:\Users\perry\miniconda3\envs\PaperReplicas\lib\threading.py", line 980, in _bootstrap_inner
    self.send_server_request(server_req)
  File "C:\Users\perry\miniconda3\envs\PaperReplicas\lib\site-packages\wandb\sdk\lib\sock_client.py", line 155, in send_server_request
    self.run()
  File "C:\Users\perry\miniconda3\envs\PaperReplicas\lib\threading.py", line 917, in run
    self._send_message(msg)
  File "C:\Users\perry\miniconda3\envs\PaperReplicas\lib\site-packages\wandb\sdk\lib\sock_client.py", line 152, in _send_message
    self._target(*self._args, **self._kwargs)
  File "C:\Users\perry\miniconda3\envs\PaperReplicas\lib\site-packages\wandb\sdk\wandb_run.py", line 182, in check_network_status
    self._sendall_with_error_handle(header + data)
  File "C:\Users\perry\miniconda3\envs\PaperReplicas\lib\site-packages\wandb\sdk\lib\sock_client.py", line 130, in _sendall_with_error_handle
    status_response = self._interface.communicate_network_status()
  File "C:\Users\perry\miniconda3\envs\PaperReplicas\lib\site-packages\wandb\sdk\interface\interface.py", line 139, in communicate_network_status
    sent = self._sock.send(data)
ConnectionResetError: [WinError 10054] An existing connection was forcibly closed by the remote host

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.