Server socket closed


ConnectionAbortedError: [WinError 10053] An established connection was aborted by the software in your host machine

BrokenPipeError: [Errno 32] Broken pipe

receiving a very long traceback error resulting in either of the above messages (windows/linux). Occurs when trying to run any sweep. Occurs when connecting from various machines/IP addresses

def train():

sweep_configuration = {
    'method' : 'grid',
    'name' : 'Sweep',
    'metric': {
        'goal': 'maximize',
        'name': 'AUC'
    'parameters': {
        'learn': {'values': [1, 0.1]}
sweep_id = wandb.sweep(sweep_configuration)
wandb.agent(sweep_id, function=train, count=1)
wandb: Agent Starting Run: gwiij466 with config:
wandb: 	learn: 1
Exception in thread Thread-9 (_run_job):
Traceback (most recent call last):
  File "\Python\Python310\lib\site-packages\wandb\agents\", line 299, in _run_job
  File "\Python\Python310\lib\site-packages\wandb\sdk\", line 3669, in finish, quiet=quiet)
  File "\Python\Python310\lib\site-packages\wandb\sdk\", line 368, in wrapper
    return func(self, *args, **kwargs)
  File "\Python\Python310\lib\site-packages\wandb\sdk\", line 331, in wrapper
    return func(self, *args, **kwargs)
  File "\Python\Python310\lib\site-packages\wandb\sdk\", line 1843, in finish
    return self._finish(exit_code, quiet)
  File "\Python\Python310\lib\site-packages\wandb\sdk\", line 1850, in _finish
    with telemetry.context(run=self) as tel:
  File "\Python\Python310\lib\site-packages\wandb\sdk\lib\", line 42, in __exit__
  File "\Python\Python310\lib\site-packages\wandb\sdk\", line 689, in _telemetry_callback
  File "\Python\Python310\lib\site-packages\wandb\sdk\", line 700, in _telemetry_flush
  File "\Python\Python310\lib\site-packages\wandb\sdk\interface\", line 101, in _publish_telemetry
  File "\Python\Python310\lib\site-packages\wandb\sdk\interface\", line 51, in _publish
  File "\Python\Python310\lib\site-packages\wandb\sdk\lib\", line 221, in send_record_publish
  File "\Python\Python310\lib\site-packages\wandb\sdk\lib\", line 155, in send_server_request
  File "\Python\Python310\lib\site-packages\wandb\sdk\lib\", line 152, in _send_message
    self._sendall_with_error_handle(header + data)
  File "\Python\Python310\lib\site-packages\wandb\sdk\lib\", line 130, in _sendall_with_error_handle
    sent = self._sock.send(data)
ConnectionAbortedError: [WinError 10053] An established connection was aborted by the software in your host machine

During handling of the above exception, another exception occurred:

ConnectionAbortedError                    Traceback (most recent call last)
File ~\AppData\Local\Programs\Python\Python310\lib\site-packages\backcall\, in callback_prototype.<locals>.adapt.<locals>.adapted(*args, **kwargs)
    102                 kwargs.pop(name)
    103 #            print(args, kwargs, unmatched_pos, cut_positional, unmatched_kw)
--> 104             return callback(*args, **kwargs)

File ~\AppData\Local\Programs\Python\Python310\lib\site-packages\wandb\sdk\, in _WandbInit._pause_backend(self)
    416 if self.backend.interface is not None:
    417"pausing backend")  # type: ignore
--> 418     self.backend.interface.publish_pause()

File ~\AppData\Local\Programs\Python\Python310\lib\site-packages\wandb\sdk\interface\, in InterfaceBase.publish_pause(self)
    663 def publish_pause(self) -> None:
    664     pause = pb.PauseRequest()
--> 665     self._publish_pause(pause)

File ~\AppData\Local\Programs\Python\Python310\lib\site-packages\wandb\sdk\interface\, in InterfaceShared._publish_pause(self, pause)
    338 def _publish_pause(self, pause: pb.PauseRequest) -> None:
    339     rec = self._make_request(pause=pause)
--> 340     self._publish(rec)

File ~\AppData\Local\Programs\Python\Python310\lib\site-packages\wandb\sdk\interface\, in InterfaceSock._publish(self, record, local)
     49 def _publish(self, record: "pb.Record", local: Optional[bool] = None) -> None:
     50     self._assign(record)
---> 51     self._sock_client.send_record_publish(record)

File ~\AppData\Local\Programs\Python\Python310\lib\site-packages\wandb\sdk\lib\, in SockClient.send_record_publish(self, record)
    219 server_req = spb.ServerRequest()
    220 server_req.record_publish.CopyFrom(record)
--> 221 self.send_server_request(server_req)

File ~\AppData\Local\Programs\Python\Python310\lib\site-packages\wandb\sdk\lib\, in SockClient.send_server_request(self, msg)
    154 def send_server_request(self, msg: Any) -> None:
--> 155     self._send_message(msg)

File ~\AppData\Local\Programs\Python\Python310\lib\site-packages\wandb\sdk\lib\, in SockClient._send_message(self, msg)
    150 header = struct.pack("<BI", ord("W"), raw_size)
    151 with self._lock:
--> 152     self._sendall_with_error_handle(header + data)

File ~\AppData\Local\Programs\Python\Python310\lib\site-packages\wandb\sdk\lib\, in SockClient._sendall_with_error_handle(self, data)
    128 start_time = time.monotonic()
    129 try:
--> 130     sent = self._sock.send(data)
    131     # sent equal to 0 indicates a closed socket
    132     if sent == 0:

ConnectionAbortedError: [WinError 10053] An established connection was aborted by the software in your host machine
wandb: Agent Starting Run: 4maabb7r with config:
wandb: 	learn: 0.1
Exception in thread Thread-10 (_run_job):
### same error continuing forwards

There seems to be some amount of minimal communication between the server and client, as the agent will run through multiple parameters in the sweep_configuration, but no useful data is sent back to the server and the wandb.config field is always empty in the client.

Hi @finnhad, I’m able to run your code so it may be related to your network environment. Are you trying to run your experiments to or have you setup a wandb server?

Also, are all of the machines you have tried on the same network?

Thank you,

Hi @finnhad, I just wanted to follow up and see if you were still seeing this issue? If so, could you let us know what your network infrastructure looks like and if you are using for logging?

Thank you,