Hello, sort of new to wandb. I’m trying sweeps for the first time - I had no problem creating and running a sweep from the web UI, then copy-pasting the command to start an agent from bash. However, starting it using wandb.agent()
keeps giving me problems. It starts training, but I keep running into two problems:
- I keep getting the error below each time a new run starts - it seems to be originating from another thread created by wandb, so I’m not sure what to do about it. Also, the runs get logged in the sweep page, but none of them have the data I’ve logged (and each run has the “active” dot even once the script ends).
- I can’t figure out what wandb calls to make, and in what order, to get the agent to populate its randomized values into
wandb.config
. I would like to set some default config values (which are not specified by mysweep_config
dict), but have the sweep agent updatewandb.config
with the randomized values created from thesweep_config
dict (it seems like this is what happens when I run from bash). In the traceback below, you can see I printwandb.config
right before the model is trained, and it simply uses theconfig
dict I specified when callingwandb.init
(it is not updated/overwritten by the agent).
Below is the full traceback. Thanks in advance for any ideas.
C:\Users\jacks\anaconda3\envs\ml-project\python.exe C:\Users\jacks\ml-project\training_script.py
Using device: cuda
wandb: Currently logged in as: jacksth22. Use `wandb login --relogin` to force relogin
wandb: Tracking run with wandb version 0.13.3
wandb: Run data is saved locally in C:\Users\jacks\ml-project\wandb\run-20221004_162225-1aj4c6jt
wandb: Run `wandb offline` to turn off syncing.
wandb: Syncing run restful-dew-107
wandb: View project at https://wandb.ai/jacksth22/<project>
wandb: View run at https://wandb.ai/jacksth22/<project>/runs/1aj4c6jt
Loading data...done (elapsed=1.49s).
Converting data to tensors: 100%|██████████| 500/500 [00:03<00:00, 164.20it/s]
done (elapsed=3.05s).
Create sweep with ID: plq6uobc
Sweep URL: https://wandb.ai/jacksth22/uncategorized/sweeps/plq6uobc
=== Starting sweep agent ===
wandb: WARNING Calling wandb.login() after wandb.init() has no effect.
wandb: Waiting for W&B process to finish... (success).
wandb:
wandb: Synced restful-dew-107: https://wandb.ai/jacksth22/<project>/runs/1aj4c6jt
wandb: Synced 6 W&B file(s), 0 media file(s), 0 artifact file(s) and 0 other file(s)
wandb: Find logs at: .\wandb\run-20221004_162225-1aj4c6jt\logs
wandb: Agent Starting Run: 1bqu30r5 with config:
wandb: batch_size: 16
wandb: dense_depth: 2
wandb: dense_width: 64
wandb: depth: 6
wandb: dropout: 0.25778082906860794
wandb: epochs: 15
wandb: gradient_clipping: 1
wandb: heads: 4
wandb: lr: 0.0008418633888555167
wandb: lr_warmup: 1960
wandb: max_seq_len: 64
wandb: optimizer: AdamW
wandb: use_max_pool: False
Model created with 90,714 parameters.
test: wandb.config: {'epochs': 3, 'batch_size': 16, 'test_size': 0.3, 'lr': 5, 'lr_warmup': 10000, 'optimizer': 'SGD', 'use_max_pool': True, 'embedding_dimension': 24, 'max_sequence_length': 512, 'heads': 8, 'depth': 10, 'rng_seed': 1, 'gradient_clipping': 1.0, 'dense_width': 64, 'dense_depth': 3, 'dropout': 0.2, 'log_dir': 'ml/logs/2022-10-04_16-22-22', 'using_small_dataset': True}
Training epoch 1/3: 0%| | 0/22 [00:00<?, ?it/s]Exception in thread ChkStopThr:
Traceback (most recent call last):
File "C:\Users\jacks\anaconda3\envs\ml-project\lib\threading.py", line 980, in _bootstrap_inner
self.run()
File "C:\Users\jacks\anaconda3\envs\ml-project\lib\threading.py", line 917, in run
self._target(*self._args, **self._kwargs)
File "C:\Users\jacks\anaconda3\envs\ml-project\lib\site-packages\wandb\sdk\wandb_run.py", line 190, in check_status
status_response = self._interface.communicate_stop_status()
File "C:\Users\jacks\anaconda3\envs\ml-project\lib\site-packages\wandb\sdk\interface\interface.py", line 128, in communicate_stop_status
resp = self._communicate_stop_status(status)
File "C:\Users\jacks\anaconda3\envs\ml-project\lib\site-packages\wandb\sdk\interface\interface_sock.py", line 69, in _communicate_stop_status
data = super()._communicate_stop_status(status)
File "C:\Users\jacks\anaconda3\envs\ml-project\lib\site-packages\wandb\sdk\interface\interface_shared.py", line 399, in _communicate_stop_status
resp = self._communicate(req, local=True)
File "C:\Users\jacks\anaconda3\envs\ml-project\lib\site-packages\wandb\sdk\interface\interface_shared.py", line 230, in _communicate
return self._communicate_async(rec, local=local).get(timeout=timeout)
File "C:\Users\jacks\anaconda3\envs\ml-project\lib\site-packages\wandb\sdk\interface\interface_sock.py", line 58, in _communicate_async
future = self._router.send_and_receive(rec, local=local)
File "C:\Users\jacks\anaconda3\envs\ml-project\lib\site-packages\wandb\sdk\interface\router.py", line 94, in send_and_receive
self._send_message(rec)
File "C:\Users\jacks\anaconda3\envs\ml-project\lib\site-packages\wandb\sdk\interface\router_sock.py", line 35, in _send_message
self._sock_client.send_record_communicate(record)
File "C:\Users\jacks\anaconda3\envs\ml-project\lib\site-packages\wandb\sdk\lib\sock_client.py", line 216, in send_record_communicate
self.send_server_request(server_req)
File "C:\Users\jacks\anaconda3\envs\ml-project\lib\site-packages\wandb\sdk\lib\sock_client.py", line 155, in send_server_request
self._send_message(msg)
File "C:\Users\jacks\anaconda3\envs\ml-project\lib\site-packages\wandb\sdk\lib\sock_client.py", line 152, in _send_message
self._sendall_with_error_handle(header + data)
File "C:\Users\jacks\anaconda3\envs\ml-project\lib\site-packages\wandb\sdk\lib\sock_client.py", line 130, in _sendall_with_error_handle
sent = self._sock.send(data)
ConnectionResetError: [WinError 10054] An existing connection was forcibly closed by the remote host
Exception in thread NetStatThr:
Traceback (most recent call last):
File "C:\Users\jacks\anaconda3\envs\ml-project\lib\threading.py", line 980, in _bootstrap_inner
self.run()
File "C:\Users\jacks\anaconda3\envs\ml-project\lib\threading.py", line 917, in run
self._target(*self._args, **self._kwargs)
File "C:\Users\jacks\anaconda3\envs\ml-project\lib\site-packages\wandb\sdk\wandb_run.py", line 172, in check_network_status
status_response = self._interface.communicate_network_status()
File "C:\Users\jacks\anaconda3\envs\ml-project\lib\site-packages\wandb\sdk\interface\interface.py", line 139, in communicate_network_status
resp = self._communicate_network_status(status)
File "C:\Users\jacks\anaconda3\envs\ml-project\lib\site-packages\wandb\sdk\interface\interface_sock.py", line 82, in _communicate_network_status
data = super()._communicate_network_status(status)
File "C:\Users\jacks\anaconda3\envs\ml-project\lib\site-packages\wandb\sdk\interface\interface_shared.py", line 409, in _communicate_network_status
resp = self._communicate(req, local=True)
File "C:\Users\jacks\anaconda3\envs\ml-project\lib\site-packages\wandb\sdk\interface\interface_shared.py", line 230, in _communicate
return self._communicate_async(rec, local=local).get(timeout=timeout)
File "C:\Users\jacks\anaconda3\envs\ml-project\lib\site-packages\wandb\sdk\interface\interface_sock.py", line 58, in _communicate_async
future = self._router.send_and_receive(rec, local=local)
File "C:\Users\jacks\anaconda3\envs\ml-project\lib\site-packages\wandb\sdk\interface\router.py", line 94, in send_and_receive
self._send_message(rec)
File "C:\Users\jacks\anaconda3\envs\ml-project\lib\site-packages\wandb\sdk\interface\router_sock.py", line 35, in _send_message
self._sock_client.send_record_communicate(record)
File "C:\Users\jacks\anaconda3\envs\ml-project\lib\site-packages\wandb\sdk\lib\sock_client.py", line 216, in send_record_communicate
self.send_server_request(server_req)
File "C:\Users\jacks\anaconda3\envs\ml-project\lib\site-packages\wandb\sdk\lib\sock_client.py", line 155, in send_server_request
self._send_message(msg)
File "C:\Users\jacks\anaconda3\envs\ml-project\lib\site-packages\wandb\sdk\lib\sock_client.py", line 152, in _send_message
self._sendall_with_error_handle(header + data)
File "C:\Users\jacks\anaconda3\envs\ml-project\lib\site-packages\wandb\sdk\lib\sock_client.py", line 130, in _sendall_with_error_handle
sent = self._sock.send(data)
ConnectionResetError: [WinError 10054] An existing connection was forcibly closed by the remote host
Training epoch 1/3: 100%|██████████| 22/22 [00:06<00:00, 3.32it/s]
Testing epoch 1/3: 100%|██████████| 10/10 [00:00<00:00, 20.42it/s]
Training epoch 2/3: 0%| | 0/22 [00:00<?, ?it/s]Train: acc = 9.43% | loss = 260.72%
Test: acc = 16.67% | loss = 225.52%
Training epoch 2/3: 100%|██████████| 22/22 [00:03<00:00, 5.70it/s]
Testing epoch 2/3: 100%|██████████| 10/10 [00:00<00:00, 21.73it/s]
Training epoch 3/3: 0%| | 0/22 [00:00<?, ?it/s]Train: acc = 11.60% | loss = 221.38%
Test: acc = 13.33% | loss = 218.46%
Training epoch 3/3: 100%|██████████| 22/22 [00:03<00:00, 5.84it/s]
Testing epoch 3/3: 100%|██████████| 10/10 [00:00<00:00, 21.05it/s]
Train: acc = 12.40% | loss = 218.99%
Test: acc = 13.33% | loss = 235.97%
Exception in thread Thread-14:
Traceback (most recent call last):
File "C:\Users\jacks\anaconda3\envs\ml-project\lib\site-packages\wandb\agents\pyagent.py", line 299, in _run_job
wandb.finish()
File "C:\Users\jacks\anaconda3\envs\ml-project\lib\site-packages\wandb\sdk\wandb_run.py", line 3565, in finish
wandb.run.finish(exit_code=exit_code, quiet=quiet)
File "C:\Users\jacks\anaconda3\envs\ml-project\lib\site-packages\wandb\sdk\wandb_run.py", line 282, in wrapper
return func(self, *args, **kwargs)
File "C:\Users\jacks\anaconda3\envs\ml-project\lib\site-packages\wandb\sdk\wandb_run.py", line 245, in wrapper
return func(self, *args, **kwargs)
File "C:\Users\jacks\anaconda3\envs\ml-project\lib\site-packages\wandb\sdk\wandb_run.py", line 1713, in finish
return self._finish(exit_code, quiet)
File "C:\Users\jacks\anaconda3\envs\ml-project\lib\site-packages\wandb\sdk\wandb_run.py", line 1719, in _finish
tel.feature.finish = True
File "C:\Users\jacks\anaconda3\envs\ml-project\lib\site-packages\wandb\sdk\lib\telemetry.py", line 40, in __exit__
self._run._telemetry_callback(self._obj)
File "C:\Users\jacks\anaconda3\envs\ml-project\lib\site-packages\wandb\sdk\wandb_run.py", line 593, in _telemetry_callback
self._telemetry_flush()
File "C:\Users\jacks\anaconda3\envs\ml-project\lib\site-packages\wandb\sdk\wandb_run.py", line 604, in _telemetry_flush
self._backend.interface._publish_telemetry(self._telemetry_obj)
File "C:\Users\jacks\anaconda3\envs\ml-project\lib\site-packages\wandb\sdk\interface\interface_shared.py", line 78, in _publish_telemetry
self._publish(rec)
File "C:\Users\jacks\anaconda3\envs\ml-project\lib\site-packages\wandb\sdk\interface\interface_sock.py", line 51, in _publish
self._sock_client.send_record_publish(record)
File "C:\Users\jacks\anaconda3\envs\ml-project\lib\site-packages\wandb\sdk\lib\sock_client.py", line 221, in send_record_publish
self.send_server_request(server_req)
File "C:\Users\jacks\anaconda3\envs\ml-project\lib\site-packages\wandb\sdk\lib\sock_client.py", line 155, in send_server_request
self._send_message(msg)
File "C:\Users\jacks\anaconda3\envs\ml-project\lib\site-packages\wandb\sdk\lib\sock_client.py", line 152, in _send_message
self._sendall_with_error_handle(header + data)
File "C:\Users\jacks\anaconda3\envs\ml-project\lib\site-packages\wandb\sdk\lib\sock_client.py", line 130, in _sendall_with_error_handle
sent = self._sock.send(data)
ConnectionResetError: [WinError 10054] An existing connection was forcibly closed by the remote host
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:\Users\jacks\anaconda3\envs\ml-project\lib\threading.py", line 980, in _bootstrap_inner
self.run()
File "C:\Users\jacks\anaconda3\envs\ml-project\lib\threading.py", line 917, in run
self._target(*self._args, **self._kwargs)
File "C:\Users\jacks\anaconda3\envs\ml-project\lib\site-packages\wandb\agents\pyagent.py", line 303, in _run_job
wandb.finish(exit_code=1)
File "C:\Users\jacks\anaconda3\envs\ml-project\lib\site-packages\wandb\sdk\wandb_run.py", line 3565, in finish
wandb.run.finish(exit_code=exit_code, quiet=quiet)
File "C:\Users\jacks\anaconda3\envs\ml-project\lib\site-packages\wandb\sdk\wandb_run.py", line 282, in wrapper
return func(self, *args, **kwargs)
File "C:\Users\jacks\anaconda3\envs\ml-project\lib\site-packages\wandb\sdk\wandb_run.py", line 245, in wrapper
return func(self, *args, **kwargs)
File "C:\Users\jacks\anaconda3\envs\ml-project\lib\site-packages\wandb\sdk\wandb_run.py", line 1713, in finish
return self._finish(exit_code, quiet)
File "C:\Users\jacks\anaconda3\envs\ml-project\lib\site-packages\wandb\sdk\wandb_run.py", line 1719, in _finish
tel.feature.finish = True
File "C:\Users\jacks\anaconda3\envs\ml-project\lib\site-packages\wandb\sdk\lib\telemetry.py", line 40, in __exit__
self._run._telemetry_callback(self._obj)
File "C:\Users\jacks\anaconda3\envs\ml-project\lib\site-packages\wandb\sdk\wandb_run.py", line 593, in _telemetry_callback
self._telemetry_flush()
File "C:\Users\jacks\anaconda3\envs\ml-project\lib\site-packages\wandb\sdk\wandb_run.py", line 604, in _telemetry_flush
self._backend.interface._publish_telemetry(self._telemetry_obj)
File "C:\Users\jacks\anaconda3\envs\ml-project\lib\site-packages\wandb\sdk\interface\interface_shared.py", line 78, in _publish_telemetry
self._publish(rec)
File "C:\Users\jacks\anaconda3\envs\ml-project\lib\site-packages\wandb\sdk\interface\interface_sock.py", line 51, in _publish
self._sock_client.send_record_publish(record)
File "C:\Users\jacks\anaconda3\envs\ml-project\lib\site-packages\wandb\sdk\lib\sock_client.py", line 221, in send_record_publish
self.send_server_request(server_req)
File "C:\Users\jacks\anaconda3\envs\ml-project\lib\site-packages\wandb\sdk\lib\sock_client.py", line 155, in send_server_request
self._send_message(msg)
File "C:\Users\jacks\anaconda3\envs\ml-project\lib\site-packages\wandb\sdk\lib\sock_client.py", line 152, in _send_message
self._sendall_with_error_handle(header + data)
File "C:\Users\jacks\anaconda3\envs\ml-project\lib\site-packages\wandb\sdk\lib\sock_client.py", line 130, in _sendall_with_error_handle
sent = self._sock.send(data)
ConnectionResetError: [WinError 10054] An existing connection was forcibly closed by the remote host
wandb: Agent Starting Run: ly4ox06s with config:
wandb: batch_size: 16
wandb: dense_depth: 4
wandb: dense_width: 128
wandb: depth: 10
wandb: dropout: 0.3449039365016265
wandb: epochs: 15
wandb: gradient_clipping: 1
wandb: heads: 6
wandb: lr: 0.0007649760799562746
wandb: lr_warmup: 1057
wandb: max_seq_len: 64
wandb: optimizer: AdamW
wandb: use_max_pool: False
Model created with 90,714 parameters.
test: wandb.config: {'epochs': 3, 'batch_size': 16, 'test_size': 0.3, 'lr': 5, 'lr_warmup': 10000, 'optimizer': 'SGD', 'use_max_pool': True, 'embedding_dimension': 24, 'max_sequence_length': 512, 'heads': 8, 'depth': 10, 'rng_seed': 1, 'gradient_clipping': 1.0, 'dense_width': 64, 'dense_depth': 3, 'dropout': 0.2, 'log_dir': 'ml/logs/2022-10-04_16-22-22', 'using_small_dataset': True}
Training epoch 1/3: 100%|██████████| 22/22 [00:03<00:00, 5.86it/s]
Testing epoch 1/3: 100%|██████████| 10/10 [00:00<00:00, 21.62it/s]
Train: acc = 10.57% | loss = 247.77%
Test: acc = 10.00% | loss = 227.83%
Training epoch 2/3: 100%|██████████| 22/22 [00:03<00:00, 5.84it/s]
Testing epoch 2/3: 100%|██████████| 10/10 [00:00<00:00, 21.09it/s]
Training epoch 3/3: 0%| | 0/22 [00:00<?, ?it/s]Train: acc = 10.60% | loss = 236.37%
Test: acc = 10.00% | loss = 231.93%
Training epoch 3/3: 100%|██████████| 22/22 [00:03<00:00, 5.83it/s]
Testing epoch 3/3: 100%|██████████| 10/10 [00:00<00:00, 21.15it/s]
Exception in thread Thread-15:
Traceback (most recent call last):
File "C:\Users\jacks\anaconda3\envs\ml-project\lib\site-packages\wandb\agents\pyagent.py", line 299, in _run_job
wandb.finish()
File "C:\Users\jacks\anaconda3\envs\ml-project\lib\site-packages\wandb\sdk\wandb_run.py", line 3565, in finish
Train: acc = 10.40% | loss = 233.15%
Test: acc = 13.33% | loss = 230.50%
wandb.run.finish(exit_code=exit_code, quiet=quiet)
File "C:\Users\jacks\anaconda3\envs\ml-project\lib\site-packages\wandb\sdk\wandb_run.py", line 282, in wrapper
return func(self, *args, **kwargs)
File "C:\Users\jacks\anaconda3\envs\ml-project\lib\site-packages\wandb\sdk\wandb_run.py", line 245, in wrapper
return func(self, *args, **kwargs)
File "C:\Users\jacks\anaconda3\envs\ml-project\lib\site-packages\wandb\sdk\wandb_run.py", line 1713, in finish
return self._finish(exit_code, quiet)
File "C:\Users\jacks\anaconda3\envs\ml-project\lib\site-packages\wandb\sdk\wandb_run.py", line 1719, in _finish
tel.feature.finish = True
File "C:\Users\jacks\anaconda3\envs\ml-project\lib\site-packages\wandb\sdk\lib\telemetry.py", line 40, in __exit__
self._run._telemetry_callback(self._obj)
File "C:\Users\jacks\anaconda3\envs\ml-project\lib\site-packages\wandb\sdk\wandb_run.py", line 593, in _telemetry_callback
self._telemetry_flush()
File "C:\Users\jacks\anaconda3\envs\ml-project\lib\site-packages\wandb\sdk\wandb_run.py", line 604, in _telemetry_flush
self._backend.interface._publish_telemetry(self._telemetry_obj)
File "C:\Users\jacks\anaconda3\envs\ml-project\lib\site-packages\wandb\sdk\interface\interface_shared.py", line 78, in _publish_telemetry
self._publish(rec)
File "C:\Users\jacks\anaconda3\envs\ml-project\lib\site-packages\wandb\sdk\interface\interface_sock.py", line 51, in _publish
self._sock_client.send_record_publish(record)
File "C:\Users\jacks\anaconda3\envs\ml-project\lib\site-packages\wandb\sdk\lib\sock_client.py", line 221, in send_record_publish
self.send_server_request(server_req)
File "C:\Users\jacks\anaconda3\envs\ml-project\lib\site-packages\wandb\sdk\lib\sock_client.py", line 155, in send_server_request
self._send_message(msg)
File "C:\Users\jacks\anaconda3\envs\ml-project\lib\site-packages\wandb\sdk\lib\sock_client.py", line 152, in _send_message
self._sendall_with_error_handle(header + data)
File "C:\Users\jacks\anaconda3\envs\ml-project\lib\site-packages\wandb\sdk\lib\sock_client.py", line 130, in _sendall_with_error_handle
sent = self._sock.send(data)
ConnectionResetError: [WinError 10054] An existing connection was forcibly closed by the remote host
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:\Users\jacks\anaconda3\envs\ml-project\lib\threading.py", line 980, in _bootstrap_inner
self.run()
File "C:\Users\jacks\anaconda3\envs\ml-project\lib\threading.py", line 917, in run
self._target(*self._args, **self._kwargs)
File "C:\Users\jacks\anaconda3\envs\ml-project\lib\site-packages\wandb\agents\pyagent.py", line 303, in _run_job
wandb.finish(exit_code=1)
File "C:\Users\jacks\anaconda3\envs\ml-project\lib\site-packages\wandb\sdk\wandb_run.py", line 3565, in finish
wandb.run.finish(exit_code=exit_code, quiet=quiet)
File "C:\Users\jacks\anaconda3\envs\ml-project\lib\site-packages\wandb\sdk\wandb_run.py", line 282, in wrapper
return func(self, *args, **kwargs)
File "C:\Users\jacks\anaconda3\envs\ml-project\lib\site-packages\wandb\sdk\wandb_run.py", line 245, in wrapper
return func(self, *args, **kwargs)
File "C:\Users\jacks\anaconda3\envs\ml-project\lib\site-packages\wandb\sdk\wandb_run.py", line 1713, in finish
return self._finish(exit_code, quiet)
File "C:\Users\jacks\anaconda3\envs\ml-project\lib\site-packages\wandb\sdk\wandb_run.py", line 1719, in _finish
tel.feature.finish = True
File "C:\Users\jacks\anaconda3\envs\ml-project\lib\site-packages\wandb\sdk\lib\telemetry.py", line 40, in __exit__
self._run._telemetry_callback(self._obj)
File "C:\Users\jacks\anaconda3\envs\ml-project\lib\site-packages\wandb\sdk\wandb_run.py", line 593, in _telemetry_callback
self._telemetry_flush()
File "C:\Users\jacks\anaconda3\envs\ml-project\lib\site-packages\wandb\sdk\wandb_run.py", line 604, in _telemetry_flush
self._backend.interface._publish_telemetry(self._telemetry_obj)
File "C:\Users\jacks\anaconda3\envs\ml-project\lib\site-packages\wandb\sdk\interface\interface_shared.py", line 78, in _publish_telemetry
self._publish(rec)
File "C:\Users\jacks\anaconda3\envs\ml-project\lib\site-packages\wandb\sdk\interface\interface_sock.py", line 51, in _publish
self._sock_client.send_record_publish(record)
File "C:\Users\jacks\anaconda3\envs\ml-project\lib\site-packages\wandb\sdk\lib\sock_client.py", line 221, in send_record_publish
self.send_server_request(server_req)
File "C:\Users\jacks\anaconda3\envs\ml-project\lib\site-packages\wandb\sdk\lib\sock_client.py", line 155, in send_server_request
self._send_message(msg)
File "C:\Users\jacks\anaconda3\envs\ml-project\lib\site-packages\wandb\sdk\lib\sock_client.py", line 152, in _send_message
self._sendall_with_error_handle(header + data)
File "C:\Users\jacks\anaconda3\envs\ml-project\lib\site-packages\wandb\sdk\lib\sock_client.py", line 130, in _sendall_with_error_handle
sent = self._sock.send(data)
ConnectionResetError: [WinError 10054] An existing connection was forcibly closed by the remote host
wandb: Agent Starting Run: viajyjrk with config:
wandb: batch_size: 16
wandb: dense_depth: 4
wandb: dense_width: 128
wandb: depth: 8
wandb: dropout: 0.06200113164824134
wandb: epochs: 15
wandb: gradient_clipping: 1
wandb: heads: 6
wandb: lr: 0.0006929869499125819
wandb: lr_warmup: 2917
wandb: max_seq_len: 512
wandb: optimizer: SGD
wandb: use_max_pool: False
Training epoch 1/3: 0%| | 0/22 [00:00<?, ?it/s]Model created with 90,714 parameters.
test: wandb.config: {'epochs': 3, 'batch_size': 16, 'test_size': 0.3, 'lr': 5, 'lr_warmup': 10000, 'optimizer': 'SGD', 'use_max_pool': True, 'embedding_dimension': 24, 'max_sequence_length': 512, 'heads': 8, 'depth': 10, 'rng_seed': 1, 'gradient_clipping': 1.0, 'dense_width': 64, 'dense_depth': 3, 'dropout': 0.2, 'log_dir': 'ml/logs/2022-10-04_16-22-22', 'using_small_dataset': True}
Training epoch 1/3: 100%|██████████| 22/22 [00:03<00:00, 5.65it/s]
Testing epoch 1/3: 100%|██████████| 10/10 [00:00<00:00, 21.07it/s]
Train: acc = 9.43% | loss = 262.65%
Test: acc = 10.00% | loss = 221.72%
Training epoch 2/3: 100%|██████████| 22/22 [00:03<00:00, 5.86it/s]
Testing epoch 2/3: 100%|██████████| 10/10 [00:00<00:00, 21.02it/s]
Train: acc = 11.60% | loss = 236.80%
Test: acc = 10.00% | loss = 234.05%
Training epoch 3/3: 100%|██████████| 22/22 [00:03<00:00, 5.81it/s]
Testing epoch 3/3: 100%|██████████| 10/10 [00:00<00:00, 20.80it/s]
Exception in thread Thread-16:
Traceback (most recent call last):
File "C:\Users\jacks\anaconda3\envs\ml-project\lib\site-packages\wandb\agents\pyagent.py", line 299, in _run_job
wandb.finish()
File "C:\Users\jacks\anaconda3\envs\ml-project\lib\site-packages\wandb\sdk\wandb_run.py", line 3565, in finish
wandb.run.finish(exit_code=exit_code, quiet=quiet)
Train: acc = 11.00% | loss = 226.62%
Test: acc = 9.33% | loss = 221.71%
File "C:\Users\jacks\anaconda3\envs\ml-project\lib\site-packages\wandb\sdk\wandb_run.py", line 282, in wrapper
return func(self, *args, **kwargs)
File "C:\Users\jacks\anaconda3\envs\ml-project\lib\site-packages\wandb\sdk\wandb_run.py", line 245, in wrapper
return func(self, *args, **kwargs)
File "C:\Users\jacks\anaconda3\envs\ml-project\lib\site-packages\wandb\sdk\wandb_run.py", line 1713, in finish
return self._finish(exit_code, quiet)
File "C:\Users\jacks\anaconda3\envs\ml-project\lib\site-packages\wandb\sdk\wandb_run.py", line 1719, in _finish
tel.feature.finish = True
File "C:\Users\jacks\anaconda3\envs\ml-project\lib\site-packages\wandb\sdk\lib\telemetry.py", line 40, in __exit__
self._run._telemetry_callback(self._obj)
File "C:\Users\jacks\anaconda3\envs\ml-project\lib\site-packages\wandb\sdk\wandb_run.py", line 593, in _telemetry_callback
self._telemetry_flush()
File "C:\Users\jacks\anaconda3\envs\ml-project\lib\site-packages\wandb\sdk\wandb_run.py", line 604, in _telemetry_flush
self._backend.interface._publish_telemetry(self._telemetry_obj)
File "C:\Users\jacks\anaconda3\envs\ml-project\lib\site-packages\wandb\sdk\interface\interface_shared.py", line 78, in _publish_telemetry
self._publish(rec)
File "C:\Users\jacks\anaconda3\envs\ml-project\lib\site-packages\wandb\sdk\interface\interface_sock.py", line 51, in _publish
self._sock_client.send_record_publish(record)
File "C:\Users\jacks\anaconda3\envs\ml-project\lib\site-packages\wandb\sdk\lib\sock_client.py", line 221, in send_record_publish
self.send_server_request(server_req)
File "C:\Users\jacks\anaconda3\envs\ml-project\lib\site-packages\wandb\sdk\lib\sock_client.py", line 155, in send_server_request
self._send_message(msg)
File "C:\Users\jacks\anaconda3\envs\ml-project\lib\site-packages\wandb\sdk\lib\sock_client.py", line 152, in _send_message
self._sendall_with_error_handle(header + data)
File "C:\Users\jacks\anaconda3\envs\ml-project\lib\site-packages\wandb\sdk\lib\sock_client.py", line 130, in _sendall_with_error_handle
sent = self._sock.send(data)
ConnectionResetError: [WinError 10054] An existing connection was forcibly closed by the remote host
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:\Users\jacks\anaconda3\envs\ml-project\lib\threading.py", line 980, in _bootstrap_inner
self.run()
File "C:\Users\jacks\anaconda3\envs\ml-project\lib\threading.py", line 917, in run
self._target(*self._args, **self._kwargs)
File "C:\Users\jacks\anaconda3\envs\ml-project\lib\site-packages\wandb\agents\pyagent.py", line 303, in _run_job
wandb.finish(exit_code=1)
File "C:\Users\jacks\anaconda3\envs\ml-project\lib\site-packages\wandb\sdk\wandb_run.py", line 3565, in finish
wandb.run.finish(exit_code=exit_code, quiet=quiet)
File "C:\Users\jacks\anaconda3\envs\ml-project\lib\site-packages\wandb\sdk\wandb_run.py", line 282, in wrapper
return func(self, *args, **kwargs)
File "C:\Users\jacks\anaconda3\envs\ml-project\lib\site-packages\wandb\sdk\wandb_run.py", line 245, in wrapper
return func(self, *args, **kwargs)
File "C:\Users\jacks\anaconda3\envs\ml-project\lib\site-packages\wandb\sdk\wandb_run.py", line 1713, in finish
return self._finish(exit_code, quiet)
File "C:\Users\jacks\anaconda3\envs\ml-project\lib\site-packages\wandb\sdk\wandb_run.py", line 1719, in _finish
tel.feature.finish = True
File "C:\Users\jacks\anaconda3\envs\ml-project\lib\site-packages\wandb\sdk\lib\telemetry.py", line 40, in __exit__
self._run._telemetry_callback(self._obj)
File "C:\Users\jacks\anaconda3\envs\ml-project\lib\site-packages\wandb\sdk\wandb_run.py", line 593, in _telemetry_callback
self._telemetry_flush()
File "C:\Users\jacks\anaconda3\envs\ml-project\lib\site-packages\wandb\sdk\wandb_run.py", line 604, in _telemetry_flush
self._backend.interface._publish_telemetry(self._telemetry_obj)
File "C:\Users\jacks\anaconda3\envs\ml-project\lib\site-packages\wandb\sdk\interface\interface_shared.py", line 78, in _publish_telemetry
self._publish(rec)
File "C:\Users\jacks\anaconda3\envs\ml-project\lib\site-packages\wandb\sdk\interface\interface_sock.py", line 51, in _publish
self._sock_client.send_record_publish(record)
File "C:\Users\jacks\anaconda3\envs\ml-project\lib\site-packages\wandb\sdk\lib\sock_client.py", line 221, in send_record_publish
self.send_server_request(server_req)
File "C:\Users\jacks\anaconda3\envs\ml-project\lib\site-packages\wandb\sdk\lib\sock_client.py", line 155, in send_server_request
self._send_message(msg)
File "C:\Users\jacks\anaconda3\envs\ml-project\lib\site-packages\wandb\sdk\lib\sock_client.py", line 152, in _send_message
self._sendall_with_error_handle(header + data)
File "C:\Users\jacks\anaconda3\envs\ml-project\lib\site-packages\wandb\sdk\lib\sock_client.py", line 130, in _sendall_with_error_handle
sent = self._sock.send(data)
ConnectionResetError: [WinError 10054] An existing connection was forcibly closed by the remote host
Process finished with exit code 0