Agent bug? File not found error

Hi I’m using kaggle with Pytorch and W&B

  • Weights and Biases version: 0.12.11
  • Python version: 3.7.12
    Description:
    When using the attached notebook I get the following error:
    [License-plate-w&b | Kaggle]
wandb: Agent Starting Run: 9uvr1lj3 with config:
wandb: 	batch_size: 64
wandb: 	dropout: 0.2
wandb: 	dropout_lstm: 0.1
wandb: 	epochs: 8
wandb: 	hidden_size: 32
wandb: 	linear_output: 64
wandb: 	models: PlateLUX_2GRU
wandb: 	optimizer: RMSprop
wandb: 	scheduler: ReduceLROnPlateau
wandb: Currently logged in as: wualas (use `wandb login --relogin` to force relogin)
Tracking run with wandb version 0.12.11
Run data is saved locally in /kaggle/working/wandb/run-20220318_082708-9uvr1lj3
Syncing run winter-sweep-1 to Weights & Biases (docs)
Sweep page: https://wandb.ai/wualas/pytorch-sweeps-rejestracje_last/sweeps/7ioy5yu1

Waiting for W&B process to finish... (failed 1). Press Control-C to abort syncing.
Synced winter-sweep-1: https://wandb.ai/wualas/pytorch-sweeps-rejestracje_last/runs/9uvr1lj3
Synced 4 W&B file(s), 0 media file(s), 0 artifact file(s) and 0 other file(s)
Find logs at: ./wandb/run-20220318_082708-9uvr1lj3/logs
Run 9uvr1lj3 errored: FileNotFoundError(2, 'No such file or directory')
wandb: ERROR Run 9uvr1lj3 errored: FileNotFoundError(2, 'No such file or directory')

train_function:

def train(config=None):
    with wandb.init(config=config):
        config = wandb.config
        df = pd.read_csv('/content/OCRdataset/annotations_CRNN.csv')
        df['filename'] = '/content/OCRdataset/images/' + df['filename'].astype(str)
        image_files = df['filename'].tolist()
        targets_orig = df['label'].tolist()
        targets = [[c for c in x] for x in targets_orig]
        targets_flat = [c for clist in targets for c in clist]

        lbl_enc = preprocessing.LabelEncoder()
        lbl_enc.fit(targets_flat)
        targets_enc = [lbl_enc.transform(x) for x in targets]
        targets_enc = np.array(targets_enc)
        targets_enc = targets_enc + 1

        (
        train_imgs,
        test_imgs,
        train_targets,
        test_targets,
        _,
        test_targets_orig,
        ) = model_selection.train_test_split(
        image_files, targets_enc, targets_orig, test_size=0.1, random_state=42
        )
        num_chars=len(lbl_enc.classes_)
        train_loader, test_loader = build_loader(train_imgs, train_targets, test_imgs, test_targets, config.batch_size)
        model = build_network(config.models, num_chars, config.linear_output, config.hidden_size, config.dropout, config.dropout_lstm)
        optimizer = build_optimizer(model, config.optimizer)
        scheduler = build_scheduler(optimizer, config.scheduler)

        train_loss_tab = []
        test_loss_tab = []
        accuracy_tab = []
        best_test_loss = 100000000000000
        for epoch in range(config.epochs):
            train_loss = train_fn(model, train_loader, optimizer)
            valid_preds, test_loss = eval_fn(model, test_loader)
            valid_captcha_preds = []
            for vp in valid_preds:
                current_preds = decode_predictions(vp, lbl_enc)
                valid_captcha_preds.extend(current_preds)
            combined = list(zip(test_targets_orig, valid_captcha_preds))
            print(combined)
            test_dup_rem = [remove_duplicates(c) for c in test_targets_orig]
            accuracy = metrics.accuracy_score(test_dup_rem, valid_captcha_preds)
            print(
              f"Epoch={epoch}, Train Loss={train_loss}, Test Loss={test_loss} Accuracy={accuracy}"
            )
            exloss = calculate_EXACTloss(combined)
            scheduler.step(test_loss)
            #torch.save(model.state_dict(), "/content/epoch_save/EPOCH_SAVER_CRNN_state_dict3{}.pt".format(epoch))
            # dopisac zapisywanie kazdego modelu
            train_loss_tab.append(train_loss)
            test_loss_tab.append(test_loss)
            accuracy_tab.append(accuracy)
            print("zapisuje")
            wandb.log({'epoch': epoch, 'loss_test': test_loss, 'loss_train': train_loss, 'accuracy': accuracy, 'EXACTacc' : exloss})

Is there an error how I’m using wandb or is this a bug?

Hi Wojtek,

The error you are getting 'No such file or directory' is appearing because Run 9uvr1lj3 is not available. The only sweep available in your dashboard is 7ioy5yu1 . If you look at blocks 41 and 42 within your notebook the values for what you put into you wandb.agent and your sweep_id don’t align properly. This is because when you’re calling wandb.agent(), a new agent is being created compared to using the previously generated sweep id from line 41. Can you explicitly state your variables in your wandb.agent function: wandb.agent(sweep_id, function = train), count = 1) and see if this helps? If not, can you hard code 7ioy5yu1 as your sweep id?

Warmly,
Leslie

1 Like

Hi Leslie!
Thanks for your response ,


It seems that it doesn’t work.

Hi again, I tried going to the project page you have on your image and it looks like you deleted it, so I wasn’t able to get any information from there. However, I still can get some information from your image. It looks like your sweep page is getting created from the image that you had sent and I have a few questions.

  • Is the sweep id show up on personal cloud?
  • If so, can you provide debug logs from your wandb run directory (debug.log and debug-internal.log) so we can see if the config is populating properly?
  • If you remove wandb, does your code run properly?

Hi Wojtek,

We wanted to follow up with you regarding your support request as we have not heard back from you. Please let us know if we can be of further assistance or if your issue has been resolved.

Best,
Weights & Biases

Hi, thanks for response again

  1. Yes, i can see it

Debug log

2022-03-29 21:20:04,247 INFO    Thread-51 :73 [wandb_setup.py:_flush():75] Loading settings from /root/.config/wandb/settings
2022-03-29 21:20:04,247 INFO    Thread-51 :73 [wandb_setup.py:_flush():75] Loading settings from wandb/settings
2022-03-29 21:20:04,248 INFO    Thread-51 :73 [wandb_setup.py:_flush():75] Loading settings from environment variables: {'project': 'pytorch-sweeps-License-Plate-last-gpu', 'entity': 'wualas', 'root_dir': '/content', 'run_id': '1a183de4', 'sweep_param_path': '/content/wandb/sweep-mhru0h2m/config-1a183de4.yaml', 'sweep_id': 'mhru0h2m'}
2022-03-29 21:20:04,248 INFO    Thread-51 :73 [wandb_setup.py:_flush():75] Inferring run settings from compute environment: {'program': '<python with no main file>'}
2022-03-29 21:20:04,248 INFO    Thread-51 :73 [wandb_init.py:_log_setup():405] Logging user logs to /content/wandb/run-20220329_212004-1a183de4/logs/debug.log
2022-03-29 21:20:04,248 INFO    Thread-51 :73 [wandb_init.py:_log_setup():406] Logging internal logs to /content/wandb/run-20220329_212004-1a183de4/logs/debug-internal.log
2022-03-29 21:20:04,248 INFO    Thread-51 :73 [wandb_init.py:_jupyter_setup():355] configuring jupyter hooks <wandb.sdk.wandb_init._WandbInit object at 0x7fdd03d848d0>
2022-03-29 21:20:04,248 INFO    Thread-51 :73 [wandb_init.py:init():439] calling init triggers
2022-03-29 21:20:04,248 INFO    Thread-51 :73 [wandb_init.py:init():443] wandb.init called with sweep_config: {'batch_size': 16, 'dropout': 0.2, 'dropout_lstm': 0.25, 'epochs': 5, 'hidden_size': 64, 'linear_output': 128, 'models': 'PlateLUX_2GRU', 'optimizer': 'adam', 'scheduler': 'ExponentialLR'}
config: {}
2022-03-29 21:20:04,248 INFO    Thread-51 :73 [wandb_init.py:init():492] starting backend
2022-03-29 21:20:04,248 INFO    Thread-51 :73 [backend.py:_multiprocessing_setup():101] multiprocessing start_methods=fork,spawn,forkserver, using: spawn
2022-03-29 21:20:04,253 INFO    Thread-51 :73 [backend.py:ensure_launched():219] starting backend process...
2022-03-29 21:20:04,276 INFO    Thread-51 :73 [backend.py:ensure_launched():225] started backend process with pid: 636
2022-03-29 21:20:04,278 INFO    Thread-51 :73 [wandb_init.py:init():501] backend started and connected
2022-03-29 21:20:04,347 INFO    Thread-51 :73 [wandb_run.py:_config_callback():992] config_cb None None {'batch_size': 16, 'dropout': 0.2, 'dropout_lstm': 0.25, 'epochs': 5, 'hidden_size': 64, 'linear_output': 128, 'models': 'PlateLUX_2GRU', 'optimizer': 'adam', 'scheduler': 'ExponentialLR'}
2022-03-29 21:20:04,350 INFO    Thread-51 :73 [wandb_run.py:_label_probe_notebook():947] probe notebook
2022-03-29 21:20:09,374 INFO    Thread-51 :73 [wandb_run.py:_label_probe_notebook():957] Unable to probe notebook: 'NoneType' object has no attribute 'get'
2022-03-29 21:20:09,374 INFO    Thread-51 :73 [wandb_init.py:init():565] updated telemetry
2022-03-29 21:20:09,378 INFO    Thread-51 :73 [wandb_init.py:init():596] communicating run to backend with 30 second timeout
2022-03-29 21:20:09,475 INFO    Thread-51 :73 [wandb_run.py:_on_init():1759] communicating current version
2022-03-29 21:20:09,542 INFO    Thread-51 :73 [wandb_run.py:_on_init():1763] got version response 
2022-03-29 21:20:09,542 INFO    Thread-51 :73 [wandb_init.py:init():625] starting run threads in backend
2022-03-29 21:20:12,495 INFO    Thread-51 :73 [wandb_run.py:_console_start():1733] atexit reg
2022-03-29 21:20:12,495 INFO    Thread-51 :73 [wandb_run.py:_redirect():1606] redirect: SettingsConsole.WRAP
2022-03-29 21:20:12,495 INFO    Thread-51 :73 [wandb_run.py:_redirect():1643] Wrapping output streams.
2022-03-29 21:20:12,497 INFO    Thread-51 :73 [wandb_run.py:_redirect():1667] Redirects installed.
2022-03-29 21:20:12,497 INFO    Thread-51 :73 [wandb_init.py:init():664] run started, returning control to user process
2022-03-29 21:22:09,715 INFO    Thread-51 :73 [wandb_run.py:finish():1539] finishing run wualas/pytorch-sweeps-License-Plate-last-gpu/1a183de4
2022-03-29 21:22:09,715 INFO    Thread-51 :73 [jupyter.py:save_history():429] not saving jupyter history
2022-03-29 21:22:09,716 INFO    Thread-51 :73 [jupyter.py:save_ipynb():374] not saving jupyter notebook
2022-03-29 21:22:09,716 INFO    Thread-51 :73 [wandb_init.py:_jupyter_teardown():337] cleaning up jupyter logic
2022-03-29 21:22:09,716 INFO    Thread-51 :73 [wandb_run.py:_atexit_cleanup():1702] got exitcode: 1
2022-03-29 21:22:09,718 INFO    Thread-51 :73 [wandb_run.py:_restore():1674] restore
2022-03-29 21:22:10,710 INFO    Thread-51 :73 [wandb_run.py:_on_finish():1831] got exit ret: file_counts {
  wandb_count: 1
}
pusher_stats {
  uploaded_bytes: 664
  total_bytes: 664
}

2022-03-29 21:22:11,553 INFO    Thread-51 :73 [wandb_run.py:_on_finish():1831] got exit ret: file_counts {
  wandb_count: 4
}
pusher_stats {
  uploaded_bytes: 664
  total_bytes: 8575
}

2022-03-29 21:22:11,657 INFO    Thread-51 :73 [wandb_run.py:_on_finish():1831] got exit ret: file_counts {
  wandb_count: 5
}
pusher_stats {
  uploaded_bytes: 664
  total_bytes: 30246
}

2022-03-29 21:22:11,760 INFO    Thread-51 :73 [wandb_run.py:_on_finish():1831] got exit ret: file_counts {
  wandb_count: 5
}
pusher_stats {
  uploaded_bytes: 30246
  total_bytes: 30246
}

2022-03-29 21:22:11,863 INFO    Thread-51 :73 [wandb_run.py:_on_finish():1831] got exit ret: file_counts {
  wandb_count: 5
}
pusher_stats {
  uploaded_bytes: 30246
  total_bytes: 30246
}

2022-03-29 21:22:11,965 INFO    Thread-51 :73 [wandb_run.py:_on_finish():1831] got exit ret: file_counts {
  wandb_count: 5
}
pusher_stats {
  uploaded_bytes: 30246
  total_bytes: 30246
}

2022-03-29 21:22:12,066 INFO    Thread-51 :73 [wandb_run.py:_on_finish():1831] got exit ret: file_counts {
  wandb_count: 5
}
pusher_stats {
  uploaded_bytes: 30246
  total_bytes: 30246
}

2022-03-29 21:22:12,214 INFO    Thread-51 :73 [wandb_run.py:_on_finish():1831] got exit ret: file_counts {
  wandb_count: 5
}
pusher_stats {
  uploaded_bytes: 30246
  total_bytes: 30246
}

2022-03-29 21:22:12,364 INFO    Thread-51 :73 [wandb_run.py:_on_finish():1831] got exit ret: done: true
exit_result {
}
file_counts {
  wandb_count: 5
}
pusher_stats {
  uploaded_bytes: 30246
  total_bytes: 30246
}
local_info {
}

2022-03-29 21:22:13,802 INFO    Thread-51 :73 [wandb_run.py:_footer_history_summary_info():2865] rendering history
2022-03-29 21:22:13,803 INFO    Thread-51 :73 [wandb_run.py:_footer_history_summary_info():2894] rendering summary
2022-03-29 21:22:13,805 INFO    Thread-51 :73 [wandb_run.py:_footer_sync_info():2822] logging synced files

Debug internal log

2022-03-29 21:20:05,321 INFO    MainThread:636 [internal.py:wandb_internal():95] W&B internal server running at pid: 636, started at: 2022-03-29 21:20:05.321477
2022-03-29 21:20:09,376 INFO    WriterThread:636 [datastore.py:open_for_write():77] open: /content/wandb/run-20220329_212004-1a183de4/run-1a183de4.wandb
2022-03-29 21:20:09,377 DEBUG   SenderThread:636 [sender.py:send():235] send: header
2022-03-29 21:20:09,379 DEBUG   SenderThread:636 [sender.py:send():235] send: run
2022-03-29 21:20:09,476 DEBUG   HandlerThread:636 [handler.py:handle_request():141] handle_request: check_version
2022-03-29 21:20:09,477 INFO    SenderThread:636 [dir_watcher.py:__init__():169] watching files in: /content/wandb/run-20220329_212004-1a183de4/files
2022-03-29 21:20:09,477 INFO    SenderThread:636 [sender.py:_start_run_threads():815] run started: 1a183de4 with start time 1648588804
2022-03-29 21:20:09,477 DEBUG   SenderThread:636 [sender.py:send():235] send: summary
2022-03-29 21:20:09,478 INFO    SenderThread:636 [sender.py:_save_file():947] saving file wandb-summary.json with policy end
2022-03-29 21:20:09,478 DEBUG   SenderThread:636 [sender.py:send_request():249] send_request: check_version
2022-03-29 21:20:09,543 DEBUG   HandlerThread:636 [handler.py:handle_request():141] handle_request: run_start
2022-03-29 21:20:10,481 INFO    Thread-7  :636 [dir_watcher.py:_on_file_created():217] file/dir created: /content/wandb/run-20220329_212004-1a183de4/files/wandb-summary.json
2022-03-29 21:20:12,462 DEBUG   HandlerThread:636 [meta.py:__init__():37] meta init
2022-03-29 21:20:12,462 DEBUG   HandlerThread:636 [meta.py:__init__():51] meta init done
2022-03-29 21:20:12,463 DEBUG   HandlerThread:636 [meta.py:probe():211] probe
2022-03-29 21:20:12,469 DEBUG   HandlerThread:636 [git.py:repo():33] git repository is invalid
2022-03-29 21:20:12,469 DEBUG   HandlerThread:636 [meta.py:_save_pip():55] save pip
2022-03-29 21:20:12,470 DEBUG   HandlerThread:636 [meta.py:_save_pip():69] save pip done
2022-03-29 21:20:12,470 DEBUG   HandlerThread:636 [meta.py:probe():249] probe done
2022-03-29 21:20:12,476 DEBUG   SenderThread:636 [sender.py:send():235] send: files
2022-03-29 21:20:12,477 INFO    SenderThread:636 [sender.py:_save_file():947] saving file wandb-metadata.json with policy now
2022-03-29 21:20:12,489 INFO    Thread-7  :636 [dir_watcher.py:_on_file_created():217] file/dir created: /content/wandb/run-20220329_212004-1a183de4/files/wandb-metadata.json
2022-03-29 21:20:12,490 INFO    Thread-7  :636 [dir_watcher.py:_on_file_created():217] file/dir created: /content/wandb/run-20220329_212004-1a183de4/files/requirements.txt
2022-03-29 21:20:12,499 DEBUG   HandlerThread:636 [handler.py:handle_request():141] handle_request: stop_status
2022-03-29 21:20:12,500 DEBUG   SenderThread:636 [sender.py:send_request():249] send_request: stop_status
2022-03-29 21:20:12,565 DEBUG   SenderThread:636 [sender.py:send():235] send: telemetry
2022-03-29 21:20:13,187 INFO    Thread-11 :636 [upload_job.py:push():137] Uploaded file /tmp/tmpb74a7agqwandb/16hbjkye-wandb-metadata.json
2022-03-29 21:20:13,490 INFO    Thread-7  :636 [dir_watcher.py:_on_file_created():217] file/dir created: /content/wandb/run-20220329_212004-1a183de4/files/output.log
2022-03-29 21:20:27,567 DEBUG   HandlerThread:636 [handler.py:handle_request():141] handle_request: stop_status
2022-03-29 21:20:27,568 DEBUG   SenderThread:636 [sender.py:send_request():249] send_request: stop_status
2022-03-29 21:20:36,500 INFO    Thread-7  :636 [dir_watcher.py:_on_file_modified():230] file/dir modified: /content/wandb/run-20220329_212004-1a183de4/files/config.yaml
2022-03-29 21:20:40,609 DEBUG   SenderThread:636 [sender.py:send():235] send: stats
2022-03-29 21:20:42,637 DEBUG   HandlerThread:636 [handler.py:handle_request():141] handle_request: stop_status
2022-03-29 21:20:42,637 DEBUG   SenderThread:636 [sender.py:send_request():249] send_request: stop_status
2022-03-29 21:20:58,606 DEBUG   HandlerThread:636 [handler.py:handle_request():141] handle_request: stop_status
2022-03-29 21:20:58,607 DEBUG   SenderThread:636 [sender.py:send_request():249] send_request: stop_status
2022-03-29 21:21:10,767 DEBUG   SenderThread:636 [sender.py:send():235] send: stats
2022-03-29 21:21:13,656 DEBUG   HandlerThread:636 [handler.py:handle_request():141] handle_request: stop_status
2022-03-29 21:21:13,656 DEBUG   SenderThread:636 [sender.py:send_request():249] send_request: stop_status
2022-03-29 21:21:28,721 DEBUG   HandlerThread:636 [handler.py:handle_request():141] handle_request: stop_status
2022-03-29 21:21:28,721 DEBUG   SenderThread:636 [sender.py:send_request():249] send_request: stop_status
2022-03-29 21:21:40,894 DEBUG   SenderThread:636 [sender.py:send():235] send: stats
2022-03-29 21:21:43,774 DEBUG   HandlerThread:636 [handler.py:handle_request():141] handle_request: stop_status
2022-03-29 21:21:43,775 DEBUG   SenderThread:636 [sender.py:send_request():249] send_request: stop_status
2022-03-29 21:21:58,823 DEBUG   HandlerThread:636 [handler.py:handle_request():141] handle_request: stop_status
2022-03-29 21:21:58,823 DEBUG   SenderThread:636 [sender.py:send_request():249] send_request: stop_status
2022-03-29 21:22:09,716 DEBUG   SenderThread:636 [sender.py:send():235] send: telemetry
2022-03-29 21:22:09,719 DEBUG   HandlerThread:636 [handler.py:handle_request():141] handle_request: partial_history
2022-03-29 21:22:09,719 DEBUG   SenderThread:636 [sender.py:send():235] send: telemetry
2022-03-29 21:22:10,541 INFO    Thread-7  :636 [dir_watcher.py:_on_file_modified():230] file/dir modified: /content/wandb/run-20220329_212004-1a183de4/files/output.log
2022-03-29 21:22:10,624 DEBUG   SenderThread:636 [sender.py:send():235] send: exit
2022-03-29 21:22:10,624 INFO    SenderThread:636 [sender.py:send_exit():371] handling exit code: 1
2022-03-29 21:22:10,625 INFO    SenderThread:636 [sender.py:send_exit():373] handling runtime: 121
2022-03-29 21:22:10,625 INFO    SenderThread:636 [sender.py:_save_file():947] saving file wandb-summary.json with policy end
2022-03-29 21:22:10,625 INFO    SenderThread:636 [sender.py:send_exit():379] send defer
2022-03-29 21:22:10,626 DEBUG   HandlerThread:636 [handler.py:handle_request():141] handle_request: defer
2022-03-29 21:22:10,626 INFO    HandlerThread:636 [handler.py:handle_request_defer():164] handle defer: 0
2022-03-29 21:22:10,626 DEBUG   SenderThread:636 [sender.py:send_request():249] send_request: defer
2022-03-29 21:22:10,626 INFO    SenderThread:636 [sender.py:send_request_defer():388] handle sender defer: 0
2022-03-29 21:22:10,626 INFO    SenderThread:636 [sender.py:transition_state():392] send defer: 1
2022-03-29 21:22:10,627 DEBUG   HandlerThread:636 [handler.py:handle_request():141] handle_request: defer
2022-03-29 21:22:10,627 INFO    HandlerThread:636 [handler.py:handle_request_defer():164] handle defer: 1
2022-03-29 21:22:10,708 DEBUG   HandlerThread:636 [handler.py:handle_request():141] handle_request: poll_exit
2022-03-29 21:22:10,708 DEBUG   SenderThread:636 [sender.py:send_request():249] send_request: defer
2022-03-29 21:22:10,708 INFO    SenderThread:636 [sender.py:send_request_defer():388] handle sender defer: 1
2022-03-29 21:22:10,708 INFO    SenderThread:636 [sender.py:transition_state():392] send defer: 2
2022-03-29 21:22:10,709 DEBUG   SenderThread:636 [sender.py:send_request():249] send_request: poll_exit
2022-03-29 21:22:10,709 DEBUG   SenderThread:636 [sender.py:send():235] send: stats
2022-03-29 21:22:10,710 DEBUG   HandlerThread:636 [handler.py:handle_request():141] handle_request: defer
2022-03-29 21:22:10,710 INFO    HandlerThread:636 [handler.py:handle_request_defer():164] handle defer: 2
2022-03-29 21:22:10,710 DEBUG   SenderThread:636 [sender.py:send_request():249] send_request: defer
2022-03-29 21:22:10,710 INFO    SenderThread:636 [sender.py:send_request_defer():388] handle sender defer: 2
2022-03-29 21:22:10,710 INFO    SenderThread:636 [sender.py:transition_state():392] send defer: 3
2022-03-29 21:22:10,710 DEBUG   HandlerThread:636 [handler.py:handle_request():141] handle_request: defer
2022-03-29 21:22:10,711 INFO    HandlerThread:636 [handler.py:handle_request_defer():164] handle defer: 3
2022-03-29 21:22:10,711 DEBUG   SenderThread:636 [sender.py:send_request():249] send_request: defer
2022-03-29 21:22:10,711 INFO    SenderThread:636 [sender.py:send_request_defer():388] handle sender defer: 3
2022-03-29 21:22:10,711 INFO    SenderThread:636 [sender.py:transition_state():392] send defer: 4
2022-03-29 21:22:10,711 DEBUG   HandlerThread:636 [handler.py:handle_request():141] handle_request: defer
2022-03-29 21:22:10,711 INFO    HandlerThread:636 [handler.py:handle_request_defer():164] handle defer: 4
2022-03-29 21:22:10,712 DEBUG   SenderThread:636 [sender.py:send():235] send: summary
2022-03-29 21:22:10,712 INFO    SenderThread:636 [sender.py:_save_file():947] saving file wandb-summary.json with policy end
2022-03-29 21:22:10,712 DEBUG   SenderThread:636 [sender.py:send_request():249] send_request: defer
2022-03-29 21:22:10,713 INFO    SenderThread:636 [sender.py:send_request_defer():388] handle sender defer: 4
2022-03-29 21:22:10,713 INFO    SenderThread:636 [sender.py:transition_state():392] send defer: 5
2022-03-29 21:22:10,713 DEBUG   HandlerThread:636 [handler.py:handle_request():141] handle_request: defer
2022-03-29 21:22:10,713 INFO    HandlerThread:636 [handler.py:handle_request_defer():164] handle defer: 5
2022-03-29 21:22:10,713 DEBUG   SenderThread:636 [sender.py:send_request():249] send_request: defer
2022-03-29 21:22:10,713 INFO    SenderThread:636 [sender.py:send_request_defer():388] handle sender defer: 5
2022-03-29 21:22:10,796 INFO    SenderThread:636 [sender.py:transition_state():392] send defer: 6
2022-03-29 21:22:10,796 DEBUG   HandlerThread:636 [handler.py:handle_request():141] handle_request: defer
2022-03-29 21:22:10,796 INFO    HandlerThread:636 [handler.py:handle_request_defer():164] handle defer: 6
2022-03-29 21:22:10,797 DEBUG   SenderThread:636 [sender.py:send_request():249] send_request: defer
2022-03-29 21:22:10,797 INFO    SenderThread:636 [sender.py:send_request_defer():388] handle sender defer: 6
2022-03-29 21:22:10,797 INFO    SenderThread:636 [dir_watcher.py:finish():283] shutting down directory watcher
2022-03-29 21:22:10,838 DEBUG   HandlerThread:636 [handler.py:handle_request():141] handle_request: poll_exit
2022-03-29 21:22:11,542 INFO    Thread-7  :636 [dir_watcher.py:_on_file_modified():230] file/dir modified: /content/wandb/run-20220329_212004-1a183de4/files/output.log
2022-03-29 21:22:11,542 INFO    SenderThread:636 [dir_watcher.py:_on_file_modified():230] file/dir modified: /content/wandb/run-20220329_212004-1a183de4/files/wandb-summary.json
2022-03-29 21:22:11,543 INFO    SenderThread:636 [dir_watcher.py:_on_file_modified():230] file/dir modified: /content/wandb/run-20220329_212004-1a183de4/files/config.yaml
2022-03-29 21:22:11,543 INFO    SenderThread:636 [dir_watcher.py:finish():313] scan: /content/wandb/run-20220329_212004-1a183de4/files
2022-03-29 21:22:11,543 INFO    SenderThread:636 [dir_watcher.py:finish():327] scan save: /content/wandb/run-20220329_212004-1a183de4/files/wandb-metadata.json wandb-metadata.json
2022-03-29 21:22:11,543 INFO    SenderThread:636 [dir_watcher.py:finish():327] scan save: /content/wandb/run-20220329_212004-1a183de4/files/requirements.txt requirements.txt
2022-03-29 21:22:11,543 INFO    SenderThread:636 [dir_watcher.py:finish():327] scan save: /content/wandb/run-20220329_212004-1a183de4/files/wandb-summary.json wandb-summary.json
2022-03-29 21:22:11,547 INFO    SenderThread:636 [dir_watcher.py:finish():327] scan save: /content/wandb/run-20220329_212004-1a183de4/files/config.yaml config.yaml
2022-03-29 21:22:11,547 INFO    SenderThread:636 [dir_watcher.py:finish():327] scan save: /content/wandb/run-20220329_212004-1a183de4/files/output.log output.log
2022-03-29 21:22:11,551 INFO    SenderThread:636 [sender.py:transition_state():392] send defer: 7
2022-03-29 21:22:11,552 DEBUG   SenderThread:636 [sender.py:send_request():249] send_request: poll_exit
2022-03-29 21:22:11,553 DEBUG   HandlerThread:636 [handler.py:handle_request():141] handle_request: defer
2022-03-29 21:22:11,555 INFO    HandlerThread:636 [handler.py:handle_request_defer():164] handle defer: 7
2022-03-29 21:22:11,555 DEBUG   SenderThread:636 [sender.py:send_request():249] send_request: defer
2022-03-29 21:22:11,555 INFO    SenderThread:636 [sender.py:send_request_defer():388] handle sender defer: 7
2022-03-29 21:22:11,555 INFO    SenderThread:636 [file_pusher.py:finish():145] shutting down file pusher
2022-03-29 21:22:11,655 DEBUG   HandlerThread:636 [handler.py:handle_request():141] handle_request: poll_exit
2022-03-29 21:22:11,656 DEBUG   SenderThread:636 [sender.py:send_request():249] send_request: poll_exit
2022-03-29 21:22:11,759 DEBUG   HandlerThread:636 [handler.py:handle_request():141] handle_request: poll_exit
2022-03-29 21:22:11,760 DEBUG   SenderThread:636 [sender.py:send_request():249] send_request: poll_exit
2022-03-29 21:22:11,863 DEBUG   HandlerThread:636 [handler.py:handle_request():141] handle_request: poll_exit
2022-03-29 21:22:11,863 DEBUG   SenderThread:636 [sender.py:send_request():249] send_request: poll_exit
2022-03-29 21:22:11,879 INFO    Thread-14 :636 [upload_job.py:push():137] Uploaded file /content/wandb/run-20220329_212004-1a183de4/files/config.yaml
2022-03-29 21:22:11,880 INFO    Thread-13 :636 [upload_job.py:push():137] Uploaded file /content/wandb/run-20220329_212004-1a183de4/files/wandb-summary.json
2022-03-29 21:22:11,895 INFO    Thread-15 :636 [upload_job.py:push():137] Uploaded file /content/wandb/run-20220329_212004-1a183de4/files/output.log
2022-03-29 21:22:11,902 INFO    Thread-12 :636 [upload_job.py:push():137] Uploaded file /content/wandb/run-20220329_212004-1a183de4/files/requirements.txt
2022-03-29 21:22:11,964 DEBUG   HandlerThread:636 [handler.py:handle_request():141] handle_request: poll_exit
2022-03-29 21:22:11,964 DEBUG   SenderThread:636 [sender.py:send_request():249] send_request: poll_exit
2022-03-29 21:22:12,066 DEBUG   HandlerThread:636 [handler.py:handle_request():141] handle_request: poll_exit
2022-03-29 21:22:12,066 DEBUG   SenderThread:636 [sender.py:send_request():249] send_request: poll_exit
2022-03-29 21:22:12,102 INFO    Thread-6  :636 [sender.py:transition_state():392] send defer: 8
2022-03-29 21:22:12,103 DEBUG   HandlerThread:636 [handler.py:handle_request():141] handle_request: defer
2022-03-29 21:22:12,103 INFO    HandlerThread:636 [handler.py:handle_request_defer():164] handle defer: 8
2022-03-29 21:22:12,103 DEBUG   SenderThread:636 [sender.py:send_request():249] send_request: defer
2022-03-29 21:22:12,103 INFO    SenderThread:636 [sender.py:send_request_defer():388] handle sender defer: 8
2022-03-29 21:22:12,167 DEBUG   HandlerThread:636 [handler.py:handle_request():141] handle_request: poll_exit
2022-03-29 21:22:12,213 INFO    SenderThread:636 [sender.py:transition_state():392] send defer: 9
2022-03-29 21:22:12,213 DEBUG   SenderThread:636 [sender.py:send_request():249] send_request: poll_exit
2022-03-29 21:22:12,214 DEBUG   HandlerThread:636 [handler.py:handle_request():141] handle_request: defer
2022-03-29 21:22:12,214 INFO    HandlerThread:636 [handler.py:handle_request_defer():164] handle defer: 9
2022-03-29 21:22:12,214 DEBUG   SenderThread:636 [sender.py:send_request():249] send_request: defer
2022-03-29 21:22:12,214 INFO    SenderThread:636 [sender.py:send_request_defer():388] handle sender defer: 9
2022-03-29 21:22:12,214 INFO    SenderThread:636 [sender.py:transition_state():392] send defer: 10
2022-03-29 21:22:12,215 DEBUG   SenderThread:636 [sender.py:send():235] send: final
2022-03-29 21:22:12,215 DEBUG   SenderThread:636 [sender.py:send():235] send: footer
2022-03-29 21:22:12,216 DEBUG   HandlerThread:636 [handler.py:handle_request():141] handle_request: defer
2022-03-29 21:22:12,216 INFO    HandlerThread:636 [handler.py:handle_request_defer():164] handle defer: 10
2022-03-29 21:22:12,216 DEBUG   SenderThread:636 [sender.py:send_request():249] send_request: defer
2022-03-29 21:22:12,216 INFO    SenderThread:636 [sender.py:send_request_defer():388] handle sender defer: 10
2022-03-29 21:22:12,314 DEBUG   HandlerThread:636 [handler.py:handle_request():141] handle_request: poll_exit
2022-03-29 21:22:12,315 DEBUG   SenderThread:636 [sender.py:send_request():249] send_request: poll_exit
2022-03-29 21:22:12,315 INFO    SenderThread:636 [file_pusher.py:join():150] waiting for file pusher
2022-03-29 21:22:12,466 DEBUG   HandlerThread:636 [handler.py:handle_request():141] handle_request: sampled_history
2022-03-29 21:22:12,467 DEBUG   HandlerThread:636 [handler.py:handle_request():141] handle_request: get_summary
2022-03-29 21:22:12,468 DEBUG   HandlerThread:636 [handler.py:handle_request():141] handle_request: shutdown
2022-03-29 21:22:12,468 INFO    HandlerThread:636 [handler.py:finish():778] shutting down handler
2022-03-29 21:22:13,216 INFO    WriterThread:636 [datastore.py:close():281] close: /content/wandb/run-20220329_212004-1a183de4/run-1a183de4.wandb
2022-03-29 21:22:13,364 INFO    SenderThread:636 [sender.py:finish():1078] shutting down sender
2022-03-29 21:22:13,364 INFO    SenderThread:636 [file_pusher.py:finish():145] shutting down file pusher
2022-03-29 21:22:13,364 INFO    SenderThread:636 [file_pusher.py:join():150] waiting for file pusher
2022-03-29 21:22:13,367 INFO    MainThread:636 [internal.py:handle_exit():82] Internal process exited
  1. When I dont use wandb model train properly

  2. Intresting thing is that when i run it on colab model train 1 epoch and after it crash but when is trained on kaggle it crashes before even start training

colab notebook

Hi Wojtek, can you give us your workspace link so we can make sure that all the files for your sweep are there?

1 Like

While going through your sweeps, runs 1-4 are working properly, but since 5 is still in the process of running, it’s not populated yet. It’s good that you got everything up and running now! Do you still need help with this issue since the sweeps are now able to find the file?

Hi Wojtek,

I’m just checking in again to see whether you still need help with this issue?

Hi Wojtek, since we have not heard back from you we are going to close this request. If you would like to re-open the conversation, please let us know!