Waiting for W&B process to finish… (success)

Hi I am logging keras training runs with wandb and my sweep process gets stuck with the following message:
wandb: Waiting for W&B process to finish… (success).

I open this issue because all similar ones have been closed and theres no clear fix.

I’m running my script python environment through anaconda3 in Windows 10.

My debug.log file is the following:

2023-10-27 13:54:05,903 INFO    Thread-5102:19736 [wandb_setup.py:_flush():76] Current SDK version is 0.15.4
2023-10-27 13:54:05,904 INFO    Thread-5102:19736 [wandb_setup.py:_flush():76] Configure stats pid to 19736
2023-10-27 13:54:05,904 INFO    Thread-5102:19736 [wandb_setup.py:_flush():76] Loading settings from C:\Users\franz\.config\wandb\settings
2023-10-27 13:54:05,904 INFO    Thread-5102:19736 [wandb_setup.py:_flush():76] Loading settings from C:\Users\franz\Repos\prediccion_rinde_anii\app\notebooks_franz\wandb\settings
2023-10-27 13:54:05,905 INFO    Thread-5102:19736 [wandb_setup.py:_flush():76] Loading settings from environment variables: {'root_dir': 'C:\\Users\\franz\\Repos\\prediccion_rinde_anii\\app\\notebooks_franz', 'entity': 'smartway-ia', 'project': 'prediccion_rendimiento_anii', 'run_id': 'n9fzwdds', 'sweep_param_path': 'C:\\Users\\franz\\Repos\\prediccion_rinde_anii\\app\\notebooks_franz\\wandb\\sweep-8mwtkgj6\\config-n9fzwdds.yaml', 'sweep_id': '8mwtkgj6'}
2023-10-27 13:54:05,906 INFO    Thread-5102:19736 [wandb_setup.py:_flush():76] Applying setup settings: {'_disable_service': False}
2023-10-27 13:54:05,906 INFO    Thread-5102:19736 [wandb_setup.py:_flush():76] Inferring run settings from compute environment: {'program_relpath': 'app\\notebooks_franz\\sweep_run.py', 'program': 'C:\\Users\\franz\\Repos\\prediccion_rinde_anii\\app\\notebooks_franz\\sweep_run.py'}
2023-10-27 13:54:05,907 INFO    Thread-5102:19736 [wandb_init.py:_log_setup():507] Logging user logs to C:\Users\franz\Repos\prediccion_rinde_anii\app\notebooks_franz\wandb\run-20231027_135405-n9fzwdds\logs\debug.log
2023-10-27 13:54:05,908 INFO    Thread-5102:19736 [wandb_init.py:_log_setup():508] Logging internal logs to C:\Users\franz\Repos\prediccion_rinde_anii\app\notebooks_franz\wandb\run-20231027_135405-n9fzwdds\logs\debug-internal.log
2023-10-27 13:54:05,908 INFO    Thread-5102:19736 [wandb_init.py:init():547] calling init triggers
2023-10-27 13:54:05,908 INFO    Thread-5102:19736 [wandb_init.py:init():554] wandb.init called with sweep_config: {'batch_size': 256, 'constant_layers': ['elevation', 'landform'], 'dropout': 0.1, 'end_of_sequence_offset': 8, 'epochs': 100, 'history_steps': 60, 'learning_rate': 0.01, 'model_name': 'single_bit_multimodal', 'optimizer': 'sgd', 'patience': 30, 'time_series_layers': ['NDWI', 'NDMI', 'NDVI']}
config: {}
2023-10-27 13:54:05,909 INFO    Thread-5102:19736 [wandb_init.py:init():596] starting backend
2023-10-27 13:54:05,909 INFO    Thread-5102:19736 [wandb_init.py:init():600] setting up manager
2023-10-27 13:54:05,927 INFO    Thread-5102:19736 [backend.py:_multiprocessing_setup():106] multiprocessing start_methods=spawn, using: spawn
2023-10-27 13:54:05,944 INFO    Thread-5102:19736 [wandb_init.py:init():606] backend started and connected
2023-10-27 13:54:05,955 INFO    Thread-5102:19736 [wandb_run.py:_config_callback():1283] config_cb None None {'batch_size': 256, 'constant_layers': ['elevation', 'landform'], 'dropout': 0.1, 'end_of_sequence_offset': 8, 'epochs': 100, 'history_steps': 60, 'learning_rate': 0.01, 'model_name': 'single_bit_multimodal', 'optimizer': 'sgd', 'patience': 30, 'time_series_layers': ['NDWI', 'NDMI', 'NDVI']}
2023-10-27 13:54:05,956 INFO    Thread-5102:19736 [wandb_init.py:init():703] updated telemetry
2023-10-27 13:54:06,019 INFO    Thread-5102:19736 [wandb_init.py:init():736] communicating run to backend with 60.0 second timeout
2023-10-27 13:54:06,665 INFO    Thread-5102:19736 [wandb_run.py:_on_init():2176] communicating current version
2023-10-27 13:54:06,787 INFO    Thread-5102:19736 [wandb_run.py:_on_init():2185] got version response upgrade_message: "wandb version 0.15.12 is available!  To upgrade, please run:\n $ pip install wandb --upgrade"

2023-10-27 13:54:06,787 INFO    Thread-5102:19736 [wandb_init.py:init():787] starting run threads in backend
2023-10-27 13:54:07,058 INFO    Thread-5102:19736 [wandb_run.py:_console_start():2155] atexit reg
2023-10-27 13:54:07,059 INFO    Thread-5102:19736 [wandb_run.py:_redirect():2010] redirect: SettingsConsole.WRAP_RAW
2023-10-27 13:54:07,059 INFO    Thread-5102:19736 [wandb_run.py:_redirect():2075] Wrapping output streams.
2023-10-27 13:54:07,059 INFO    Thread-5102:19736 [wandb_run.py:_redirect():2100] Redirects installed.
2023-10-27 13:54:07,060 INFO    Thread-5102:19736 [wandb_init.py:init():828] run started, returning control to user process
2023-10-27 14:00:19,842 INFO    Thread-5102:19736 [wandb_run.py:_finish():1890] finishing run smartway-ia/prediccion_rendimiento_anii/n9fzwdds
2023-10-27 14:00:19,842 INFO    Thread-5102:19736 [wandb_run.py:_atexit_cleanup():2124] got exitcode: 0
2023-10-27 14:00:19,843 INFO    Thread-5102:19736 [wandb_run.py:_restore():2107] restore
2023-10-27 14:00:19,843 INFO    Thread-5102:19736 [wandb_run.py:_restore():2113] restore done

Hi @franzmayr! Apologies you are seeing this behavior on your side! Could you please also provide debug-internal.log file as well?

Could you talk a bit about your experiment setup and the environment it is running in?

Hi @artsiom!
Sadly I deleted that file, but as I still face this issue, I can provide another run with the same problem.
Find both log files in the links below:

Regarding my setup:
Python 3.9.16 on windows 10
wandb version: ‘0.15.12’
tensorflow vresion: ‘2.10.1’

I’m running a sweep with a training function implemented with keras using “wandb.keras.WandbCallback()” callback.
This funcion contains a try-catch-finally statement, where “wandb.finish()” is run inside the finally.
Please let me know if there is any further detail I can provide.

Hi @franzmayr! I’ve taken a peek at your debug-internal.log, and it seems like your process hangs after your system is not able to find this file here:

The system throws FileNotFoundError: [WinError 2] El sistema no puede encontrar el archivo especificado and then goes into an indefinite loop, because wandb process was not able to be finished properly.

Are you by any chance calling pydev_monkey.py, but it is not on your local computer?

Hi Franz, wanted to follow up with you regarding this thread!

Hi, since we have not heard back from you, we are going to close this request. If you would like to reopen the conversation, please let us know! Unfortunately, at the moment, we do not receive notifications if a thread reopens on Discourse. So, please feel free to create a new ticket regarding your concern if you’d like to continue the conversation.