Wait for a long to get the result

yigenannan · July 18, 2023, 10:13pm

Hi, Could you please help me solve this problem?

wandb: Tracking run with wandb version 0.15.5
wandb: Run data is saved locally in /home/ex20249/xai/src/train/wandb/run-20230718_231057-o7ybnjr4
wandb: Run wandb offline to turn off syncing.
wandb: Syncing run colorful-night-2
wandb: View project at Weights & Biases
wandb: View run at Weights & Biases
/opt/miniconda3/envs/xai2/lib/python3.7/site-packages/sklearn/utils/validation.py:72: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().
return f(**kwargs)
/opt/miniconda3/envs/xai2/lib/python3.7/site-packages/sklearn/utils/validation.py:72: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().
return f(**kwargs)
4%|███████▎ | 101/2660 [00:00<00:07, 329.70it/s]
(9090, 30, 10) (9090, 6) (1111, 30, 10) (1111, 6)
2023-07-18 23:11:28.811367: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1

The code stopped here, and there is no response. Could you let me know how to solve it?

yigenannan · July 18, 2023, 10:15pm

here is the debug-log file
2023-07-18 23:03:50,394 INFO 2023-07-18 23:03:50,395 INFO 2023-07-18 23:03:50,395 INFO 2023-07-18 23:03:50,395 INFO 2023-07-18 23:03:50,395 INFO 2023-07-18 23:03:50,395 INFO 2023-07-18 23:03:50,395 INFO 2023-07-18 23:03:50,395 INFO 2023-07-18 23:03:50,395 INFO 2023-07-18 23:03:50,396 INFO 2023-07-18 23:03:50,396 INFO config: {‘DATASET’: 2023-07-18 23:03:50,396 INFO 2023-07-18 23:03:50,396 INFO 2023-07-18 23:03:50,400 INFO 2023-07-18 23:03:50,401 INFO 2023-07-18 23:03:50,403 INFO 2023-07-18 23:03:50,404 INFO 2023-07-18 23:03:50,837 INFO 2023-07-18 23:03:50,895 INFO 2023-07-18 23:03:50,895 INFO 2023-07-18 23:04:20,922 INFO 2023-07-18 23:04:20,922 INFO 2023-07-18 23:04:20,922 INFO 2023-07-18 23:04:20,922 INFO 2023-07-18 23:04:20,922 INFO 2023-07-18 23:04:21,665 INFO MainThread:3851716 [wandb_setup.py:_flush():76] Current SDK version is 0.15.5
MainThread:3851716 [wandb_setup.py:_flush():76] Configure stats pid to 3851716
MainThread:3851716 [wandb_setup.py:_flush():76] Loading settings from /home/ex20249/.config/wandb/settings
MainThread:3851716 [wandb_setup.py:_flush():76] Loading settings from /home/ex20249/xai/src/train/wandb/settings
MainThread:3851716 [wandb_setup.py:_flush():76] Loading settings from environment variables: {}
MainThread:3851716 [wandb_setup.py:_flush():76] Applying setup settings: {‘_disable_service’: False}
MainThread:3851716 [wandb_setup.py:_flush():76] Inferring run settings from compute environment: {‘program_relpath’: ‘train_model.py’, ‘program’: ‘train_model.py’}
MainThread:3851716 [wandb_init.py:_log_setup():507] Logging user logs to /home/ex20249/xai/src/train/wandb/run-20230718_230350-mwxuu6tx/logs/debug.log
MainThread:3851716 [wandb_init.py:_log_setup():508] Logging internal logs to /home/ex20249/xai/src/train/wandb/run-20230718_230350-mwxuu6tx/logs/debug-internal.log
MainThread:3851716 [wandb_init.py:init():547] calling init triggers
MainThread:3851716 [wandb_init.py:init():555] wandb.init called with sweep_config: {}
‘walmart’, ‘NUM_SERIES’: 100, ‘HISTORY_SIZE’: 30, ‘TARGET_SIZE’: 6, ‘STRIDE’: 1, ‘CONT_FEATURES’: [0, 9], ‘CAT_FEATURES’: [1, 2, 3, 4, 5, 6, 7, 8], ‘BATCH_SIZE’: 512, ‘EPOCHS’: 200, ‘PATIENCE’: 10, ‘MODEL’: ‘lstm’, ‘NUM_LAYERS’: 2, ‘NUM_UNITS’: 32, ‘DROPOUT’: 0, ‘STANDARDIZE’: False}
MainThread:3851716 [wandb_init.py:init():596] starting backend
MainThread:3851716 [wandb_init.py:init():600] setting up manager
MainThread:3851716 [backend.py:_multiprocessing_setup():108] multiprocessing start_methods=fork,spawn,forkserver, using: spawn
MainThread:3851716 [wandb_init.py:init():606] backend started and connected
MainThread:3851716 [wandb_init.py:init():705] updated telemetry
MainThread:3851716 [wandb_init.py:init():738] communicating run to backend with 60.0 second timeout
MainThread:3851716 [wandb_run.py:_on_init():2173] communicating current version
MainThread:3851716 [wandb_run.py:_on_init():2182] got version response
MainThread:3851716 [wandb_init.py:init():789] starting run threads in backend
MainThread:3851716 [wandb_run.py:_console_start():2152] atexit reg
MainThread:3851716 [wandb_run.py:_redirect():2007] redirect: SettingsConsole.WRAP_RAW
MainThread:3851716 [wandb_run.py:_redirect():2072] Wrapping output streams.
MainThread:3851716 [wandb_run.py:_redirect():2097] Redirects installed.
MainThread:3851716 [wandb_init.py:init():830] run started, returning control to user process
MainThread:3851716 [wandb_run.py:_config_callback():1281] config_cb None None {‘FEATURES’: [‘Weekly_Sales’, ‘Temperature’, ‘Store’, ‘Dept’, ‘Year’, ‘month’, ‘weekofmonth’, ‘day’, ‘Type’, ‘Size’], ‘CONT_FEATURES’: [0, 9], ‘CAT_FEATURES’: [1, 2, 3, 4, 5, 6, 7, 8]}

By the way, the code could be used in colab.

nathank · July 21, 2023, 3:23pm

Hi @yigenannan, could you possibly share some of the code you are using or even a link to the Colab? If not, I might need a little more information about what you are looking to do?

Thank you,
Nate

yigenannan · July 24, 2023, 10:58am

Hi Nathank,
Thank you for your reply. This is the code that i run:

“”" Train the models “”"
import sys

sys.path.append(‘…/’)
import wandb, yaml, os
import tensorflow as tf
from tensorflow.keras.callbacks import EarlyStopping
from data.loaders import dataset_loader
from models.loaders import model_loader
from models.gbr_model import MultiGradientBoostingRegressorRemoval
from configs.loaders import load_config # File configs
from configs.defaults import Globs, dataset_defaults, model_defaults
from train.helpers import auto_gpu_selection, log_errors
from visualization.vis_preds import visualize_preds
from data.helpers import convert_wandb_config_to_dict
import pickle

Configuration

NOTE: Remove the lines below to save experiment results to W&B

os.environ[‘WANDB_SILENT’]=‘true’

os.environ[‘WANDB_MODE’] = ‘dryrun’

def trainer(conf, wandb_tags=[‘train’]):
# auto_gpu_selection()

wandb.init(project=Globs.PROJECT_NAME, config=conf, tags=wandb_tags, reinit=True) #load_config()
config = wandb.config

dataset_params, dataset = dataset_loader[config.DATASET](convert_wandb_config_to_dict(config))
config.update(dataset_params)
model = model_loader[config.MODEL](config)
print(dataset['train_x'].shape, dataset['train_y'].shape,
    dataset['test_x'].shape, dataset['test_y'].shape)

if isinstance(model, tf.keras.Model):
    print(model.summary())
    callbacks=[ EarlyStopping(patience=config.PATIENCE,
        restore_best_weights=True, monitor='val_loss'),
        wandb.keras.WandbCallback() ]
    model.fit(dataset['train_x'], dataset['train_y'],
        validation_data=(dataset['test_x'], dataset['test_y']),
        batch_size=config.BATCH_SIZE, epochs=config.EPOCHS, callbacks=callbacks)
elif isinstance(model, MultiGradientBoostingRegressorRemoval):
    model.fit(dataset['train_x'], dataset['train_y'],
        validation_data=(dataset['test_x'], dataset['test_y']),
        wandb=wandb)

log_errors(dataset, model, wandb)
visualize_preds(dataset, model, wandb.run.dir)

if name == “main”:
# Train a single model
config = {**dataset_defaults[‘walmart’], **model_defaults[‘lstm’]}
# config.update({‘CONT_FEATURES’:[0], ‘CAT_FEATURES’:[3]})
# config.update({‘SUBJECTS’:[‘559’, ‘570’, ‘575’, ‘588’],})
config.update({‘STANDARDIZE’: False})
trainer(config, wandb_tags=[‘trainv7’])
Is this information enough? Actually i have tried add wandb.finish() to solve this problem. It does not work.

nathank · July 26, 2023, 4:27pm

Thank you @yigenannan! It’s a bit hard to tell if the code is hanging because of wandb. Is there anything that lead you to believe wandb was causing the hang? One thing we could try is os.environ[‘WANDB_MODE’] = ‘disabled’ to prevent wandb code from executing and see if the code is still not completing.

nathank · August 3, 2023, 6:45pm

Hi @yigenannan, were you able to try disabling wandb?

yigenannan · August 4, 2023, 7:45pm

Hi Nathank,
When i remove wandb, it still does not work. I run the same code in colab which could be success. I have no idea. I will use colab to finalize my work. Thanks again.

system · October 3, 2023, 7:45pm

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Waiting for W&B process to finish… (success) W&B Help sweeps , wandb	6	919	January 12, 2024
Waiting for W&B process to finish... (success) W&B Help	12	4566	March 3, 2023
Sync issue after training W&B Help wandb	6	233	August 20, 2024
Waiting for W&B process to finish (success) W&B Help dashboard , projects , questions , wandb , beginner-friendly	4	1426	September 26, 2022
Repeated Timeouts in Wandb Init W&B Help wandb	18	2459	March 12, 2025

Wait for a long to get the result

Configuration

NOTE: Remove the lines below to save experiment results to W&B

os.environ[‘WANDB_SILENT’]=‘true’

os.environ[‘WANDB_MODE’] = ‘dryrun’

Related topics