Wait for a long to get the result

Hi, Could you please help me solve this problem?

wandb: Tracking run with wandb version 0.15.5
wandb: Run data is saved locally in /home/ex20249/xai/src/train/wandb/run-20230718_231057-o7ybnjr4
wandb: Run wandb offline to turn off syncing.
wandb: Syncing run colorful-night-2
wandb: :star: View project at Weights & Biases
wandb: :rocket: View run at Weights & Biases
/opt/miniconda3/envs/xai2/lib/python3.7/site-packages/sklearn/utils/validation.py:72: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().
return f(**kwargs)
/opt/miniconda3/envs/xai2/lib/python3.7/site-packages/sklearn/utils/validation.py:72: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().
return f(**kwargs)
4%|███████▎ | 101/2660 [00:00<00:07, 329.70it/s]
(9090, 30, 10) (9090, 6) (1111, 30, 10) (1111, 6)
2023-07-18 23:11:28.811367: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1

The code stopped here, and there is no response. Could you let me know how to solve it?

here is the debug-log file
2023-07-18 23:03:50,394 INFO MainThread:3851716 [wandb_setup.py:_flush():76] Current SDK version is 0.15.5
2023-07-18 23:03:50,395 INFO MainThread:3851716 [wandb_setup.py:_flush():76] Configure stats pid to 3851716
2023-07-18 23:03:50,395 INFO MainThread:3851716 [wandb_setup.py:_flush():76] Loading settings from /home/ex20249/.config/wandb/settings
2023-07-18 23:03:50,395 INFO MainThread:3851716 [wandb_setup.py:_flush():76] Loading settings from /home/ex20249/xai/src/train/wandb/settings
2023-07-18 23:03:50,395 INFO MainThread:3851716 [wandb_setup.py:_flush():76] Loading settings from environment variables: {}
2023-07-18 23:03:50,395 INFO MainThread:3851716 [wandb_setup.py:_flush():76] Applying setup settings: {‘_disable_service’: False}
2023-07-18 23:03:50,395 INFO MainThread:3851716 [wandb_setup.py:_flush():76] Inferring run settings from compute environment: {‘program_relpath’: ‘train_model.py’, ‘program’: ‘train_model.py’}
2023-07-18 23:03:50,395 INFO MainThread:3851716 [wandb_init.py:_log_setup():507] Logging user logs to /home/ex20249/xai/src/train/wandb/run-20230718_230350-mwxuu6tx/logs/debug.log
2023-07-18 23:03:50,395 INFO MainThread:3851716 [wandb_init.py:_log_setup():508] Logging internal logs to /home/ex20249/xai/src/train/wandb/run-20230718_230350-mwxuu6tx/logs/debug-internal.log
2023-07-18 23:03:50,396 INFO MainThread:3851716 [wandb_init.py:init():547] calling init triggers
2023-07-18 23:03:50,396 INFO MainThread:3851716 [wandb_init.py:init():555] wandb.init called with sweep_config: {}
config: {‘DATASET’: ‘walmart’, ‘NUM_SERIES’: 100, ‘HISTORY_SIZE’: 30, ‘TARGET_SIZE’: 6, ‘STRIDE’: 1, ‘CONT_FEATURES’: [0, 9], ‘CAT_FEATURES’: [1, 2, 3, 4, 5, 6, 7, 8], ‘BATCH_SIZE’: 512, ‘EPOCHS’: 200, ‘PATIENCE’: 10, ‘MODEL’: ‘lstm’, ‘NUM_LAYERS’: 2, ‘NUM_UNITS’: 32, ‘DROPOUT’: 0, ‘STANDARDIZE’: False}
2023-07-18 23:03:50,396 INFO MainThread:3851716 [wandb_init.py:init():596] starting backend
2023-07-18 23:03:50,396 INFO MainThread:3851716 [wandb_init.py:init():600] setting up manager
2023-07-18 23:03:50,400 INFO MainThread:3851716 [backend.py:_multiprocessing_setup():108] multiprocessing start_methods=fork,spawn,forkserver, using: spawn
2023-07-18 23:03:50,401 INFO MainThread:3851716 [wandb_init.py:init():606] backend started and connected
2023-07-18 23:03:50,403 INFO MainThread:3851716 [wandb_init.py:init():705] updated telemetry
2023-07-18 23:03:50,404 INFO MainThread:3851716 [wandb_init.py:init():738] communicating run to backend with 60.0 second timeout
2023-07-18 23:03:50,837 INFO MainThread:3851716 [wandb_run.py:_on_init():2173] communicating current version
2023-07-18 23:03:50,895 INFO MainThread:3851716 [wandb_run.py:_on_init():2182] got version response
2023-07-18 23:03:50,895 INFO MainThread:3851716 [wandb_init.py:init():789] starting run threads in backend
2023-07-18 23:04:20,922 INFO MainThread:3851716 [wandb_run.py:_console_start():2152] atexit reg
2023-07-18 23:04:20,922 INFO MainThread:3851716 [wandb_run.py:_redirect():2007] redirect: SettingsConsole.WRAP_RAW
2023-07-18 23:04:20,922 INFO MainThread:3851716 [wandb_run.py:_redirect():2072] Wrapping output streams.
2023-07-18 23:04:20,922 INFO MainThread:3851716 [wandb_run.py:_redirect():2097] Redirects installed.
2023-07-18 23:04:20,922 INFO MainThread:3851716 [wandb_init.py:init():830] run started, returning control to user process
2023-07-18 23:04:21,665 INFO MainThread:3851716 [wandb_run.py:_config_callback():1281] config_cb None None {‘FEATURES’: [‘Weekly_Sales’, ‘Temperature’, ‘Store’, ‘Dept’, ‘Year’, ‘month’, ‘weekofmonth’, ‘day’, ‘Type’, ‘Size’], ‘CONT_FEATURES’: [0, 9], ‘CAT_FEATURES’: [1, 2, 3, 4, 5, 6, 7, 8]}

By the way, the code could be used in colab.

Hi @yigenannan, could you possibly share some of the code you are using or even a link to the Colab? If not, I might need a little more information about what you are looking to do?

Thank you,
Nate

Hi Nathank,
Thank you for your reply. This is the code that i run:

“”" Train the models “”"
import sys

sys.path.append(‘…/’)
import wandb, yaml, os
import tensorflow as tf
from tensorflow.keras.callbacks import EarlyStopping
from data.loaders import dataset_loader
from models.loaders import model_loader
from models.gbr_model import MultiGradientBoostingRegressorRemoval
from configs.loaders import load_config # File configs
from configs.defaults import Globs, dataset_defaults, model_defaults
from train.helpers import auto_gpu_selection, log_errors
from visualization.vis_preds import visualize_preds
from data.helpers import convert_wandb_config_to_dict
import pickle

Configuration

NOTE: Remove the lines below to save experiment results to W&B

os.environ[‘WANDB_SILENT’]=‘true’

os.environ[‘WANDB_MODE’] = ‘dryrun’

def trainer(conf, wandb_tags=[‘train’]):
# auto_gpu_selection()

wandb.init(project=Globs.PROJECT_NAME, config=conf, tags=wandb_tags, reinit=True) #load_config()
config = wandb.config

dataset_params, dataset = dataset_loader[config.DATASET](convert_wandb_config_to_dict(config))
config.update(dataset_params)
model = model_loader[config.MODEL](config)
print(dataset['train_x'].shape, dataset['train_y'].shape,
    dataset['test_x'].shape, dataset['test_y'].shape)

if isinstance(model, tf.keras.Model):
    print(model.summary())
    callbacks=[ EarlyStopping(patience=config.PATIENCE,
        restore_best_weights=True, monitor='val_loss'),
        wandb.keras.WandbCallback() ]
    model.fit(dataset['train_x'], dataset['train_y'],
        validation_data=(dataset['test_x'], dataset['test_y']),
        batch_size=config.BATCH_SIZE, epochs=config.EPOCHS, callbacks=callbacks)
elif isinstance(model, MultiGradientBoostingRegressorRemoval):
    model.fit(dataset['train_x'], dataset['train_y'],
        validation_data=(dataset['test_x'], dataset['test_y']),
        wandb=wandb)

log_errors(dataset, model, wandb)
visualize_preds(dataset, model, wandb.run.dir)

if name == “main”:
# Train a single model
config = {**dataset_defaults[‘walmart’], **model_defaults[‘lstm’]}
# config.update({‘CONT_FEATURES’:[0], ‘CAT_FEATURES’:[3]})
# config.update({‘SUBJECTS’:[‘559’, ‘570’, ‘575’, ‘588’],})
config.update({‘STANDARDIZE’: False})
trainer(config, wandb_tags=[‘trainv7’])
Is this information enough? Actually i have tried add wandb.finish() to solve this problem. It does not work.

Thank you @yigenannan! It’s a bit hard to tell if the code is hanging because of wandb. Is there anything that lead you to believe wandb was causing the hang? One thing we could try is os.environ[‘WANDB_MODE’] = ‘disabled’ to prevent wandb code from executing and see if the code is still not completing.

Hi @yigenannan, were you able to try disabling wandb?

Hi Nathank,
When i remove wandb, it still does not work. I run the same code in colab which could be success. I have no idea. I will use colab to finalize my work. Thanks again.

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.