Problem with Sweep; how to use run.finish() and log without error + Question about defined metric

Hi all,

Nice to meet you!

Currently I’m not understanding how to use run.finish() and wandb.init for logging correctly. I’m constantly getting an error when the wandb.agent sweeps to another model configuration. It’s successfully doing the K-fold split, I dont see any errors. But it’s right after the K-fold split when the now model configuration is applied by the sweep.

Information about my code:
My code is a bit messy. I think there is no way to use K cross validation from scikitlearn. I’ve tried it many times, but my input and output are (with N = number of datasets):

Input 1: N datasets of 1000 numbers (x-axis)
Input2 : N datasets of 1000 numbers (y-axis)
Output1: 1 number for each N’th dataset
Output2: 1 number for each N’th dataset

Input 1 and 2 are concatenated to produce 2 outputs. Lets say N is 300 and split is 0.2 then:
Output1.shape, Output2.shape, Output1_test.shape, Output2_test.shape, X.shape, Y.shape, X_test.shape, Y_test.shape

In the same order, their shapes: ((240,), (240,), (60,), (60,), (240, 1000), (240, 1000), (60, 1000), (60, 1000))
I think there is just no way I can define the cross validation with sklearn with this type of data I think…

I’ve introduced to save to model each time it’s configured. Then load the model in each for loop with zero weigths. This way may cross validation succeeds. However, I’m not sure how to correctly log my files. This code is doing bad at producing the groups I want them to be in; it’s just overwriting them. Also, and as I mentioned; every time when a new model is initiated by the sweep, I get and error:


wandb: Sweep Agent: Waiting for job.
wandb: Job received.
wandb: While tearing down the service manager. The following error has occured: [WinError 10054] De externe host heeft een verbinding verbroken
wandb: Agent Starting Run: ekc1s4gm with config:
wandb: 	batch_size: 6
wandb: 	dense_units: 63.48472025895866
wandb: 	dense_units2: 81.47931201263756
wandb: 	learning_rate: 0.0007466619646085462
wandb: 	num_layers: 10
wandb: 	optimizer: Adam
wandb: WARNING Ignored wandb.init() arg project when running a sweep.

Exception in thread ChkStopThr: 
Some file information

Exception in thread NetStatThr::
Some file information

Ending in: 

ConnectionResetError: [WinError 10054] De externe host heeft een verbinding verbroken
    sent = self._sock.send(data)
ConnectionResetError: [WinError 10054] De externe host heeft een verbinding verbroken

So when this error occurs, it just continues right after it created the new model configuration. The last time this error occurs is when the last loop and last model gets evaluated. Also this error occurs every loop

wandb: WARNING Ignored wandb.init() arg project when running a sweep.

I think it definitely has to do something with the Groups I want certain logs to be in and therefore also my run.finish() command (The error does not error without any wandb.init()!!! The error also occurs with just the wandb.init() in the for loop, also if I add this, each sweep is overwritten by the other sweep so it’s not creating groups aswell!!. I’m unsure if I placed them correctly. it sounded logical to me to define run just once and to have to others as just wandb.init()… (see me code). I just don’t understand how to use it in this case… I hope you do ? How can I group my folds and my validation seperately without using wandb.init() ? Any recommendations on this would be very welcome!

My code
Here is my code… A little messy, sorry.

def seed_all(seed):
    os.environ['PYTHONHASHSEED'] = str(seed)
def build_model(config):
    activ = config.activation   
    dense_units = config.dense_units  
    dense_units2 = config.dense_units2
    num_layers = config.num_layers
    batch_size = config.batch_size
    batch_norm = False
    optimizer = config.optimizer
    learning_rate = config.learning_rate
    x = tf.keras.layers.Concatenate()([input1, input2])
        #input_layer = Input(shape=(len(norm_train_X .columns), len(norm_train_X.iloc[0][0]))
    x = Dense(units=dense_units, activation=activ)(x)
    for _ in range(num_layers):
        x = Dense(units=dense_units, activation=activ)(x)
    x1 = Dense(units=dense_units2, activation=activ)(x)
        # Y1 output will be fed directly from the second dense
    y1_output = Dense(units='1', name='y1_output')(x1)

    third_dense = Dense(units=dense_units2, activation=activ)(x1)

         # Y2 output will come via the third dense
    y2_output = Dense(units='1', name='y2_output')(third_dense)
    model = Model(inputs=[input1, input2], outputs=[y1_output, y2_output])
    model.save_weights('model.h5', overwrite = True)
    return model
def train():
    test_loss_sum =np.array([0])
    Hp_loss_sum = np.array([0])
    MDp_loss_sum = np.array([0])
    Hp_rmse_sum = np.array([0])
    MDp_rmse_sum = np.array([0])
    loss_sum = np.array([0])
    loss_sum_tot =0 
    Hp_R2_append = []
    test_loss_sum_tot =0
    hp_append = []
    MDp_append = []
    hp_pred_append = []
    MDp_R2_append = []
    MDp_sum = 0
    Hp_sum = 0
    test_loss_mean = 0 
    hp_pred_append = []
    MDp_pred_append = []
    Hp_loss_sum_tot = 0
    MDp_loss_sum_tot =0
    Hp_rmse_sum_tot =0
    X = np.vstack(np.asarray(norm1.numpy()[:]))
    Y = np.vstack(np.asarray(norm2.numpy()[:]))
    max_trials = 2
    epochs = 100
    test_loss_sum = np.array([0])
    Hp = train_Y_1_t
    MDp = train_Y_2_t
    Hp_test = test_Y_1_t
    MDp_test = test_Y_2_t
    hyperparams = dict(
        lr = 0.0001,
        optimizer = 'Adam',
        dense_units = 256,
        batch_size = 64,
        epochs = 1,        
        ense_units2 = 64,
        activation = 'relu',)
    cb_reducelr = tf.keras.callbacks.ReduceLROnPlateau(
        monitor = "val_loss",
        mode = 'auto',
        factor = 0.1,
        patience = 20,
        min_delta = 1e-04, #default
        min_lr = 1e-07,
        verbose = 1)

    cb_earlystop= tf.keras.callbacks.EarlyStopping(
        min_delta = 0,
    run = wandb.init(project="custom-charts", config=hyperparams, reinit = True) #Note the Reinit here!
    config = wandb.config
    Wandcalback = WandbCallback(monitor='val_loss')
    model =  build_model(config=config)
    LR = config.learning_rate     # 0.001
    if config.optimizer=='Adam':
        optimizer = tf.keras.optimizers.Adam(lr = LR)
    elif config.optimizer=='RMSprop':
        optimizer = tf.keras.optimizers.RMSprop(lr=LR, rho=0.9, epsilon=1e-08, decay=0.0)
    # Compile the model
    loss={'y1_output': 'mse', 'y2_output': 'mse'},
    metrics={'y1_output': tf.keras.metrics.RootMeanSquaredError(),'y2_output': tf.keras.metrics.RootMeanSquaredError()})
    n_splits = 4
    skf = KFold(n_splits, shuffle = True)
    skf.get_n_splits(X, Y)
    vall_loss = []
    for train_index, test_index in skf.split(X, Y):
        wandb.init(project="custom-charts", group = "folds_experiment", job_type = "fold{}".format(i)) 
        train_index = train_index.astype(int)
        test_index = test_index.astype(int)
        X = np.array(X)
        Y = np.array(Y)
        Hp = np.array(Hp)
        MDp = np.array(MDp)
        X_train, X_test = X[train_index], X[test_index]
        Y_train, Y_test = Y[train_index], Y[test_index]
        Hp_train, Hp_test = Hp[train_index], Hp[test_index]
        MDp_train, MDp_test = MDp[train_index],MDp[test_index]      
        history =[tf.convert_to_tensor(X), tf.convert_to_tensor(Y)], [Hp, MDp], validation_data = ([tf.convert_to_tensor(X_test), tf.convert_to_tensor(Y_test)], [Hp_test, MDp_test]), 
        loss_sum = pd.DataFrame(history.history)['loss'].iloc[-1]  + loss_sum
        test_loss_sum = pd.DataFrame(history.history)['val_loss'].iloc[-1]  + test_loss_sum
        Hp_loss_sum = pd.DataFrame(history.history)['val_y1_output_loss'].iloc[-1]  + Hp_loss_sum
        MDp_loss_sum = pd.DataFrame(history.history)['val_y2_output_loss'].iloc[-1]  + MDp_loss_sum
        Hp_rmse_sum = pd.DataFrame(history.history)['val_y1_output_root_mean_squared_error'].iloc[-1]  + Hp_rmse_sum
        MDp_rmse_sum = pd.DataFrame(history.history)['val_y2_output_root_mean_squared_error'].iloc[-1]  + MDp_rmse_sum
        loss_sum_tot = pd.DataFrame(history.history)['loss']  + loss_sum_tot
        test_loss_sum_tot = pd.DataFrame(history.history)['val_loss']  + test_loss_sum_tot
        Hp_loss_sum_tot = pd.DataFrame(history.history)['val_y1_output_loss'] +  Hp_loss_sum_tot
        MDp_loss_sum_tot = pd.DataFrame(history.history)['val_y2_output_loss']  + MDp_loss_sum_tot
        Hp_rmse_sum_tot = pd.DataFrame(history.history)['val_y1_output_root_mean_squared_error']  +  Hp_rmse_sum_tot
        MDp_rmse_sum_tot = pd.DataFrame(history.history)['val_y2_output_root_mean_squared_error']  +  MDp_rmse_sum_tot   

        Y_pred = model.predict([tf.convert_to_tensor(X_test), tf.convert_to_tensor(Y_test)])
        metric = tfa.metrics.r_square.RSquare()
        metric.update_state(Hp_test, Y_pred[0].flatten())
        result = metric.result()
        R_2_Hp = result.numpy()
        metric.update_state(MDp_test, Y_pred[1].flatten())
        result = metric.result()
        R_2_MDp = result.numpy()
        MDp_sum = MDp_sum + Y_pred[1]
        Hp_sum = Hp_sum + Y_pred[0]
        i = i + 1 
    test_loss_mean = test_loss_sum/n_splits
    loss_sum_mean = loss_sum/n_splits
    test_loss_sum_mean =  test_loss_sum/n_splits
    Hp_loss_sum_mean = Hp_loss_sum/n_splits
    MDp_loss_sum_mean = MDp_loss_sum/n_splits
    Hp_rmse_sum_mean = Hp_rmse_sum/n_splits
    MDp_rmse_sum_mean = MDp_rmse_sum/n_splits
    test_MDp_R2 = np.mean(MDp_R2_append)
    test_Hp_R2 = np.mean(Hp_R2_append)
    Hp_mean = Hp_sum/n_splits
    MDp_mean = MDp_sum/n_splits    
        # wandb.init(project= "sweep & optimalisation RandomSearch", group="experimentfold{}".format(i), job_type="validation")
    wandb.init(project="custom-charts", group ="folds_experiment", job_type = "validation")
    for val_los in range(len(test_loss_sum_tot)):
        wandb.log({"val_loss_mean" : test_loss_sum_tot[val_los]/n_splits})
    for loss in range(len(loss_sum_tot)):
        wandb.log({"loss_mean": loss_sum_tot[loss]/n_splits})
    for val_MDp_los in range(len(test_loss_sum_tot)):
        wandb.log({"val_MDp_loss_mean" : test_loss_sum_tot[val_MDp_los]/n_splits})
    for val_hp_los in range(len( Hp_loss_sum_tot)):
        wandb.log({"val_hp_loss_mean": Hp_loss_sum_tot[val_hp_los]/n_splits})
    for val_MDp_rmse in range(len(MDp_rmse_sum_tot)):
        wandb.log({"val_MDp_rmse_mean" : MDp_loss_sum_tot[val_MDp_rmse]/n_splits})
    for val_hp_rmse in range(len(Hp_rmse_sum_tot)):
        wandb.log({"val_MDp_rmse_mean": Hp_rmse_sum_tot[val_hp_rmse]/n_splits})
    hp_append = np.concatenate(hp_append)
    MDp_append = np.concatenate(MDp_append)
    hp_pred_append = np.concatenate(hp_pred_append)
    MDp_pred_append = np.concatenate(MDp_pred_append)
    Hp_score = np.sqrt(mean_squared_error(hp_pred_append,hp_append))
    MDp_score = np.sqrt(mean_squared_error(MDp_pred_append,MDp_append))  
    test_MDp_R2 = np.mean(MDp_R2_append)
    test_Hp_R2 = np.mean(Hp_R2_append)
    wandb.log({"R2_score_hp":Hp_score, "R2_score_MDp":MDp_score, "R2_hp":test_Hp_R2, "R2_MDp":test_MDp_R2})
    Hp = np.asarray(Hp_mean.flatten())
    MDp = np.asarray(MDp_mean.flatten())
    Hp_testt = np.asarray(Hp_test.flatten())
    MDp_testt = np.asarray(MDp_test.flatten())
    fd = pd.DataFrame({"pred": Hp,"actual":Hp_testt})
    table = wandb.Table(dataframe=fd)
    wandb.log({'scatter-plot1': wandb.plot.scatter(table, "pred", "actual")})
    fd2 = pd.DataFrame({"pred": MDp,"actual":MDp_testt})
    table2 = wandb.Table(dataframe=fd2)
    wandb.log({'scatter-plot2': wandb.plot.scatter(table2, "pred", "actual")})
    predictions_h = [s for s in Hp_mean]
    table2 = wandb.Table(data=predictions_h, columns=["h_predictions"])
    wandb.log({'my_histogramM': wandb.plot.histogram(table2, "h_predictions",
    title="Prediction Score Distribution Hubble Parameter")})
        # hist = np.histogram(predictions_h)
        # wandb.log({'Hubble parameter': wandb.plot.histogram(hist)})
    predictions_hh = [ s for s in MDp_mean]
    table3 = wandb.Table(data=predictions_hh, columns=["h_predictions"])
    wandb.log({'my_histogram': wandb.plot.histogram(table3, "h_predictions",
    title="Prediction Score Distribution Mass Density")})
        # hist = np.histogram(predictions_hh)
        # wandb.log({'Mass Density parameter': wandb.plot.histogram(hist)})
sweep_config = {
    'method': 'random',         
    'metric': {
        'name': 'test_loss_mean',     
        'goal': 'minimize'      
    'parameters': {
        'dense_units': {
            'distribution': 'log_uniform_values',
            'min': 32,
            'max': 256
        'learning_rate': {
            'distribution': 'log_uniform_values',
            'min': 0.0000001,
            'max': 0.1
        'dense_units2': {
            'distribution': 'log_uniform_values',
            'min': 32,
            'max': 256
        'batch_size': {
            #Integers between 32 and 256 
            # with evenly distributed logarithms
            'values': [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20],
        'optimizer': {
            'values': ['Adam', 'RMSprop']
        'num_layers': {
            'values': [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20]

sweep_id = wandb.sweep(sweep_config, entity="stijnvdbosch", project="custom-charts")
wandb.agent(sweep_id, function=train, count=2, project="custom-charts")

You can also see i’m updating my own defined metric (not from called test_loss_mean. I suppose I did that correct?
If there is any more information you need to help me, then, please, send me a message and I will reply in a blink. :blush: