Wandb integration using class and sweep running twice under the same name

tschug · May 16, 2022, 4:44pm

Hi all,

I’m implementing W&B into an existing project in which Agent, Model creation and Environment are constructed in classes. The code structure in the Python file (AIAgent.py) looks like this:

import wandb

config = {
    'layer_sizes': [17, 16, 12, 4],
    'batch_minsize': 32,
    'max_memory': 100_000,
    'episodes': 2,
    'epsilon': 1.0,
    'epsilon_decay': 0.998,
    'epsilon_min': 0.01,
    'gamma': 0.9,
    'learning_rate': 0.001,
    'weight_decay': 0,
    'optimizer': 'sgd',
    'activation': 'relu',
    'loss_function': 'mse'
}

class AIAgent:
    def __init__(self):
        self.config = config
        self.pipeline(self.config)


    def pipeline(self, config):
        wandb.init()
        config = wandb.config

        model, criterion, optimizer = self.make(config)
        self.train(model, criterion, optimizer, config) 


    def make(self, config):
        model = LinearQNet(config).to(device)

        if config['loss_function'] == 'mse':
            criterion = nn.MSELoss()

        if config['optimizer'] == 'adam':
            optimizer = torch.optim.Adam(model.parameters(), lr=config['learning_rate'], betas=(0.9, 0.999), eps=1e-08, weight_decay=config['weight_decay'], amsgrad=False)
 
        wandb.watch(model, criterion, log='all', log_freq=1)
        summary(model)

        return model, criterion, optimizer


    def train(self, model, criterion, optimizer, config):
        for episode in range(1, config['episodes'] + 1):
            while True:
                # Where the training is performed

                if done:
                    if (episode % 1) == 0:
                        wandb.log({'episode': episode, 'epsilon': epsilon, 'score': score, 'loss': loss_mean, 'reward': reward_mean, 'score_mean': score_mean, 'images': [wandb.Image(img) for img in env_images]}, step=episode})
                    break

            if episode < config['episodes']:
                game.game_reset()
            else:
                wandb.finish()
                break


class LinearQNet(nn.Module):
    def __init__(self, config):
        super(LinearQNet, self).__init__()
        self.config = config
        # Where the NN is configured


if __name__ == '__main__':
    AIAgent.__init__(AIAgent())

I’m currently initializing the sweep configuration via a .yaml file calling wandb sweep sweep.yaml. The sweep.yaml file looks like this:

program: AIAgent.py
project: evaluation-sweep-1
method: random
metric:
  name: score_mean
  goal: maximize
command:
  - ${env}
  - python3
  - ${program}
  - ${args}
parameters:
  layer_sizes:
    distribution: constant
    value: [17, 16, 512, 4]
  batch_minsize:
    distribution: int_uniform
    max: 1024
    min: 32
  max_memory:
    distribution: constant
    value: 100_000
  episodes:
    distribution: constant
    value: 50
  epsilon:
    distribution: constant
    value: 1.0
  epsilon_decay:
    distribution: constant
    value: 0.995
  epsilon_min:
    distribution: constant
    value: 0.01
  gamma:
    distribution: uniform
    max: 0.99
    min: 0.8
  learning_rate:
    distribution: uniform
    max: 0.1
    min: 0.0001  
  weight_decay:
    distribution: constant
    value: 0
  optimizer:
    distribution: categorical
    values: ['sgd', 'adam', 'adamw']
  activation:
    distribution: categorical
    values: ['relu', 'sigmoid', 'tanh', 'leakyrelu']
  loss_function:
    distribution: constant
    value: 'mse'
early_terminate:
  type: hyperband
  min_iter: 5

Besides general feedback on the implementation I’m a bit dumbfounded with a current bug. The sweeps run fine and show up in the W&B interface but every sweep is performed twice under the same name of which only the loffing of the first is displayed and the second runs ‘silently’ in the environment without update of wandb.log. Does anybody have an idea what the reason for this might be?

Thanks,
Tobias

ramit_goolry · May 18, 2022, 2:52pm

Hi Tobias,

Looks like the source of this bug is this line: AIAgent.__init__(AIAgent()) which is calling 2 constructors: 1 from AIAgent.__init__() and 1 from AIAgent(). This, in turn calls pipeline twice, which ends up meaning 2 calls to wandb.init() and therefore you see 2 runs.

I would suggest changing that line to just AIAgent to prevent this error.

Thanks,
Ramit

tschug · May 19, 2022, 8:22am

Oh mann. Thanks a lot Ramit.
Much appreciated!

Best regards,
Tobias

system · July 18, 2022, 8:22am

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Sweep - starting with a small project W&B Help	4	635	May 20, 2022
Sweep main function with arguments W&B Help sweeps	3	1043	September 11, 2023
W&B Sweeps w/ Self-Supervised Learning W&B Help sweeps	6	925	September 17, 2022
Detectron2 sweeps W&B Help	4	631	January 10, 2023
Running sweep agent using multiprocessing.pool results in Connection Error W&B Help sweeps , wandb	4	643	January 1, 2024

Wandb integration using class and sweep running twice under the same name

Related topics