Hi all,
I’m implementing W&B into an existing project in which Agent, Model creation and Environment are constructed in classes. The code structure in the Python file (AIAgent.py
) looks like this:
import wandb
config = {
'layer_sizes': [17, 16, 12, 4],
'batch_minsize': 32,
'max_memory': 100_000,
'episodes': 2,
'epsilon': 1.0,
'epsilon_decay': 0.998,
'epsilon_min': 0.01,
'gamma': 0.9,
'learning_rate': 0.001,
'weight_decay': 0,
'optimizer': 'sgd',
'activation': 'relu',
'loss_function': 'mse'
}
class AIAgent:
def __init__(self):
self.config = config
self.pipeline(self.config)
def pipeline(self, config):
wandb.init()
config = wandb.config
model, criterion, optimizer = self.make(config)
self.train(model, criterion, optimizer, config)
def make(self, config):
model = LinearQNet(config).to(device)
if config['loss_function'] == 'mse':
criterion = nn.MSELoss()
if config['optimizer'] == 'adam':
optimizer = torch.optim.Adam(model.parameters(), lr=config['learning_rate'], betas=(0.9, 0.999), eps=1e-08, weight_decay=config['weight_decay'], amsgrad=False)
wandb.watch(model, criterion, log='all', log_freq=1)
summary(model)
return model, criterion, optimizer
def train(self, model, criterion, optimizer, config):
for episode in range(1, config['episodes'] + 1):
while True:
# Where the training is performed
if done:
if (episode % 1) == 0:
wandb.log({'episode': episode, 'epsilon': epsilon, 'score': score, 'loss': loss_mean, 'reward': reward_mean, 'score_mean': score_mean, 'images': [wandb.Image(img) for img in env_images]}, step=episode})
break
if episode < config['episodes']:
game.game_reset()
else:
wandb.finish()
break
class LinearQNet(nn.Module):
def __init__(self, config):
super(LinearQNet, self).__init__()
self.config = config
# Where the NN is configured
if __name__ == '__main__':
AIAgent.__init__(AIAgent())
I’m currently initializing the sweep configuration via a .yaml file calling wandb sweep sweep.yaml
. The sweep.yaml file looks like this:
program: AIAgent.py
project: evaluation-sweep-1
method: random
metric:
name: score_mean
goal: maximize
command:
- ${env}
- python3
- ${program}
- ${args}
parameters:
layer_sizes:
distribution: constant
value: [17, 16, 512, 4]
batch_minsize:
distribution: int_uniform
max: 1024
min: 32
max_memory:
distribution: constant
value: 100_000
episodes:
distribution: constant
value: 50
epsilon:
distribution: constant
value: 1.0
epsilon_decay:
distribution: constant
value: 0.995
epsilon_min:
distribution: constant
value: 0.01
gamma:
distribution: uniform
max: 0.99
min: 0.8
learning_rate:
distribution: uniform
max: 0.1
min: 0.0001
weight_decay:
distribution: constant
value: 0
optimizer:
distribution: categorical
values: ['sgd', 'adam', 'adamw']
activation:
distribution: categorical
values: ['relu', 'sigmoid', 'tanh', 'leakyrelu']
loss_function:
distribution: constant
value: 'mse'
early_terminate:
type: hyperband
min_iter: 5
Besides general feedback on the implementation I’m a bit dumbfounded with a current bug. The sweeps run fine and show up in the W&B interface but every sweep is performed twice under the same name of which only the loffing of the first is displayed and the second runs ‘silently’ in the environment without update of wandb.log. Does anybody have an idea what the reason for this might be?
Thanks,
Tobias