I’m using stable baselines 3 (SB3) and in order to use one activation function or another, you must pass it to the SB3 algo as kwargs, that is:
model = A2C(
policy=config['policy'],
env=env,
learning_rate=config['learning_rate'],
gae_lambda=config['gae_lambda'],
ent_coef=config['ent_coef'],
tensorboard_log=LOGS_DIR,
device=config['device'],
verbose=config['verbose'],
policy_kwargs=dict(
net_arch=dict(
pi=config['policy_nn'],
vf=config['value_nn']),
activation_fn=config['activation_fn'],
optimizer_class=config['optimizer_class'])
)
where:
config['activation_fn'] = th.nn.ReLU
or
config['activation_fn'] = th.nn.Tanh
If I configure the sweep dictionary key for the ‘activation_fn’ as:
'optimizer_fn':{
'values': [th.nn.ReLU, th.nn.Tanh]
}
Afterwards when running sweeps, those values are not seen by wandb, neither in the plots appears the optimizer_fn been used nor changes to the other option.
Is it because the sweep config dictionary only accepts string for that kind of values?