I am logging Stable-Baselines3 experiments, but my gradient and parameter histograms are constant across hundreds of timesteps, each composed of over 30,000 individual observations. I have verified that the parameters of my model are evolving over time via model checkpoints.
I am not sure what might be causing this. I have followed the guide at Stable Baselines 3 | Weights & Biases Documentation.
This should be the relevant part of my code:
config = {
"total_timesteps": model_parameters["n_steps"]*num_cpu*200,
"log_interval": 1,
}
run = wandb.init(
project="MyProject",
sync_tensorboard=True, # auto-upload sb3's tensorboard metrics
save_code=True, # optional
name=run_name # optional
)
wandbCb = WandbCallback(
gradient_save_freq=1,
model_save_path=f"models/{run_name}",
model_save_freq=10,
verbose=2,
)
RewardCb = RewardCallback(eval_freq=model_parameters["n_steps"]*num_cpu)
callbacks = CallbackList([
wandbCb,
RewardCb,
])
print("Learning...")
model.learn(total_timesteps=config["total_timesteps"],
log_interval=config["log_interval"],
progress_bar=True,
callback=callbacks,
)
run.finish()