I tried to get the observations and actions I logged when running RL in gym. But I still get a lot of NaNs even I switched from run.history() to run.scan_history() (I learned this from this link Run.history() returns different values on almost each call - #2 by jaeheelee). I thought scan_history will return all the logged values. Am I wrong?
Here is an example
Hi @xjygr08, apologies you are running into this! Could you send me a link to your workspace where you’ve stored your values as well as script snippet of how you are logging those values to wandb?
Weights & Biases This is the workspace.
This is how I logged the values:
import gymnasium as gym
from gymnasium.wrappers.record_episode_statistics import RecordEpisodeStatistics
import wandb
from stable_baselines3 import PPO
from stable_baselines3.common.vec_env import DummyVecEnv
wandb.login()
config = {
"env_name": "CartPole-v1",
"deque_size": 1000,
}
def make_env():
env = gym.make(config["env_name"], render_mode="rgb_array")
env = RecordEpisodeStatistics(env, config["deque_size"])
return env
env = DummyVecEnv([make_env])
model = PPO.load("ppo_cartpole")
run = wandb.init(
project="log-obs-action",
)
obs = env.reset()
while True:
action, _states = model.predict(obs)
obs, rewards, dones, info = env.step(action)
print(f"action={action} obs={obs} rewards={rewards} dones={dones} info={info}")
wandb.log({"obs": obs[0][0]})
wandb.log({"action": action[0]})
wandb.log(info[0])
if dones[0]:
break
env.close()
Hi Jinyu,
Run.history() does return every single logged value you have. I think I found where your issue is coming form.
Inside of your code you call:
wandb.log({"obs": obs[0][0]})
wandb.log({"action": action[0]})
wandb.log(info[0])
back to back. Every time your call wandb.log, is considered you taking a new step as a part of your experiment.
So in this case, for a single iteration of the while True
loop you have, you are taking three steps at wandb and they all record different parameters. That is why:
Your obs
and action
variables are logged 3 steps apart here:

Obs is recorded at steps 0, 3, 6, 9…
Action is at 1,4,7,10
In order to fix this issue you can either log all of your info using the same wandb.log like this:
wandb.log({"obs": obs[0][0], "action": action[0], "info": info[0]})
or by specifying steps individually inside of the wandb.log():
counter = <counter that counts your step>
wandb.log({"obs": obs[0][0]}, step = counter)
wandb.log({"action": action[0]}, step = counter)
wandb.log(info[0], step = counter)
Aha, I see, this makes sense. Thank you!
No problem! I will close this ticket out, you are always welcome to write back in!