Why run.scan_history() still returns lots of NaN values

I tried to get the observations and actions I logged when running RL in gym. But I still get a lot of NaNs even I switched from run.history() to run.scan_history() (I learned this from this link Run.history() returns different values on almost each call - #2 by jaeheelee). I thought scan_history will return all the logged values. Am I wrong?

Here is an example

Hi @xjygr08, apologies you are running into this! Could you send me a link to your workspace where you’ve stored your values as well as script snippet of how you are logging those values to wandb?

Weights & Biases This is the workspace.

This is how I logged the values:

import gymnasium as gym
from gymnasium.wrappers.record_episode_statistics import RecordEpisodeStatistics
import wandb

from stable_baselines3 import PPO
from stable_baselines3.common.vec_env import DummyVecEnv

wandb.login()
config = {
    "env_name": "CartPole-v1",
    "deque_size": 1000,
}


def make_env():
    env = gym.make(config["env_name"], render_mode="rgb_array")
    env = RecordEpisodeStatistics(env, config["deque_size"])
    return env


env = DummyVecEnv([make_env])

model = PPO.load("ppo_cartpole")
run = wandb.init(
    project="log-obs-action",
)

obs = env.reset()
while True:
    action, _states = model.predict(obs)
    obs, rewards, dones, info = env.step(action)
    print(f"action={action} obs={obs} rewards={rewards} dones={dones} info={info}")
    wandb.log({"obs": obs[0][0]})
    wandb.log({"action": action[0]})
    wandb.log(info[0])

    if dones[0]:
        break
env.close()

Hi Jinyu,

Run.history() does return every single logged value you have. I think I found where your issue is coming form.

Inside of your code you call:
wandb.log({"obs": obs[0][0]})
wandb.log({"action": action[0]})
wandb.log(info[0])

back to back. Every time your call wandb.log, is considered you taking a new step as a part of your experiment.

So in this case, for a single iteration of the while True loop you have, you are taking three steps at wandb and they all record different parameters. That is why:
Your obs and action variables are logged 3 steps apart here:
image

Obs is recorded at steps 0, 3, 6, 9…
Action is at 1,4,7,10

In order to fix this issue you can either log all of your info using the same wandb.log like this:
wandb.log({"obs": obs[0][0], "action": action[0], "info": info[0]})

or by specifying steps individually inside of the wandb.log():

counter = <counter that counts your step>
wandb.log({"obs": obs[0][0]}, step = counter)
wandb.log({"action": action[0]}, step = counter)
wandb.log(info[0], step = counter)

Aha, I see, this makes sense. Thank you!

No problem! I will close this ticket out, you are always welcome to write back in!

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.