Why run.scan_history() still returns lots of NaN values

xjygr08 · August 17, 2023, 11:08pm

I tried to get the observations and actions I logged when running RL in gym. But I still get a lot of NaNs even I switched from run.history() to run.scan_history() (I learned this from this link Run.history() returns different values on almost each call - #2 by jaeheelee). I thought scan_history will return all the logged values. Am I wrong?

Here is an example

artsiom · August 22, 2023, 7:08pm

Hi @xjygr08, apologies you are running into this! Could you send me a link to your workspace where you’ve stored your values as well as script snippet of how you are logging those values to wandb?

xjygr08 · August 22, 2023, 8:25pm

Weights & Biases This is the workspace.

This is how I logged the values:

import gymnasium as gym
from gymnasium.wrappers.record_episode_statistics import RecordEpisodeStatistics
import wandb

from stable_baselines3 import PPO
from stable_baselines3.common.vec_env import DummyVecEnv

wandb.login()
config = {
    "env_name": "CartPole-v1",
    "deque_size": 1000,
}


def make_env():
    env = gym.make(config["env_name"], render_mode="rgb_array")
    env = RecordEpisodeStatistics(env, config["deque_size"])
    return env


env = DummyVecEnv([make_env])

model = PPO.load("ppo_cartpole")
run = wandb.init(
    project="log-obs-action",
)

obs = env.reset()
while True:
    action, _states = model.predict(obs)
    obs, rewards, dones, info = env.step(action)
    print(f"action={action} obs={obs} rewards={rewards} dones={dones} info={info}")
    wandb.log({"obs": obs[0][0]})
    wandb.log({"action": action[0]})
    wandb.log(info[0])

    if dones[0]:
        break
env.close()

artsiom · August 28, 2023, 7:49pm

Hi Jinyu,

Run.history() does return every single logged value you have. I think I found where your issue is coming form.

Inside of your code you call:
wandb.log({"obs": obs[0][0]})
wandb.log({"action": action[0]})
wandb.log(info[0])

back to back. Every time your call wandb.log, is considered you taking a new step as a part of your experiment.

So in this case, for a single iteration of the while True loop you have, you are taking three steps at wandb and they all record different parameters. That is why:
Your obs and action variables are logged 3 steps apart here:

Obs is recorded at steps 0, 3, 6, 9…
Action is at 1,4,7,10

In order to fix this issue you can either log all of your info using the same wandb.log like this:
wandb.log({"obs": obs[0][0], "action": action[0], "info": info[0]})

or by specifying steps individually inside of the wandb.log():

counter = <counter that counts your step>
wandb.log({"obs": obs[0][0]}, step = counter)
wandb.log({"action": action[0]}, step = counter)
wandb.log(info[0], step = counter)

xjygr08 · August 28, 2023, 10:18pm

Aha, I see, this makes sense. Thank you!

artsiom · August 28, 2023, 10:29pm

No problem! I will close this ticket out, you are always welcome to write back in!

system · October 27, 2023, 10:18pm

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Wandb API run.history() skip some values W&B Help wandb	7	793	January 11, 2022
Scan_history() is empty W&B Help wandb	9	1300	April 21, 2023
Run.history() returns different values on almost each call W&B Help dashboard , wandb	5	3764	July 29, 2022
Run.history(keys=key_list) returns empty history W&B Help wandb	3	491	March 27, 2023
Reverse iterator of run.scan_history W&B Help	8	793	September 9, 2023

Why run.scan_history() still returns lots of NaN values

Related topics