Wandb sweep not working

I have been using wandb sweeps for a long time, but I am getting a sweep bug now that I have not seen before.
This is my yaml file:

Program to run

program: main_minatar.py

project name

project: jax_meta

Method

method: grid

metric to optimize

metric:
name: average_reward
goal: maximize

Hyperparameters

parameters:
ENV_NAME:
values: [“Breakout-MinAtar”, “SpaceInvaders-MinAtar”, “Freeway-MinAtar”, “Asterix-MinAtar”]
ACTIVATIONS:
values: [“11111111_+00”, “01010101_+00”, “02020202_+00”, “03030303_+00”, “04040404_+00”,
“14141414_+00”, “04040404_+02”, “14141414_+02”, “04040404_+05”, “14141414_+05”,
“04040404_+10”, “14141414_+10”, “04040404_-02”, “14141414_-02”, “01040104_+05”,
“01140114_+05”, “01040101_+05”, “01010104_+05”, “01040104_+02”, “01140114_+02”,
“01040101_+02”, “01010104_+02”, “14141414_+15”, “14141414_-10”, “01010404_+05”,
“04010401_+05”, “01010404_+02”, “04010401_+02”]
TOTAL_TIMESTEPS:
values: [1e7, 2e7]

For some reason, it keeps sweeping the first 2 activation values: “11111111_+00”, “01010101_+00”, all the time.
For example, it is now running TOTAL_TIMESTEPS: 1e7, Freeway-MinAtar, “11111111_+00” for the 4th time already.

The logging is done here after training a RL agent:

data = outs[“metrics”][“returned_episode_returns”][0].mean(0).mean(-1).reshape(-1)
chunk_size = 500 # Or 1000, depending on your preference

Calculate number of chunks

num_chunks = len(data) // chunk_size
time_per_chunk = args.TOTAL_TIMESTEPS / num_chunks

Logging

for i in range(num_chunks + 1):
start_idx = i * chunk_size
end_idx = start_idx + chunk_size
chunk = data[start_idx:end_idx]

# Compute summary statistics for the chunk
mean = np.mean(chunk)
std = np.std(chunk)
min_val = np.min(chunk)
max_val = np.max(chunk)

# Log summary statistics to wandb
wandb.log({
    "returns_mean": mean,
    "returns_std" : std,
    "global_step": i*time_per_chunk
})

Does anyone know what could be the problem? (Ubuntu 22.04, Wandb 0.16.4)

-Update: It is still running, only sweeping the first two entries of ACTIVATIONS for an infinite amount of times.

Hi @jkooi23, could you possibly send me a link to the sweep and I’ll take a look?

That certainly looks like unexpected behavior since you are using a grid search.

Thank you,
Nate

Hi @nathank,

Thanks for taking the time. I deleted that old sweep but just created a new one to recreate the problems: It is still running certain activations over and over.

Please let me know if you need more information.

Hi @jkooi23, sorry for the delay on this. I’ve gone ahead and reported this to the engineering team since your sweep config looks correct and shouldn’t be just sweeping over those first 3 activation values. I’ll be able to follow up once I have an update from the team

I think I’m having the same issue when running a sweep via W&B. For me, W&B starts the same run multiple times. They are then also logged to the same run on W&B, causing that the logs are a mix of the last and the current run. See this screenshot, where the relative Wall time is suddenly negative?
Screenshot from 2024-04-18 09-47-54
I used W&B 0.16.3