I have been using wandb sweeps for a long time, but I am getting a sweep bug now that I have not seen before.

This is my yaml file:

## Program to run

program: main_minatar.py

## project name

project: jax_meta

## Method

method: grid

## metric to optimize

metric:

name: average_reward

goal: maximize## Hyperparameters

parameters:

ENV_NAME:

values: [“Breakout-MinAtar”, “SpaceInvaders-MinAtar”, “Freeway-MinAtar”, “Asterix-MinAtar”]

ACTIVATIONS:

values: [“11111111_+00”, “01010101_+00”, “02020202_+00”, “03030303_+00”, “04040404_+00”,

“14141414_+00”, “04040404_+02”, “14141414_+02”, “04040404_+05”, “14141414_+05”,

“04040404_+10”, “14141414_+10”, “04040404_-02”, “14141414_-02”, “01040104_+05”,

“01140114_+05”, “01040101_+05”, “01010104_+05”, “01040104_+02”, “01140114_+02”,

“01040101_+02”, “01010104_+02”, “14141414_+15”, “14141414_-10”, “01010404_+05”,

“04010401_+05”, “01010404_+02”, “04010401_+02”]

TOTAL_TIMESTEPS:

values: [1e7, 2e7]

For some reason, it keeps sweeping the first 2 activation values: “11111111_+00”, “01010101_+00”, all the time.

For example, it is now running TOTAL_TIMESTEPS: 1e7, Freeway-MinAtar, “11111111_+00” for the 4th time already.

The logging is done here after training a RL agent:

data = outs[“metrics”][“returned_episode_returns”][0].mean(0).mean(-1).reshape(-1)

chunk_size = 500 # Or 1000, depending on your preference## Calculate number of chunks

num_chunks = len(data) // chunk_size

time_per_chunk = args.TOTAL_TIMESTEPS / num_chunks## Logging

for i in range(num_chunks + 1):

start_idx = i * chunk_size

end_idx = start_idx + chunk_size

chunk = data[start_idx:end_idx]`# Compute summary statistics for the chunk mean = np.mean(chunk) std = np.std(chunk) min_val = np.min(chunk) max_val = np.max(chunk) # Log summary statistics to wandb wandb.log({ "returns_mean": mean, "returns_std" : std, "global_step": i*time_per_chunk })`

Does anyone know what could be the problem? (Ubuntu 22.04, Wandb 0.16.4)

-Update: It is still running, only sweeping the first two entries of ACTIVATIONS for an infinite amount of times.