Wandb sweep not working

jkooi23 · March 17, 2024, 2:18pm

I have been using wandb sweeps for a long time, but I am getting a sweep bug now that I have not seen before.
This is my yaml file:

Program to run

program: main_minatar.py

project name

project: jax_meta

Method

method: grid

metric to optimize

metric:
name: average_reward
goal: maximize

Hyperparameters

parameters:
ENV_NAME:
values: [“Breakout-MinAtar”, “SpaceInvaders-MinAtar”, “Freeway-MinAtar”, “Asterix-MinAtar”]
ACTIVATIONS:
values: [“11111111_+00”, “01010101_+00”, “02020202_+00”, “03030303_+00”, “04040404_+00”,
“14141414_+00”, “04040404_+02”, “14141414_+02”, “04040404_+05”, “14141414_+05”,
“04040404_+10”, “14141414_+10”, “04040404_-02”, “14141414_-02”, “01040104_+05”,
“01140114_+05”, “01040101_+05”, “01010104_+05”, “01040104_+02”, “01140114_+02”,
“01040101_+02”, “01010104_+02”, “14141414_+15”, “14141414_-10”, “01010404_+05”,
“04010401_+05”, “01010404_+02”, “04010401_+02”]
TOTAL_TIMESTEPS:
values: [1e7, 2e7]

For some reason, it keeps sweeping the first 2 activation values: “11111111_+00”, “01010101_+00”, all the time.
For example, it is now running TOTAL_TIMESTEPS: 1e7, Freeway-MinAtar, “11111111_+00” for the 4th time already.

The logging is done here after training a RL agent:

data = outs[“metrics”][“returned_episode_returns”][0].mean(0).mean(-1).reshape(-1)
chunk_size = 500 # Or 1000, depending on your preference

Calculate number of chunks

num_chunks = len(data) // chunk_size
time_per_chunk = args.TOTAL_TIMESTEPS / num_chunks

Logging

for i in range(num_chunks + 1):
start_idx = i * chunk_size
end_idx = start_idx + chunk_size
chunk = data[start_idx:end_idx]
# Compute summary statistics for the chunk
mean = np.mean(chunk)
std = np.std(chunk)
min_val = np.min(chunk)
max_val = np.max(chunk)

# Log summary statistics to wandb
wandb.log({
    "returns_mean": mean,
    "returns_std" : std,
    "global_step": i*time_per_chunk
})

Does anyone know what could be the problem? (Ubuntu 22.04, Wandb 0.16.4)

-Update: It is still running, only sweeping the first two entries of ACTIVATIONS for an infinite amount of times.

nathank · March 19, 2024, 9:43pm

Hi @jkooi23, could you possibly send me a link to the sweep and I’ll take a look?

That certainly looks like unexpected behavior since you are using a grid search.

Thank you,
Nate

jkooi23 · March 20, 2024, 7:54am

Hi @nathank,

Thanks for taking the time. I deleted that old sweep but just created a new one to recreate the problems: It is still running certain activations over and over.

Please let me know if you need more information.

nathank · March 28, 2024, 6:15pm

Hi @jkooi23, sorry for the delay on this. I’ve gone ahead and reported this to the engineering team since your sweep config looks correct and shouldn’t be just sweeping over those first 3 activation values. I’ll be able to follow up once I have an update from the team

claushofmann · April 18, 2024, 7:54am

I think I’m having the same issue when running a sweep via W&B. For me, W&B starts the same run multiple times. They are then also logged to the same run on W&B, causing that the logs are a mix of the last and the current run. See this screenshot, where the relative Wall time is suddenly negative?
Screenshot from 2024-04-18 09-47-54
I used W&B 0.16.3

mohammadbakir · June 5, 2024, 9:04pm

Hi @claushofmann , I wanted to inform you this is now fixed and will be released with the next version of our sdk. In the event you still encounter and issues, please let us know.

Topic		Replies	Views
Wandb sweep showing null for loss W&B Help sweeps	10	343	July 26, 2024
BrokenPipeError when doing sweeps W&B Help sweeps , wandb	5	718	January 22, 2024
Broken Pipe error W&B Help sweeps , wandb	2	1798	February 9, 2024
Wandb is doing the same possibility multiple times W&B Help sweeps	4	503	March 5, 2022
Runs log stops at 50 W&B Help sweeps , wandb	9	549	September 15, 2022

Wandb sweep not working

Program to run

project name

Method

metric to optimize

Hyperparameters

Calculate number of chunks

Logging

Related topics