Scan history is very slow

leander-kurscheidt · May 10, 2024, 6:35pm

I need to compare four runs in detail and want to plot details of the training in matplotlib. I am therefore interested in getting the exact history per training step for my metric train/avg_log_like. These are just floating points, but now getting the complete history for the first run is already taking like 6min. 39 sec. That’s way too much! It’s only 394999 floating point numbers? That’s not a lot.

This is my code:

test = list(comparison[0].scan_history(keys=["train/avg_log_like", "_step"], page_size=500))

How can I speed this up?

Edit: it seems to me this is still subsampled? It’s just a bunch of floats, that should not be too expensive? Can I download it in a csv or something?

fmamberti-wandb · May 16, 2024, 9:37am

Hi @leander-kurscheidt, thank you for reaching out.

Regarding the metrics still being subsampled - is the train/avg_log_like logged with every wandb.log call? If this is not the case, it would be expected that some values are missing (scan_history will only retrieve points that are present for all the metrics specified). How many points are being retrieved in your case?

Following our docs on limits, having more than 100k points logged for a Metric is not advised and having slow performance is expected in this case.

I’d be happy to raise a feature request on your behalf to improve performance when retrieving the entire history metrics via the api. Is there anything you would like me to add to the feature request for your case and urgency on top of what you mentioned already?

uma-wandb · May 20, 2024, 6:11pm

hey @leander-kurscheidt - since we have not heard back from you we are going to close this request. I’l go ahead and make the feature request in the meantime, and please let us know if you have any questions

Topic		Replies	Views
Calling run.history(samples=n_samples) returns a sample size different from n_samples W&B Help wandb	7	886	February 11, 2023
Different history run lengths of a metric W&B Help	7	440	April 4, 2023
Scan_history() is empty W&B Help wandb	9	1297	April 21, 2023
Why run.scan_history() still returns lots of NaN values W&B Help	6	1322	October 27, 2023
Wandb API run.history() skip some values W&B Help wandb	7	790	January 11, 2022

Scan history is very slow

Related topics