Downloaded CSV does not have all the values (capped to 100k?)

ramil · January 3, 2024, 4:20pm

Hello everyone,

Today I tried to download the CSV representation of a logged metric of a run of mine and found that the number of rows is capped at 100k. The run logged the metric every step, this is why I was expecting 300k entries.

Using the Python API also only yields 100k entries.

A few months ago I could access all 300k entries through the API. What has happened in the meantime?

Best,
Ramil

joana-marie · January 5, 2024, 7:59am

Hi @ramil , happy to help.

To further investigate the case, is it possible for you to send a link to your run where you are seeing this behavior (you can keep the project private and wandb admins can access). You can also send it to us via email to support@wandb.com or in community thread.

Additionally, can you provide us code snippet of how you are trying to access all data through our API please.

Thanks.

ramil · January 7, 2024, 1:09pm

Hi @joana-marie ,

thank you for your answer.

You can see this behavior for example for this run: Weights & Biases if you try to download the metrics “step_std”, “step_lambda”, “step_q_mean” under the tab “rollout”. Note that the metrics are called “step_” exactly because they are logged in every step of the environment.

For the API call, I was using the code snippet:

run_ids = {
‘0.8-Agent’: ‘ramil/Uncertainty_MA/85rp8ana’
}
runs = [api.run(run_id) for run_id in run_ids.values()]
metrics = [run.scan_history(page_size=1000000) for run in runs]
keys = [‘_step’, ‘Charts/episode_step’, ‘rollout/step_std’, ‘rollout/step_lambda’]
data = [[[row[key] for key in keys] for row in metric] for metric in metrics]

This code was working just a few months ago, now it only returns 100k entries. The entries seem to be sampled from the entirety of entries.

Thank you for your help!
Ramil

joana-marie · January 16, 2024, 6:59am

Hello @ramil , thank you for providing all the information, we will try to repo this on our end and will get back to you.

mohammadbakir · January 17, 2024, 11:17pm

Hi @ramil , the export limits are hard coded for backend performance reasons, you will not be able to export the entire run history data to a csv or through run.history via api.

To access the complete run history, you must download the runs history artifact parquet file, example for 500k step run. Then for example you can use pandas to read this history.

Extract the artifact, from project above - you can try this out as it is an open public project

import wandb
run = wandb.init()
artifact = run.use_artifact('wandb-apac/csv-export-test/run-6knjjh91-history:v0', type='wandb-history')
artifact_dir = artifact.download()

Read through pandas

import pandas as pd
df = pd.read_parquet('<path to .parquet file>')
#logic to export to csv or other

Hope this help and please let me know if you have any questions.

mohammadbakir · January 22, 2024, 7:44pm

Hi @ramil , since we have not heard back from you we are going to close this request. If you would like to re-open the conversation, please let us know!

Topic		Replies	Views
Wandb API run.history() skip some values W&B Help wandb	7	774	January 11, 2022
How to retrieve the tracked metrics W&B Help wandb	3	1580	October 9, 2023
Cannot access run data via run.history() W&B Help	4	571	February 12, 2023
Wandb stops uploading data W&B Help wandb	19	1606	February 29, 2024
Export runs as csv W&B Help dashboard	3	77	October 14, 2024

Downloaded CSV does not have all the values (capped to 100k?)

Related topics