Downloaded CSV does not have all the values (capped to 100k?)

Hello everyone,

Today I tried to download the CSV representation of a logged metric of a run of mine and found that the number of rows is capped at 100k. The run logged the metric every step, this is why I was expecting 300k entries.

Using the Python API also only yields 100k entries.

A few months ago I could access all 300k entries through the API. What has happened in the meantime?

Best,
Ramil

Hi @ramil , happy to help.

To further investigate the case, is it possible for you to send a link to your run where you are seeing this behavior (you can keep the project private and wandb admins can access). You can also send it to us via email to support@wandb.com or in community thread.

Additionally, can you provide us code snippet of how you are trying to access all data through our API please.

Thanks.

Hi @joana-marie ,

thank you for your answer.

You can see this behavior for example for this run: Weights & Biases if you try to download the metrics “step_std”, “step_lambda”, “step_q_mean” under the tab “rollout”. Note that the metrics are called “step_” exactly because they are logged in every step of the environment.

For the API call, I was using the code snippet:

run_ids = {
‘0.8-Agent’: ‘ramil/Uncertainty_MA/85rp8ana’
}
runs = [api.run(run_id) for run_id in run_ids.values()]
metrics = [run.scan_history(page_size=1000000) for run in runs]
keys = [‘_step’, ‘Charts/episode_step’, ‘rollout/step_std’, ‘rollout/step_lambda’]
data = [[[row[key] for key in keys] for row in metric] for metric in metrics]

This code was working just a few months ago, now it only returns 100k entries. The entries seem to be sampled from the entirety of entries.

Thank you for your help!
Ramil

Hello @ramil , thank you for providing all the information, we will try to repo this on our end and will get back to you.

Hi @ramil , the export limits are hard coded for backend performance reasons, you will not be able to export the entire run history data to a csv or through run.history via api.

To access the complete run history, you must download the runs history artifact parquet file, example for 500k step run. Then for example you can use pandas to read this history.

Extract the artifact, from project above - you can try this out as it is an open public project

import wandb
run = wandb.init()
artifact = run.use_artifact('wandb-apac/csv-export-test/run-6knjjh91-history:v0', type='wandb-history')
artifact_dir = artifact.download()

Read through pandas

import pandas as pd
df = pd.read_parquet('<path to .parquet file>')
#logic to export to csv or other

Hope this help and please let me know if you have any questions.

Hi @ramil , since we have not heard back from you we are going to close this request. If you would like to re-open the conversation, please let us know!