Bottleneck in uploads

bkaplowitz · November 6, 2023, 8:28pm

Hi,
Overall my code is running quite well. However, there are two bottlenecks I am running into. One is automation related and maybe is best dealt with in a separate post. The bigger one is that the runs complete quite fast-- about half a second per wandb.run, but then the actual upload takes several-seconds to minutes on the “wandb: Waiting for W&B process to finish… (success).” I am wondering if it is possible to direct the sweep or wandb.run to skip uploading the run after every run and instead upload as a batch at the end or every n runs? That way, I can run an analysis script I have locally at the end of the actual runs, while waiting for the upload to finish. I am also not sure, but perhaps combining the tables preemptively before uploading across multiple runs might increase the upload speed? Where would I find the json tables files locally if I wanted to access them this way?

I am using the cmdline interface to run 3 instances of the wandb agent in parallel (which I think is the max possible), which perhaps is not the smartest way of handling so many runs.

carlo-catimbang · November 7, 2023, 8:07am

Hi @bkaplowitz

Thank you for reaching out for support.

Weights & Biases is designed to stream logs in real-time, which is why it uploads data after every run. However, you can use the WANDB_MODE=offline mode to train offline and sync results later. This mode allows you to log data to a local directory during training, and then you can manually sync your runs to the cloud later. Here’s how you can do it:

Set the environment variable

import os
os.environ[“WANDB_MODE”] = “offline”

Your training code here

After your training is done, you can sync your runs to the cloud with the following command:

wandb sync /path/to/wandb/offline-run-20210513_161834-1te2f4ji

Regarding your question about combining tables across multiple runs to increase upload speed, it’s not clear whether this would actually speed up the process. However, you can find the JSON tables files locally in the directory specified by the WANDB_DIR environment variable. If WANDB_DIR is not set, the default directory is ./wandb.

# Set the environment variable
os.environ["WANDB_DIR"] = os.path.abspath("your/directory")

Remember that the wandb process is separate from your training process, so it should not block your training. If you’re experiencing significant delays, it might be due to network issues or because you’re logging a large amount of data. If you’re logging less than once a second and less than a few megabytes of data at each step, the effect on your training performance should be negligible.

Can you also provide the following information for me please:

Can you explain your experiment a bit more?
What type of data are they logging
Can you provide the debug.log and debug-internal.log files. These files are under your local folder wandb/run-<date>_<time>-<run-id>/logs in the same directory where you’re running your code.

Regards,
Carlo Argel

carlo-catimbang · November 13, 2023, 10:03am

Hi @bkaplowitz ,

Reaching back from the support team. I just want to follow up if you still need more assistance.

Regards,
Carlo Argel

carlo-catimbang · November 16, 2023, 4:10am

Hi @bkaplowitz , since we have not heard back from you we are going to close this request. If you would like to re-open the conversation, please let us know!

system · January 15, 2024, 4:11am

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Best practices for many quick runs? W&B Help	13	1802	February 6, 2022
Run Time is inaccurate because of including upload time W&B Help wandb	8	838	February 14, 2023
Wandb takes too much time after each run ends W&B Help sweeps	6	1262	October 25, 2022
How to jump the W&B upload process when the network is not so good? W&B Help wandb , beginner-friendly	5	1917	February 12, 2023
Waiting for W&B process to finish (success) W&B Help dashboard , projects , questions , wandb , beginner-friendly	4	1425	September 26, 2022

Bottleneck in uploads

Set the environment variable

Your training code here

Related topics