Upload and Syncing of Artifacts are too slow using WSL - MainThread and HandlerThread hanging

Hello everyone, hope you can help me with this issue.

I am very new to the W&B interface and python library. I am tryingo to incorporate the dataset versioning and experiment tracking issues into my training procedure for now.

In the dataset versioning section of my code, I am logging the raw dataset and the cleaned ones via wandb.Artifact → wandb.add_file → wandb.log_artifact workflow, as show in documentation.

The problem is that a simple upload and syncing takes around 10 minutes to finish! The datasets sizes are small (approx. 2MB) and I don’t have any connection constraints or issues that I’m aware of.
I’m using a JupyterNotebook in a WSL2 environment (distro Ubuntu 20.04)

The output of code shows: Wating for W&B process to finish (sucess) ... for the whole time, and the upload and syncing bar stucks during the whole time of waiting.

The debug log for the latest-run show this over and over:

2022-09-05 01:29:54.344 INFO    MainThread:10293 [jupyter.py:save_history():447] not saving jupyter history
2022-09-05 01:29:54.344 INFO    MainThread:10293 [jupyter.py:save_ipynb():377] not saving jupyter notebook
2022-09-05 01:29:54.344 INFO    MainThread:10293 [wandb_init.py:_jupyter_teardown():393] cleaning up jupyter logic
2022-09-05 01:29:54.344 INFO    MainThread:10293 [wandb_run.py:_atexit_cleanup():1931] got exitcode: 0
2022-09-05 01:29:54.344 INFO    MainThread:10293 [wandb_run.py:_restore():1914] restore
2022-09-05 01:29:54.345 INFO    MainThread:10293 [wandb_run.py:_restore():1920] restore done
...
2022-09-05 01:29:57.481 INFO    MainThread:10293 [wandb_run.py:_on_finish():2221] got exit ret: file_counts {
  wandb_count: 5
}
pusher_stats {
  uploaded_bytes: 397
  total_bytes: 4431
}

And the debug-intenal log shows the following:

2022-09-05 01:29:57.789 DEBUG   SenderThread:10324 [sender.py:send_request():316] send_request: poll_exit
2022-09-05 01:29:57.892 DEBUG   HandlerThread:10324 [handler.py:handle_request():141] handle_request: poll_exit
2022-09-05 01:29:57.892 DEBUG   SenderThread:10324 [sender.py:send_request():316] send_request: poll_exit
2022-09-05 01:29:57.993 DEBUG   HandlerThread:10324 [handler.py:handle_request():141] handle_request: poll_exit
2022-09-05 01:29:57.994 DEBUG   SenderThread:10324 [sender.py:send_request():316] send_request: poll_exit

And in the end of the running cell, the debug-internal log shows (sensible info omitted):

2022-09-05 01:33:09,176 DEBUG   HandlerThread:10324 [handler.py:handle_request():141] handle_request: poll_exit
2022-09-05 01:33:09,176 DEBUG   SenderThread:10324 [sender.py:send_request():316] send_request: poll_exit
2022-09-05 01:33:10,268 INFO    WriterThread:10324 [datastore.py:close():279] close [...]
2022-09-05 01:33:11,177 INFO    SenderThread:10324 [sender.py:finish():1312] shutting down sender
2022-09-05 01:33:11,177 INFO    SenderThread:10324 [file_pusher.py:finish():171] shutting down file pusher
2022-09-05 01:33:11,177 INFO    SenderThread:10324 [file_pusher.py:join():176] waiting for file pusher

I tried setting WANDB_START_METHOD=thread as mentioned in a Github issue, but didn’t reduce overall time that the cell takes to finish. I have made the login through CLI and the cell recognizes my user.

The raw data are in JSON format, and the cleaned data is a Pandas dataframe where of 30-60 rows, where each row contains an array of data (temporal analysis) around 2000 items.

Is there something I forgot to setup? Is this the average time taken to upload files, even when they are small? I am missing something in the code workflow?

Any help would be much appreciated!

Regards

Hi @shogunhirei, sorry you are running into this. Just to clarify, a single ~2MB dataset is taking 10 minutes to upload via Artifacts? Also, how many files are within the Artifact?

We do perform a checksum of every file when uploading an Artifact which can take some time if an Artifact contains a large number of files.

Can you also let me know what version of wandb you are running? WANDB_START_METHOD=thread is ignored in the latest version of wandb but I don’t think that is the cause of this.

Thank you,
Nate

Hi @nathank, thank you for the response.

There are 7 files, they all together have size around 2MB, I added them to the same artifact. Is this not recommended?
My version of wandb is 0.13.2.

Looking at the logs that I have posted, it doesn’t seems that the thread is stuck on some process…
The fact that I’m using a WSL2 environment is a problem? Is there a network configuration that I should have done?

I have also setted the WANDB_START_METHOD through os.environ

Thanks in advance

@shogunhirei this is fine having the 7 files all together in a single Artifact. I was more concerned if this was a dataset with 100’s of small files but that doesn’t look like the source of the problem.

I think the most likely cause is the WSL2 environment. Could you possibly try uploading your files to a Colab and running this there to compare? This will also test if the issue is isolated to your network.

Thank you,
Nate

Hi @shogunhirei, I wanted to follow up and see if you were still looking for help with this?

Thank you,
Nate

Hi @nathank , sorry for not responding.

I still haven’t the time to check on this, but I’ll do it soon enough. As soon as I can check in the Colab environment I’ll post it here!

Ok, thank you for the update!

Hi @nathank! Sorry for the late reply… (again :sweat_smile: )

I have follow you advice and run the script in the Colab environment, it was way faster then in my WSL2 environment. Is there some thing I could do to fix this issue with WSL2?

Is this info enough to open a issue in the wandb repo?

Kind regards,

Hi everyone, does anyone know a solution for this?

Just want to add that I’m running experiencing a similar problem

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.