I am trying logging my tfRecords files to artefact, but it seems to not be working (I get an error: “wandb: Network error (TransientError), entering retry loop.”).
I am providing the code I use below. I am pretty sure it is something regarding the tfRecords file since I tried changing the contents of my folders to contain only .csv and .paqruet and it worked nicely. Do you have any ideas what could be happening here?
with wandb.init(project="----", entity='----', job_type='saving_processed_files') as run:
train_data_art = wandb.Artifact(
name='train_data',
type='train_data'
)
files_train = os.listdir(final_path_train)
files_train=[x for x in files_train if x[0]!='.']
for file in files_train:
file_path = os.path.join(final_path_train, file)
train_data_art.add_file(file_path, name=file)
run.log_artifact(train_data_art)
We wanted to follow up with you regarding this issue as we have not heard back from you. Please let us know if we can be of further assistance or if your issue has been resolved.
Sorry for not being prompt. I tried again and I got the same error. However, I also tries doing this on a small fraction of data (also saved as a TFRecords) and it went through. So I am guessing this has something to do with the size - the total size of my files is around 10gb. Do you think that could be the issue?
Do you have a feedback regarding the limit size of the files being uploaded? We are thinking to upgrade our account and this issue is really important for us.
Could you please re-look at this issue - I have left additional comment a while ago, and in a couple of days we will face this issue again, so I would love to get to the bottom of it.
I’m sorry about not responding here sooner. I’m sorry to hear you are still facing this issue, and I will definitely assist you here. You said you are seeing an error with a lot of data : Could you share a little bit more information about the structure of this data and the behavior you see? More specifically:
How many files do you have?
Are you seeing a lot of time delay before these errors show up?
Could you try uploading this same scale of information but using some other file format? (like a set of .txt files)
Additionally, the debug.log and debug-internal.log files associated with the run where you are facing this issue would be highly appreciated since it would give us some more visibility into this issue.
In reproducing the issue today, the artifact was saved without any problems; so I guess the issue can be closed. If we experience the same problematic again, I will follow the steps above and supply you with the log files.