Download logged table in run locally

Hi all,

I have been trying to download and open Wandb table locally. I have managed to get the corresponding table and its id, however, I cannot find way to download the table and open it as CSV for example.

runs[0].summary['avg_results'].keys()
dict_keys(['_type', 'ncols', 'nrows', 'sha256', 'artifact_path', '_latest_artifact_path', 'path', 'size'])```

Above is a snippet of what I have managed to reach, how can I go from this point to get the table file and read it as cdv

Hi @mohamedr002 thank you for writing in! You could use the API to download the Table in json format which you could then easily convert to pandas dataframe. Please see below a code snippet, and feel free to ask more questions:

import wandb
api = wandb.Api()
run = api.run(f"ENTITY/PROJECT/{run_id}")
table = run.logged_artifacts()[0]
table_dir = table.download()
table_name = "my_table_name"
table_path = f"{table_dir}/{table_name}.table.json"
with open(table_path) as file:
    json_dict = json.load(file)
df = pd.DataFrame(json_dict["data"], columns=json_dict["columns"])

Please note that logged_artifacts() is an iterator, and for simplicity I added [0] to return only the first entry as an example. Would this work for you?

2 Likes

Hi Thanos,

Thank you so much for your clear response. But only one issue that when I logged the table I didn’t log it as artifacts I have just used the workspace

wandb.log(wandb.table)

Will your provider solution still work? Or it requires logging the table as artifact?

Hi Thanos,

I have tried your script it worked but not directly. The issue as I mentioned that I don’t have the artifact name, so I managed to get the table path directly.

avg_table_path = best_run.summary['avg_results']['path']
avg_table = json.load(open(avg_table_path))
avg_df = pd.DataFrame(avg_table['data'], columns= avg_table['columns'])
1 Like

Hi @mohamedr002 that’s automatically done when you’re logging wandb.Table objects. You could click on the Artifacts icon (left panel) from your project’s workspace. Another way to get directly table would be:

table = run.use_artifact("run-<run-id>-<table_name>:<tag>").get("<table_name>")

Please let me know if that works for you, or if you have any further questions.

Hi @mohamedr002 we both posted same time, is this issue now resolved for you by getting the avg_table_path first? May I also ask if these logged tables were wandb.Table objects? in that case it would also create an artifact.

1 Like

Yes, you are right, I found the table as already been logged as artifiact, but rather than getting the name and directory separately, I used the ‘path’ element that exist in the table artifact dictionary. I am really thankful for your prompt response. Really appreciated!

Hi @mohamedr002 glad to hear that, thanks a lot for posting your workaround for future reference! I am closing this ticket for now, but please feel free to reach out to us if you have any other questions!

1 Like

Hey, I am trying to use, table = run.use_artifact("run-<run-id>-<table_name>:<tag>").get("<table_name>")
Can you help how run should be initialised?

What should I pass as arguments here?
run = wandb.Api().artifact()

Hi @satpalsr there are two ways to access this artifact, either by initialising a run such as run = wandb.init() and then you could use run.use_artifact method or by using our public API. In the latter case, you could do the following:

api = wandb.Api()
artifact = api.artifact('entity/project/artifact-name:alias')

I hope this helps!

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.