Hi all,
I have been trying to download and open Wandb table locally. I have managed to get the corresponding table and its id, however, I cannot find way to download the table and open it as CSV for example.
runs[0].summary['avg_results'].keys()
dict_keys(['_type', 'ncols', 'nrows', 'sha256', 'artifact_path', '_latest_artifact_path', 'path', 'size'])```
Above is a snippet of what I have managed to reach, how can I go from this point to get the table file and read it as cdv
Hi @mohamedr002 thank you for writing in! You could use the API to download the Table in json format which you could then easily convert to pandas dataframe. Please see below a code snippet, and feel free to ask more questions:
import wandb
api = wandb.Api()
run = api.run(f"ENTITY/PROJECT/{run_id}")
table = run.logged_artifacts()[0]
table_dir = table.download()
table_name = "my_table_name"
table_path = f"{table_dir}/{table_name}.table.json"
with open(table_path) as file:
json_dict = json.load(file)
df = pd.DataFrame(json_dict["data"], columns=json_dict["columns"])
Please note that logged_artifacts() is an iterator, and for simplicity I added [0] to return only the first entry as an example. Would this work for you?
1 Like
Hi Thanos,
Thank you so much for your clear response. But only one issue that when I logged the table I didn’t log it as artifacts I have just used the workspace
wandb.log(wandb.table)
Will your provider solution still work? Or it requires logging the table as artifact?
Hi Thanos,
I have tried your script it worked but not directly. The issue as I mentioned that I don’t have the artifact name, so I managed to get the table path directly.
avg_table_path = best_run.summary['avg_results']['path']
avg_table = json.load(open(avg_table_path))
avg_df = pd.DataFrame(avg_table['data'], columns= avg_table['columns'])
Hi @mohamedr002 that’s automatically done when you’re logging wandb.Table objects. You could click on the Artifacts icon (left panel) from your project’s workspace. Another way to get directly table
would be:
table = run.use_artifact("run-<run-id>-<table_name>:<tag>").get("<table_name>")
Please let me know if that works for you, or if you have any further questions.
Hi @mohamedr002 we both posted same time, is this issue now resolved for you by getting the avg_table_path
first? May I also ask if these logged tables were wandb.Table objects? in that case it would also create an artifact.
1 Like
Yes, you are right, I found the table as already been logged as artifiact, but rather than getting the name and directory separately, I used the ‘path’ element that exist in the table artifact dictionary. I am really thankful for your prompt response. Really appreciated!
Hi @mohamedr002 glad to hear that, thanks a lot for posting your workaround for future reference! I am closing this ticket for now, but please feel free to reach out to us if you have any other questions!
1 Like
Hey, I am trying to use, table = run.use_artifact("run-<run-id>-<table_name>:<tag>").get("<table_name>")
Can you help how run should be initialised?
What should I pass as arguments here?
run = wandb.Api().artifact()
Hi @satpalsr there are two ways to access this artifact, either by initialising a run such as run = wandb.init()
and then you could use run.use_artifact
method or by using our public API. In the latter case, you could do the following:
api = wandb.Api()
artifact = api.artifact('entity/project/artifact-name:alias')
I hope this helps!