I have been trying to download and open Wandb table locally. I have managed to get the corresponding table and its id, however, I cannot find way to download the table and open it as CSV for example.
runs[0].summary['avg_results'].keys()
dict_keys(['_type', 'ncols', 'nrows', 'sha256', 'artifact_path', '_latest_artifact_path', 'path', 'size'])```
Above is a snippet of what I have managed to reach, how can I go from this point to get the table file and read it as cdv
Hi @mohamedr002 thank you for writing in! You could use the API to download the Table in json format which you could then easily convert to pandas dataframe. Please see below a code snippet, and feel free to ask more questions:
import wandb
api = wandb.Api()
run = api.run(f"ENTITY/PROJECT/{run_id}")
table = run.logged_artifacts()[0]
table_dir = table.download()
table_name = "my_table_name"
table_path = f"{table_dir}/{table_name}.table.json"
with open(table_path) as file:
json_dict = json.load(file)
df = pd.DataFrame(json_dict["data"], columns=json_dict["columns"])
Please note that logged_artifacts() is an iterator, and for simplicity I added [0] to return only the first entry as an example. Would this work for you?
I have tried your script it worked but not directly. The issue as I mentioned that I don’t have the artifact name, so I managed to get the table path directly.
Hi @mohamedr002 that’s automatically done when you’re logging wandb.Table objects. You could click on the Artifacts icon (left panel) from your project’s workspace. Another way to get directly table would be:
Hi @mohamedr002 we both posted same time, is this issue now resolved for you by getting the avg_table_path first? May I also ask if these logged tables were wandb.Table objects? in that case it would also create an artifact.
Yes, you are right, I found the table as already been logged as artifiact, but rather than getting the name and directory separately, I used the ‘path’ element that exist in the table artifact dictionary. I am really thankful for your prompt response. Really appreciated!
Hi @mohamedr002 glad to hear that, thanks a lot for posting your workaround for future reference! I am closing this ticket for now, but please feel free to reach out to us if you have any other questions!
Hi @satpalsr there are two ways to access this artifact, either by initialising a run such as run = wandb.init() and then you could use run.use_artifact method or by using our public API. In the latter case, you could do the following:
api = wandb.Api()
artifact = api.artifact('entity/project/artifact-name:alias')