Loading a saved table to pandas dataframe

Hi, I have been recently using wandb a lot in my projects and it is really helpful.

my issue is that I an trying access the logged tables as a pandas dataframe in a program. I check the documentation and tried the solution mentioned in the documentation here. Once I run this instead of getting the table it returns a dictionary like this

{'artifact_path': 'wandb-client-artifact://sx4urflmwtczzq7zf71hsfxkill8hqrnd8o4uirjbcswuuc29f0xxrq6nra7uo2kzsp8jmu4s2g53e7xl3xuyu4lfjiowz9v63r9fbn7d3r8ckmlz5lrkhncuyhr0e46:latest/metrics.table.json', '_latest_artifact_path': 'wandb-client-artifact://sx4urflmwtczzq7zf71hsfxkill8hqrnd8o4uirjbcswuuc29f0xxrq6nra7uo2kzsp8jmu4s2g53e7xl3xuyu4lfjiowz9v63r9fbn7d3r8ckmlz5lrkhncuyhr0e46:latest/metrics.table.json', 'path': 'media/table/metrics_2_491a3e34c6fcf4271cb2.table.json', 'size': 413, '_type': 'table-file', 'ncols': 9, 'nrows': 3, 'sha256': '491a3e34c6fcf4271cb2378f9a33ff5dc8c9cdb8268299b4f96b88151730ecad'}

It would be great if someone can help me to convert this to a table so that I can perform aggregations on the results.


Hey Prateek,

you can use this function (its a modified version of a function from the wandb repo).

def get_table_data_from_url(source_url: str, api_key: Optional[str] = None) -> None:
    response = requests.get(source_url, auth=("api", api_key), stream=True, timeout=5)
    bytes_list = []
    for data in response.iter_content(chunk_size=1024):
    final_byte_data = b"".join(bytes_list)
    data_dict = json.loads(final_byte_data.decode("utf-8"))
    table_df = pd.DataFrame(data=data_dict["data"], columns=data_dict["columns"])
    return table_df

To get the specific url of the artifact, you can just iterate over run.files() and save the url attribute of the returned files. By checking out the file names you should be able to see which files are relevant for you.


This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.