How to access to raw fata from WandB folder

Hi,

I ran my code using WandB in offline mode and now have the WandB folder ready for syncing. Instead of viewing the plots on the WandB webpage, I would like to access the raw data so I can generate the plots elsewhere myself. How can I do that?

Thanks in advance!

Hello @pparv056 ,

To access the raw data from a WandB folder that you’ve used in offline mode, you’ll need to follow a few steps to sync your data to the WandB servers and then access it. Here’s how you can do it:

  1. Sync your offline run to the WandB servers: If you’ve run your experiment in offline mode, you’ll have a local folder named wandb containing all the run data. To sync this data to the WandB servers, use the wandb sync command followed by the path to the specific run folder you wish to sync. For example:
   wandb sync wandb/offline-run-20230917_123456

This command will upload the data to the WandB servers, making it accessible through the WandB web interface.

  1. Accessing raw data: Once your data is synced, you can access the raw data through the WandB web interface. However, if you prefer to work with the data locally:
  • You can download the run data as JSON or CSV files from the WandB web interface. Navigate to the specific run page, and you’ll find options to export tables and charts.
  • For more programmatic access, you can use the WandB Public API to fetch the data. Here’s an example of how you might use the API to access data from a specific run:
 import wandb

 api = wandb.Api()
 run = api.run("your_username/project_name/run_id")
 history = run.history()
 # `history` is a Pandas DataFrame containing the run's metrics over time.

This approach allows you to manipulate and visualize the data using your preferred tools and libraries.

  1. Generating plots elsewhere: With the raw data now accessible either through a downloaded file or via the API, you can use any data visualization library (like Matplotlib, Seaborn, or Plotly in Python) to generate plots. Here’s a simple example using Matplotlib:
   import matplotlib.pyplot as plt

   # Assuming `history` contains your run data
   plt.plot(history['epoch'], history['accuracy'])
   plt.xlabel('Epoch')
   plt.ylabel('Accuracy')
   plt.title('Model Accuracy Over Time')
   plt.show()

Remember, the WandB API provides a flexible way to access your data programmatically, which can be particularly useful for custom analysis or visualization workflows.

Let us know if this helps or you have further questions.

Thanks!

Hi @pparv056 , since we have not heard back from you we are going to close this request. If you would like to re-open the conversation, please let us know!