How to interpret evaluation results in python

joshkimperial · September 3, 2024, 7:28am

I have a dataset of 45k items, I’m asking an LLM to interpret an item and then I evaluate the results with four metrics providing boolean results. I have this whole pipeline running with the dataset in WandB and execute it with and weave evaluation.

Once complete I have a table with the results in WandB, how can I query this back into python to see on which items the LLM got True for all four metrics?

The docs mention about using the feedback class to query results, but this relies on human-annotated feedback.

My question is:

How can I get an evaluations results table open in Python?

fmamberti-wandb · September 5, 2024, 10:00am

Hi @joshkimperial, thank you for reaching out W&B and it’s great to hear you are implementing Weave for your evaluation!

Would you mind sharing a URL to the project and the table you are trying to query back so we can have a look and provide further guidance? Thanks!

joshkimperial · September 9, 2024, 8:18am

I’ve just reproduced one of the docs examples, producing a Weave evaluation table.

Here Is an example call which I’d like to query into python

fmamberti-wandb · September 10, 2024, 11:16am

Hi @joshkimperial , you should be able to filter and export calls using either Python or HTTP as described in our docs here .

Please let me know if you have any further questions on this.

fmamberti-wandb · September 12, 2024, 12:40pm

Hi @joshkimperial , I wanted to follow up on this request. Please let us know if we can be of further assistance or if your issue has been resolved.

joshkimperial · September 12, 2024, 1:33pm

Thanks, yep the docs update that was done here clarified my query.

Topic		Replies	Views
How to only track evaluations with Weave? W&B Help wandb	0	29	February 17, 2025
Collab example for building an "evaluation" table using wandb.log() W&B Help	4	494	April 20, 2022
Nesting models in Weave - Evaluation still possible? W&B Help	4	49	September 25, 2024
Custom analysis over sweep W&B Help	12	982	January 5, 2024
Does Weave support viewing/exporting confusion matrices? W&B Help	0	14	October 28, 2024

How to interpret evaluation results in python

Related topics