I have a dataset of 45k items, I’m asking an LLM to interpret an item and then I evaluate the results with four metrics providing boolean results. I have this whole pipeline running with the dataset in WandB and execute it with and weave evaluation.
Once complete I have a table with the results in WandB, how can I query this back into python to see on which items the LLM got True for all four metrics?
The docs mention about using the feedback class to query results, but this relies on human-annotated feedback.
My question is:
How can I get an evaluations results table open in Python?
Hi @joshkimperial, thank you for reaching out W&B and it’s great to hear you are implementing Weave for your evaluation!
Would you mind sharing a URL to the project and the table you are trying to query back so we can have a look and provide further guidance? Thanks!
I’ve just reproduced one of the docs examples, producing a Weave evaluation table.
Here Is an example call which I’d like to query into python
Hi @joshkimperial , you should be able to filter and export calls using either Python or HTTP as described in our docs here .
Please let me know if you have any further questions on this.
Hi @joshkimperial , I wanted to follow up on this request. Please let us know if we can be of further assistance or if your issue has been resolved.
Thanks, yep the docs update that was done here clarified my query.
1 Like