Nesting models in Weave - Evaluation still possible?

mirekro · September 15, 2024, 10:37pm

I have a task that consists of four prompts, so one piece of text gets evaluated four times regarding these different questions. Currently, I make my Weave model return a list with the results of each prompt at the end (so a list with four items), and then my scorer function checks if every item in that list matches the individual observed values (so whether the first item (=prediction) matches the corresponding observed value, the second item in the list corresponds to that item’s observed value etc). This is helpful for me because I can see which examples are evaluated perfectly overall.

But it is also very clunky because I don’t evaluate each prompt separately, but always all four together and that causes a lot of overhead. Can I do both somehow? So have models for each prompt, and then also a model that combines all of them? Or is this combination an antipattern anyways? Would be interested in learning more about best practices here. Thanks!

jason-arkens17 · September 17, 2024, 10:52pm

Hi @mirekro! Thank you for sharing!

Would you be willing to share the current code you are using. It will help a lot to better understand and also a way to provide solid examples from the team

jason-arkens17 · September 19, 2024, 2:33pm

Hi there, I wanted to follow up on this request. Please let us know if we can be of further assistance or if your issue has been resolved.

jason-arkens17 · September 20, 2024, 1:41pm

Hi there, I wanted to follow up on this request. Please let us know if we can be of further assistance or if your issue has been resolved.

mirekro · September 25, 2024, 9:47pm

It’s resolved, I figured out that you can do what I wanted to do out of the box, I was just overthinking it, everything works as expected!

Topic		Replies	Views
Can I implement a custom stopping condition in an Evaluation in Weave? W&B Help wandb	5	64	September 12, 2024
How to only track evaluations with Weave? W&B Help wandb	0	26	February 17, 2025
How to filter Weave Calls in the Weave UI W&B Help dashboard , wandb	10	99	October 1, 2024
How to interpret evaluation results in python W&B Help	5	53	September 12, 2024
Wandb weave - is there a way of enforcing weave parallelism? W&B Help wandb	0	50	November 24, 2024

Nesting models in Weave - Evaluation still possible?

Related topics