Nesting models in Weave - Evaluation still possible?

I have a task that consists of four prompts, so one piece of text gets evaluated four times regarding these different questions. Currently, I make my Weave model return a list with the results of each prompt at the end (so a list with four items), and then my scorer function checks if every item in that list matches the individual observed values (so whether the first item (=prediction) matches the corresponding observed value, the second item in the list corresponds to that item’s observed value etc). This is helpful for me because I can see which examples are evaluated perfectly overall.

But it is also very clunky because I don’t evaluate each prompt separately, but always all four together and that causes a lot of overhead. Can I do both somehow? So have models for each prompt, and then also a model that combines all of them? Or is this combination an antipattern anyways? Would be interested in learning more about best practices here. Thanks!

Hi @mirekro! Thank you for sharing!

Would you be willing to share the current code you are using. It will help a lot to better understand and also a way to provide solid examples from the team :+1:

Hi there, I wanted to follow up on this request. Please let us know if we can be of further assistance or if your issue has been resolved.

Hi there, I wanted to follow up on this request. Please let us know if we can be of further assistance or if your issue has been resolved.

It’s resolved, I figured out that you can do what I wanted to do out of the box, I was just overthinking it, everything works as expected!