Feature Request: Head-to-head comparisons for Weave evaluations

I really wanted to switch from LangSmith to weave, but my team wants a feature like this one on LangSmith where we can see direct comparisons between different evaluation runs. Additionally, filtering on cases where performance gets worse/better.

Does this happen to be in the works?

Hi @joaomendonca_lw Good day and thank you for reaching out to us! Happy to help you on this.

We have a feature called Run Comparer that allows you to see what metrics are different across your runs. You can check this link for some guidance and there’s also a live example available. Let me know if this tool can help. Otherwise, we’ll be more than happy to raise a feature request for you.

Thanks,
Paulo

Hi @joaomendonca_lw , since we have not heard back from you we are going to close this request. If you would like to re-open the conversation, please let us know!