How to replay prompts and evaluate multiple model output quality with past runs?

yudixue · September 6, 2023, 4:55pm

Hi, I have used wandb Tracer to log basic prompt input and output, each query is "a run " for now. I’d like to replay the prompts with a differnt model and group the outputs to the same run.

Is this straight-forward to do? How should I update a run with a new Tracer result?

Thank you.

_scott · September 7, 2023, 10:48am

Hi
Thanks for your question. This is a use case we’re working hard to support and have some really exciting things in the works to make this easy.

Today, this is a bit involved because of your traces being logged within runs. We’ll make it easier in future to save a set of prompts and use them on LLMs to get traces. This will be powered by a new toolkit we’re building called Weave.

To use the wandb.Api to get your runs:

import wandb
import json

api = wandb.Api()
runs = api.runs("yudixue/<project name>")
for run in runs:
    root_spans = json.loads(run.summary['langchain_trace']['root_span_dumps'])
    input = root_spans['results'][0]['inputs']['input']

To resume a run, you can do

wandb.init(id=run.id, resume='must')

using the run object from the above code, and log to it as normal.

Although I don’t necessarily see a problem with this, you should be ok to create a new run and log to your table.

Hope this helps.

nathank · September 12, 2023, 3:07pm

Hi @yudixue, I wanted to follow up and see if you had a chance to try out Scott’s suggestion or if there was anything else we could help answer?

yudixue · September 12, 2023, 4:37pm

Thanks Scott Weave sounds exciting!

yudixue · September 12, 2023, 4:48pm

Also if I want to make current API work and log new “root_span_dumps” (let’s call a new <model_name>_span_dumps), I can just add to run summary, deserialize and replace it?

_scott · September 12, 2023, 6:10pm

You should be able to use the Tracer as normal to log more traces for the resumed run.
You might be better off just logging new runs rather than resuming though.

system · November 11, 2023, 6:10pm

This topic was automatically closed 60 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Log evaluation to finished run W&B Help wandb	4	1432	September 2, 2023
Wandb Resume Logging W&B Help dashboard , wandb , beginner-friendly	3	1956	February 12, 2023
How to create a copy of wandb plots online as well as offline W&B Help wandb	13	1464	July 17, 2023
Is it possible to continue training with additional epochs? Also where can I find logs in local? W&B Help wandb , beginner-friendly	6	2288	March 20, 2023
Run to Run Logging W&B Help dashboard , wandb	11	1434	January 23, 2023

How to replay prompts and evaluate multiple model output quality with past runs?

Related topics