Just deployed a weave integration to production yesterday and saw it triple my llama-index query latency.
My guess is that it is making calls to WandB synchronously during the query pipeline run, but I can’t tell for sure because it was basically a no-config setup (which was very nice).
Apologies you are seeing this behavior and thank you very much for writing in. Looking into this and we will get back to you as soon as we have any updates or follow up questions.
Digging into the weave source a bit, I’m curious whether the finish_call function is sync or async.
I traced the calls from the llamaindex integration here:
And it appears there is a trace server that executes the HTTP request using the python requests library here (there are a few TraceServerInterface implementors, so I’m just guessing this is the culprit):
This is definitely making a synchronous POST, but I’m not proficient enough in Python to know if this can be made async (from what I understand, it’s gotta be async all the way down in order for that to work correctly, and I do not currently use async/await or asyncio anywhere in my Python application, so not sure this would actually help).
HI @tryrisotto! Thank you very much for digging into this!
I have gone ahead and escalated this behavior to our Weave team. They have found a couple of things that couple possibly lead to this behavior such as each trace creates new versions of multiple objects for the Weave workflow, which could be causing the slowdown for you.
I will keep you posted as soon as I get any updates on my end as well.