Weave: High latency with LlamaIndex

tryrisotto · August 12, 2024, 4:47pm

Hey folks!

Just deployed a weave integration to production yesterday and saw it triple my llama-index query latency.

My guess is that it is making calls to WandB synchronously during the query pipeline run, but I can’t tell for sure because it was basically a no-config setup (which was very nice).

Any suggestions for ways to improve this?

Thanks!
Chris

artsiom · August 12, 2024, 8:24pm

Hi @tryrisotto~

Apologies you are seeing this behavior and thank you very much for writing in. Looking into this and we will get back to you as soon as we have any updates or follow up questions.

tryrisotto · August 13, 2024, 5:55pm

Digging into the weave source a bit, I’m curious whether the finish_call function is sync or async.

I traced the calls from the llamaindex integration here:

github.com

wandb/weave/blob/b08b65073b3646971b07fa18b4b8c46d7717eab4/weave/integrations/llamaindex/llamaindex.py#L78C24-L78C35


      
          gc.finish_call(call, None, exception=exception)

And it appears there is a trace server that executes the HTTP request using the python requests library here (there are a few TraceServerInterface implementors, so I’m just guessing this is the culprit):

github.com

wandb/weave/blob/b08b65073b3646971b07fa18b4b8c46d7717eab4/weave/trace_server/remote_http_trace_server.py#L184


      
              before_sleep=_log_retry,
              retry_error_callback=_log_failure,
              reraise=True,
          )
          def _generic_request_executor(
              self,
              url: str,
              req: BaseModel,
              stream: bool = False,
          ) -> requests.Response:
              r = requests.post(
                  self.trace_server_url + url,
                  # `by_alias` is required since we have Mongo-style properties in the
                  # query models that are aliased to conform to start with `$`. Without
                  # this, the model_dump will use the internal property names which are
                  # not valid for the `model_validate` step.
                  data=req.model_dump_json(by_alias=True).encode("utf-8"),
                  auth=self._auth,
                  stream=stream,
              )
              if r.status_code == 500:

This is definitely making a synchronous POST, but I’m not proficient enough in Python to know if this can be made async (from what I understand, it’s gotta be async all the way down in order for that to work correctly, and I do not currently use async/await or asyncio anywhere in my Python application, so not sure this would actually help).

artsiom · August 14, 2024, 4:34pm

HI @tryrisotto! Thank you very much for digging into this!

I have gone ahead and escalated this behavior to our Weave team. They have found a couple of things that couple possibly lead to this behavior such as each trace creates new versions of multiple objects for the Weave workflow, which could be causing the slowdown for you.

I will keep you posted as soon as I get any updates on my end as well.

Warmly,
Artsiom

tryrisotto · August 14, 2024, 4:53pm

Amazing, thank you @artsiom ! Standing by…

tryrisotto · January 9, 2025, 9:18pm

@artsiom It’s been a while! Do you know if this has been resolved? I’m curious what changes were made, if any?

mauricio_shopsense · January 16, 2025, 3:03pm

Did they find a solution ? this latency is the worst for the LLM agents, it takes so long.

jason-arkens17 · May 14, 2025, 8:08pm

WB-20357 has been moved to Merged - This ticket will be closed for now. Thanks so much!

Topic		Replies	Views
requests.exceptions.HTTPError: 500 Server Error: Internal Server Error for url: /table/query. Reason W&B Help wandb	5	98	August 30, 2024
Wandb takes too much time after each run ends W&B Help sweeps	6	1309	October 25, 2022
Agent bug? File not found error W&B Help sweeps , wandb	11	5522	May 31, 2022
Sweeps: Waiting for W&B process to finish... (failed 1) W&B Help sweeps , projects , wandb	7	4131	May 31, 2023
Using Prompts & LangChain for non-agent use cases? W&B Help dashboard , wandb	4	882	September 2, 2023

Weave: High latency with LlamaIndex

Related topics