I am not completely sure how the wandb sweep agent calls the program that is specified in the sweep configuration, but I want to make sure that I run my sweeps as efficiently as possible. In the top-level code I pull data from the cloud, which I do not want to do for each sweep trial. So instead, after the first sweep is started I want to make sure the agent only calls the main() function afterwards.
How does it currently work? Is what I am asking possible?
In a way, yes. I know that you can specify a function to run instead of an entire script. However, I still need to download the data once, which is done in the top-level code. I’m wondering if it’s possible to have different subsections of the code that are ran only once and then cached.
Hm, in theory, you should be able to do something like this with launching sweeps using the Python sdk. It would look to something like this from the docs.
You could first run the necessary code above which downloads and caches the data, and after that call the sweep agent using Python and the necessary function/ part of the code you are interested in.
In the example I sent above, the sweep agent (at the very bottom of the page) calls onto the main function which then performs the sweep. It is possible to add more before the example’s agent call –wandb.agent(sweep_id, function=main, count=4), where you can download and cache the necessary data needed before the sweep.