Async LLM calls using PHP Fibers or Swoole coroutines
For a feature that needs to call the OpenAI API three times in parallel (different prompts, independent results), I want to fan out the calls and wait for all to complete.
What is the right approach in PHP? Currently each call takes 3-4 seconds, total would be 12 seconds sequentially.
If you are on Swoole or Hyperf, Coroutine::parallel() or Co\run with multiple coroutines is the cleanest approach. Total time becomes max(call1, call2, call3) instead of sum.
On FPM without async framework: curl_multi_exec handles parallel HTTP requests natively. openai-php/client does not support it directly but you can use the underlying Guzzle pool with Promise::all() via Guzzle concurrent requests.
Guzzle concurrent requests with the pool abstraction are straightforward. You create an array of request callables and Guzzle runs them concurrently up to your concurrency limit. Worked well for our 5-call fanout.
ReactPHP event loop approach also works but adds dependencies. For a one-off fanout, Guzzle pool is simpler. For a system that does this constantly, an async framework is the better long-term choice.
With Fibers you can write a simple scheduler manually: create one Fiber per call, resume them round-robin, they suspend at the curl read point. But honestly the Guzzle pool does this for you already.