OpenAI PHP SDK streaming responses: implementation notes
Implemented streaming chat completions in a PHP backend using the official openai-php/client package. Sharing some implementation notes because the documentation skips over the PHP-specific parts.
The key is using createStreamed() instead of create(). The stream returns an iterator of CreateStreamedResponse objects.
The issue is that default PHP output buffering eats the stream. You need to disable it or flush manually.
If you are behind Nginx, also set X-Accel-Buffering: no in the response headers. Nginx buffers proxied responses by default and will hold the stream until the connection closes.
For Laravel + Octane, the response flushing is different because Swoole manages output buffering. Use response()->stream() with the Octane-aware streaming API instead of raw echo + flush.
Also: set a long enough timeout. Default PHP max_execution_time will kill a 60-second stream. Either set it to 0 for the streaming endpoint or to a generous value. In FPM this is request_terminate_timeout in the pool config.
I tried using Server-Sent Events format for the stream (data: prefix, double newline) which is nicer for browser clients. The openai-php client does not format it as SSE so you need to wrap the delta yourself.
Error handling mid-stream is tricky. If the OpenAI API returns an error after streaming has started, you have already sent HTTP 200. I catch stream exceptions and send a special error token in the stream that the client handles.