artem_ml14 Mar 2025 06:42

Implemented streaming chat completions in a PHP backend using the official openai-php/client package. Sharing some implementation notes because the documentation skips over the PHP-specific parts.

The key is using createStreamed() instead of create(). The stream returns an iterator of CreateStreamedResponse objects.

PHP
$stream = $openai->chat()->createStreamed([
'model' => 'gpt-4o',
'messages' => [['role' => 'user', 'content' => $prompt]],
]);
foreach ($stream as $response) {
$delta = $response->choices[0]->delta->content ?? '';
echo $delta;
ob_flush();
flush();
}
הההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההההה
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

The issue is that default PHP output buffering eats the stream. You need to disable it or flush manually.

Replies (5)
alex_petrov14 Mar 2025 06:47

If you are behind Nginx, also set X-Accel-Buffering: no in the response headers. Nginx buffers proxied responses by default and will hold the stream until the connection closes.

0
dmitry_kv14 Mar 2025 07:03

For Laravel + Octane, the response flushing is different because Swoole manages output buffering. Use response()->stream() with the Octane-aware streaming API instead of raw echo + flush.

0
artem_ml14 Mar 2025 07:48

Also: set a long enough timeout. Default PHP max_execution_time will kill a 60-second stream. Either set it to 0 for the streaming endpoint or to a generous value. In FPM this is request_terminate_timeout in the pool config.

0
katedev14 Mar 2025 09:48

I tried using Server-Sent Events format for the stream (data: prefix, double newline) which is nicer for browser clients. The openai-php client does not format it as SSE so you need to wrap the delta yourself.

0
vova14 Mar 2025 10:12

Error handling mid-stream is tricky. If the OpenAI API returns an error after streaming has started, you have already sent HTTP 200. I catch stream exceptions and send a special error token in the stream that the client handles.

0