Building a RAG pipeline in PHP without Python dependencies
Most RAG tutorials assume Python and LangChain. I built a working RAG system in pure PHP. Here is the stack:
- OpenAI
text-embedding-3-smallfor embeddings - Qdrant for vector storage (PHP Qdrant client)
- openai-php/client for generation
The retrieval step does a vector similarity search in Qdrant, retrieves top 5 chunks, and appends them to the system prompt as context.
Happy to share the embedding + chunking code if anyone is interested.
Interested in the chunking strategy. Fixed-size chunks vs semantic chunking? And how do you handle overlapping chunks to avoid losing context at boundaries?
Using fixed 512 token chunks with 64 token overlap. Token counting with tiktoken-php (PHP port). Not as sophisticated as semantic chunking but works well for technical documentation.
Qdrant PHP client is pretty thin but the REST API is simple enough that you can hit it with Guzzle directly without a client library. Saved us a dependency.
How are you handling embedding costs? text-embedding-3-small is cheap but if you have large document sets the ingestion cost adds up.
Caching embeddings in the DB alongside the content hash. If the content has not changed, skip re-embedding. Incremental ingestion with a last_embedded_at timestamp on each document.
For PHP apps on Swoole you can parallelize the embedding calls with coroutines. We dropped ingestion time from 40 minutes to 3 minutes by running 20 concurrent embedding requests.
Does the retrieval quality degrade on long documents? I have seen complaints about coherence when chunks lose document structure context (like which section they came from).
Yes, adding the document title and section header as a prefix to each chunk before embedding improves retrieval a lot. The embedding captures context from the prefix not just the chunk text.