Building a RAG pipeline in PHP without Python dependencies

artem_ml22 Mar 2025 08:42

Most RAG tutorials assume Python and LangChain. I built a working RAG system in pure PHP. Here is the stack:

OpenAI text-embedding-3-small for embeddings
Qdrant for vector storage (PHP Qdrant client)
openai-php/client for generation

The retrieval step does a vector similarity search in Qdrant, retrieves top 5 chunks, and appends them to the system prompt as context.

Happy to share the embedding + chunking code if anyone is interested.

549

Replies (8)

alex_petrov22 Mar 2025 08:50

Interested in the chunking strategy. Fixed-size chunks vs semantic chunking? And how do you handle overlapping chunks to avoid losing context at boundaries?

artem_ml22 Mar 2025 10:45

Using fixed 512 token chunks with 64 token overlap. Token counting with tiktoken-php (PHP port). Not as sophisticated as semantic chunking but works well for technical documentation.

sergey_web22 Mar 2025 11:58

Qdrant PHP client is pretty thin but the REST API is simple enough that you can hit it with Guzzle directly without a client library. Saved us a dependency.

dmitry_kv22 Mar 2025 13:08

How are you handling embedding costs? text-embedding-3-small is cheap but if you have large document sets the ingestion cost adds up.

artem_ml22 Mar 2025 14:50

Caching embeddings in the DB alongside the content hash. If the content has not changed, skip re-embedding. Incremental ingestion with a last_embedded_at timestamp on each document.

vova22 Mar 2025 15:36

For PHP apps on Swoole you can parallelize the embedding calls with coroutines. We dropped ingestion time from 40 minutes to 3 minutes by running 20 concurrent embedding requests.

katedev22 Mar 2025 16:19

Does the retrieval quality degrade on long documents? I have seen complaints about coherence when chunks lose document structure context (like which section they came from).

artem_ml22 Mar 2025 16:28

Yes, adding the document title and section header as a prefix to each chunk before embedding improves retrieval a lot. The embedding captures context from the prefix not just the chunk text.

Write a reply