Computing cosine similarity for embeddings in pure PHP
If you are working with LLM embeddings and need to compare them, you do not need a vector database for small datasets. Cosine similarity is straightforward to implement in PHP and runs fine for thousands of vectors.
Here is a working implementation you can run directly:
For real embeddings you would store the vectors in a table as JSON blobs and load them into PHP arrays. Up to about 50k vectors this is fast enough for most use cases without any external tooling.
Good implementation. One optimization: if your vectors are pre-normalized (unit length, which most embedding APIs return), you can skip the norm computation entirely because for unit vectors cosine similarity equals the dot product.
The dot product version is about 30% faster on large vectors since it avoids two sqrt() calls.
```php blocks are runnable.