Prompt versioning and testing strategy for PHP applications
We have 15 different prompts used across our application. They change frequently as we tune them. No good process for tracking changes, testing regressions, or rolling back a bad prompt change.
How do teams manage prompt engineering over time?
Store prompts in the DB with a version column, not in code. Each prompt has a name, version, content, and model configuration. The application loads the active version at runtime. Rollback is a DB update.
For regression testing: collect a test set of (input, expected output) pairs. After changing a prompt, run the test set and compare outputs with the previous version. Manual review or LLM-as-judge scoring.
We keep prompts in version-controlled YAML files in the repo. Deployment includes a migrate step that upserts prompt versions to the DB. Best of both worlds: git history and runtime DB lookup.
LangSmith and similar tools do prompt versioning as a service. Overkill for 15 prompts, useful if you have hundreds across many models.
Shadow testing: run both old and new prompt in parallel, log both outputs, compare offline. Costs double per request during the test period but gives you real traffic results before switching.
How do you handle prompts that have PHP variables interpolated? Template syntax in the stored prompt?
Simple Mustache-style placeholders: {{user_name}}, {{document}}. Before sending, do a str_replace pass. Avoid complex templating engines in prompts, the indirection makes them harder to read and test.