katedev10 Apr 2025 11:42

Building a chat feature where users have multi-turn conversations with an LLM. The challenge is keeping conversation history within the context window without losing important early messages.

How are people handling this in PHP? Interested in both technical strategies and any PHP-specific considerations.

Replies (5)
artem_ml10 Apr 2025 12:11

Simplest approach: keep the last N messages where N is tuned so total tokens stay under the limit. Use tiktoken-php or a rough heuristic (chars/4) for token counting. Not perfect but works for most chat use cases.

0
alex_petrov10 Apr 2025 12:45

Better approach: sliding window with the system prompt pinned at the beginning. Always include the first user message and last assistant response. Summarize older turns with a separate LLM call when the window fills up.

0
sergey_web10 Apr 2025 13:39

We store messages with their token count in the DB (estimated at insert time). When building the context, select messages from the end up to the budget. The system prompt token count is subtracted from the budget first.

0
vova10 Apr 2025 15:14

Claude 3 and GPT-4 128k context windows make this less urgent for most use cases. The bigger problem at scale is cost: charging users based on conversation length or truncating aggressively.

0
artem_ml10 Apr 2025 17:02

Token counting in PHP: the tiktoken-php library is the most accurate. For OpenAI models it gives exact counts. For rough estimates, (string length / 3.5) is close enough for GPT-4 English text.

0