Anyone who has shipped an LLM-powered product runs into the same wall: conversations grow, context windows do not. The first ten turns are fast and pleasant. By turn eighty the prompt is bloated, latency creeps up, costs scale linearly with every new message, and eventually the model starts dropping the