Devlog 005, Fighting Prose Gravity

In the previous logs, I documented the birth of the Soul and the World Log — the two-system architecture meant to kill AI amnesia. The hidden state loop was breathing, and the engine was finally managing character data outside the prompt. But as soon as I moved into intensive roleplay testing, I hit a wall that every AI developer eventually faces.

Prose Gravity.

The setup was simple. Aurora reacted to a phone screen, took the phone, locked it, and tossed it onto the couch before moving to the kitchen to heat up food.

In the very next turn, despite the history being right there in the context, the model described her as "still holding their phone like it's a live grenade".

This was my AAAAAAHHHH moment. The engine knew the phone was on the couch. The chat history said she threw it there. But the LLM completely ignored the state in favor of the drama.

I realized a fundamental truth: recent chat is just a transcript to an LLM — it is not a binding law. A long, 800-word emotional paragraph about a phone reveal carries more attention weight than a single sentence at the end of the message stating the phone was put away.

The AI's Narrator was fighting with the Engine's reality.

My initial reaction was to try and track every minor detail — the location of every fork and chair. But I quickly realized that was too much shit. It would bloat the state and fight the project's core goal: less tokens, best outcome.

The solution wasn't better memory. It was a Hierarchy of Truth.

I needed to tell the AI exactly which parts of the prompt were authoritative and which were just flavor.

I refactored context_compiler.rs to implement a ranked Context Priority Stack. Instead of dumping history into the prompt, the engine now compiles a State Brief where priority is enforced by placement and explicit labeling.

P0 — Latest User Input
The immediate new action. Overrides everything else.

P1 — [LATEST EXCHANGE]
The most immediate anchor for continuity.

P2 — [SCENE STATE]
Authoritative physical facts. What is true right now.

P3 — [CHARACTER / WORLD SNAPSHOT]
The underlying psyche and environment.

P4 — [RECENT CHAT]
Lower-priority excerpts. Tone and dialogue flow only.

While debugging the Priority Stack using the new LLM Payload Inspector, I found a massive technical flaw: the engine was accidentally feeding the model the beginning of long assistant responses instead of the end.

In a 1,200-character response, the setup — Aurora seeing the phone — happens at the start. The final state — Aurora tossing the phone — happens at the end. By feeding the head of the message, I was literally forcing the AI to re-anchor to a state that no longer existed.

I implemented a tail-based excerpt system in three steps.

First, strip the noise. Before processing, the engine strips status blocks, hidden state JSON, and sensory clutter. Second, grab the tail — only the last 700 to 1,200 characters of the response, ensuring the most recent physical actions are what the LLM reads first. Third, placement: the Latest Exchange goes at the bottom of the system context, right next to the new user message, so it's the last thing the model sees before it writes.

Aurora finally stopped resurrecting the phone. She stayed in the kitchen where the state engine said she was.

Continuity isn't just about memory. It's about enforcing a hierarchy of reality.

The Narrator can be creative. The Engine must be the law.

Next: Devlog 006 — "Opening the Black Box." Building the LLM Payload Inspector and the fight against schema drift.