LLMs don't chat — they complete text. The "conversation" you see is constructed from a flat token stream with special tokens marking roles and boundaries.
Chat View
Token Stream View
Token Legend Role marker — starts a new speaker turn Separator — divides role from content End token — closes the current turn Tool token — marks tool call/result boundaries
Key Insight
The model has no concept of "chatting." Every conversation is converted into a single text stream with special tokens. The chat interface is just a user-friendly wrapper — underneath, it's ALL just next-token prediction on a flat string.