Constrained Decoding: How to Get Guaranteed JSON from an LLM (and the Reasoning Tax)
How constrained decoding guarantees valid JSON from an LLM: runnable vLLM and structured-output examples, the latency cost, and the reasoning tax that JSON-mode hides.