Prefill and Decode Are Not the Same Job
Prefill and decode are the two halves of every transformer inference call, but they place radically different demands on the GPU. We trace why running them on the same hardware leaves throughput on the table — and what production-scale agent serving looks like when you stop pretending they are the same job.
Read the post