Latency budgets in algorithmic execution

Total trip time, from a market-data event arriving on the wire to a venue acknowledgement landing back, is a budget. It is finite, it is divisible into stages with characteristic distributions, and it is most usefully reasoned about as a budget rather than as a single number. Ingestion, normalisation, decision, serialisation, send, and acknowledgement each consume a share. Each stage has its own typical share and its own characteristic failure modes.

Median figures are easy to quote and almost always misleading. The trades that hurt are not the ones executed at the median. They are the ones at the ninety-ninth percentile of the slowest stage on a given day, where a garbage-collection pause, a kernel scheduling decision, or a transient network reordering pushes one component well past its share. Reasoning about the budget by histograms, with attention to the right tail of each stage, reveals problems that mean and median collapse into noise.

The implication for engineering effort is unintuitive. Compressing the median of an already-fast stage rarely changes outcomes, because the marginal millisecond hides in tail behaviour rather than central tendency. The work that pays is bounding the worst case: pre-allocating, removing allocations from hot paths, controlling for jitter sources, and instrumenting per-stage so a regression in any one of them is visible within hours rather than weeks.

Speed, in this framing, is a constraint to be respected rather than a goal to be optimised. The goal is reliable behaviour under the worst conditions the market produces. Latency budgets are simply the language in which that goal is written down.

← Back to Insights