A computer scientist beat textbook binary search by more than 2x

Binary search is the page-flipping trick everyone learns in their first programming class: to find a word in a sorted list, look at the middle, decide whether your target is in the top or bottom half, and repeat. It has been considered close to optimal since the 1940s. Computer scientist Daniel Lemire, who specializes in fast algorithms, just published benchmarks showing he can beat it by more than 2x on both Apple and Intel chips.

The trick is that modern processors are nothing like the imaginary computers that textbook algorithms were designed for. Today's chips ship with SIMD instructions that compare eight 16-bit values against a target in a single operation and can fetch several values from memory at once instead of waiting in line. Binary search, which inches forward one comparison at a time, leaves all that hardware sitting idle.

Lemire's algorithm, which he calls SIMD Quad, splits the array into 16-element blocks, then narrows down to the correct block by jumping through quarters rather than halves. Once it lands on a block, SIMD checks all 16 entries at once. More instructions overall, but the chip runs them in parallel, so the wall-clock time drops.

He ran 10 million queries for each array size from 2 to 4,096 elements on an Apple M4 and an Intel Emerald Rapids server. SIMD Quad won every single comparison. On Intel with a warm cache, it finished in less than half the time. On Apple with a cold cache, same result. "Standard algorithms," Lemire writes, "were often not designed for computers that have so much parallelism." His source code is on the blog.

Previously: