Machine vision breakthrough: 100,000 objects recognized with a single CPU

Fast, Accurate Detection of 100,000 Object Classes on a Single Machine a prizewinning paper by Google Research scientists, describes a breakthrough in machine vision that can distinguish between a huge class of objects 20,000 times faster than before.

This so-called convolution operator is one of the key operations used in computer vision and, more broadly, all of signal processing. Unfortunately, it is computationally expensive and hence researchers use it sparingly or employ exotic SIMD hardware like GPUs and FPGAs to mitigate the computational cost. We turn things on their head by showing how one can use fast table lookup — a method called hashing — to trade time for space, replacing the computationally-expensive inner loop of the convolution operator — a sequence of multiplications and additions — required for performing millions of convolutions with a single table lookup.

We demonstrate the advantages of our approach by scaling object detection from the current state of the art involving several hundred or at most a few thousand of object categories to 100,000 categories requiring what would amount to more than a million convolutions. Moreover, our demonstration was carried out on a single commodity computer requiring only a few seconds for each image. The basic technology is used in several pieces of Google infrastructure and can be applied to problems outside of computer vision such as auditory signal processing.

Fast, Accurate Detection of 100,000 Object Classes on a Single Machine (via /.)

(Image: Clutter, a Creative Commons Attribution Share-Alike (2.0) image from neofob's photostream)

Notable Replies

  1. Fex says:

    The objects were then ranked in order of potential to assist in the inevitable destruction of humanity.

  2. It reminds me of Bill Gosper's old code for Conway's game of life from back in the 1980s. It used spatial and temporal hashing so it took several minute to do the first 1000 generations of the F pentonimo but just another second for the next 2^31 generations.

    Hashing, sometimes using K-D trees, was common in image processing in the early 80s and into the 90s. There just wasn't enough processing power, but if you created your tables cleverly you could get pretty good results a lot faster. (Professor Pentland at MIT got our group on the right track with this, way back when.) I suppose you never really have enough processing power.

  3. One step closer to the robot apocalypse. They're walking with perfect balance now, and aiming "pretend" weapons. They've had control of the means of production for decades. The only piece missing is the Anger Core.

Continue the discussion

2 more replies