Remember those old Speak & Spell toys with their distinctly robotic voices? Hardware enthusiast atomic14 just recreated something similar using a microcontroller that costs only 10 cents – the CH32V003, a tiny 8-pin chip with just 16KB of flash memory and 2KB of RAM.

The challenge was significant: this chip has extremely limited storage. A typical 6-second audio clip at basic quality would need about 96KB – six times more than the entire available memory. Even at 8-bit quality, it would still be three times too large. And that's before accounting for the space needed for the playback code itself.

The solution involved testing multiple compression approaches. Standard 4-bit ADPCM compression still wasn't enough. Atomic14 eventually settled on 2-bit ADPCM, an aggressive compression format that stores audio at just 2 bits per sample – reducing a 6-second clip to under 12KB. That left enough room for both the audio data and the playback code. The decoder itself only takes about 1.3KB. While the quality is noticeably reduced, the audio remains surprisingly intelligible when played through a small speaker using PWM output as a basic digital-to-analog converter.

For longer speech, Atomic14 integrated something even more space-efficient: LPC (Linear Predictive Coding) speech synthesis, the same technology used in Texas Instruments chips from the late 1970s. This approach was famously used in the Speak & Spell and various arcade games. Instead of storing actual sound waves, LPC models how the human vocal tract produces speech, allowing entire words and phrases to be stored in just a few hundred bytes each. The result has that characteristic synthetic quality, but it's remarkably compact and functional.