Lip-reading algorithms have all sorts of real-world applications, and LipNet shows great promise in machine-learning lipreading of constructed sentences from the GRID sentence corpus.
From the paper LipNet: sentence-level lipreading
Lipreading is the task of decoding text from the movement of a speaker's mouth. Traditional approaches separated the problem into two stages: designing or learning visual features, and prediction. More recent deep lipreading approaches are end-to-end trainable (Wand et al., 2016; Chung & Zisserman, 2016a). All existing works, however, perform only word classification, not sentence-level sequence prediction. Studies have shown that human lipreading performance increases for longer words (Easton & Basala, 1982), indicating the importance of features capturing temporal context in an ambiguous communication channel. Motivated by this observation, we present LipNet, a model that maps a variable-length sequence of video frames to text, making use of spatiotemporal convolutions, an LSTM recurrent network, and the connectionist temporal classification loss, trained entirely end-to-end. To the best of our knowledge, LipNet is the first lipreading model to operate at sentence-level using a single end-to-end speaker-independent deep model to simultaneously learn spatiotemporal visual features and a sequence model. On the GRID corpus, LipNet achieves 93.4% accuracy, outperforming experienced human lipreaders and the previous 79.6% state-of-the-art accuracy.
• LipNet: How easy do you think lipreading is? (YouTube / Yannis Assael)
Pigeons with hats have been spotted and filmed in Las Vegas, prompting a search for whoever is putting them there. Pigeons in Las Vegas have been spotted wearing tiny little cowboy hats. It’s not clear how the pigeons got the hats on their heads. Some people found it funny, but other are wondering if this […]
Princen Alice created a “password generator” that glues random Welsh-sounding words into a craggy landscape of letters. It’s probably not very good, since it’s three or four dictionary words and a number plus the fallacious ethnocentric belief that unpronouceability to English speakers reflects randomness, but what a delightful mess!
Enjoy this compilation of 1990s-era sprite games extruded into lowpoly 2½D extravaganzas. These are works of static art, but I’m sure I’ve seen this done “live” in-game recently, in an automated, playable way. (Note that there are a number of 2½D Super Marios already–these replicas are hand-made, like this video).
There’s much ado about coffee brewing methods these days, but most of us – at least on the busy weekdays – just want our morning joe to be consistent, easy, and most of all, fast. If that sounds like anyone on your Christmas list, they’re going to get a lot of mileage out of the […]
Online shopping is going to be through the roof this month, and you’d better believe that hackers know it. If you’re going to invest in a virtual private network, now is the time. Especially when Windscribe VPN is offering its own holiday discount. At their current price point, it’s well worth jumping on the paid […]
Cheap massage chairs are a common Christmas gift, but we’re willing to bet they don’t get a lot of actual use from the people who could really use a massage. We’re talking about people with deep, chronic joint pain or anyone who does a serious workout on a regular basis. For that kind of soreness, […]