On Twitter's engineering blog, a fascinating description of how Twitter uses a blend of machine intelligence and Mechanical Turk tasks to figure out, in real time, what is going on in the world:
Before we delve into the details, here's an overview of how the system works.
First, we monitor for which search queries are currently popular.
Behind the scenes: we run a Storm topology that tracks statistics on search queries.
For example, the query [Big Bird] may suddenly see a spike in searches from the US.
As soon as we discover a new popular search query, we send it to our human evaluators, who are asked a variety of questions about the query.
Behind the scenes: when the Storm topology detects that a query has reached sufficient popularity, it connects to a Thrift API that dispatches the query to Amazon's Mechanical Turk service, and then polls Mechanical Turk for a response.
For example: as soon as we notice "Big Bird" spiking, we may ask judges on Mechanical Turk to categorize the query, or provide other information (e.g., whether there are likely to be interesting pictures of the query, or whether the query is about a person or an event) that helps us serve relevant Tweets and ads.
Finally, after a response from an evaluator is received, we push the information to our backend systems, so that the next time a user searches for a query, our machine learning models will make use of the additional information. For example, suppose our evaluators tell us that [Big Bird] is related to politics; the next time someone performs this search, we know to surface ads by @barackobama or @mittromney, not ads about Dora the Explorer.
Guetzli is Google’s new free/open JPEG compression algorithm, which produces images that are more than a third smaller in terms of byte-size, and the resulting images are consistently rated as more attractive than traditionally compressed JPEGs. It’s something of a web holy grail: much smaller, better-looking files without having to convince people to install a […]
Thread count isn’t like one of those deceiving metrics like camera megapixels or Facebook friends—more threads are always better if you can afford them. If price was no object, we would all be snoozing soundly bundled up in 1.8 kilo-thread sheets every single night. Guess what? Price doesn’t have to be an object with this […]
Maybe it’s entirely because of podcast ads, but drag-and-drop tools like Squarespace have gotten immensely popular in recent years. While it’s definitely a great tool for any non-coders who want to get a small website up and running quickly, managing content with a primarily visual interface can become a pain once you have more than […]
When you can’t wait for the world’s longest meeting to end, the mindless leg bouncing makes your boredom obvious and just annoys everybody else. Everyone knows the TPS reports need the damn cover sheet, but some sadistic colleague keeps forgetting, probably on purpose just to eat into your lunch hour. Enough is enough!While serving a […]