On Twitter's engineering blog, a fascinating description of how Twitter uses a blend of machine intelligence and Mechanical Turk tasks to figure out, in real time, what is going on in the world:
Before we delve into the details, here's an overview of how the system works.
- First, we monitor for which search queries are currently popular.
Behind the scenes: we run a Storm topology that tracks statistics on search queries.
For example, the query [Big Bird] may suddenly see a spike in searches from the US.
- As soon as we discover a new popular search query, we send it to our human evaluators, who are asked a variety of questions about the query.
Behind the scenes: when the Storm topology detects that a query has reached sufficient popularity, it connects to a Thrift API that dispatches the query to Amazon's Mechanical Turk service, and then polls Mechanical Turk for a response.
For example: as soon as we notice "Big Bird" spiking, we may ask judges on Mechanical Turk to categorize the query, or provide other information (e.g., whether there are likely to be interesting pictures of the query, or whether the query is about a person or an event) that helps us serve relevant Tweets and ads.
- Finally, after a response from an evaluator is received, we push the information to our backend systems, so that the next time a user searches for a query, our machine learning models will make use of the additional information. For example, suppose our evaluators tell us that [Big Bird] is related to politics; the next time someone performs this search, we know to surface ads by @barackobama or @mittromney, not ads about Dora the Explorer.
Improving Twitter search with real-time human computation
Zeynep Tufekci (previously) is one of the most consistently astute, nuanced commenters on networked politics and revolutions, someone who’s been literally on the front lines around the world. In a new book called Twitter and Tear Gas: The Power and Fragility of Networked Protest, she sets out a thesis that (as the title suggests) explores […]
The latest Wikileaks release of leaked CIA cyberweapons includes “Scribbles” — referred to by the CIA as the “Snowden Stopper” — a watermarking tool that embeds web-beacon style tracking beacons into secret documents that quietly notify a central server every time the document is opened.
A pair of social scientists from UCSD and Yale conducted an NIH study published in the American Journal of Epidemiology on the link between Facebook use and mental health, drawing on data from the Gallup Panel Social Network Study combined with “objective measures of Facebook use” and self-reported data for 5,208 subjects, and concluded that […]
Boasting an IPX6 waterproof rating, the Trakk Bullet Ultra Compact Waterproof Bluetooth Speaker resists dust and heavy rainfall. It’s currently available in the Boing Boing Store.The Trakk Bullet offers the same wireless convenience as other portable speakers, but few are built as tough as this one. Its utilitarian construction is designed to be a totally low-maintenance […]
The Ticwatch 2 Active Smartwatch is a simpler take on an active wearable that raised over $2m dollars on Kickstarter and is currently offered in the Boing Boing Store.Somewhere in between the single-day battery life and platform-specificity of the Apple Watch and Android Wear devices, there exists the Ticwatch. Instead of trying to shoehorn another […]
Loot Crate is a subscription service that delivers a box of curated pop culture goods to your doorstep. To sample their geeky wares, you can order a single mystery box exclusively from the Boing Boing Store.Each month Loot Crate sends you 6-7 unique items and apparel, including collectibles, books, and t-shirts. Pulling inspiration from all […]