On Twitter's engineering blog, a fascinating description of how Twitter uses a blend of machine intelligence and Mechanical Turk tasks to figure out, in real time, what is going on in the world:
Before we delve into the details, here's an overview of how the system works.
- First, we monitor for which search queries are currently popular.
Behind the scenes: we run a Storm topology that tracks statistics on search queries.
For example, the query [Big Bird] may suddenly see a spike in searches from the US.
- As soon as we discover a new popular search query, we send it to our human evaluators, who are asked a variety of questions about the query.
Behind the scenes: when the Storm topology detects that a query has reached sufficient popularity, it connects to a Thrift API that dispatches the query to Amazon's Mechanical Turk service, and then polls Mechanical Turk for a response.
For example: as soon as we notice "Big Bird" spiking, we may ask judges on Mechanical Turk to categorize the query, or provide other information (e.g., whether there are likely to be interesting pictures of the query, or whether the query is about a person or an event) that helps us serve relevant Tweets and ads.
- Finally, after a response from an evaluator is received, we push the information to our backend systems, so that the next time a user searches for a query, our machine learning models will make use of the additional information. For example, suppose our evaluators tell us that [Big Bird] is related to politics; the next time someone performs this search, we know to surface ads by @barackobama or @mittromney, not ads about Dora the Explorer.
Improving Twitter search with real-time human computation
Most people don’t look at any news, or at one news site; using social media a lot (even without the intention of looking for news) means that sometimes you’ll end up clicking a news link — so heavy social media users, on average, are consuming a wider media diet than those who do not use […]
In 2012, Google introduced Certificate Transparency, an internet-wide tripwire system designed to catch cryptographic “certificate authorities” who abused their position to produce counterfeit credentials that would allow criminals, governments and police to spy on and tamper with secure internet connections.
Yesterday’s massive ransomware outbreak of a mutant, NSA-supercharged strain of the Petya malware is still spreading, but the malware’s author made a mere $10K off it and will likely not see a penny more, because Posteo, the German email provider the crook used for ransom payment negotiations, shut down their account.
The TREBLAB X11 Earphones are versatile, offer great sound, and are currently $32.99 in the Boing Boing Store.These Bluetooth earbuds are a great workout companion. They’re totally sweat proof and their ear-fins keep them snugly in place during high activity — something that Apple’s AirPods can only do if you were blessed with precisely the […]
Whether you’re a seasoned entertainment industry veteran or a student working on your first spec script, having the right tool for the job will make a huge difference in your focus and productivity.Final Draft 10 is far and away the world’s best screenwriting software, used extensively by professional film and TV writers at top production […]
Web content creators who don’t have a solid SEO strategy should take note of Webtexttool. It’s a service that pulls in anonymous data from their entire user base to offer crowdsourced guidance that increases your search page ranks. By analyzing prior user successes, it helps you better gauge how your posts will perform at a […]