Redditors' convention of tagging their sarcastic remarks is a dream come true for machine learning researchers hoping to teach computers to recognize and/or generate sarcasm.
The Self-Annotated Reddit Corpus (SARC) is a corpus with 1.3 million sarcastic remarks ("10 times more than any previous dataset") that were tagged by redditors and stored in the database along with "user, topic, and conversation context."
Reddit comments from December 2005 have been
made available due to web-scraping 4
; we construct
our dataset as a subset of comments from
2009-2016, comprising the vast majority of comments
and excluding noisy data from earlier years.
For each comment we provide a sarcasm label, author,
the subreddit it appeared in, the comment score as voted on by users, the date of the comment,
and the parent comment or submission.
A Large Self-Annotated Corpus for Sarcasm
[Mikhail Khodak, Nikunj Saunshi and Kiran Vodrahalli/Princeton University]
(via Marginal Revolution)
More bad news for Google's beleaguered spinoff Jigsaw, whose flagship project is "Perspective," a machine-learning system designed to catch and interdict harassment, hate-speech and other undesirable online speech.
A group of MIT Media Lab researchers have published Radiotalk, a massive corpus of talk radio audio with machine-generated transcriptions, with a total of 240,000 hours' worth of speech, marked up with machine-readable metadata.
Ben Herzog's Cryptographic Attacks: A Guide for the Perplexed from Check Point Research is one of the clearest, most useful guides to how cryptography fails that I've ever read.
Are we done with capsule coffee makers yet? Sure, they’re easy. But they are not so easy on the environment, and it’s debatable whether they actually make a better cup. Luckily, there’s never been a better time to switch back to the good old reliable drip method – especially when drip coffeemakers have quietly been […]
If there’s one thing that stayed consistent through the last decade or so of tech industry turmoil, it’s the love affair between techies and Linux. There’s just a ton you can do with the OS, and its open-source format means you can customize your rig from the ground up. Apparently not content with that level […]
Accidents happen. And when they do, you’re going to want a dash cam for a second pair of eyes. At the minimum, a decent dash cam can save you vast sums of time and money in case of an accident. But a really good dash cam can do a whole lot more. Here are six […]