Redditors' convention of tagging their sarcastic remarks is a dream come true for machine learning researchers hoping to teach computers to recognize and/or generate sarcasm.
The Self-Annotated Reddit Corpus (SARC) is a corpus with 1.3 million sarcastic remarks ("10 times more than any previous dataset") that were tagged by redditors and stored in the database along with "user, topic, and conversation context."
Reddit comments from December 2005 have been
made available due to web-scraping 4
; we construct
our dataset as a subset of comments from
2009-2016, comprising the vast majority of comments
and excluding noisy data from earlier years.
For each comment we provide a sarcasm label, author,
the subreddit it appeared in, the comment score as voted on by users, the date of the comment,
and the parent comment or submission.
A Large Self-Annotated Corpus for Sarcasm
[Mikhail Khodak, Nikunj Saunshi and Kiran Vodrahalli/Princeton University]
(via Marginal Revolution)
SQL Murder Mystery is a free/open game from Northwestern University's Knight Lab that teaches the player SQL database query structures and related concepts while they solve imaginary crimes.
Coraline Ada Ehmke's Hippocratic License is a software license that permits the broad swathe of activities enabled by traditional free/open licenses, with one exception it bars use by: "individuals, corporations, governments, or other groups for systems or activities that actively and knowingly endanger, harm, or otherwise threaten the physical, mental, economic, or general well-being of […]
Machine learning systems are pretty good at finding hidden correlations in data and using them to infer potentially compromising information about the people who generate that data: for example, researchers fed an ML system a bunch of Google Play reviews by reviewers whose locations were explicitly given in their Google Plus reviews; based on this, […]
We have a theory about those throw blankets that are barely big enough to cover your legs. The only people who seem to make them or use them are grandmothers, and the blankets are only that small because Nana got bored halfway through the sewing job. Look, we’re sure she means well. But if you […]
Remember when the default state of your online presence was anonymity? That’s not so clear-cut anymore, and the worst part is you may not even know who is using your data or what they’re using it for. Small wonder that so many people are choosing to surf through virtual private networks. VPNs filter web access […]
Get ready for the stream of your dreams, binge-watchers. There’s a contest afoot, and at stake is a lifetime subscription to Netflix. All you have to do is sign up, and you’re entered to win this ultimate Netflix plan. When does it expire? Only when you do. And hey, just in case you need something […]