Redditors' convention of tagging their sarcastic remarks is a dream come true for machine learning researchers hoping to teach computers to recognize and/or generate sarcasm.
The Self-Annotated Reddit Corpus (SARC) is a corpus with 1.3 million sarcastic remarks ("10 times more than any previous dataset") that were tagged by redditors and stored in the database along with "user, topic, and conversation context."
Reddit comments from December 2005 have been
made available due to web-scraping 4
; we construct
our dataset as a subset of comments from
2009-2016, comprising the vast majority of comments
and excluding noisy data from earlier years.
For each comment we provide a sarcasm label, author,
the subreddit it appeared in, the comment score as voted on by users, the date of the comment,
and the parent comment or submission.
A Large Self-Annotated Corpus for Sarcasm
[Mikhail Khodak, Nikunj Saunshi and Kiran Vodrahalli/Princeton University]
(via Marginal Revolution)
Are you a PhD with interest in "the intersection of digital technology and public life, including experts in computer science, sociology, economics, law, political science, public policy, information studies, communication, and other related disciplines?" Princeton's CITP has three open job postings for 10-month residences starting Sept 1, 2019.
Ganbreeder uses a machine learning technique called Generative Adversarial Networks (GANs) to generate images that seem like photos, at least a first glance.
CIT computer scientist Milan Cvitkovic conducted 46 in-depth interviews with "scientists, engineers, and CEOs" and collated their machine learning research needs into an aptly named paper entitled "Some Requests for Machine Learning Research from the East African Tech Scene," which presents an illuminating look into the gaps in the current practice of machine learning, itself […]
So you’ve got a good eye for pictures? We’ve got a good eye for deals. And this holiday, there are some solid deals out there for photographers. Check out some of our favorite recent discounts on gear, software, and e-learning for photogs of any experience. Gadgets RevolCam: The Multi-Lens Photo Revolution for Smartphones This […]
Take a scroll through any app marketplace and you’ll see that the doors are wide open for any game these days – and any game developer. Like any creation, virtual or analog, it all starts with an idea. And if you’ve got one of those, the Complete Unity Game Developer Bundle can walk you the […]
At the rate the world is shrinking, you don’t need to be a globetrotter for a second language to be a useful skill. And if you’re looking to learn that second language (or a third, or fourth), uTalk Language Education is the learning program that makes progression not only easy but fun. If you can’t […]