Redditors' convention of tagging their sarcastic remarks is a dream come true for machine learning researchers hoping to teach computers to recognize and/or generate sarcasm.
The Self-Annotated Reddit Corpus (SARC) is a corpus with 1.3 million sarcastic remarks ("10 times more than any previous dataset") that were tagged by redditors and stored in the database along with "user, topic, and conversation context."
Reddit comments from December 2005 have been
made available due to web-scraping 4
; we construct
our dataset as a subset of comments from
2009-2016, comprising the vast majority of comments
and excluding noisy data from earlier years.
For each comment we provide a sarcasm label, author,
the subreddit it appeared in, the comment score as voted on by users, the date of the comment,
and the parent comment or submission.
A Large Self-Annotated Corpus for Sarcasm
[Mikhail Khodak, Nikunj Saunshi and Kiran Vodrahalli/Princeton University]
(via Marginal Revolution)
Larry Tesler, the Xerox PARC computer scientist who coined the terms cut, copy, and paste, has died. Born in 1945 in New York, Tesler went on to study computer science at Stanford University, and after graduation he dabbled in artificial intelligence research (long before it became a deeply concerning tool) and became involved in the […]
Writing in Wired, Boing Boing contributor Clive Thompson discusses the rise and rise of "Edge AI" startups that sell lightweight machine-learning classifiers that run on low-powered chips and don't talk to the cloud, meaning that they are privacy respecting and energy efficient.
Yesterday's column by John Naughton in the Observer revisited Nathan Myhrvold's 1997 prediction that when Moore's Law runs out -- that is, when processors stop doubling in speed every 18 months through an unbroken string of fundamental breakthroughs -- that programmers would have to return to the old disciplines of writing incredibly efficient code whose […]
If you’re a photographer, videographer, or graphic designer, you’ve got a lot of competition charging up behind you. Because while you’ve been trained as a content creator, the task of snapping brilliant images, capturing well-composed video, and posting effective social media is now part of literally everyone’s skill set. For years, Adobe and their ubiquitous […]
After years of hearing a steady drumbeat about the necessity of surfing the web under the protection of a VPN, even the most technophobic among us are starting to come around. But even knowing the dangers one can face from cybercrooks phishing for information from unsuspecting victims online, those last holdouts still have some fears. […]
You may not realize it, but some of the biggest films in movie history have been edited using the same tools some of you use to cut your video of vacationing at Disney World. Giant movies from Oscar favorites The Social Network and Gone Girl to blockbusters like Avatar, Deadpool, and last year’s Terminator: Dark […]