Train your AI with the world's largest data-set of sarcasm, courtesy of redditors' self-tagging

Cory Doctorow 12:23 pm Mon May 1, 2017

Redditors' convention of tagging their sarcastic remarks is a dream come true for machine learning researchers hoping to teach computers to recognize and/or generate sarcasm.

The Self-Annotated Reddit Corpus (SARC) is a corpus with 1.3 million sarcastic remarks ("10 times more than any previous dataset") that were tagged by redditors and stored in the database along with "user, topic, and conversation context."

Reddit comments from December 2005 have been
made available due to web-scraping 4
; we construct
our dataset as a subset of comments from
2009-2016, comprising the vast majority of comments
and excluding noisy data from earlier years.
For each comment we provide a sarcasm label, author,
the subreddit it appeared in, the comment score as voted on by users, the date of the comment,
and the parent comment or submission.

A Large Self-Annotated Corpus for Sarcasm

[Mikhail Khodak, Nikunj Saunshi and Kiran Vodrahalli/Princeton University]

(via Marginal Revolution)

Toxic Trump spokesperson says Biden "can hardly speak," then hilariously flubs her own words (video)

Damn-en it! A Donald Trump spokesperson who tried to paint President Biden as a man who "can hardly speak" hilariously flubbed up her own words in the process. "The transcript… READ THE REST
There is a Royal Order of Adjectives, and you follow it without knowing what it is

There is a Royal Order of Adjectives, and you follow it without knowing what it is—a particular sequence to use when more than one adjective precedes a noun. There are… READ THE REST
Scientists report groundbreaking first conversation between humans and whales

Researchers from the SETI Institute report what may have been the first conversation with whales in the humpbacks' own language. The SETI scientists have been exploring interspecies communication to learn… READ THE REST
Save $169 on a lifetime license to Microsoft Windows 11 Pro and never look back

TL;DR: Revamp your digital world with this incredible lifetime license to Microsoft Windows 11 Pro, with its seamless interface and top-notch security, for only $29.97 (Reg. $199) until 11:59 PM on 1/07.… READ THE REST
Upgrade your tech for the new year with this refurbished iPad Pro, less than half price right now

TL;DR: Save over $350 on a refurbished Apple iPad Pro 10.5" 256GB, plus a free accessories bundle, with this sweet deal on sale for just $315.99 right now. Tech fans, it's time… READ THE REST
Make your rockstar dreams a reality for only $15.97 with this Guitar Lessons Training Bundle

TL;DR: The perfect last-minute holiday gift for an aspiring rocker, the 2024 Guitar Lessons Training Bundle is only $15.97 (Reg. $480) until 11:59 PM on 12/25. It's really never too late to make… READ THE REST