Redditors' convention of tagging their sarcastic remarks is a dream come true for machine learning researchers hoping to teach computers to recognize and/or generate sarcasm.
The Self-Annotated Reddit Corpus (SARC) is a corpus with 1.3 million sarcastic remarks ("10 times more than any previous dataset") that were tagged by redditors and stored in the database along with "user, topic, and conversation context."
Reddit comments from December 2005 have been
made available due to web-scraping 4
; we construct
our dataset as a subset of comments from
2009-2016, comprising the vast majority of comments
and excluding noisy data from earlier years.
For each comment we provide a sarcasm label, author,
the subreddit it appeared in, the comment score as voted on by users, the date of the comment,
and the parent comment or submission.
A Large Self-Annotated Corpus for Sarcasm
[Mikhail Khodak, Nikunj Saunshi and Kiran Vodrahalli/Princeton University]
(via Marginal Revolution)
On my first day at Michigan State University in 1992, a fellow student called me a “liberal” and I was shocked: as a Canadian who was often to the left of the social-democratic New Democratic Party, I identified “liberal” with the Liberal Party, a centre-right political party that had once imposed martial law in Canada.
AT&T, which has successfully lobbied state governments and the FCC to ban any broadband competition in the markets where it operates, says that its forced arbitration “agreements” aren’t really forced, because people in the markets it serves could just not use the internet.
Facebook is not responsible for bad speech by its users — section 230 of the US Telecommunications Act says that libel and other forms of prohibited speech are the responsibility of users, not those who provide forums for users to communicate in — but it takes voluntary steps to try to keep its service from […]
Web content creators who don’t have a solid SEO strategy should take note of Webtexttool. It’s a service that pulls in anonymous data from their entire user base to offer crowdsourced guidance that increases your search page ranks. By analyzing prior user successes, it helps you better gauge how your posts will perform at a […]
Just because English has become the common global tongue doesn’t mean it’s the easiest language to write—even for native speakers. If you’re looking to improve your written communication skills, especially on your smartphone, take a look at Ginger Page.Ginger is a cross-platform app that offers corrections for phrasing as well as grammar. It’s powered by […]
The current web development landscape is rife with buzzwords and technology that gets abandoned almost as soon as it’s made. If you’ve never written a line of code before, it can be hard to figure out what’s coming, what’s here to stay, or how to get ahead.This Beginner Web Development Bundle is a great place […]