Machine-learning algorithm develops heuristics for trustworthy tweets in time of emergency

In "Credibility ranking of tweets during high impact events," a paper published in the ACM's Proceedings of the 1st Workshop on Privacy and Security in Online Social Media , two Indraprastha Institute of Information Technology researchers describe the outcome of a machine-learning experiment that was asked to discover factors correlated with reliability in tweets during disasters and emergencies:

The number of unique characters present in tweet was positively correlated to credibility, this may be due to the fact that tweets with hashtags, @mentions and URLs contain more unique characters. Such tweets are also more informative and linked, and hence credible. Presence of swear words in tweets indicates that it contains the opinion / reaction of the user and would have less chances of providing informa- tion about the event. Tweets that contain information or are reporting facts about the event, are impersonal in nature, as a result we get a negative correlation of presence of pronouns in credible tweets. Low number of happy emoticons [:-), :)] and high number of sad emoticons [:-(, :(] act as strong predictors of credibility. Some of the other important features (p-value < 0.01) were inclusion of a URL in the tweet, number of followers of the user who tweeted and presence of negative emotion words. Inclusion of URL in a tweet showed a strong positive correlation with credibility, as most URLs refer to pictures, videos, resources related to the event or news articles about the event.

Of course, this is all non-adversarial: no one is trying to trick a filter into mis-assessing a false account as a true one. It's easy to imagine an adversarial tweet-generator that suggests rewrites to deliberately misleading tweets to make them more credible to a filter designed on these lines. This is actually the substance of one of the cleverest science fiction subplots I've read: in Peter Watt's Behemoth, in which a self-modifying computer virus randomly hits on the strategy of impersonating communications from patient zero in a world-killing pandemic, because all the filters allow these through. It's a premise that's never stopped haunting me: the co-evolution of a human virus and a computer virus.

Credibility Ranking of Tweets during High Impact Events [PDF] (via /.)

4

  1. Low number of happy emoticons [:-), :)] and high number of sad emoticons [:-(, :(] act as strong predictors of credibility. Some of the other important features (p-value < 0.01) were inclusion of a URL in the tweet, number of followers of the user who tweeted and presence of negative emotion words.

    Is it live now? I wonder how it’s doing with the breaking news about Seattle falling into the sea. :( If any of you have loved ones where Seattle used to be, check out seattlehasfallenintothesea.com for the latest news. :-( This is a sad tragic unhappy thing that makes me angry and hopeless and other negative emotions. (Please RT)

  2. I still don’t “get” twitter — read: I have not figured out who to follow for twitter to be a useful source of information — but these not bad rules of thumb for assessing reliability of information generally. Credible messages cite evidence and avoid overt emotion.

    This story reminds me of a automatic youtube comment filter I read about (can’t find the link) that would suppress messages which had too much or too little punctuation, or too many capital letters, or too many emoticons, and so forth.I’m really surprised that negative emoticons are correlated with credibility — I would have guessed ANY emoticon in a communication would cut the credibility to almost nothing. But I suppose they are talking about disasters in particular, so negative emoticons would be normal (and happy ones a little suspicious). Also twitter as a medium is a little more personal than other news sources. While it would be professionally inappropriate for a journalist to react emotionally in a new article or even a blog post, we expect tweets (both from pros and citizens) to be a little more open. Or at least that’s my take.

Comments are closed.