An algorithm to figure out your gender
Twitter claims a 90 percent accuracy rate for the clever techniques it uses to learn the gender of any given user. Glenn Fleishman reports on the company's disconcerting new analytics tools, the research behind them, and how large a pinch of salt they come with.
Twitter opened its analytics platform to every user on August 27, allowing all of us — not just verified users and those with advertising accounts — to track how many people viewed and acted upon our tweets. But the "Followers" section, revealing demographics, provoked the most discussion. Alongside breakdowns in followers' interests and location is a gender bar that splits followers into male and female.
Many women were surprised how many of their followers — whether they had hundreds or thousands — were men. The ratio is often in the 75 to 80 percent range, and it's easy to find thousands of tweets reporting so. Some of the authors work in tech, a male-dominated industry, but others tweet about other subjects. Why such a heavy skew there?
Forget for a moment the problem, in 2014, with offering a simple duality for gender, which brings with it biases and assumptions.
How can Twitter offer this information when it doesn't ask for you to indicate a gender when you sign up for an account?
The service analyses our tweets and uses word choice, proximity, and other factors to make a guess. According to a 2012 post on its Advertising blog, the company relies on multiple signals to assign confidence to a gender selection. For matches with a high confidence level, Twitter tested its results against a global panel of humans found its approach 90 percent accurate. The post noted, "…where we can’t predict gender reliably, we don’t." (Twitter didn't reply to a request for information for this article.)
As Robin James, a professor of philosophy and women's/gender studies at the University of North Carolina–Chapel Hill, tweeted recently, "The Twitter analytics method of reading gender just shows social identity isn't bodily features anymore, but behavioral patterns."
Twitter's marketing research didn't come out of a vacuum. Researchers have long analyzed cues that arise out of modes of expression to determine personal characteristics that aren't explicitly mentioned or known to a reader. A 2002 book, revised in 2013, Reading, Writing, and Talking Gender in Literacy Learning, has a chapter that identifies and summarizes 42 studies mostly from the 1990s examining marks of gender in student writing, and it's just scratching the surface of the literature.
It's no surprise that such work would be extended when massive corpuses could be analyzed and then checked for accuracy using control cases in which gender was known, as when the writer (on a blog, social network, or other public platform) provided explicit details about themselves. This allows refinement on a scale never before possible.
One researcher, Delip Rao, was the lead author on several papers during his time at the Human Language Technology Center of Excellence at Johns Hopkins University that dealt with algorithmic methods of identification. Many of his co-authored papers talk about "latent attributes," those implicit specific details about people that can be surfaced, including ethnicity and gender.
Some of the "tells" in tweets and other messages are the stereotypical ones that would leap to mind, and we shouldn't be surprised that they test out as valid. In one paper from 2010, the researchers note that "OMG" is used four times as often by women than men in the dataset of Twitter messages they tested. The phrase "my zipper" has an extremely high predictive value for men, while "my yoga" has the same effect for women. The paper even notes, "People laugh differently on Twitter as well. While women LOL, men tend to LMFAO."
The MITRE Corporation released a much-cited study in May 2011 that attempted to predict gender and other factors, and found a range of 76 percent accuracy using the text of tweets alone and 92 percent using tweets, the account description, screen name, and full name (as provided). A September 2013 paper examined "700 million words, phrases, and topic instances collected from the Facebook messages of 75,000 volunteers." While the authors focused mostly on "psychological insights," based in part on conducting personality tests on all the volunteers, they conclude an accuracy rate of 91.9 percent for gender using "language" alone, and not including other data they gathered.
Rao went to work for Twitter in October 2011, and there's been a lot less research in the field in the two years since, with the 2012 and 2013 papers the only ones widely cited. It's possible Twitter, Facebook, and others recruited other researchers with similar interests, which would explain the paucity of papers, since there's such a huge financial reward associated with precise targeting of attributes.
In its 2012 post, the company said it only asserts gender when it is "reliable," but we don't know what percentage of the time that was; and of the reliable data, Twitter was wrong 10 percent of the time, assigning the incorrect gender. That could have meant an error rate of 20 percent or more. But let's assume this research has only improved, and Twitter now has an extremely high confidence and accuracy, which it measures above 90 percent. But things still don't add up.
Leaving aside people who don't identify as either strictly or solely male or female, or who reject those gender constructs, there are accounts run by teams and by individuals representing entities or publications, joke accounts, people who specifically crosstweet — using a name, avatar, description, and other factors that don't represent their identified gender — and bots, which may provide a sense of gender, but arguably possess none (yet) in a meaningful manner. Twitter may be mostly accurate, but I have to believe the error bar is larger than they maintain. The Pew Research Center's "Social Media Update 2013," released at the end of last year, finds statistically equal numbers of men and women (self-identified) in America using Twitter. The women are there.
My over 15,000 followers are divided 81/19 male/female, and the numbers for each (which you can see when you hover over each bar) adds up to nearly my exact current total. This isn't that odd, I suppose, since I dad tweet, tech tweet, and pun tweet, all of which are likely to find more male peers. Colleague Lisa Oberndorfer, an Austrian tech journalist who did a recent long stint in San Francisco, says she splits 74/26 male/female.
But the results seem stranger for many of my female friends and colleagues, who, even if not in tech, have Twitter reporting 75 to 80 percent of their followers are men. When they examine their list of followers and with whom they interact, they find it implausible.
For instance, my buddy Swoozy Clancey (a nom de Twitter) lists "feminist killjoy" and "kill the kyriarchy" in her bio, but somehow has 75 percent male followers. Another friend, @MaddieSayWhat, a PhD candidate in counselor education and supervision, splits 72/28. However, Maddie offers one plausible explanation: people are more likely to follow those of a gender to which they are attracted, even if they aren't specifically attracted to that person. She notes that following a gender to which one isn't attracted is "less reward centery." That would help explain the ratio for women tweeters, but not necessarily for men.
Some women and men report much more equal ratios. Sarah Werner, the digital media strategist at the Folger Shakespeare Library, says her ratio is 52/48 male/female; she checked Folger Research's account, and it has a mirror: 53/47 female/male. Where people checked, the total number of followers combined for genders typically added to nearly the sum of all followers, as in my case, indicating extremely high confidence.
One other factor may be the absolute number of people that identified men and women follow. If men follow 30 percent more people than women on average, that could also account for a more general disparity.
One has to ask, after all this scrying of gender, whether it matters a bit? For advertisers who know their products skew to one gender or another, or produce better results with gender-tailored marketing for the same market, sure. For the rest of us, it's hard to say.
It can seem disconcerting when you think you're being listened to by an audience you imagine in one fashion, and it turns out to be another. But there's enough ambiguity in Twitter's numbers, and not enough information revealed, to take some percentage points of accuracy with a grain of salt.
Last spring, Five Thirty-Eight’s Walt Hickey published analysis of the IMDB ratings of women-oriented entertainment (like Sex in the City), showing that the ratings for these shows were artificially depressed because men went out of their way to give them extremely low scores.
Books one and two of Lumberjanes introduced us to the characters and setting of the awesome, women-run, girl-positive comics: the girls of Roanoke cabin at Miss Quinzella Thiskwin Penniquiqul Thistle Crumpet’s Camp for Hardcore Lady Types are Lumberjanes, being trained in the badass arts. Book three — collecting comics from a kind of victory lap of the title after its amazing success — turned the series’ reins over to some of the best writers and illustrators in comics-dom for a series of vignettes. Now, with Out of Time, the fourth book, the original creative team are back at the helm, telling a long-form story that illuminates the Lumberjane backstory and introduces one of the best, scariest monsters of cryptozoologica.
The first two volumes of Matt Fraction and Chip Zdarsky’s Sex Criminals were a dirty romp: a pair of lovers who discover that they can stop time at the moment of orgasm start robbing banks to save a local library from demolition, and run into a posse of other time-stopping fuckers who are set against them. But in volume three, Three the Hard Way, the story transcends the sex and the jokes to take a hard, wet look at what humans do when we do sex.
3D printers are hot, but they’re also pricey. While the prospect of cranking out everything we can dream up is enticing, cost is often one factor that keeps us from jumping onto the 3D printing train.Now, thanks to M3D, that doesn’t have to be the case. You can now get its flagship 3D printer–plus four reels of filaments–for just […]
It’s no secret that technology is changing the way we all work—but it’s also transforming the way we play. The games of today look nothing like those of 10 or even 20 years ago: these days it’s all about mobile and 3D. And now you can learn to design 3D mobile games with the Intro to Unity 3D Game […]
Earbuds are fine for casual listening while you work out or run errands. But when you really want to experience music as it was intended, nothing beats a serious set of noise-canceling, soundscape-enhancing headphones.The REMXD On-Ear Bluetooth Headphones offer high-quality sound with complete wireless connectivity — and at just $35.99, this rechargeable set won’t even cut into […]