An algorithm to figure out your gender
Twitter claims a 90 percent accuracy rate for the clever techniques it uses to learn the gender of any given user. Glenn Fleishman reports on the company's disconcerting new analytics tools, the research behind them, and how large a pinch of salt they come with.
Twitter opened its analytics platform to every user on August 27, allowing all of us — not just verified users and those with advertising accounts — to track how many people viewed and acted upon our tweets. But the "Followers" section, revealing demographics, provoked the most discussion. Alongside breakdowns in followers' interests and location is a gender bar that splits followers into male and female.
Many women were surprised how many of their followers — whether they had hundreds or thousands — were men. The ratio is often in the 75 to 80 percent range, and it's easy to find thousands of tweets reporting so. Some of the authors work in tech, a male-dominated industry, but others tweet about other subjects. Why such a heavy skew there?
Forget for a moment the problem, in 2014, with offering a simple duality for gender, which brings with it biases and assumptions.
How can Twitter offer this information when it doesn't ask for you to indicate a gender when you sign up for an account?
The service analyses our tweets and uses word choice, proximity, and other factors to make a guess. According to a 2012 post on its Advertising blog, the company relies on multiple signals to assign confidence to a gender selection. For matches with a high confidence level, Twitter tested its results against a global panel of humans found its approach 90 percent accurate. The post noted, "…where we can’t predict gender reliably, we don’t." (Twitter didn't reply to a request for information for this article.)
As Robin James, a professor of philosophy and women's/gender studies at the University of North Carolina–Chapel Hill, tweeted recently, "The Twitter analytics method of reading gender just shows social identity isn't bodily features anymore, but behavioral patterns."
Twitter's marketing research didn't come out of a vacuum. Researchers have long analyzed cues that arise out of modes of expression to determine personal characteristics that aren't explicitly mentioned or known to a reader. A 2002 book, revised in 2013, Reading, Writing, and Talking Gender in Literacy Learning, has a chapter that identifies and summarizes 42 studies mostly from the 1990s examining marks of gender in student writing, and it's just scratching the surface of the literature.
It's no surprise that such work would be extended when massive corpuses could be analyzed and then checked for accuracy using control cases in which gender was known, as when the writer (on a blog, social network, or other public platform) provided explicit details about themselves. This allows refinement on a scale never before possible.
One researcher, Delip Rao, was the lead author on several papers during his time at the Human Language Technology Center of Excellence at Johns Hopkins University that dealt with algorithmic methods of identification. Many of his co-authored papers talk about "latent attributes," those implicit specific details about people that can be surfaced, including ethnicity and gender.
Some of the "tells" in tweets and other messages are the stereotypical ones that would leap to mind, and we shouldn't be surprised that they test out as valid. In one paper from 2010, the researchers note that "OMG" is used four times as often by women than men in the dataset of Twitter messages they tested. The phrase "my zipper" has an extremely high predictive value for men, while "my yoga" has the same effect for women. The paper even notes, "People laugh differently on Twitter as well. While women LOL, men tend to LMFAO."
The MITRE Corporation released a much-cited study in May 2011 that attempted to predict gender and other factors, and found a range of 76 percent accuracy using the text of tweets alone and 92 percent using tweets, the account description, screen name, and full name (as provided). A September 2013 paper examined "700 million words, phrases, and topic instances collected from the Facebook messages of 75,000 volunteers." While the authors focused mostly on "psychological insights," based in part on conducting personality tests on all the volunteers, they conclude an accuracy rate of 91.9 percent for gender using "language" alone, and not including other data they gathered.
Rao went to work for Twitter in October 2011, and there's been a lot less research in the field in the two years since, with the 2012 and 2013 papers the only ones widely cited. It's possible Twitter, Facebook, and others recruited other researchers with similar interests, which would explain the paucity of papers, since there's such a huge financial reward associated with precise targeting of attributes.
In its 2012 post, the company said it only asserts gender when it is "reliable," but we don't know what percentage of the time that was; and of the reliable data, Twitter was wrong 10 percent of the time, assigning the incorrect gender. That could have meant an error rate of 20 percent or more. But let's assume this research has only improved, and Twitter now has an extremely high confidence and accuracy, which it measures above 90 percent. But things still don't add up.
Leaving aside people who don't identify as either strictly or solely male or female, or who reject those gender constructs, there are accounts run by teams and by individuals representing entities or publications, joke accounts, people who specifically crosstweet — using a name, avatar, description, and other factors that don't represent their identified gender — and bots, which may provide a sense of gender, but arguably possess none (yet) in a meaningful manner. Twitter may be mostly accurate, but I have to believe the error bar is larger than they maintain. The Pew Research Center's "Social Media Update 2013," released at the end of last year, finds statistically equal numbers of men and women (self-identified) in America using Twitter. The women are there.
My over 15,000 followers are divided 81/19 male/female, and the numbers for each (which you can see when you hover over each bar) adds up to nearly my exact current total. This isn't that odd, I suppose, since I dad tweet, tech tweet, and pun tweet, all of which are likely to find more male peers. Colleague Lisa Oberndorfer, an Austrian tech journalist who did a recent long stint in San Francisco, says she splits 74/26 male/female.
But the results seem stranger for many of my female friends and colleagues, who, even if not in tech, have Twitter reporting 75 to 80 percent of their followers are men. When they examine their list of followers and with whom they interact, they find it implausible.
For instance, my buddy Swoozy Clancey (a nom de Twitter) lists "feminist killjoy" and "kill the kyriarchy" in her bio, but somehow has 75 percent male followers. Another friend, @MaddieSayWhat, a PhD candidate in counselor education and supervision, splits 72/28. However, Maddie offers one plausible explanation: people are more likely to follow those of a gender to which they are attracted, even if they aren't specifically attracted to that person. She notes that following a gender to which one isn't attracted is "less reward centery." That would help explain the ratio for women tweeters, but not necessarily for men.
Some women and men report much more equal ratios. Sarah Werner, the digital media strategist at the Folger Shakespeare Library, says her ratio is 52/48 male/female; she checked Folger Research's account, and it has a mirror: 53/47 female/male. Where people checked, the total number of followers combined for genders typically added to nearly the sum of all followers, as in my case, indicating extremely high confidence.
One other factor may be the absolute number of people that identified men and women follow. If men follow 30 percent more people than women on average, that could also account for a more general disparity.
One has to ask, after all this scrying of gender, whether it matters a bit? For advertisers who know their products skew to one gender or another, or produce better results with gender-tailored marketing for the same market, sure. For the rest of us, it's hard to say.
It can seem disconcerting when you think you're being listened to by an audience you imagine in one fashion, and it turns out to be another. But there's enough ambiguity in Twitter's numbers, and not enough information revealed, to take some percentage points of accuracy with a grain of salt.
After allowing women to serve in combat roles, the United States Marines Corps plans to update various specialty titles to be ungendered. Insecure men are angry about this. Antitank Missilemen, for example, will now be Antitank Gunners instead. Operations Men will henceforth be Operations Chiefs. Most of the changes just replace the word “man” with […]
Lindy West is one of those web-writers who’s done consistently great work over the years, whether it’s talking about boobs or talking about trolls, and so I expected to like her memoir Shrill: Notes From a Loud Woman, but I didn’t expect to find myself laughing aloud over and over, nor did I expect to end up crying — and having done both in great measure, now I can’t get that most excellent book out of my head.
The Court ruled in Whole Woman’s Health v Hellerstedt that the Texas law placed undue burdens on clinics that performed abortions by requiring them to meet the standards of ambulatory surgical centers, use doctors with admitting privileges at local hospitals — measures that led to the closure of three quarters of the state’s abortion-providing facilities […]
Folks used to rely on alarms to protect their home – and before that, the family dog. Now, anyone looking to guard their homes can choose from some high-tech options, including the Amaryllo iCamPRO FHD Home Security Camera (now just $219 in the Boing Boing Store).In fact, this 2015 CES “Best of Innovation” award-winner boasts so many features, it’s […]
If you want a quality vaping experience, it’s usually going to cost you. Vaporizers that deliver a fast, controlled burn will set you back up to $300, which is why the FEZ Vaporizer (now just $99) is an absolute steal.The FEZ dry herb pen does everything that more expensive models handle at a reduced price. It heats up […]
Taking pictures can be challenging. There are a million factors that can influence each shot you take – and unless you’re a trained photographer, you often just focus, click…and cross your fingers.Of course, you can take some of the ambiguity out of your picture-taking with this Hollywood Art Institute Photography Course & Certification package, now […]