Twitter Knows I’m Female!!! (But Only Because I Used Those Exclamation Marks)

Any gay lady will tell you that stereotypes are a double-edged sword. On one hand, it’s really frustrating to explain over and over that yes, you’re gay and no, you don’t hate men. On the other foot, I know I’m not the only one who relies on a girl’s swagger or her alternative lifestyle haircut to let me know if she plays for the home team. Some stereotypes exist because there’s often a tiny grain of truth buried deep in them–some pattern that has appeared over time.

Researchers at the Mitre Corporation are taking advantages of these patterns to try to figure out who Twitter users really are. In a paper presented this week at the Conference on Empirical Methods in Natural Language Processing, they showed how an algorithm that they created is capable of determining a user’s age, location, gender, and political affiliation by analyzing their tweets. The algorithm scans a user’s tweets for the presence of sociolinguistic cues–words, characters, and phrases that are more likely to be used by a specific group–to determine the user’s demographic characteristics.

While the researchers believe their technology could be useful for advertisers looking to break in to niche markets on twitter, others have suggested that it could be helpful in uncovering impostors. The Atlantic Wire’s Rebecca Greenfield thinks that this kind of technology could help us avoid another man-pretending-to-be-a-lesbian fiasco. While I like her thinking, I’m not sure if it could stop another Gay Girl in Damascus. The program was written to recognize certain features and classify users based on binary categories (I’m going to save my breath/fingertips here because talking about how problematic a gender binary is would be preaching to the (intelligent, savvy, and attractive) choir). Let’s take a look at some of the classifiers so that you get an idea of what exactly it’s scanning for.

Hilarious? Yes. Easy to trick? You bet. In the words of Rachel’s Real Computational Linguist friend “to be honest, all someone would have to do to try and fool this thing into thinking its a woman is say ‘omg!!! :)'” It doesn’t take a PhD to figure out that women tend to be more effusive online; in fact, I bet Tom MacMaster or Bill Graber could tell you all about it. Besides, LGBT people tend to experience less pressure to confirm to gendered expectations, making it more difficult to pinpoint the gender of a queer user based on key words alone and even tougher to spot a fraud.

Patterns, stereotypes, and formulas only get you so far. The algorithm is based on the idea that men and women use language differently, something that’s been shown over and over in research. The issue here is that while men and women as aggregate groups behave differently, it’s an ecological fallacy to believe that you can accurately gauge the characteristics of an individual man or woman.

The results of the experiment don’t to much to back up the algorithm’s accuracy either. The baseline for gender was 55%–meaning that since 55% of twitter users are women, if the computer were to randomly guess at the gender of all the users it would theoretically be right 55% of the time. Using the program, the computer was able to determine the user correctly 72% of the time–only a 17% improvement over chance. While it’s clear that there are a variety of markets for programs that allow you ascertain the intent and disposition of otherwise anonymous internet users, it looks like it’ll be a while before they hit the shelves.

Laura is a tiny girl who wishes she were a superhero. She likes talking to her grandma on the phone and making things with her hands. Strengths include an impressive knowledge of Harry Potter, the ability to apply sociology to everything under the sun, and a knack for haggling for groceries in Spanish. Weaknesses: Chick-fil-a, her triceps, girls in glasses, and the subjunctive mood. Follow the vagabond adventures of Laura and her bike on twitter [@laurrrrita].

Laura has written 329 articles for us.

26 Comments

  1. Democrat? Tofurkey and yoga. Republican? Walmart and trucker. That’s hilarious. I am not impressed that this system uses words like “zipper,” “wife” and “gf” to figure out it’s man. Why not add the “my penis,” “prostate” and “jacking off” while they are at it? Not clever. I think the challenge and the interesting thing is when those self identifiers are missing. I believe women tend to use smileyfaces more and use more expressive punctuation. But I don’t think that’s a secret really. It’s cool research, but I think this is only a starting point. Right now it seems as good as guessing as anyone able to read.

    • from my understanding it does sort of seem to me like they picked which words to look for first and then calculated backwards, which seems to build a bias into the whole thing? since it’s looking for keywords — it seems to work by verifying stereotypes rather than by having compiled new factual information based on research and analysis. like it figures you’re a woman because someone hypothesized that women “OMG!!!!” more, and they found that the numbers somewhat supported that, and then made that the base of the algorithm…. icky.

      • I think they picked Twitter users who they knew were male or female, but the cues that they analyzed were identified by the computer algorithm. It doesn’t sound like they picked the words first and then went looking for bias. It’s more like the computer just confirmed the fact that most men talk one way and most women talk another. The computer algorithm is all about probability – which means that all of us who defy probability (hi!) and don’t fit the gender binary (hi again!) don’t get reflected in their results.

        So it’s not like the basis of the algorithm is biased. The algorithm reflects reality. It’s reality that’s biased. People really do fit themselves into the traditional gender binary.

        Now, if you’ll excuse me, I’m gonna go put on my_shorts, call up my_buddy, get in my_jeep, and go eat some youthful gay feminist sushi while LOL’ing at netflix.

    • In the study they specifically mentioned that they removed body parts and obscene words although I’m not really sure why. They were probably trying to stay away from a micky avalon-esque situation.

    • I was *just* coming here to say something about that. It’s an insult to Yuengling drinkers everywhere! 😉 <–LOOK AT THAT, I'M SUCH A WOMAN! <—AND THERE IT IS AGAIN! I'M OUT OF WOMANLY CONTROL!

  2. In a language variation course I took a few years ago (under sociolinguistics), a classmate did her course project on Craigslist personal ads. She was disappointed to find that there was NO significant difference between men and women, according to several variables she was looking at. I can’t remember which variables… This was an undergrad course so the results aren’t hardcore.

    I wish I knew I was a homo in university when I had ample opportunities to learn about sexuality + linguistics! Not too late I suppose… Laura, I vote for more language related posts. I dare you to do a study and analyze Autostraddle writers as a baseline, ha ha ha! Ooh, that would be cool, but a lot of work. 😀

    • I could definitely run some CHAT codes and look at some simplified patterns if you’d like? I have nothing better to do with my time 🙂 Maybe I could even use this opportunity to get to know LIWC and look at more in-depth stuff.

      What kind of variables are you interested in?

  3. my useless sociolinguistics minor is sighing again, my projects involving BLOG AS COMMUNITY were SO MUCH COOLER — @c. above, autostraddle wasn’t in my projects but i seriously spent like half a year collecting research on language styles of blog writers & their commenters & how language styles and ‘syntatic performativity’ and ‘transgressive orthography’ fostered community DROOLING ALL OVER MYSELF RITE NAO

    i also got to do a thing on online dating sites where i signed up for a shit ton of weirdo dating sites and stalked people a lot, college was awesome why are these things not relevant in the Real World

  4. OMG LOL BEST MUG!!! I’D PUT ALL MY YOGURT IN IT (smileysmileysmiley)
    (Seriously though I love the mug)

    This article was awesome and smart and linguistic-y and nerdy and pretty much all of the things I like to Autostraddle.

    • And apparently enough Republicans spell “patriot” incorrectly that the program used “patroit” instead – unless there’s some new French word I’ve never heard of that’s become extremely popular among the right wing

  5. did you guys see the part at the end where the guy is hating on people who don’t want to list their gender as male or female?

    “A lot of people will put any kind of garbage in those fields,” points out Rao. “People have things like ‘jellyfish’ or something, and you don’t know what ‘jellyfish’ means, or what gender is ‘jellyfish.'”

    i think that’s the point, dude.

  6. Wait, Republicans don’t have Netflix? Or is it like gay sex, they do have it but don’t talk about it in public?

    Frankly, I think this proposed technology is useless and kind of stupid. Even if it did work, who cares? Why are we wasting time and money coming up with algorithms that separate tweets into gender categories when there’s death and injustice all over the world?

    I think it was a bad idea to read this right after the post about the Norway terrorist. This just seems so frivolous now and it pisses me right off.

Contribute to the conversation...

You must be logged in to post a comment.