How do you tell male speech from female speech? University researchers David Bamman, Jacob Eisenstein, and Tyler Schnoebelen took up the challenge of replicating and reciprocating previous research, which used Twitter, to point out that identifying gender differences in speech is a lot more complicated than the conclusions previous studies came to.
The latest study cites the work of researchers Burger, Henderson, Kim, and Zarrella 2011, that measured 184,000 “authors,” or Twitter users, which was able to obtain 75.5 percent accuracy for identifying the gender of the speaker, granted that their algorithm had multiple tweets per author to work with. 67.8 accuracy was obtained from single messages per author. Bamman, Einsentein, and Schnoebelen, were able to best these results, arriving at an impressive 88.8 percent accuracy.
They achieved this rate by looking at deciphering gender-based lexicon a little differently than their predecessors, although the study does build on top of work that has already been conducted in this field.
The researchers used a test group of 14,464 English speaking authors based in the United States. To make sure that these users were in fact English speakers, for the sake of accuracy, each author needed to have used at least 50 of the 1,000 most common English words. A total of 9,212,118 tweets were investigated over the course of this study, and to help make sense of this trove of data, researchers used a predictive machine learning algorithm that would be able to make an educated assumption about the gender based on individual words.
What previous research hasn’t touched on was the fact that in each gender there is more than one style of speech, influenced by our social circles. “Individuals with many same-gender friends tend to use language that is strongly associated with their gender,” the study explains. “Individuals with more balanced social networks” on the other hand, aren’t likely to give into gender-specific “norms” when it comes to their speech habits.
Based on the study’s results, the researchers compare previous research in gender speech and reveal the inaccuracy of their predecessors’ findings, which you can check out in the table above. But to give you an idea about the differences in speech at a glance (and not diving too deeply into the complexities of different styles of speech within each gender), the more obvious results pointed out that women will normally tend to use emotional language like “sad, love, glad, sick, proud, happy, scared, annoyed, excited, and jealous.” Emoticons, and CMC (computer-mediated communication) terms (lol, omg, brb, for instance) are female markers, “as [are] ellipses, expressive lengthening (e.g., coooooool), exclamation marks, question marks, and backchannel sounds like ah, hmmm, ugh, and grr.”
Clear male markers include words related to swearing, technology, and sports, and in relation, numbers (as in scores).
Terms relating to family are a little more complicated. In the past they’ve been a market of female gender, but this study marks it as a point of contention with “mixed results” as the researchers explain in the following statement:
“Of the family terms that are gender markers, most are associated with female authors: mom, mommy, moms, mom’s, mama, sister, sisters, sis, daughter, aunt, auntie, grandma, kids, child, children, dad, husband, hubby, hubs. However, wife, wife’s, bro, bruh, bros, and brotha are all male markers.”
To briefly get deeper into how the study picked apart lexicon patterns, the researchers identified eight different types of categories of speech – dictionary words, punctuation, non-standard unpronounceable words, non-standard pronounceable words, named entities, numbers, taboo words, and hashtags – and plotted these to individual social groups of authors to find the patterns, giving them a better sense of gender speech habits than before. If you had a knack for statistics you might be able to get a better idea of the researcher’s findings with the chart below:
Our speech is ultimately predisposed to certain types of languages based on gender, but what steers it are our social ties. The takeaway? Who you hang out with, and the number of guys or girls that are in your social group (both online and off), will effect how you end up speaking on Twitter.