Crazy good: More nuanced sentiment analysis

7 Feb


The word “crazy” is one of the most flexible in English. It can be an intensifier as in crazy good/crazy bad and it can be positive or negative when standing alone, this party is crazy!, these demands are crazy. There is often a pejorative use associated with mental illness, so it is a sensitive and sometimes offensive word by association. In this post, we are looking at its grammatical context and how that contributes to sentiment.

At the end of the post, you’ll also see a bit about how we automatically detect and remove phishing messages and other spam.

So what do people on social media think is crazy? Mostly events. More specifically: movies, sports, life, women, and Kevin Durant.

Crazy talk (1)

The unexpected

The unexpected gets attention—this is a pretty basic truth of cognition and culture.

But we have various kinds of reactions. There’s the unexpected that fills us with excitement and there’s the kind that we reel away from or that we use to socially patrol others and ourselves. In other words, there’s crazy-fun and crazy-unacceptable.

The wrinkle is that even crazy-normatively-objectionable can inspire titillation that we kinda like. Here are other emotional cues that introduce someone saying something is crazy in social media: holy shit, man, lol, damn, ohh shitttt, smh (‘shaking my head’), lmao, omg. Just because we’re communicating an intense reaction doesn’t mean we actually know what we think about it. Pure emotional states are rare.

Crazy is a good example of a word with a complicated social signal. A major motivation for this post is that there’s a lot more to sentiment than positive/negative/neutral.

What does “crazy” mean?

For people concerned about the stigmatization of mental illness, some good news: craziness terms are applied to non-humans 3.47 times more often than to people—mostly to events and situations. (Although this is also the case for pejorative gay, which is also applied to situations—that’s gay—so this may be a limited consolation.)

Having said that, there are 1.66 times as many references to women being crazy as men (for example, mom, girl, she vs. he, guy, dude). This is a long-standing imbalance, just look at the etymology of hysteria. Back to gender in a minute.

When someone or something is crazy they are unintelligible. In speech, it could be a failure to take listeners’ needs into account by dropping reference cues that are necessary to follow (“Wait, you’re using a pronoun but you haven’t introduced the referent!”). Crazy talk is also where you say things that aren’t socially sanctioned, like a soldier giving an order to his commanding officer or an invisible penguin. Disrupting the established social order can get you labeled crazy.

Sometimes doing the unacceptable is good. Sometimes not. The main movie that people said was crazy was the Lifetime remake of Flowers in the Attic. It features Ellen Burstyn and Heather Graham locking kids in an attic. The story gets messier.

Craziness also indicates that there’s no reasoning possible. In Apache culture, you tend to stay quiet when someone is enraged (hashkee) because they are crazy (bíni’édi̜h): they forget who they are and lose concern for what their actions do. Odds are you weren’t raised Apache, but this will still sound familiar.

The connection between (un)reasonability and gender is doing a lot of work. A lot of the sexist messages give no real content other than This bitch is crazy. A fair number of writers specify that they love this fact but more are probably using it as a critique. Most instances don’t have enough context surrounding them to actually let us tell what the authors meant. They may not know themselves.

The majority of women who are labeled as crazy are left as relatively anonymous. The men labeled as crazy are more specific: Kevin Durant, John Tortorella and Dynamo the Magician, in particular. During the January time period I grabbed this data from, Durant scored 54 points against the Golden State Warriors, a career high for a guy who is a phenomenal basketball player even without that game. His scoring was exceptional (unexpected), which led lots of people to say Durant is crazy. The ways we describe situations and actions can pretty easily sneak their way into the way we describe people.

Scrubbing spam, singing sexy

The first worry of a data scientist is “garbage in, garbage out”. Hence the importance of data janitor functions. One step I did for this analysis was restrict my analysis to users whose follower and propensity-to-be-retweeted counts were within two standard deviations of the average. That reflects an assumption that people outside of that range deserve a different kind of analysis. At one end, they are new users and spammers; at the other end, hyperpopular celebrities and news outlets.

But dropping the extreme ends doesn’t eliminate a specific kind of spam: phishing. These are the messages that are sent out by normal users when their accounts have been compromised. For example, there were over 98,000 messages of the form, “@phishingvictimfriend haha your blog is crazy http://evilurl”.

In our system, we do unsupervised clustering in order to group messages that are the most alike together. Those 98k crazy-blog spam messages have a variety of users addressed with the @ and a multiplicity of URLs, but they end up all grouping together. In this case, there are about 4 major spam clusters worth removing from consideration (so they aren’t reported above).

Automatic clustering gives you a sense of the data you’re dealing with, something that word clouds are fairly pathetic at. Clustering techniques allow you to get exemplars, the most representative messages for each cluster. This is where you can also see if there are problem zones like phishing and where you can see which messages are enjoying the widest circulation.

A huge number of the bitches is crazy posts are referencing a lyric from Lil Durk’s Bang Bros, which was released last October (I’d include a link to the YouTube video but it’s really dull and not worth watching). Song lyrics stick around for a while: folks are still tweeting out “This shit is bananas, B-A-N-A-N-A-S”, which comes from Gwen Stefani’s 2005 Hollaback Girl. That’s a song where she tells some dude to meet her at the bleachers for a fight. Watch it, Lil Durk.

Analyses of social media often need to decide what to do with song lyrics. The right way to do it is to ask the client what they want. For example, Granger Smith’s country song has the lyrics, “I wanna love you on a Silverado bench seat”. When people tweet that should it count as positive sentiment? Neutral? A case could be made for either.

Sexuality is important component for many brands. In our work, we’ve found that a huge part of how cars and trucks are evaluated in social media has to do with their sexiness or cuteness, including the sexiness/cuteness of drivers. So brand managers for Chevy and competitors probably do want to keep track of this sort of thing. (Product aside: we offer automatic detection of sexy-cute and lots of other dimensions like intent-to-buy that are more fine-grained than traditional sentiment analysis, check out the second row of our product page.)

To be honest, being able to decide this is kind of a luxury. This is not the kind of feature sentiment analysis tools typically have. They’ll just automatically count that lyric about the Silverado as positive (mostly because of the word love) but they’ll also count completely irrelevant Silverados like “I have to go to silverado tomorrow I might kill myself ??¿” That’s negative about Silverado, California not the Chevy Silverado. Drop us a line if you’re interested in learning more.

– Tyler Schnoebelen (@TSchnoebelen)



Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: