Archive | August, 2015

Don’t mention museums! Tips for couchsurfers and sentiment analysers

31 Aug

I had the great pleasure of hosting a webinar with Vita Markman and Chris Potts. Vita joined us from LinkedIn where she is an engineer handling all sorts of natural language processing (NLP) tasks. Chris joined us from Stanford, where he is an associate professor of linguistics and director of the Center for the Study of Language and Information (CSLI).

One the problems that sentiment analysis runs into is similar to any other classification problem: what’s in and what’s out for each category? Chris had examples like:

Many consider the masterpiece bewildering, boring, slow-moving, or annoying

In this case, something is called a masterpiece, but it’s also reportedly much-maligned. Depending on what you’re doing with sentiment analysis, you may want to deal with reported information differently than someone talking about their direct experience. It’s a lot harder to get people to agree on how to categorize emotions when they’re embedded in something like an I heard that you feared that he sensed that she thought that they said that everyone absolutely loved it.

Classification requires consistency

When Vita and Chris talk about experimental design, this is an important part–defining categories so that humans are consistent is a crucial step for getting machines to be able to automatically classify something. That’s true whether you’re classifying social media in terms of sentiment or extracting person names from Korean product reviews.

Vita gave the example of a former colleague wanting to crowdsource emotionally-charged language–but they couldn’t define what that meant. Machines can learn patterns automatically from large sets of data, but they have to learn from something. Unless you (and your team) can give exemplars and consistently label the categories you care about, it’s hard to get other people or machines to do the classification correctly.

The extra wrinkle in analyzing automatic classifications is that correlations sometimes behave in ways we don’t expect. As Chris says about trying to measure team effectiveness through politeness and sentiment, “productive teamwork might be possible only if people feel empowered to express frustration, which will be read as negativity correlating with a desirable team outcome.” This is the case with speed-dating, too, in which saying something negative about each other correlates to a positive speed-dating experience.

Training on your data is better than training on someone else’s

Another aspect we talked about in the webinar had to do with appreciating domain-specificity. It’s often a bad idea to try to treat a model from one set of data as something generic that can be applied to any other kind of data. Consider Couchsurfing.com. Chris analyzed what words went with people who were identified by their hosts as good surfers and which ones weren’t. What hosts really wanted were people who engaged with them and weren’t just using the couch as merely a landing pad. As Vita said after he showed the results in the webinar, “I have never seen museum in a negative context before…[it] reinforces how domain-specific and how context- and people-specific sentiment words can be.”

Bringing in context is also how you know what to do with something like You’re terrible!

Screen Shot 2015-08-31 at 1.48.53 PM

If everyone is smiling and laughing, there’s a pretty good chance that’s positive even though on the face of it telling someone they are terrible should be negative. This is also how Chris addresses how to think of sarcasm–there’s a nice layout of this in the webinar, walking through what bits of context you could lean on to get the sentiment right for Yeah, great idea.

We also talk a bit about politeness, power, reputation, emotion. Near-and-dear to my own heart is the idea of positioning. In the webinar, we discussed work on social balance/social status. Understanding how to impute social relationships from words and other features helps you understand how to interpret something potentially ambiguous like You’re one crazy {expletive}!

Easy-to-implement practicalities

We also talked practicalities, like Vita’s helpful suggestion about how you find key phrases that are meaningful, rather than just popular. Let’s say you’re looking for bigrams and trigrams that matter. If you just use frequency, you’ll end up with lots of prepositional phrases like of your department or non-topical things like good morning. She shows how to drop those so that you can focus on things like jobs on LinkedIn or talent solutions.

We also chat a bit about cleaning up the data, which is always important. An additional point from Vita here: people often remove “stop words” because they can get in the way of seeing trends. Stop words are little, frequent words like of, may and the. One of the most important things to consider, says Vita, is negation. Negations like not and never are often removed but that can give you a very inaccurate reading about what’s going on.

Vita has mentioned these examples:

  • rarely arrived on time
  • cd arrived without case
  • no issues with delivery. arrived promptly
  • no delivery. issues with shipping.

If you don’t know about rarely or without, you won’t understand what’s going on in the first example. And if you don’t understand the “scope” of no in the two other examples, your system won’t understand that (3) is reassuring to a company while (4) may suggest a big problem.

Go watch the webinar to get even more ideas and contact us at info@idibon.com if you’d like to hear how we help with consistent, context-specific, easy and actionable insights.

Emoji use: Who, where, how

20 Aug

Emoji are on the rise. People on their smartphones and on social media use emoji to add a visual key to their message. Today, emoji are being used in advertising, in the courtroom, and even in recent political campaigns. To learn more about how emoji are being used in the business world, you can check out the blog post and video here.

There were 722 emoji when the Unicode 6.0 character set was released in 2010 and one hundred more have–and will–be added. So it’s not surprising that not all emoji are used equally. What are the most frequently used emoji? Are some emoji used and interpreted differently across different cultures and groups of people? And do people really use emoji to communicate strong emotions or are they more of a whimsical addition to a text message?

Check out this video to learn about the who, where, and how of emoji use around the world!