Last week was the Social Media & Web Analytics Innovation Summit here in San Francisco. I thought I’d take this opportunity to put together a glossary of terms that were used around the conference and share the slides I presented (I recommend downloading them so that you can read the notes, it’s a fairly visual presentation). The presentation is basically about stuff you can do that’s more interesting than sentiment analysis. The third case study is about working with routing text messages for the UN–it was recently featured in this article from The Guardian in Nigeria: Boosting community development, disease control with SMS-based platform.
Btw, here’s the presentation abstract:
Sentiment analysis is a rudimentary classification of messages into buckets like positive, negative, and neutral. On its own, sentiment analysis rarely answers key business questions. Though it is automatic and scalable.
Now what would you do with unlimited human analysts? You’d ask them to classify messages into categories that enable you to take action. Machine learning models with humans-in-the-loop can power sophisticated classification.
This talk walks through case studies that demonstrate the value of categorizations beyond sentiment: detecting the toxicity/supportiveness of Reddit communities, understanding the effectiveness of Always’ #likeagirl campaign, and routing text messages to UNICEF.
There was a mix of technical and not-so-technical work being presented at the conference. The phrases below tended to get thrown around without a lot of definition—so I thought I’d give a quick sketch and some resources to learn more about them.
Quick conference glossary in not-quite alphabetical order
Here are some of the key phrases that came up in the conference. Feel free to add nuance in the comments (or ask for more definitions).
Data science: One of the conference’s most popular job titles was “data scientist”. Data scientists are, basically, folks who wrestle with data to turn up trends and actions that organizations can do something about. That’s a pretty loose definition. See also our post on The dirty hands of data scientists, which features results from a survey of data scientists about what THEY think they’re up to.
Machine learning (supervised): You want to teach a computer to do something. Say you want it to categorize items: you give it a bunch of items that go into Bucket A and a bunch of items that go into Bucket B (etc). Then you say, “pay attention to these qualities (features!)”. In natural language processing, one key thing you want to pay attention to is words. Now the computer can say, well, these words seem to go with Bucket A and these other words go with Bucket B, and other words seem to be evenly split. If it sees a new document with a lot of Bucket A words, it’ll put it in there. There are a lot of fancy things that can happen at this point, but those are the basics. See also last week’s Text by the Bay recap and our earlier Machine learning for medicine for more examples.
Machine learning (unsupervised): You want a computer to teach you something. Say you have a lot of text and want to know what the common clusters of themes people are talking about in that text. Check out our definition for topic modeling below, that’s one prominent example.
Feature reduction: First off, what’s a feature? In the world of machine learning, what we mean here is “a thing you could extract and use for prediction/classification”. For example, if you’re doing sentiment analysis, you know that if you see “I love Dell” that you’d want the computer to keep track of “love” and use it as an indicator of positive sentiment. So you can think of “love” as a feature. (Or you can think of words and phrases and features, and perhaps “time of day”, etc.) You might also have some linguistic parsing that abstracts “I love” to “1st person expressing desire”, which would also cover “I like”, “we adore”, etc. You might also want the combinations of features, the combinations can easily run into the billions, and it is tough for machine learning algorithms to make accurate predictions of billions of features. So you’d like to only use ones that are the most promising. That is, you figure out what makes something a worthwhile (significant) feature and use those for classification. That’s feature reduction.
Deep learning: Machine-learning that uses algorithms to find complex correlations between features. That is, instead of just having a feature do classifications/predictions, you’re automatically coming up with abstractions and combinations from the features you’ve selected.
Natural language processing (NLP): Contemporary NLP is based on machine learning, but specializes and develops it for language (whether speech or text). If you want to learn more about NLP, check out An NLP tutorial with Roger Ebert, NLP for all languages, The top 10 NLP conferences, and 9 top computational linguistics citations.
Named entity recognition (NER): Within a document, you may want to automatically identify and extract things like person names, organization names, and place names. Those are the standard NER methods, but you can also extract other things–like parts of cars. Take a look at Improving privacy with language technologies, 16 places that aren’t anywhere, and Naming camps and corporations.
N-gram: NLP-shorthand for sequences of words (phrases). A simple but effective feature in machine learning applied to text. A unigram is one word, a bigram is two words in a row, a trigram is three words in a row. N-gram or ngram just refers to the fact that sometimes you want to even capture sequences of 10 words. See also Sexing up “6-gram”.
Novelty detection: There are lots of things you can predict beyond sentiment (see my presentation above). One of the things you could do is figure out when an article comes in, if it’s substantially different than everything else. If Samsung acquires Sony, that’s immediately huge news but you may not care to read all of the articles that come after the first few–maybe you only want the novel ones. Those that really add something new to the discourse. Novelty detection works to find “new stuff”. In contemporary systems, novelty detection often uses a form of unsupervised machine learning.
Polarity: In sentiment analysis, polarity refers to the two poles of negative and positive. Another term for this is valence.
Syntactic parsing: If you’re only using unigrams (single words) then you are essentially saying that the order of words doesn’t matter. Obviously it does. Syntax is the study of the regularities that let us put sentences together so that they make sense to us. Parsing helps find the meaningful components in a sentence. For example, in the previous sentence, “the meaningful components” is syntactic unit in a way that “components in a” is not, even though both of them are three word sequences. If you ask a computer to parse a sentence, it’s usually because you hope that they can find stronger patterns and/or because you’d like the output to be a little more readable (if I told you that “components in a” was an important phrase you wouldn’t know what to do with it).
Text classification: Sometimes, like with Named Entity Recognition, you want to pull things out of a document. Other times you want to classify a document into various categories. Text classification is just a way of saying that you want a kind of natural language processing solution that tells you how to categorize documents.
Text mining: Text mining and text analytics are about extracting information out of textual data. So they are really just other ways of saying “natural language processing”.
TF-IDF: This is a handy statistic that helps computing whether a word or phrase really is significant. The “TF” is for “term frequency”–how often does a word occur in a document? Frequency is a crucial measure, but we do want to deal with the fact that we have lots of words that are VERY frequent (like the) and therefore tell us very little. That’s where the IDF comes in–“inverse document frequency”. This is a way of saying that if a word occurs in every document it isn’t very important after all. TF-IDF is often used as a feature in supervised and unsupervised machine learning.
Topic modeling/clustering: Text classification and named entity recognition both use training information–you have humans label some training data so that the computer can learn significant patterns. In topic modeling, you ask the computer to figure out on its own what documents are like each other. Take a look at Entrepreneurs and empresarios: trends in English, French, and Spanish, 8 Olympic trends in the Russian blogosphere, #BlackLivesMatter: How events change conversations, and The rise and fall of a dream.