Archive | January, 2015

The rise and fall of a dream

20 Jan


Martin Luther King, Jr. Day was yesterday, but even here in the U.S.A. not many people know his speeches beyond a few famous quotes.

We live in a world of words. Words that represent us, words that motivate us, words that are themselves actions (I now pronounce thee…), and MLK was a master of expression in every one of these. In this post we use computational methods to analyze the major speeches of Martin Luther King, Jr., showing how we can find insights and trends that organize his great body of work.

For our data, we looked at 22 major speeches and sermons that are anthologized as The Landmark Speeches of Martin Luther King, Jr. and The Great Sermons of Reverend Martin Luther King, Jr.

Imagine a spreadsheet with well-behaving rows and columns: that’s structured data. Language data isn’t so orderly. Especially in speeches, themes rise and disappear only to re-emerge again. There are patterns to this, but not as clean as what comes out of, say, a heart monitor.

Part of the structure that is present in these speeches is marked by audience responses, when Dr. King paused, and where he placed paragraph breaks when he wrote them. So we’ll treat the speeches as if they are made up of lots of smaller segments. We end up with 988 segments for the 22 speeches. Note also that we’re going to leave in audience responses as they are transcribed as well as various bible quotes and hymns. We treat these as an important part of what is going on, not noise.

We used Idibon’s trend and topic discovery features to analyze the data. We can choose the number of trends and topics that we’d like to uncover to any level of granularity. Here, we kept it simple and looked for the 22 most emergent trends over the 22 speeches, allowing multiple topics per speech, and even multiple topics per segment. First a giant inforgraphic, then some more details and some important quotes.

Topic and theme detection help us find major themes in Dr. Martin Luther King Jr.'s speeches

While the words do not give the whole picture of a topic, we can find some interesting topics by looking at the most closely associated words with each topic. For example, here are some of the keywords associated with Topic 1–importantly, these aren’t just the most frequent words in this topic, they are the most frequent and distinctivefreedom, applause, day, long men, justice, yeah, dream, history, god, stand children, people, struggle, violence, dignity, white, death, nation, faith.

By contrast, Topic 2 is best described by these words: god, life, sir, i’m, don’t, yeah, make, morning, things, lord, preach, heart, that’s, time, called, amen, good, people, live, great. As you can see, there are three words that these two topics have in common: god, yeah, and people. In topic modeling, each word and each document can be in multiple topics. That’s really closer to how things work in the real world: it’s very rare for something to be only one thing. The world–and our language–are multifaceted and intersectional.

The reason I call Topic 1, the “I Have a Dream topic” is because 56% of I Have a Dream paragraphs actually get classified as being part of Topic 1.

Topic 1 also describes a lot of the speech from The Great March on Detroit–which makes sense since that was an earlier version of the I Have a Dream speech. And Topic 1 describes large sections of the speech where he accepts the Nobel Peace Prize, Eulogy for the Martyred Children, and Dr. King’s early speech at the Holt Street Baptist Church. I want to pause for a minute and reflect on the fact that there had to be speeches called Eulogy for the Martyred Children and Give Us the Ballot. One of the questions we’ll be pursuing in an upcoming blog post is how these themes have–and haven’t–changed in #BlackLivesMatter and #ICantBreathe. But back to Dr. King.

Topic modeling identifies the parts of speeches that are most emblematic of each topic/theme. I’ll close with examples of Topic 1, the “I Have a Dream topic”. I’ve bolded the words that are the most characteristics of Topic 1 that also appear in these paragraphs.

With this faith, we will be able to hew out of the mountain of despair a stone of hope. With this faith, we will be able to transform the jangling discords of our nation into a beautiful symphony of brotherhood. With this faith, we will be able to work together, to pray together, to struggle together, to go to jail together, to stand up for freedom together, knowing that we will be free one day. (I Have a Dream, August 28, 1963)

I have a dream this afternoon (I have a dream) that there will be a day that we will no longer face the atrocities that Emmett Till had to face or Medgar Evers had to face, that all men can live with dignity. (Speech at the Great March on Detroit, June 23, 1963)

And so I stand here to say this afternoon to all assembled here, that in spite of the darkness of this hour (Yeah Well), we must not despair. (Yeah, Well) We must not become bitter (Yeah, That’s right), nor must we harbor the desire to retaliate with violence. No, we must not lose faith in our white brothers. (Yeah, Yes) Somehow we must believe that the most misguided among them can learn to respect the dignity and the worth of all human personality. (Eulogy for the Martyred Children, September 18, 1963)

And as we stand and sit here this evening and as we prepare ourselves for what lies ahead, let us go out with the grim and bold determination that we are going to stick together. [applause] We are going to work together. [applause] Right here in Montgomery, when the history books are written in the future (Yes), somebody will have to say, “There lived a race of people (Well), a black people (Yes sir), ‘fleecy locks and black complexion’ (Yes), a people who had the moral courage to stand up for their rights. [applause] And thereby they injected a new meaning into the veins of history and of civilization.” And we’re going to do that. God grant that we will do it before it is too late. (Oh yeah) As we proceed with our program, let us think of these things. (Yes) [applause(MIA Mass Meeting at Holt Street Baptist Church, December 5, 1955)

Reading these examples, you see one of the themes of Dr. King is change and transformation, and these are a set of ideas and words that describe one version of that powerful and important theme in his speeches and sermons. The question that the infographic above asks is: what happens to this theme after 1964? What about it is complete, abandoned, and replaced. How is transformation transformed?

– Tyler Schnoebelen (@TSchnoebelen) [sociable]


Quantifying the Word of the Year

9 Jan


The oldest running “Word of the Year” is the American Dialect Society’s. And it’s getting chosen tonight (Friday, Jan 9th, 5:30pm PST). Folks will be live tweeting it so follow the #woty14 hashtag for the play-by-play.

There are a few ways to determine a Word of the Year. One is that you can go for a big theme from the news–this is what did in choosing exposure.

Another way to do it is to look at works that have increased a lot in the last year. For Merriam-Webster, this was online searches for culture. For the Oxford Dictionaries, it was vape.

But there are other quantitative methods for finding words-on-the-rise. Here are two of my favorites.

Jack Grieve proposes that an “emerging word” is one that starts out rare in some time period and quickly rises in relative frequency. (His blog articles are very readable, you might also check out his presentation from yesterday.)

From Jack Grieve

You definitely will want to go check out his work, if only to update yourself on how fuckboy, m, hbd, fw, ft, gmfu, sm, squad, asf are getting used.

Another approach is to collect all the times people are explicitly saying they love or hate a new word. That’s what @hugovk did with various bots collecting sentences on Twitter all year, check out the work here.

The most discussed words from this standpoint are bae, thot, and no.

What’s a word?

Dirty secret: it’s actually hard for linguists to define what a “word” is. You’re happy that I will is two words in English but how many words is I’ll, what about ill in text speech? What about Chris Brown’s–what do we do with that possessive s?

Last year, the word of the year was Because X–a really great innovation where you can say something like because reasons. (Here’s an analysis of how because x is getting used.)

Gretchen McCulloch proposes that really the word of the year should be an emoticon or an emoji. That’s certainly near and dear to my heart from our work on emoji recently (here and here) and the fact that a big part of my dissertation was on emoticons.

If you look at what’s popular on Urban Dictionary searches across the last half of 2014 compared to the last half of 2013, you see a number of things come up that may also be worth a Jack Grieve-style analysis, things like basic bitch, bye Feliciafacebook mommyfriends with possibilitieshigh Qwill advise, and maybe my favorite, next level bullshit.

Finally, take a look at the OTHER categories in the Word of the Year competition, like Most Useful (vote for even as in I can’t even!) and Most Creative (like columbusing).

– Tyler Schnoebelen (@TSchnoebelen)