Word clouds are pretty. Here’s what it looks like across presidential and vice-presidential debates from the first Kennedy-Nixon to the third Obama-Romney.
Frequency is kind of like the old grey mare of corpus linguistics–you don’t want to ride it too hard. Let’s try trotting out something just one notch more sophisticated. In this post, I’m going to try to answer Joshua Benton’s Twitter question from last night: how have strength and weakness been used in presidential debates?
In terms of absolute frequency, George W. Bush uses words related to strength more than any other presidential or vice-presidential candidate–here I’m combining strong/stronger/strongest/strength/strengthen/strengthened. GW has 76 uses. The next closest is Jimmy Carter with 69 uses.
A different way to look at the data is to say: let’s add up all of GW’s words and see if that 76 is a lot compared to everyone else. I couldn’t find a real corpus of debate transcripts, so I made one myself using raw data from here and here. I cleaned it up as much as I could in the night:
When we look at word counts for all the candidates, moderators, questioners, etc, we see that GW has 7.07% of all the words spoken. There are 752 uses of strong/strength/etc. So if everything were random, we’d expect him to use these words 7.07%*752=~53 times. So he *is* using it a lot (1.4 times more than we’d expect). But there are folks who use it even more. The biggest users (by “observed/expected”) are:
- Mondale: 61 uses (14 expected)
- Kennedy: 62 uses (24 expected)
- Dukakis: 43 uses (19 expected)
- Carter: 69 uses (32 expected)
The big avoiders–who use less than half what we’d expect given their overall word–are (in order of most constrained to least constrained):
- Lehrer (he’s been part of 12 debates, so he’s got a lot of words–I’m only including people who have at least 1% of all the words in the corpus)
You might be wondering about the role of weak/weakness/etc. Well, there are far fewer uses of any of these words (only 75 total for everyone). For what it’s worth, these are the main users: Kemp, Ryan, Carter, Romney, Obama, and Perot. But I’d be careful with this since the counts are so low for all of them (Carter has the most, with 14 tokens–Ryan only uses it 3 times…so do you really want to read that much into his use?).
What about over time? Using the same kind of “observed/expected” logic but for years rather than speakers, we find that the big boom years of strength/etc were 1960, 1976, and 1984. The current year has about 60% of what we’d expect if everything were distributed at random (i.e., “observed/expected”). This has been a pretty big year for weak/etc, though (18 uses in reality instead of the 9 we’d expect–but again, much smaller counts).
Now for a few other odds and ends. First, word counts:
- Obama has been in six debates. Mostly he has the same number of words as his rival.
- He had about 1.9% more words than McCain across their three debates.
- He had a 2.6% deficit in terms of Romney.
- The first Obama-Romney debate was a difference of 7005 words (Obama) to 7742 (Romney). Actually, Obama had–proportionately–fewer words in the third debate: 7493 vs. Romney’s 8553. (So I wouldn’t jump to any “fewer words correspond to lousy performance”.)
On particular other words:
- As you might have guessed, McCain and Biden loved talking about friend/friends/etc. So did Kemp, Palin, Lieberman, Romney, and George W.
- The big users of leader/leadership/etc have been Dukakis, Ferraro, Mondale, Bentsen, Palin.
- I was fascinated by how much Bill Clinton changed in his DNC speech this year–the Atlantic has a great visualization–I expected him to be a big users of “now,“. But he’s not the main one: Perot, Obama, Edwards, Kerry, Bentsen, and Gore are.
- Everybody loves freedom, but especially Kennedy, Nixon, Palin, Cheney, George W., and Ryan. Liberty gets much less love (its proponents are George W., Lieberman, Ryan, Mondale, Clinton, and Dole).
- In the debates, Romney has painted a dire picture, yet insisted he is an optimist. Optimist/optimistic/etc are used most by Bush Sr., Bush Jr. (a family value?!), Kemp, Dukakis, Reagan, and Romney.
- I wanted to give you the great uh‘ers, but it looks like the transcribers weren’t consistent in transcribing it. But if you’re curious, Ford has 358 uh‘s and Carter has 507 (Carter does have more words overall, so just comparing them, you’d say that Ford was more uh-prone).
So there you go. Properly, I should be reading all of the examples and giving some interpretations. I’m going to leave that to you all. What do you see in digging through the data (or your memory banks)?