How are you carving up the world and gathering it back together? Or more specifically, when’s the last time that you said both x and y—and what were your x and y? Our dividing and lumping lines tell us a lot about how we see a situation.
A construction like both x and y is also handy for computers. Like you, computers are constantly encountering words and phrases they’ve never encountered before. But they are a lot more clunky than you are when it comes to filling in the gaps. It’s not easy to learn which parts of context to use to build up understanding. In terms of specific natural language processing applications, the both x and y construction helps do Named Entity Recognition (NER) for infrequent/new items. And it is a good testing ground for sentiment analysis: how do you best describe text that expresses both positive and negative sentiment?
In the case of both x and y, it is generally the case that x and y are the same type of thing. In other words, if you only know x, odds are that y is—at the very least—the same part of speech. For example, consider Twitter’s most popular both x and y’s for American English tweeps:
- both good and bad
- both men and women
- both Twitter and Facebook
- both love and hate
- both on and off
- both you and I
- both males and females
Notice the diversity here—we’ve got adjectives, named entities, common nouns, prepositions, pronouns. If you go enter these in Twitter’s search box, you can also see that in general the construction is taking two things that are traditionally seen as separate (if not oppositional) and saying something is true for not just one of these categories but the other one, too. In the case of both good and bad, most of the examples have to do with acceptance of everything in one’s life and/or the fact that there are a lot of experiences that are ambiguous. This kind of sentiment is common in uses of both love and hate—that is, that there are things that contain extremes and the tweep is ambivalent, which is not to say wishy-washy (etymologically, ambivalent is ‘in two ways’ + ‘strong’). [Twitter search: “both love and hate” | Twitter search: “both good and bad”]
We can confidently predict the part of speech for 2,237 of the both x and y examples. For these 64.0% of the data has x and y as the same part of speech (provided we treat pronouns like I/me/you as belonging to the same class as @twittername and proper nouns). This number goes up if we were to allow x and/or y to be more than single words (strictly speaking both China and the would be a proper noun and a determiner, but obviously this quadrigram is usually a fragment of both China and the United States, which is really “both [noun phrase] and [noun phrase]).
Twitter is famous for its 140-character limit. What happens when you look at both x and y across books? The top x’s and y’s in the Google Books data for works in American English published 1900-2000 have these as tops:
- both men and women
- both before and after
- both male and female
- both positive and negative
- both public and private
- both boys and girls
- both internal and external
There’s a lot of gender stuff going on. Next week, we’ll look at these more seriously, considering how various pairs have changed over the last one hundred years. So think of the last bullet list and the next graph as a teaser.
Note: there is a general principle of “bulky things go at the end”, so if you have a one-syllable word and a two-syllable word, you tend to put the one-syllable word first. More on this and how it affects couple namings, too. (For example, David and Juliet > Juliet and David, Chris and Colin > Colin and Chris, etc.) Let me know if I’ve both under and over teased you.
– Tyler Schnoebelen (@TSchnoebelen)