Cristian Danescu-Niculescu-Mizil, Timothy Hawes, and colleagues have released some more corpora that are worth playing with.
- The Wikipedia Talk Page Conversations Corpus: 125,000 conversations involving about 30,000 editors. Metadata such as editor’s status, time of status change and gender is included.
- Supreme Court Dialogs Corpus: oral arguments making up 51,498 utterances (50,389 conversational exchanges); 204 cases with 11 justices and 311 other participants (lawyers, for example). You get case outcome, justice vote, gender annotation, etc.
These corpora support really interesting work about how accommodation and power go together.
One of my interests is how the word little gets used in terms of power relationships (see also my dissertation). I did a quick look through the Supreme Court. (Very very quick, so I make no conclusions, here, just report some numbers.)
Justices Breyer and Ginsburg both talk a lot during oral arguments (their speech represents 15.51% and 11.37% of all of the speech-of-justices that are in the corpus). There are 232 turns involving the word “little”. That’s not quite as much as I’d like to make really strong claims. But these judges use the word at very different rates: Breyer uses it 71 times–nearly twice as often as we’d expect if little was distributed across the justices based on their total number of turns. Ginsburg, on the other hand, only uses it 10 times, that’s 38% of what we might have expected.
I’m not going to offer any analysis here, but I do want to give some examples of the work that little does–it is sometimes dismissive, sometimes hedging (it is especially likely to hedge states that the justices are claiming for themselves, it seems). Here are a few examples from Breyer:
- “they have a little paragraph of explanation”
- “now, that’s a little tough”
- “you put a little thing in the corner”
- “I’m a little nervous about it”
Other justices talk about being a little puzzled, confused, etc. But notice that Ginsburg never pairs little with any kind of mental state. The very closest she comes is something pretty far off (“I” is not in the utterance): “It’s — given that we’re dealing with sophisticated judges, the same panel in both episodes, it’s a little hard to — to see where the due process violation is.”
Btw, the string “I mean,” is used 2,020 times. Justices and non-justices speak roughly the same amount of time but the justices are the ones who are using “I mean,” much more (in terms of utterances that have the phrase in it, justices use 1.4 times more often than we’d expect them to; 2.6 times more often than the non-justices who talk). Wanna know who is the biggest “I mean,”‘er? Have a guess?