People have different styles of communicating–we interpret these styles situationally (“he’s upset/flirting/pretending to be objective”), broadly (“he’s rich/poor/straight/gay/born in Detroit”), and/or as identity markers (“it’s cuz he’s a total bro“).
It occurred to me that a great place to look at style is in the Canterbury Tales because Chaucer is doing so much styling of his characters–making not just their stories but their poetics different from one another. This post gives me a chance to link you to some Middle English corpora, talk a bit about part-of-speech (POS) tagging, and to tell you which pilgrim in the Canterbury Tales is the most like Sarah Palin (okay, yes, in one particular dimension).
Some background
A few week agos, Eric Acton gave a presentation at NWAV about work he and Chris Potts had done on affective demonstratives and Sarah Palin. For example:
And Secretary Rice, having recently met with leaders on one side or the other there, also, still in these waning days of the Bush administration, trying to forge that peace, and that needs to be done, and that will be top of an agenda item, also, under a McCain-Palin administration.
If you look at Palin’s speech, she really has this/that/these/those all over the place. Everyone can and does use demonstratives to position not just physical objects (“hand me ‘that’ cup over there”, “‘this’ is my son”) but to take up stances towards things that draw them closer or push them farther away from the speaker and/or their audience. But some people, like Sarah Palin, use this device A LOT. We could say that it’s part of her style–part of what makes her, her.
There’s lots of interesting stuff to be said about these affective demonstratives and I recommend that you check out:
- Lakoff, Robin. 1974. Remarks on this and that. In CLS 10, 345-356. Chicago: Chicago Linguistic Society. (Okay the first thing I list I haven’t actually been able to find…Chicago and/or Berkeley linguists, help!)
- Mark Liberman’s various posts on the Language Log: here and here (and maybe here; Chris Potts also has a follow-up on the LL here).
- Davis, Christopher and Christopher Potts. 2010. Affective demonstratives and the division of pragmatic labor. In Maria Aloni, Harald Bastiaanse, Tikitu de Jager, and Katrin Schulz, eds., Logic, Language, and Meaning: 17th Amsterdam Colloquium Revised Selected Papers, 42-52. Berlin: Springer.
- Potts, Christopher and Florian Schwarz. 2010. Affective ‘this’. Linguistic Issues in Language Technology 3(5):1-30.
- And we should hound Chris and Eric to post their NWAV talk, too.
- If you want to know what sociolinguists are doing with style, btw, I have some resources on my website.
Corpora for Middle English
I have a secret plan to write up something about Old English corpora and then about Shakespeare and beyond, but for now, let’s stick to the late 10th century to the early 15th century corpora.
Part of speech tagging
If we want to investigate “this”, we’re in pretty good shape just using good old grepping (or Ctrl-F).
But “that” is tricky, because our hypothesis doesn’t really thing that the complementizer “that” is doing affective stuff (complementizer = “I pity the fool that falls in love with you”).
In other questions like this, you can use any of a number of POS taggers out there free and fairly easy to use. My tagger of choice is the Stanford POS Tagger. A few quick notes about it:
- The highest accuracy comes with using “bidirectional-distsim-wsj-0-18.tagger” but it’s really really slow. Don’t use it. The “left3words-wsj-0-18.tagger” is almost as accurate and A LOT faster. I’m not kidding about this.
- Once you’ve installed it, the basic command to get it going is as follows. Note that the “-mx300m” is about memory, so you may need to up the “300” part.
{call java at the start–the full path to your java if you don’t have it saved to your path} -mx300m -classpath {directory stuff}/postagger/pos_tagger_dir/stanford-postagger.jar edu.stanford.nlp.tagger.maxent.MaxentTagger -model {directorystuff/}postagger/pos_tagger_dir/models/left3words-wsj-0-18.tagger -textFile {inputfile to tag} > {output file name}
But of course tagging Middle English poetry using a tagger that was trained on 20th century articles from the Wall Street Journal is not such a great idea. Actually, I was surprised at how well it did, but nevertheless: uff.
So yaaaaay, WordHoard to the rescue.
WordHoard corpora + tool
Now, you’ll have to download WordHoard and figure out how to use their interface, but I found it pretty easy to do stuff like find collocates, multiword expressions, compare texts (Wife of Bath vs. Knight; Hamlet vs. Twelfth Night, etc.). It also has lexicons for the various corpora it supports.
Those corpora are not just English–you also have Ancient Greek. From the website:
- Early Greek Epic. This corpus includes Homer, Hesiod, and the Homeric Hymns in the original Greek, with English and/or German translations for all texts but Shield of Herakles.
- Chaucer. We have all the works of Chaucer, including all of The Canterbury Tales.
- Spenser. We have all of the poetical works of Spenser, including The Faerie Queene.
- Shakespeare. We have all the works of Shakespeare, including all of his plays and poems.
These are POS tagged, which is going to let us distinguish between complementizer-that and demonstrative-that (for example).
Collocations (first steps in analysis)
One of the most fun things to do with corpora is to look at collocations–what words are showing up with what other words more than they should be by chance? “Black and white” is a collocation, “salt and pepper”, too. You’ll also find that “force”, “tool”, and “group” collocate with “powerful” (but not “strong”), while “support”, “ties”, and “relationships” are “strong” (but not “powerful”).
So the first question I have is, what gets collocated with “this” and “that”? To figure this out, I ask WordHoard to give me collocates that appear 1, 2, or 3 words after “the”, “this”, and “that (d)”. There are lots of ways of calculating strength of relationships, but for this mini-project I’m going to go for exploration more than proof. So what I do is drop out words that collocate strongly with “the” as well as “this” and/or “that”. What I care about are the words that seem to be more strongly associated with demonstratives than with the plain-ole determiner “the”. The thinking here is counterfactual: when someone uses this/that, they COULD have used “the” instead (and that is generally more common).
First, let’s look at “this”. I’m going to report things that occur with “this” 1.5 or more times more often than they occur with “the”. Note that there are 3.2 times as many “the” tokens than “this” tokens, so we’re talking about words that are REALLY combining with “this” a lot more than if the distributions were just random.
- You’ll immediately notice that adjectives are showing up:
- jolly, sorry, noble, little, wide, worthy, innocent, woeful, fresh
- And other affectively oriented words:
- Lots of social roles:
- maid, merchant, duke, yeoman, summoner, maiden, dame, earl, monk, carpenter, marquis, messenger
- And really interestingly, proper names, too, which goes with the narrative style of “this {social role}”, I think:
- Damyan, Absolon, Griselda, Cambuskan, Nicholas, Palamon, Alla, Arcite, Melibee, John, Phebus, Emily
- Other words that might get your attention:
- world, matter, case, canon, answer, tale, sentence, present, need, creature, conclusion, marriage, treasure
Each of these is worth tracking down and doing close readings of the particular lines. But in general, it’s looking like Chaucer will be a rich place for tracking down affective demonstratives.
What about “that”? There are fewer of these that stand out (because there’s a lot more “this” than “that” among demonstratives). But words for “that” include:
I think you’ll intuitively get why “that one” and “that ilk” occur so much more than “the/this one” and “the/this ilk”. There’s some sort of othering/distancing going on.
The Google Ngram viewer is pretty good for tracing developments back to 1800 (it has older data but the guys who work on it don’t think it gets reliable til 1800-ish). We can see that “ilk” really does have a “thatness” to this day, while “the one” is more popular in general (just not in the Canterbury Tales). Again, these graphs are NOT for Chaucer’s time but show what’s happening more recently:
Okay, so how are the demonstratives distributed?
To answer the question, “which pilgrim uses demonstratives the most?” I took demonstratives and looked at how the various tellers in the Canterbury Tales use them. We can establish a base rate of demonstrative use across Chaucer (all his works or just the Canterbury Tales). We can do this relative to all words or relative to total-tokens-of-the-plus-that-plus-this. (A token is just an occurrence of a word.)
For example, whether you’re looking at all of Chaucer or just the Canterbury Tales, that makes up about 5% of all the the/this/that‘s. For any give pilgrim, we can look at their tale and count up the the/that/this. If 5% of the total are that, then nothing special is happening. Like the Knight–he has 933 the/this/that‘s. So we’d guess that he’s going to have 0.05*933=47 uses of that. In fact, he has 54. Not a big difference. (By the way, I’ll put in a table with all the counts for all the pilgrims if you’re interested. Make the request in the comments below.)
Now, lots of folks like the folksiness of Sarah Palin (other people call it pseudo-folksiness). If you’re familiar with the Canterbury Tales you may immediately wonder about the earthy Wife of Bath. By demonstrative use, the Wife of Bath is decidedly NOT the Sarah Palin of the Canterbury Tales. The Wife has 120 the/this/that, but only 4 of these are that‘s (we would’ve guessed about 6). She uses this a litle bit more than expected but not by much.
I would like to stress that she DOES use affective demonstratives–it’s hard not to. Here are some examples. The point is that she doesn’t use them all that often.
That gentil text kan I wel understonde (she’s talking about God telling people to multiply, line 29)
That oon thou shalt forgo, maugree thyne yen (something like “give it up, damn your eyes”, line 315)
That oon for love, that oother was for hate (line 749, oon=one and oother is just a much much better way to spell ‘other’)
Okay, so the Knight uses affective demonstratives at a normal rate, the Wife of Bath underuses them. WHO IS THE SARAH PALIN OF THE CANTERBURY TALES? There are two contenders when we just go by word counts and stats. The first contender is the Shipman. But looking at the data qualitatively shows that most of his that‘s and this‘s are pretty non-affective. By contrast…the Pardoner uses this and that all over the place (over twice as many that‘s as we’d expect, for example and about 1.25 as many this‘s). For example:
Withinne that develes temple in cursed wise (470)
Were dryven for that vice, it is no drede (507)
He hath a thousand slayn this pestilence (679)
And we wol sleen this false traytour Deeth (699)
Now this is part where I will let other people draw comparisons between the Pardoner and Sarah Palin. I will not. I will, however, refresh your memory that the story he tells is about drunk guys trying to kill Death and is generally seen as morally proper…though the Pardoner himself is seen as corrupt/corrupted/corrupting. But back to the first hand, the Pardoner really is one of the most fascinating characters in the Tales. Love her or hate her, that Sarah Palin is pretty fascinating, too.
Tags: affect, Canterbury Tales, Chaucer, demonstratives, emotion, english, Middle English, politics, pragmatics, Sarah Palin, WordHoard