Archive | March, 2012

Geography stuff for dialectology

14 Mar

InfoChimps have a geography API that might help you plot people against locations:

Brice Russ at OSU has been doing Twitter dialect stuff and has been using the Data Science Toolkit, but says the InfoChimps looks more user-friendly. (That was an initial look, though, so hit him up at @KilroyWasHere on Twitter to learn more from him.)

Making a corpus from YouTube: dialects in North America

11 Mar

Here is a link to Rick Aschmann’s amazing collection of speech clips from Canadian and American speakers on YouTube using the Atlas of North American English as a starting point:

Aschmann’s work is a great example of how to use YouTube–you should also be aware that YouTube allows users to subtitle and to caption clips, which means that you can potentially find words and expressions in particular languages AND/OR translations into other languages.

You may want to get all the YouTube stuff into wave forms that you can analyze. Here are my instructions for how to get YouTube clips into Praat. Basically, you’ll capture the video and then convert the video into audio. Note that YouTube does involve compression, so it’s not the same as a lossless recording. That may or may not be important depending upon the phenomena you’re studying:



Hwaet! Old English corpora and a quick look at my favorite word in Beowulf

11 Mar

More often than I should admit, when people talk about wh-words, I hear a sharp rush-clack of hwæt! That’s where what comes from but it’s not just the inversion of w and h. The Anglo-Saxons seemed to use it in some ways that we don’t often hold on to today. I’ll be focused (briefly) on its discourse functions where it’s like well/so/still/hey you.

Here’s how this post works: (1) Old English corpora you can use, (2) a quick survey of hwæt in Beowulf.

If you’re interested in Beowulf, check out Epstein (2011) on distal demonstratives as marking importance, topic continuity, and chapter boundaries. In terms of not-modern-English + affective demonstratives, see my post on “Who is the Sarah Palin of the Canterbury Tales“.

And if you’re interested in modern what, see my post exploring the what a __! construction. (If what interests you, also check out the Oxford English Dictionary, though the entry is looooong.)

Old English corpora

The main one to know about is “YCOE”, which is the York-Toronto-Helsinki Parsed Corpus of Old English. (If you’re here at Stanford, we have it ready for easy-access online, you just need to show me that you’ve been granted access by the administrators, here’s where you apply for access.)

The folks at Toronto actually put together a dictionary and a packaging of *every* Old English text we know to exist (there are about 3,000).

And check out this link for projects out of Helsinki, including parsed OE poetry and links to stuff on Middle English, too. The goal is to have parsed, diachronic info for every stage of English.

You’ll sometimes encounter the Brooklyn corpus of OE (or rather the “Brooklyn-Geneva-Amsterdam-Helsinki Parsed Corpus of Old English), but the creators of it actually just recommend YCOE.

Hwæt in Beowulf

Let me first say that some of the oldest uses of hwæt are in subordinate clauses, where it isn’t necessarily used like an interrogative (for example, the modern English I couldn’t understand what he felt or it was what she said imply only the most indirect of questions).

In fact, in Beowulf, there’s not much of an interrogative feel at all with perhaps one exception, which is really a rhetorical question: ” Hwæt syndon ge searohæbbendra, byrnum werede? (~’What kind of men are you…’). I’ll show all the subordinate clause-type examples at the bottom of this post, but I’d like to turn to something I find more interesting.

Of the 14 uses of hwæt in Beowulf,  six are very discourse marker-y. That’s pretty high considering our modern day uses. In fact, here’s how the long poem starts:

Hwæt! We Gardena         in geardagum,

þeodcyninga,         þrym gefrunon,

hu ða æþelingas         ellen fremedon.

For Seamus Heaney this is translated as:

So. The Spear-Danes in days gone by

and the kings who ruled them had courage and greatness.

We have heard of those princes’ heroic campaigns.

Or in John Porter’s more literal translation:

What! We Spear-Danes’ in yore-days,

tribe-kings’      glory heard,

how the leaders            courage accomplished.

While Porter goes for the literal “What!” here, most of the time both he and Heaney often opt for something more like well, so, or still. Note that there is either some kind of naturalness associated with hwæt or it does some fancy rhetorical footwork. Three instances of hwæt occur in dialog:

From line 530:

Hwæt! þu worn fela,         wine min Unferð,

beore druncen         ymb Brecan spræce,

sægdest from his siðe…”

Heaney: “Well, friend Unferth, you have had your say…”

Porter: “What! you very much, friend my Unferth…”

From 1652:

Hwæt! we þe þas sælac,         sunu Healfdenes,

leod Scyldinga,         lustum brohton

tires to tacne,         þe þu her to locast…”

Heaney: “So, son of Halfdane…”

Porter: “Well, we you these sea-loots, son of Halfdane”

And line 2248:

“Heald þu nu, hruse,         nu hæleð ne moston,

eorla æhte!         Hwæt, hyt ær on ðe

gode begeaton…”

Heaney: “And heroes can no more; it was mined from you first…”

Porter: “Earls’ possessions! Well, it formerly from you…”

Two more examples—from line 942:

þurh drihtnes miht         dæd gefremede

ðe we ealle         ær ne meahton

snyttrum besyrwan.         Hwæt, þæt secgan mæg

efne swa hwylc mægþa         swa ðone magan cende

æfter gumcynnum,         gyf heo gyt lyfað,

Heaney: who brought forth this flower of manhood

Porter: by schemes contrive. What! that may say

(In this case, Heaney has opted not to translate hwæt at all.)

And line 1774:

Hwæt, me þæs on eþle         edwenden cwom,

gyrn æfter gomene,         seoþðan Grendel wearð,

ealdgewinna,         ingenga min;

ic þære socne         singales wæg

modceare micle.         þæs sig metode þanc,

Heaney: Still, what happened was a hard reversal

Porter: Well, me for that in homeland setback came

These last ones are interesting because they stand in such contrast to the shouted exclamation that begins the poem–these are more reflective. Probably what we want to say is that hwæt is doing some sort of topic-shift marking and/or attention direction.

Here for completeness, are the rest of the hwæt‘s:

Line 173:

modes brecða.         Monig oft gesæt

rice to rune;         ræd eahtedon

hwæt swiðferhðum         selest wære

wið færgryrum         to gefremmanne.

Heaney: plotting how best the bold defenders

Porter: what for bold-hearts best would be

Line 233:

fyrdsearu fuslicu;         hine fyrwyt bræc

modgehygdum,         hwæt þa men wæron.

Gewat him þa to waroðe         wicge ridan

Heaney: he had to find out who and what the arrivals were

Porter: in mind-thoughts what these men were

Line 237 (cited above)–I think this example is intriguing, note that it’s also in dialog:

þegn Hroðgares,         þrymmum cwehte

mægenwudu mundum,         meþelwordum frægn:

Hwæt syndon ge         searohæbbendra,

byrnum werede,         þe þus brontne ceol

ofer lagustræte         lædan cwomon,

Heaney: “What kind of men are you who arrive…”

Porter: “What are you armour-wearers”

Line 474:

orh is me to secganne         on sefan minum

gumena ængum         hwæt me Grendel hafað

hynðo on Heorote         mid his heteþancum,

færniða gefremed.         Is min fletwerod,

Heaney: with all the grief Grendal has caused

Porter: to man any what me Grendal has

Line 880:

þonne he swulces hwæt         secgan wolde,

eam his nefan,         swa hie a wæron

Heaney: the urge to speak of them: always they had been

Porter: when he such matters say would,

Line 1186:

uncran eaferan,         gif he þæt eal gemon,

hwæt wit to willan         ond to worðmyndum

umborwesendum ær         arna gefremedon.”

Heaney: the favour and respect he found in his childhood

Porter: what we for his happiness and for his honour

Line 1476:

snottra fengel,         nu ic eom siðes fus,

goldwine gumena,         hwæt wit geo spræcon,

gif ic æt þearfe         þinre scolde

Heaney: what we said earlier: that you, son of Halfdane

Porter: gold-friend of man, what we earlier said

Line 3010:

ond þone gebringan,         þe us beagas geaf,

on adfære.         Ne scel anes hwæt

meltan mid þam modigan,         ac þær is maðma hord,

Heaney: on the funeral road. His royal pyre

Porter: on pyre-journey. not shall mere pittance

Line 3068:

Swa wæs Biowulfe,         þa he biorges weard

sohte, searoniðas;         seolfa ne cuðe

þurh hwæt his worulde gedal         weorð

Heaney: of how his departure from the world would happen.

Porter: through what his world’s departing caused should be

Prosodically annotated corpora

8 Mar

Here’s a summary of corpora to check out if you’re interested in prosody. It’s really English-heavy. Send me ideas for non-English sources that are annotated!

For ToBI marked stuff:

Other annotation systems:

  • You might check out the Santa Barbara Corpus is free now and is a great source for prosody research since it’s naturalistic and has a lot of different kinds of people talking in a lot of different situations. I’m not sure if anyone has ever annotated it with ToBI but the transcripts themselves have a host of prosodic cues.
  • The London-Lund Corpus has a lot of prosodic annotation, too.
  • The Hong Kong Corpus of Spoken English is naturalistic in that it’s all from real-life stuff (interviews, presentations, etc). You can get a flavor of it here but to get all the prosodic information, you need to get the book, here. It uses David Brazil’s Discourse Intonation system (prominence, tone, key, termination).
  • There’s also the Aix-MARSEC database, which is five hours of spoken British English with phonemes, syllables, syllable constituents, rhythm units, stress feet, words, and intonation units all marked up. (Get the data here, ready for Praat.)
  • The Wellington Corpus of Spoken New Zealand English has New Zealand English with emphatic stress marked.
  • The IViE corpus is labeled prosodically, too.

More of a stretch is the Audiovisual Database of Spoken American English. I don’t think most of you interested in prosody will care about this corpus, but I include it just in case.

Finally, in the universe of emotion and prosody, you can try out:

(See my previous posts on emotion here and here for other resources–note that the two above are both “acted”.)