Prosodically annotated corpora

8 Mar

Here’s a summary of corpora to check out if you’re interested in prosody. It’s really English-heavy. Send me ideas for non-English sources that are annotated!

For ToBI marked stuff:

Other annotation systems:

  • You might check out the Santa Barbara Corpus is free now and is a great source for prosody research since it’s naturalistic and has a lot of different kinds of people talking in a lot of different situations. I’m not sure if anyone has ever annotated it with ToBI but the transcripts themselves have a host of prosodic cues.
  • The London-Lund Corpus has a lot of prosodic annotation, too.
  • The Hong Kong Corpus of Spoken English is naturalistic in that it’s all from real-life stuff (interviews, presentations, etc). You can get a flavor of it here but to get all the prosodic information, you need to get the book, here. It uses David Brazil’s Discourse Intonation system (prominence, tone, key, termination).
  • There’s also the Aix-MARSEC database, which is five hours of spoken British English with phonemes, syllables, syllable constituents, rhythm units, stress feet, words, and intonation units all marked up. (Get the data here, ready for Praat.)
  • The Wellington Corpus of Spoken New Zealand English has New Zealand English with emphatic stress marked.
  • The IViE corpus is labeled prosodically, too.

More of a stretch is the Audiovisual Database of Spoken American English. I don’t think most of you interested in prosody will care about this corpus, but I include it just in case.

Finally, in the universe of emotion and prosody, you can try out:

(See my previous posts on emotion here and here for other resources–note that the two above are both “acted”.)



One Response to “Prosodically annotated corpora”


  1. The Mary Jane of Tomorrow | Emily Short's Interactive Storytelling - June 5, 2016

    […] Procedural limericks (requires rhyme training but a lack of poetry): these were tricky to do. Limericks require both rhyme and meter, and I turned to Twitter to suggest some word corpora with metrical information. Two sources proved especially useful: a corpus that lists words according to foot type (all the anapests you can eat!) and Allison Parish’s scripts for getting rhyme and meter matches out of the CMU pronunciation dictionary. I was also recommended this source of prosodically annotated corpora. […]

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: