Here’s a summary of corpora to check out if you’re interested in prosody. It’s really English-heavy. Send me ideas for non-English sources that are annotated!
For ToBI marked stuff:
- The Boston University Radio Speech Corpus will get you student hosts reading the news. The transcripts are marked up with prosodic information (ToBI) for about 3.5 hours worth of data. One nice thing is that it has inter-rater reliability information on the prosodic annotations (see Hasegawa-Johnson et al., 2005 for more about that and an example of research using the corpus).
- There’s also ToBI annotation for 75 Switchboard conversations in the NXT edition: http://groups.inf.ed.ac.uk/switchboard/
Other annotation systems:
- You might check out the Santa Barbara Corpus is free now and is a great source for prosody research since it’s naturalistic and has a lot of different kinds of people talking in a lot of different situations. I’m not sure if anyone has ever annotated it with ToBI but the transcripts themselves have a host of prosodic cues.
- The London-Lund Corpus has a lot of prosodic annotation, too.
- The Hong Kong Corpus of Spoken English is naturalistic in that it’s all from real-life stuff (interviews, presentations, etc). You can get a flavor of it here but to get all the prosodic information, you need to get the book, here. It uses David Brazil’s Discourse Intonation system (prominence, tone, key, termination).
- There’s also the Aix-MARSEC database, which is five hours of spoken British English with phonemes, syllables, syllable constituents, rhythm units, stress feet, words, and intonation units all marked up. (Get the data here, ready for Praat.)
- The Wellington Corpus of Spoken New Zealand English has New Zealand English with emphatic stress marked.
- The IViE corpus is labeled prosodically, too.
More of a stretch is the Audiovisual Database of Spoken American English. I don’t think most of you interested in prosody will care about this corpus, but I include it just in case.
Finally, in the universe of emotion and prosody, you can try out: