Originally written with Rob Munro for Idibon.com, thanks, Rob!
“Id” “ee” “bon”
Pronunciation matters. But to no one so much as Steve Wilhite, it seems. The inventor of the GIF graphics format accepted a lifetime achievement award at yesterday’s Webby’s by flashing this on the screen:
The mismatch between sounds and symbols matching is famously complex and controversial.
To be fair to Wilhite, a purely empirical approach to guessing the name would have led language technologists like us to get it wrong. It would also have made it seem more complicated. From a quick lookup of the CMU Pronouncing Dictionary you’ll see there are not two but six pronunciations for words starting with “g”:
- /g/: the hard “g” like glide or galore (n=4,956)
- /jh/: the soft “j” like gelatin or Gemini (n=686)
- /zh/: fricatives like genre or Giselle (n=32)
- /n/: sort of skip the “g” in favor of an /n/ like gnarly and gnash (n=28)
- /hh/: softer still, like Gerlado, Geraldi (n=2)
- /k/: the unvoiced alternate to “g”, as in Ghadafi (n=1)
So if you were just running the odds, then you’d bet g-if was over seven times more likely than jh-if (86.9% vs. 12.0%). If we just look for words that start off with “gi” it is closer but still favors g-if (58.4% vs. 38.4%), with j-if mostly from Italian names (Giacomo, Giovanni) and Giraffes.
And that was just word-initial “g”.
Inside words, it gets more complex (switching to the International Phonetic Alphabet):
- ŋ: sing, complaining, English
- f: tough, enough, roughneck
- aɪ: height, alight, align
- ɔ: daughter, afterthought, naught
- eɪ: featherweight, campaign, sleigh
- aʊ: bough, drought, plough
- ju: Hugh, impugn
- oʊ: although, cologne, furlough
And there are a few real outliers:
We pulled these last six out of the corpus easily enough, but in the interest of time (mainly our own) we’ll leave the phonological analysis to other people. We also skipped some ough variations, because we couldn’t do it better than Dr. Seuss:
Why sound-symbol mismatch is not all bad
If nothing else, these examples should have shown why speech recognition is a non-trivial task. Full respect to the people who have made technologies like Siri a reality—it builds on decades of work.
However, the lack of a one-to-one mapping between sounds and symbols is not always a bad thing. In some contexts, it’s unambiguous (you wouldn’t turn good into ‘jood’ or ugly into ‘ujly’ if you had a simple dictionary and/or knew the context).
In other cases, the mismatch is downright helpful. The English plural is typically the /z/ sound (‘dogz’, ‘tablez’, ‘carz’) and only /s/ in certain contexts (‘cats’, ‘lamps’, ‘bikes’). By standardizing the spelling, even when the pronunciation changes, it’s easier for both humans and machines to understand the written text.
It’s the same for verbs. When you add ‘-ing’ to ‘sigh’, you are pronouncing it ‘sigh-ying’, adding the glide ‘y’ because of English’s preference against adjacent vowels (also looks like another ‘g’ pronunciation). That doesn’t really give us any more information about the meaning of the word, so it’s simpler to write and read in a way that happens to be more consistent than the way we actual speak.
– Tyler Schnoebelen and Rob Munro, your #1 g’s