Stephanie Shih has been doing really fun work on what makes a name (first and last) using a corpus of Facebook names. This helps her get recent trends–the Social Security Administration releases first names all the time, but it doesn’t release first+last until 100 years after the birth certificates come in.
Amaç Herdagdelen has compiled Census data from 1990 and put it with the Social Security Administration’s statistics for popular baby names for every year between 1960 and 2010:
- Data, code: https://github.com/amacinho/Name-Gender-Guesser
- Orwant and Daly’s older Perl module has fuzzy search capabilities (phonetic similarity of names): http://search.cpan.org/~edaly/Text-GenderFromName-0.32/GenderFromName.pm
- Paper: http://clic.cimec.unitn.it/amac/twitter_ngram/Herdagdelen2012-RTC-draft.pdf
Leave a comment