The Belmont Stakes (the third of the Triple Crown) are happening this weekend. Check out this blog post on horse names:
Does anyone have an even more enormous corpus of horse names? I grabbed and analyzed the top 4 finishers for the last 137 years of the Kentucky Derby and showed the diversity of naming schemes.
Luo, Osborne and Wang make the following data set available:
They crawled 30 million English-language tweets and then had 7 people use a search engine to call up results. The results showed 100 tweets and the people had to classify each of the 100 for whether it was (a) opinionated about the query, or (b) not opinionated.
There were 50 queries resulting in 5,000 annotated tweets.
Read their paper here:
Opinion Retrieval in Twitter: http://homepages.inf.ed.ac.uk/miles/papers/icwsm12.pdf
The CMU Pronouncing Dictionary offers us the chance to see how many ways there are to pronounce “g” in English. Should it be hard-g GIF or soft-g JIF? (There are 8+ pronunciations of “g”!)
One of the things you might want to use crowdsourcing for is to annotate or create corpora.
You can read about crowdsourcing techniques in linguistics in this paper:
Using crowdsourcing for linguistic research by Tyler Schnoebelen and Victor Kuperman
Or see a bunch of different linguistic research projects that used crowdsourcing (presented at the Linguistic Society of America’s annual conference):
LSA 2011 presentation
And you can read an analysis of using crowdsourcing to help assess damage from Hurricane Sandy here:
In honor of the NBA Draft Lottery, some facts about the vagaries of three synonymous-looking terms: basketball, hoops, and bball.
Over at the Idibon blog a couple posts that talk about how languages do stuff.
First, some of our favorite things about indigenous languages of the US and Canada:
(The origin of “powwow” and Havasupai pronoun fun, Cherokee verbs and more.)
And using a corpus of movie subtitles, an analysis of a single line from film noir in French, Hungarian, and Turkish.
Over on the Idibon blog, an analysis of 122 years of pop song hits. Focusing on love (and the loss of love from song titles in recent years, ack!)