Horse name corpus for Belmont Stakes

6 Jun

The Belmont Stakes (the third of the Triple Crown) are happening this weekend. Check out this blog post on horse names:

http://idibon.com/back-the-right-horse-name/

Does anyone have an even more enormous corpus of horse names? I grabbed and analyzed the top 4 finishers for the last 137 years of the Kentucky Derby and showed the diversity of naming schemes.

Categorizing horse names for the last 137 years of the Kentucky Derby: see http://idibon.com/back-the-right-horse-name/

Opinionated tweets

28 May

Luo, Osborne and Wang make the following data set available:

https://sourceforge. net/projects/ortwitter/

They crawled 30 million English-language tweets and then had 7 people use a search engine to call up results. The results showed 100 tweets and the people had to classify each of the 100 for whether it was (a) opinionated about the query, or (b) not opinionated.

There were 50 queries resulting in 5,000 annotated tweets.

Read their paper here:

Opinion Retrieval in Twitter: http://homepages.inf.ed.ac.uk/miles/papers/icwsm12.pdf

GIF pronunciations and the CMU Pronuncing Dictionary

23 May

The CMU Pronouncing Dictionary offers us the chance to see how many ways there are to pronounce “g” in English. Should it be hard-g GIF or soft-g JIF? (There are 8+ pronunciations of “g”!)

http://idibon.com/gif-and-ways-to-say-g/

Crowdsourcing and corpus studies

23 May

One of the things you might want to use crowdsourcing for is to annotate or create corpora.

You can read about crowdsourcing techniques in linguistics in this paper:

Using crowdsourcing for linguistic research by Tyler Schnoebelen and Victor Kuperman

Or see a bunch of different linguistic research projects that used crowdsourcing (presented at the Linguistic Society of America’s annual conference):

LSA 2011 presentation

And you can read an analysis of using crowdsourcing to help assess damage from Hurricane Sandy here:

http://idibon.com/crowdsourced-hurricane-sandy-response/

Corpus linguistics and the NBA playoffs

21 May

In honor of the NBA Draft Lottery, some facts about the vagaries of three synonymous-looking terms: basketball, hoops, and bball.

http://idibon.com/bball-and-hoops-when-do-synonyms-matter/

Are basketball, bball, and hoops really synonyms? From http://idibon.com/bball-and-hoops-when-do-synonyms-matter/

 

 

Discovering linguistic diversity

20 May

Over at the Idibon blog a couple posts that talk about how languages do stuff.

First, some of our favorite things about indigenous languages of the US and Canada:

http://idibon.com/powwow-5-facts-about-native-american-languages/

(The origin of “powwow” and Havasupai pronoun fun, Cherokee verbs and more.)

And using a corpus of movie subtitles, an analysis of a single line from film noir in French, Hungarian, and Turkish.

http://idibon.com/the-multilingual-falcon/

The Maltese Falcon, analyzed in French, Hungarian, and Turkish at http://idibon.com/the-multilingual-falcon/

Top pop songs corpus

18 May

Over on the Idibon blog, an analysis of 122 years of pop song hits. Focusing on love (and the loss of love from song titles in recent years, ack!)

http://idibon.com/weve-lost-that-lovin-feelin/

"Love" in song titles, really see http://idibon.com/weve-lost-that-lovin-feelin/

Follow

Get every new post delivered to your Inbox.

Join 33 other followers