You’ve got a text, now get easy frequency and collocation information

21 Feb

You can find my intro to the Corpus of Contemporary American English here, but there’s a related site called that will let you enter a bunch of text and then tell you all about them.

Here’s what it does:

* It highlights medium and low frequency words (and create lists of these words you can use offline)

* You can see how “academic” the text is

* You can click on a word and get its frequency, frequency-per-genre (spoken/fiction/magazine/newspaper/academic), its top collocates (nearby words), synonyms, and related words.

* At the phrase level, you can highlight a phrase and it’ll show you related phrases from COCA. The example Mark Davies gives is clicking on “potent argument” would show you “strong/persuasive/convincing argument”, which are all more common.


4 Responses to “You’ve got a text, now get easy frequency and collocation information”

  1. Rohit March 10, 2012 at 11:34 pm #

    Is there any API i can make use to query these corpora and get results from my code ?

    • Tyler March 11, 2012 at 4:57 pm #

      Hey Rohit,

      Not that I know of, but maybe send a note to Mark Davies to ask? But if you’re in the realm of API stuff you may just want to use Python’s Natural Language Toolkit to build your own collocation information (the “NLTK” makes it really easy to do it).


      • Rohit March 11, 2012 at 11:56 pm #

        Hi Tyler,

        Thanks for the reply. I will contact Mark Davies about the API. I am also using NLTK, but I want to get the collocates from the american english corpus also to use along with my database of collocates.

      • Tyler March 12, 2012 at 5:56 pm #

        Ah, has a bunch of the COCA stuff for downloading, including collocate information. Maybe that’ll help?

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: