Build your own corpus (well, for now)

1 Apr

BootCaT is meant to help folks build up their own corpora from the Internet. However, it uses the Bing API and may not be able to so for much longer, so it may go down temporarily. Go get your corpus started now!

Note that Google also has an API that you can use (but they limit you to 100 queries a day), as does Blekko and EntireWeb.

You may also want to check out this O’Reilly summary of data sources.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: