This blog is meant to help linguists do research using corpora and quantitative methods. (If you’re curious about the stuff I do myself, check out my website: http://www.stanford.edu/~tylers).
About
Recent Posts
- Paul Ryan dislikes Trump almost as much as Cruz does: On (not) naming names at the conventions
- Failed vs. fighting: the linguistic differences between speeches at the RNC and the DNC conventions
- Which new emoji will be the most popular?
- Artificial intelligence in the press and in history!
- Poetry v. not-poetry
Archives
- August 2016
- June 2016
- April 2016
- February 2016
- November 2015
- October 2015
- September 2015
- August 2015
- July 2015
- May 2015
- April 2015
- March 2015
- February 2015
- January 2015
- July 2014
- April 2014
- March 2014
- February 2014
- January 2014
- December 2013
- November 2013
- October 2013
- August 2013
- July 2013
- June 2013
- May 2013
- April 2013
- January 2013
- November 2012
- October 2012
- September 2012
- August 2012
- April 2012
- March 2012
- February 2012
- January 2012
- December 2011
- November 2011
- October 2011
Some favorites
-
Intro to corpus linguistics
Here’s my presentation to Stanford undergrads about corpus linguistics. You’ll find it full of examples and resources. And even some findings. http://www.stanford.edu/~tylers/notes/presentations/IntroductionToCorpusLinguistics.pptx
-
Chat room corpus
Went hunting around for some chat room corpora today–I though I’d find tons and tons but really just turned up one resource. But it’s a big one: over 30 billion words across 47,860 English language news groups from Oct 2005 to Jan 2011. Posts that are not in English are pulled out and the people […]
-
African language corpora
There are over two thousand African languages, spoken (in situ) by 15% of the world’s population. In density of linguistic diversity it is rivaled only by New Guinea (which probably exceeds it to be honest). And yet it is the Electronic Dark Continent. The LRE Map will give you 663 corpora/computational tools on English. But (almost) […]
-
COCA: What a fantastic source of data!
Intro 425 million words from 1990-2011. I believe that one of the best resources out there for linguists (or anyone interested in language) is the Corpus of Contemporary American English (COCA). Mark Davies has put together a bunch of corpora and put together an easy-to-use interface so you can make sophisticated queries on vast amounts […]
-
What were the cultural keywords when you were born?
Raymond Williams published a fascinating (and often-cited) book called Keywords (first in the 70s, then an update in the 80s). It’s full of really interesting stuff (my notes are here). But Williams’ words were just sort of the ones he saw flying around and took an interest in. This post gives you something a little more […]
Leave a comment