July | 2015 | Corpus linguistics

Archive | July, 2015

Alex Korbonits turns a computer into James Joyce: Deep Learning from text (and images)

Last week we had the pleasure of welcoming Alex Korbonits to speak at Idibon about Deep Learning. Practically speaking, Alex gave us the low-down on the different tools that people are using to do Deep Learning. Inspirationally speaking, he showed us how computers imagine the Seattle skyline and how they would write if you taught them only James Joyce.

So let’s start with the ooooh and end with the hooooww.

Deep Learning, like other forms of machine learning, is about finding patterns in data. The “depth” of Deep Learning is that it involves a bunch of layers that feed into each other, each layer extracts higher-level features until the final layer where a decision is made.

One of the ways to figure out what’s happening in all those layers is to ask the computer to exaggerate what it’s finding at a given layer. That’s how you get stuff like the doge below—because Google’s training information contains so many dogs and faces, it “sees” dogs and eyes all over the place. (For more information check out Google’s blog here.)

Source: http://bit.ly/1DJoX4t

On the literature front, Alex loves James Joyce, so he wanted to see what Deep Learning would do if he gave it Ulysses and said, “Write me something.” You can read about how he did this (so you can do it yourself…I am) on his blog here. You’ll need to read this as poetry if you’re going to enjoy it.

Bloom works. Quick! Pollyman. An a lot it was seeming, mide, says, up and the rare borns at
Leopolters! Cilleynan’s face. Childs hell my milk by their
doubt in thy last, unhall sit attracted with source
The door of Kildan
and the followed their stowabout over that of three constant
trousantly Vinisis Henry Doysed and let up to a man with hands in surresses afraid quarts to here over
someware as cup to a whie yellow accept thicks answer to me.

As Alex notes, all that he gave the computer was Ulysses. It didn’t know English at all, yet it’s able to make up words that are fairly English-like and it even gets some grammar right-ish–notice the prepositions.

Meanwhile, Andrej Karpathy feeds in Tolstoy and gets:

Pierre aking his soul came to the packs and drove up his father-in-law women.

(Now to synthesize! Read Gretchen McCulloch’s Grammar of Doge.)

Time to get practical. Here are the Deep Learning tools that Alex reviewed for us:

Torch: This is what Alex actually used for his James Joyce project; one of its benefits is a large number of packages so you don’t have to start from zero. Additionally, it is increasingly the tool of choice used for doing deep learning research.

Caffe: Like Torch, has a large amount of work already done that you can build off of, one of the easier to use. One of its main strengths is its Model Zoo, where many reference models are already built and pretrained, so if you’re chomping at the bit, you don’t have to wait weeks to train a larger model such as AlexNet.

Theano: This is probably the most sophisticated tool out there but people tend to find it pretty complicated. However, lots of popular Pythonic projects are being created on top of it and used in places such as Kaggle competitions (check out Keras, PyLearn, and Lasagne).

GraphLab Create: From a company called Dato, this is also one of the easier tools to use if you’re just getting started. Given a dataset, their toolkit will pick a “sane default” network topology so that you don’t have to build one from scratch.

Since layers are a big part of Deep Learning, we’ll conclude with a picture of Alex presenting to us followed by what two different layers are seeing in the image—the first layer is seeing contours of shapes, while the second is hallucinating all kinds of different more abstract shapes.

Comments Leave a Comment
Categories Uncategorized

Emoji: Why brands should pay attention

21 Jul

Ever since their inclusion in the Apple iOS system in 2011, there has been a surge in emoji use around the world. Today, emoji are not only used in text message, social media, and email correspondence, but can be found in literature, ad campaigns, and even the courtroom.

How are brands using emoji today? What are the dangers of omitting emoji when doing text analytics? And, most importantly, why should brands care about these little images in the first place?

Watch this short video to learn a bit about emoji, text analytics, and why brands have been and should be paying attention.

Want to learn more? Check out these blog posts on the grammar of emoji and on the eMomji phenomenon!

Comments Leave a Comment
Categories Uncategorized

Some favorites

Intro to corpus linguistics

Here’s my presentation to Stanford undergrads about corpus linguistics. You’ll find it full of examples and resources. And even some findings. http://www.stanford.edu/~tylers/notes/presentations/IntroductionToCorpusLinguistics.pptx
Chat room corpus

Went hunting around for some chat room corpora today–I though I’d find tons and tons but really just turned up one resource. But it’s a big one: over 30 billion words across 47,860 English language news groups from Oct 2005 to Jan 2011. Posts that are not in English are pulled out and the people […]
African language corpora

There are over two thousand African languages, spoken (in situ) by 15% of the world’s population. In density of linguistic diversity it is rivaled only by New Guinea (which probably exceeds it to be honest). And yet it is the Electronic Dark Continent. The LRE Map will give you 663 corpora/computational tools on English. But (almost) […]
COCA: What a fantastic source of data!

Intro 425 million words from 1990-2011. I believe that one of the best resources out there for linguists (or anyone interested in language) is the Corpus of Contemporary American English (COCA). Mark Davies has put together a bunch of corpora and put together an easy-to-use interface so you can make sophisticated queries on vast amounts […]
What were the cultural keywords when you were born?

Raymond Williams published a fascinating (and often-cited) book called Keywords (first in the 70s, then an update in the 80s). It’s full of really interesting stuff (my notes are here). But Williams’ words were just sort of the ones he saw flying around and took an interest in. This post gives you something a little more […]

Search

Corpus linguistics

Alex Korbonits turns a computer into James Joyce: Deep Learning from text (and images)

Emoji: Why brands should pay attention

Recent Posts

Archives

Meta

On Twitter…

Some favorites

Intro to corpus linguistics

Chat room corpus

African language corpora

COCA: What a fantastic source of data!

What were the cultural keywords when you were born?