Poetry v. not-poetry

5 Jun

I’ve been training an artificial intelligence system to write poetry and this morning I got interested in what the little parts of syntax and semantics are that preoccupy poets compared to other forms of written language. So I took a heap of poetry and a heap of not-poetry , pulled out the bigrams (two-word phrases) and did some statistics to see what distinguishes poetic writing from non-poetic writing.

Poets are preoccupied by these phrases:

  • Metaphor:
    • like a
    • like the
  • Nature:
    • the sky
    • the sun
    • the wind
    • the moon
    • the earth
    • the dark
    • the river
    • the sea
    • the snow
    • a stone
    • the water
    • the air
  • Self and others:
    • said geryon (this is because there’s a fair amount of Anne Carson in the data and she has a whole book about a character named Geryon)
    • the world 
    • my mother
    • the dead
    • my heart
    • of your
  • Space/time/prepositional phrases:
    • the present
    • in the
    • on my
    • in your
    • in its
    • under the
    • from the
    • in a

Meanwhile, they steer clear of the following phrases, which seem to be better for writing letters, fiction, essays or other kinds of non-fiction. The point of this comparison set was not so much to compare poetry to any particular other genre, but to collect a variety of “not-poetry” to see what poets tend not to use:

  • she had
  • he had
  • had been
  • she was
  • she said
  • it was
  • did not
  • going to
  • that he
  • that she
  • i had
  • there was
  • to her
  • seemed to
  • to do
  • he was
  • (okay, I’m going to stop there)

In other words, poets–or at least these poets–don’t tend to talk much about the past tense. They do, however, orient things spatially, as evidenced by all those prepositional phrases. Not surprisingly, the poetry is a lot more personal, with many more I/my/we/you/each other. And of course like a and like the are prominent, since it’s hard to resist a metaphor.

Notice that there’s a lot of definite articles used in poetry. I think that’s mostly in service of talking about nature and poetical things, though there are twice as many the phrases in the poetry-camp than in the non-poetry camp. Those that are non-poetic and reasonably frequent in both lists are the mostthe hospital, the baby, the american, the time, the fact, and the country.

If you’re curious for some major phrases where there is no difference, let me give you a sample of those: and aagainst theas it, and you do are all examples of phrases that are even in usage betwen poetic and non-poetic writing.

Methods and data

I took 75,678 lines of poetry (537,711 words) and compared them to 15,000 paragraphs of fiction and non-fiction (534,723 words). If you’re curious about methodology, you can read more about it here and here.

The poetry sample is 37 texts from 35 authors. By word count, the top authors in the data here are Lorine Niedecker (12.8%), Wisława Szymborska (8.7%), Jane Shore (6.1%), and Anne Carson (5.5%). Szymborska wrote in Polish but here I’ve included her in English–so you may want to say I’ve included her and/or the translators, Stanislaw Baranczak and Clare Cavanagh.

The non-poetry is randomly sampled “lines” (paragraphs) from 41 texts by 32 authors. The biggest amounts come from Joan Didion (12%), Virginia Woolf (7.7%), Penelope Fitzgerald (6.7%), Rainbow Rowell (5.3%), and Louisa May Alcott (5.2%).

What about women who aren’t white?

The authors in the data are all white women. You will be shocked–shocked!–to hear that it’s harder to get collections of poetry by non-white poets who are women. I currently only have eight poetry collections that fit. You’ll grant me that it would be strange to include Phillis Wheatley who was writing in the 1700s with a bunch of much-more modern writers. But if you think I should go ahead and add in people like Claudia Rankine, Nikki Giovanni, and Maya Angelou, I’m certainly open to that critique.

There’s a lot more data available for non-white novelists who are women, so that’s probably the next step. EXCEPT that one wants to be careful about what their lumping. So I’m unlikely to compare novelists in terms of race unless I have A LOT more data.

That said, diving into the phrases that preoccupy, say, Toni Morrison or Octavia E. Butler compared to other people (or each other!) has some appeal. If you have particular interests or suggestions, I’d be very happy to get them.

Leave a comment