Archive | January, 2012

Daddy’s the little tiger: how parents and kids talk about size

30 Jan


There are two sections depending on your interest–(i) a how-to for using a corpus of parent-child interactions (bottom) and (ii) a case study looking at how parents and their children use “little” (up first).

Little big men

How does little actually get used by little kids? The best data source for this is CHILDES (MacWhinney, 2000), which contains a variety of interactions between children and their parents. In the subset of American English that I investigate, we can see that children use far fewer tokens of little than we would’ve expected at chance. What we’re going to see is that little is actually doing a tremendous amount of positioning for both parents and kids.   

Observed Expected OE
Children  2,217  3,327 0.666
Parents  12,371  11,261 1.099
Table 1: Uses of little among American English speakers in the CHILDES corpus are significantly different by age/social role (2.0154E-106 by chi-squared test).

Looking at this data one might just ask whether kids avoid little because it’s hard for them to say. In Figure 10, a simpler word like big does seem to be acquired and used earlier than little, but little clearly is used by most age groups except for some of the youngest. [Addendum 1/31/12: Adriana Weisleder helpfully suggests using an online tool with norms from Dale and Fenson (1996), which reports when parents say kids are using these words. We see that big is used by over 50% of children by 21 months, while little takes to 25 months to clear the 50% mark.]

Figure 1: Kids percentage of big/little (by dividing by total tokens) per 3-month age group.  The numbers in this chapter come from CLAN searches across 4,676 American English transcripts, but the data for this particular graph comes from the very handy online search tool provided by Baath (2010).

Let’s look at parents and children in terms of gender since gender is so prominent in parent-child interactions.

Observed Expected OE
Mothers  13,781  12,307 1.120
Fathers  2,319  2,241 1.035
Boys  2,267  3,352 0.676
Girls  1,789  2,256 0.793
Table 2: Significantly different (1.46E-135 by chi-squared test)

It appears that the extreme cells are mothers using little and boys avoiding it. But once we restrict ourselves to one-on-one interactions, we can see that this effect is actually probably dominated by mothers using little especially when talking to their daughters and that boys avoidance of little is even greater with their fathers than with their mothers.

Observed Expected OE
Mothers-to-boys  4,313  4,158 1.037
Fathers-to-boys  1,516  1,381 1.098
Mothers-to-girls  6,312  5,441 1.160
Fathers-to-girls  230  281 0.819
Girls-to-mothers  1,221  1,533 0.796
Girls-to-fathers  4  3 1.482
Boys-to-mothers  875  1,526 0.573
Boys-to-fathers  117  265 0.441
Table 3: Differences in little use in CHILDES by parent/child gender.

Let’s take a look at some of the mothers talking to their daughters. When Lily is pretty young—a year and a few months, her mother is using little a lot in naming items. It’s also an easy go-to word:

138      *MOT: is that a little bag for mommy ?

139      *MOT: that’s a little bag for mommy !

140      *MOT: mommy’s little bag .

141      *MOT: wee !

By the time Lily is over 2 and a half, she’s talking herself. Her mother is still talking about the little Buddha, your little guy, your little cell phone, the little house, your little picture, that little lizard, that little bug, she talks about it being a little hard to get down and while Lily is drawing, she asks her to add a little more orange and a little bit more red (twice). In this one conversation her mother uses little 46 times. Lily only uses it twice. Once to talk about it being a little bit sunny. Lily’s second use is more affective in nature and in fact it’s prompted by her mother recalling an incident and offering an emotional interpretation—one that Lily rejects and replaces. (Her mother then reframes it again to reiterate the minimization of the emotion.)

1621    *MOT: ow did you hurt yourself?

1622    *MOT: &aw we forgot to tell Daddy that you slipped and fell on a wet floor

1623                today.

1624    *CHI:   yeah.

1626    *MOT: yeah that was a little sad huh ?

1627    *CHI:   that was a little scary.

1629    *MOT: that was just a little bit scary but you’re okay right?

Little is not a common way for Lily and her mom to talk about sadness, but from this age on, little and scary are linked—in particular, they are linked to minimize fear (up to this point, Lily’s mother has been talking about scary things without calling them little):

1288    *MOT: that’s the ghost of his father, of king Mufasa.

1289    *CHI:   (be)cause he’s scary.

1291    *MOT: yeah, well, scary, but he’s still nice.

1292    *MOT: he looks a little scary but he’s still very nice.

This kind of emotional understanding is also happening in book reading—here line 306 is part of a story read when Lily is close to turning three. Lily’s mother is asking for comprehension and attention, but she’s also adding in little to minimize even a fictional character’s fear.

306      *MOT: I think it’s scary said Franklin .

307      *MOT: is Franklin a lil [: little] scared?

308      *CHI:   yeah.

Later in that same conversation, it functions in a more direct reframing of Lily’s feelings (notice in these last examples how the informal lil can serve to further emphasize a kind of casualness to the object-of-fear).

1692    *CHI:   what is that?

1694    *MOT: um that’s a big snake.

1695    *CHI:   he’s scary.

1697    *MOT: he’s a lil [: little] bit scary.

But parents are not always doing reframing of feelings. Ross’s dad is one of the bigger users of little and while he does talk to Ross about scary and sad things, he never uses these terms with little or anything similar. He uses little in quite a different way. Let’s take a look at an example between Ross and his father, keeping in mind the broader generalization that boys avoid little in interactions with their dads.

1401    *FAT:  is Daddy a little tiger?

1404    *CHI:   a little tiger.

When you read through the CHILDES transcripts, you get a sense of how much prompting the parents do, as in this example. What I haven’t shown you is that this discussion of little tigers begins several turns earlier—Ross’s father is asking him then question in 1401 because Ross has already made some other claims. The notion of size is still introduced by Ross’s father, but Ross subverts his father’s plan. Ross is not a little tiger:

1386    *FAT:  are you a little tiger?

1389    *FAT:  what are you?

1392    *CHI:   &li [//] a big tiger.

1395    *FAT:  you a big tiger?

1398    *CHI:   yeah.

Part of what makes this delightful, of course, is the unpredictability of the positioning in that moment. Tigers are kind of big relative to anyone, especially a young boy. There’s no reason to treat Ross’s father’s speech in 1386 as anything other than affectionate, but Ross rejects the position because it entails a size that he won’t sign on for. The surprise of Ross standing up and insisting on being a big tiger gets his father asking for a repetition and playing with Ross about his own size.

Here’s Ross and his father two months later, talking about what Ross’s mom will get at the store. Ross wants a Spiderman shirt and says he’ll buy it himself. His father talks about what Ross’s little brother might get.

174      *FAT:  Marky would like a little one too.

178      *FAT:  could Marky get a little one.

181      *CHI:   little one.

184      *FAT:  mommy’s gonna buy a little one for Marky.

189      *CHI:   and a big one.

192      *CHI:   Spiderman is big yeah (.) yeah.

195      *FAT:  yeah a big one for you.

198      *CHI:   and a little one.

201      *FAT:  a little one for who?

204      *CHI:   little one for Mark.

207      *FAT:  and +…

210      *CHI:   big one for me.

Parents seem to really like these interactions that have to do with claims of being big, certainly the parents prompt the size conversations and repetitions of replies that the children give.

Part of what leads the kids to avoid little is the imperative to mature. This is felt by boys and girls, although it seems to come across more clearly in the boys’ speech. Let’s look at big for a comparison. It becomes clear that the strongest effects are among the males—the fathers are the most constrained from using big while the boys are disproportionately likely to use it.

Observed Expected OE
Mothers  7,099  7,828 0.907
Fathers  1,229  1,425 0.862
Boys  1,660  2,132 1.328
Girls  2,832  1,435 1.157
Table 4: The use of big in CHILDES.

We know that children acquire the word little from their parents. But we also see that they are quite sensitive to how it’s used. It’s used in part to show the tiny-ness of the world, but it’s also used to help reframe children’s inner life and it’s part of how the children are themselves understood by the parents. So much so that it is delightful to watch them grow. In fact, asking children to be a big boy/girl is probably part of the understanding children have about who they are, who they aren’t, and who they could become. Little is used on children to shape their worlds, their feelings, and them. But we see children resist this. We also see how gender plays a role in discussions of size—and metaphorical extensions, as well. Here the people with the power—the adults—use little a lot, but that doesn’t always mean they win out. Children have their own alternate discourse of size, which they make known to us through their use and disuse.


The chief resource for parent-child interactions is CHILDES (MacWhinney, 2000), which you can browse online at:

Though the standard way to do this is to download the data and then download and use the CLAN program.

CHILDES will get you videos and transcripts (with phonetic annotations) of thousands of parent-child sessions. The bulk of the data is in English, but researchers have also made available data in Irish, Welsh, Cantonese, Mandarian, Indonesian, Japanese, Korean, Thai, Afrikaans, Danish, Dutch, German, Norwegian, Swedish, Estonian, Basque, Farsi/Persian, Greek, Hebrew, Hungarian, Sesotho, Tamil, Turkish, Catalan, French, Italian, Portuguese, Romanian, Spanish, Croatian, Polish, Russian, and Slovenian.

If you’re looking for”tried and true” longitudincal American English corpora, Marisa Tice says try Adam, Eve, and Sarah from the Brown folder, Nina from Suppes, and Peter from Bloom.

Whether you do it in your browser or through your CLAN download, you’re still going to need to know CLAN commands. Here are two examples that I have found illustrative. The CLAN manual has a lot of good stuff and a Google search will turn up even more.

When I went looking for examples of “little”, I used the kwal command:

kwal +slittle -w2 +w2 +t*CHI *.cha

This searches for all instances of children saying “little” in whatever folder I’m viewing. And it shows me the two lines before and after.

If I wanted to get mothers, I’d use +t*MOT, and for fathers, +t*FAT. You can also just leave that off to get matches from everyone.

If you’re trying to get word or morpheme frequency , you can get both type and tokens by typing:

freq @ +t*CHI +t%mor +r5 +u > mor_freq.cex

That’s going to give you a file (“more_freq.cex”) that lists each word the child says, broken down into morphemes. the +r5 means it only includes actual utterances and no bracketed interpretations (lil [ : little]”, the +u combines search results from  multiple .cha files into one file. Note that it’s the %mor that restricts it to the “morpheme tier”), at the top of each *.cha file you can see which tiers it has. There’s the %sit tier for situations, %eng for English translations, %int tier for intonation, etc. (More examples of tiers and CLAN commands here.)

If you just want words from children, though, your best bet is to use the web interface for ChildFREQ. It provides you gender-splittable info for the kids in terms of either their age or their “mean length utterance”. For lexical inquiries, it’s great.

Now, chi-squared tests are easy to perform when the cells are big. The normal rule-of-thumb is that if an “expected” cell has a value of “5 or fewer”, you really ought to do Fisher’s exact test. That used to be a pain but now it’s pretty easy. I like this tool provided by Microsoft research (run it online or download it).

[Addendum 1/31/2012: I mention this website with the Dale and Fenson (1996) norms (thanks again, Adriana!). It has data for both English and Spanish. In the English, you get 384 words reported by parents with kids age 8-16 months (“infants”) and 653 words from age 16-30 months (“toddlers”). These are also broken into classes like “toys” and “furniture/rooms” and “articles”. Descriptive words like little and big are used relatively late compared to, say, body parts. But well before things like time-related words or helping verbs.]

Sentiment corpus

25 Jan

Found this and thought I’d pass it along to folks interested in sentiment/opinion/emotion research:

If you’re at an academic institution, they’ll give you access to a variety of things tagged with sentiments. What you’ll get is a tag that is the average from three human beings. That’s not really a lot, though that is typical in the field right now. Each has a  1-5 positive strength score and a separate 1-5 negative strength score.

  • BBC News forum posts: 2,594,745 comments from selected BBC News forums and > 1,000 human classified sentiment strengths.
  • Digg post comments: 1,646,153 comments on Digg posts (typically highlighting news or technology stories) and > 1,000 human classified sentiment strengths.
  • MySpace (social network site) comments: six sets of systematic samples (3 for the US and 3 for the UK) of all comments exchanged between pairs of friends (about 350 pairs for each UK sample and about 3,500 pairs for each US sample) from a total of >100,000 members and > 1,000 human classified sentiment strengths.