Linguists had a big conference in Boston this past weekend and they got together to vote “#hashtag” as the 2012 Word of the Year. (My own Twitter pleas for “Honey Boo Boo” as WOTY went unheeded.)
This post is a quick summary of what went on in Twitter in the big conferences for the Linguistics Society of America (LSA) and the Modern Languages Association (MLA). At the bottom of the post, I also show how to grab this kind of data using TwitteR for R.
For the data, I restrict myself to everything with that was marked #lsa2013 or #mla13. (I’m doing a sleight of hand here–the WOTY was part of the American Dialect Society, which runs its conference alongside the LSA.)
From Jan 3 to Jan 8, there were 872 tweets with #lsa2013 (I’m including RTs in this number–drop anything with any “RT” and you’re down to 584 tweets). After removing common words, here’s the word cloud–click to enlarge.
Since linguists liked hashtags for WOTY–how many did they use? Holding aside “#lsa2013″ (since it is defined to be part of every tweet I’m looking at), there were 75 different hashtags, used a total of 305 times. The most popular–as you can see in a stripped form in the word cloud, were #ads2013, #woty12, #ads, #woty2012 (most of these were people tagging tweets about the WOTY vote with multiple tags). Also fairly popular: #mla13, which was the tag for the Modern Languages Association’s annual meeting, which was happening at the same time, also in Boston.
Linguists did tweet about findings in presentations they were giving/watching, but not really all that many and there isn’t really enough activity in any particular topic area/sub-discipline to puff anything in the word cloud up. Plenaries do get the most live tweeting, as you’d expect. If “variation” or “rickford” catch your eye, see my summary of NWAV 41 from November, which is the big sociolinguistics conference in the US.
Our colleagues in the MLA were a bit more reporterly and critiquey. That is, qualitatively, I think they had a lot of really interesting conversations and observations. The LSA tweets were more independent of each other.
The MLA folks are also a lot more garrulous–their conference was longer but they’d still win if you restricted to even just a day of data. Here, I’m reporting data from the first 1,580 tweets per day from Jan-3 to today (why this restriction? Check out the mini-tutorial below). There were 5,664 #mla13 tweets–3,774 if we remove retweets marked with “RT”…note that this means that both conferences were similarly retweety–about 33% of tweets at both conferences involved retweeting.
Notice that the MLA folks had a strong convention to mark the session they were in (that’s the big #s112, etc). They had 17 sessions hashtagged more than 50 times. Overall, they had 472 different hashtags, used a total of 4,870 times.
- In corpus linguistics it’s useful to distinguish “types” (like individual words) from “tokens” (uses of those types). The ratio of hashtag TYPES to tweets is similar for LSA and MLA people, but the MLA folks are using theirs a lot more. (Again, I think this has to do with the fact that the MLA folks were consistently labeling their sessions and doing more conversational stuff than the linguists.)
The biggest other hashtags for the MLA folks were #altac (the alternative academic movement), #nerduendos (sexual innuendos by nerds) and #elit (electronic literature, like these love letters). On the LSA side, I do wish that @dsbigham‘s “Boston is Burning” hashtag had caught on: #sweatervestrealness.
Btw, even though badges to one conference got people into the other, there really were only 14 tweets that had both tags. Maybe there were people going back and forth, but they weren’t tweeting about it. #MissedOpportunities
Finally, a little bit about the “who”. 153 different people used the #lsa2013 tag, the most prolific was @sociolx, who in real life is David Bowie but the linguist-who-lives-in-Alaska-and-presented-about-how-young-Alaskans-have-vowels-like-coastal-Californians, though obviously he is often forced to live in the shadow of the-happy-66th-birthday-singer-actor-Goblin-king. Bowie had over 3 times as many tweets as the next person.
Here are all the linguists with 20+ tweets (go follow them if you haven’t already).
Over in the MLA, there were 1,024 different people using the #mla13 tag. Here are the ones that had 50+, go follow them, too.
How to do this yourself
Initially, I just grabbed the LSA tweets by going to Twitter’s web search, searching for “All” (not Top) and then copying and pasting. That’s a kind of dumb way to do things but I didn’t mind. But that doesn’t work for all the bazillions of MLA tweets. So I used the TwitteR package for R.
- Open R
- Install packages that you need (TwitteR, ROAuth, digest, bitops, RCurl, rjson)
- Load the packages mentioned above, e.g., “library(twitteR)”
- Now, you can use OAuth to get some fancier capabilities. That’s going to involve going to Twitter and registering as a Developer. I ran into a dumb certificate problem, so I ended up not doing that. That’s why for the MLA I don’t have “all” the data. But since this is just meant to be a quick project, I decided to let it be.
- Now we start the actual searching through the Twitter API.
mlaJan2to3<-searchTwitter("#mla13", n=1580, since="2013-01-02", until="2013-01-03"
Do this for each date (the LSA data is small enough that you don’t have to specify an “until” date and you can still get everything).
Now you want to turn this into a “data frame” so it’s easier to deal with.
And do this for each one.
Now combine them
mla<-rbind(mla1, mla2, mla3, mla4, mla5)
And write it as a table. If you like Excel, it’ll be easiest if you use some sort of separator like “|”, which I do here. Regardless, there will be some clean-up you’re going to have to do.
(Personally, I’d create one file for all the MLA stuff and one file for all the LSA stuff.)