Mining my messages

Hit and Miss #126

Somehow it’s February already??

This morning, a friend and I did a site visit for a course on Indigenous histories and public discourse in Ottawa. (We went to the building formerly known as Langevin Block, facing Parliament Hill. It was renamed in 2017, with coverage from historians and the media alike, on the basis of a murky historical justification. Happy to give you more background if ever you’re curious.) It was snowy and our feet got cold, so now I’m comfortably ensconced in my living room, listening to Riding with the King by B.B. King and Eric Clapton.

Earlier this week, I saw some threads going around on Twitter about exporting Facebook archives.

Facebook’s had an export feature for a few years, but the experience and output seems to have improved considerably in the last year or two. (We probably have the EU and GDPR to thank for that.) You go to a settings page, choose what you want to download, and Facebook notifies you when your download is ready.

And whew, is it ever a download.

Downloading everything Facebook has about me (all posts, messages, photos, settings, and so on), with photos at medium quality, worked out to a 1.98 GB zip. Unzipping bumped that up to 2.19 GB. Of those 2.19 GB, 2.15 GB (98%) was from my messages (including photos, videos, and attachments shared in message threads)—which confirms my longstanding relationship with Facebook as “the place where I can message most people I know”.

My messages contain much of my life’s documentation over the last decade. Friendships and relationships alike are documented there, so many of their ups and downs, countless memories sitting inert, captured in text and photo. (There are entire friendships I’d almost forgotten about, making up far more of my message archive than they do my active memory.) Of course, this is just what’s documented online. But there’s also hints within those messages of life off the screens, of get togethers and dates and working sessions and more.

There’s huge potential for analysis with a corpus like this. My messages archive counts almost 600,000 messages with over 500 different people, more than 4.5 million words exchanged. For example, I can tell you that 45% of those messages (containing 47% of those words) are from me. My most used word was “I” (used 65,000+ times), or, if we remove the most common English words, “just” (16,000+).

Thursday night, I poked around my most active group chat: five people, almost 80,000 messages. Looking just at the timestamp and sender of each message, I could quantify trends that we felt or knew intuitively, like who was most active at which hours, or who responds most quickly. (I “own” 6 and 7 a.m. with that group, and am middle of the pack responsive.)

Working with these data brought a few thoughts to mind:

  • I finally felt the power of metadata, something that was big in the news in the Snowden NSA years, or more recently, with the NYTime’s pieces on location data. From just a few fields, not looking at message content, I could infer plenty.
  • There was also an element of “being observed”: as I chatted with the group that night (discussing my findings, naturally), I was conscious of how my messages would affect stats: how many messages was I sending, what words was I using, how quickly was I responding, and so on. But that was just tied to how I had set up the dataset for analysis—if someone else were to study it, they might approach it with different methods, and my messages would have different effects.
  • This leads directly to my last point: these are historical records, not created to document my life, but doing so as a natural side effect. Then I got to wondering about what I might want to document in messages (or elsewhere) to more intentionally create an archive for the future. (For me, for others, who knows.)

And then from all these points I reflected on the documents or artefacts we use to reconstruct history—and how flawed or incomplete they can be. Facebook doesn’t hold all my digital conversations, but it holds many of them—for some periods of my life, or with some groups of friends, the coverage is very high.

In the last few years, I feel my messaging sites have fragmented considerably. In addition to Facebook, there’s WhatsApp, Twitter, Instagram, Slack, and more. But there’s always been fragmentation: Skype, Google chat, SMS, and others used to factor much higher in my platforms. I haven’t downloaded all those archives, though the auto-archivist part of me feels the need to.

The one last feeling I had was how strange it was to go through these messages as if it were just my history. Because it’s not: these messages are the record of many shared histories, those of me and my many interlocutors. Should I have access to what all those people shared with me over the years? Is it “my” data to walk away with? I don’t have great answers to these questions, beyond noting that the answer may not be an automatic yes. (And let’s not even get into all this information sitting in corporate hands.)

I’ll leave you here for now. I’m curious: what would you like to know if you could poke through your entire message archive? What questions do you have about your messages? What are your thoughts or reactions about some of these themes?

All the best for the week ahead.

Lucas