eDiscovery Daily Blog

Twitter Might “Bug” You if You Want to Retrieve Archive Data – eDiscovery Best Practices


Thanks to the Google Alerts that I set up to send me new stories related to eDiscovery, I found an interesting blog post from an attorney that appears to shed light on an archival bug within Twitter that could affect people who may want to retrieve Twitter archival data for eDiscovery purposes.

In Erik J. Heels’ blog, his latest post (Twitter Bug Makes Tweet Archives Unreliable For eDiscovery), he starts by noting that his very first tweet of October 30, 2008 is no longer active on his site – his earliest tweet was dated September 5, 2010.  All attempts to try to locate the tweets were unsuccessful and customer service was no help.

After some digging, Erik found out that, on October 13, 2010, Twitter announced the “new Twitter” which included a new user interface.  As part of that revamping, Twitter changed the format for its status URLs (Tweets) so that the sequential number at the end of each Tweet (the Tweet ID) changed length, eventually doubling from nine digits in 2008 to 18 digits today (which supports up to one quintillion tweets).

Then, on December 12, 2012, Twitter announced that users could export archives of their Tweets (via your account’s Settings page).  The result is a nine field comma-separated values (CSV) file with information, including the tweet ID.  So, now you could retrieve your old tweets that were no longer actively stored, right?  Not necessarily.

As Erik found out, when he exported the file and went to retrieve old tweets, some of his tweet IDs actually pointed to other people’s tweets!  You can see the details of his tests in his blog post or via his 60 second YouTube video here.

Obviously, if you need to retrieve archived tweets for eDiscovery purposes, that’s not a good thing.

As it happens, CloudNine Discovery has our own Twitter account (@Cloud9Discovery) and has since August of 2009 (we were known as Trial Solutions back then), so I decided to run my own test and exported our archive.  Here’s what I found:

  • Our very first tweet was August 17, 2009 (Trial Solutions Ranks 2,830 on the Exclusive 2009 Inc. 5000).  It retrieved correctly.  The bit.ly link within the tweet isn’t hyperlinked, but if you copy and paste it into the browser address, it correctly retrieves (obviously, this will only be the case if the referenced web page still exists).
  • In fact, all of the early tweets that I tested retrieved with no problem.
  • Then, on December 1, 2010, I encountered the first tweet that didn’t retrieve correctly (one of our blog posts titled eDiscovery Project Management:  Effectively Manage Your Staff) with a tweet ID of 9924696560635900 (Full URL: https://twitter.com/Cloud9Discovery/status/9924696560635900).  It didn’t point to a different person’s tweet, it simply said “Sorry, that page doesn’t exist!”
  • From that point on, none of the tweets that I tried to retrieve would retrieve – all gave me the “page doesn’t exist” message.
  • I even tried to retrieve our tweet of yesterday’s blog post (Defendant Ordered to Produce Archived Emails Even Though Plaintiff Failed to Produce Theirs – eDiscovery Case Law) via the tweet ID provided in the archive file (535098008715538000).  Even it wouldn’t retrieve.  Wait a minute, what’s going on here!

I then retrieved our tweet of yesterday’s blog post by clicking on it directly within Twitter.  Here is the URL to the tweet with the ID at the end: https://twitter.com/Cloud9Discovery/status/535098008715538434.

See the problem?  The tweet IDs don’t match.

I ultimately determined that all of the tweet IDs provided in the archive file starting on December 1, 2010 end with two or more zeroes.  Starting on November 5, 2010, they all end with at least one zero.  When I started testing those, I re-created Erik’s problem:

Our tweet on November 29, 2010 titled eDiscovery Trends: Sanctions at an All-Time High, (tweet ID: 9199940811100160) actually retrieves a tweet by @aalyce titled @StylesFever snog liam marry louis and niall can take me out 😉 hehe.  Whoops!

It appears as though the archive file provided by Twitter is dropping all digits of the tweet ID after the fifteenth digit and replacing them with zeroes, effectively pointing to either an invalid or incorrect page and rendering the export effectively useless.  Hopefully, Twitter can fix it – we’ll see.  In the meantime, don’t rely on it and be prepared to address the issue if your case needs to retrieve archived data from Twitter.  Thanks, Erik, for the heads up!

So, what do you think?  Have you ever had to preserve or produce data from Twitter in litigation?  Please share any comments you might have or if you’d like to know more about a particular topic.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine Discovery. eDiscoveryDaily is made available by CloudNine Discovery solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscoveryDaily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.