JSON is not a document, it is data… and lots of it!
By: Trent Livingston
Modern eDiscovery deals with much more than just documents. In 2020, people created 1.7 MB of data every second, and most of that data was likely stored in a database. Now, newer applications like Facebook, Twitter, Slack are storing copious amounts of data ranging from tweets, wall posts, and chats.
Many of these artifacts are stored with complementary data that can include file links, reactions (such as “likes”), and even geolocation information. Accessing this modern data in order to leverage it for a discovery request often requires some sort of archive process from the originating application, and that is when JSON enters the picture.
JSON stands for “JavaScript Object Notation”, but that doesn’t mean you need to know how to write JavaScript (or any code for that matter)! If you’ve ever dealt with discovery surrounding any of the aforementioned applications, you’ve likely come across a few JSON files. In a nutshell, think of JSON as a relational spreadsheet where one column of data in one tab of your spreadsheet is defined by another column of data in another tab in another spreadsheet.
For example, you might have a column called “Address” in a spreadsheet and that column contains a series of numeric “id” values that reference another tab in that same spreadsheet. In this secondary tab, each address is broken down into values for that address that may include things like “country”, “zip”, or “street”. All those values that have the same “id” belong to the “address” reference in the previous tab. Simply put, this is structured data. JSON data is no different. However, JSON can contain a multitude of data structures varying from simple to complex.
The problem with JSON is while there are multiple JSON viewers and formatters online, they do not understand the defined data structures within. Each platform defines these data structures differently, and while the vehicle may be the same, the defined structure is usually different from application to application (as well as the application’s version). Therefore, the data within the JSON often comes out in an unexpected format when using a generic formatting tool, and the relationships between the data are often lost or jumbled. (By the way, you should never use any online “free” tool to format potentially confidential or privileged information).
Therefore, it is important to work with someone who understands JSON as an eDiscovery data source. A single JSON file just a few megabytes in size may represent hundreds, if not thousands of messages, contain numerous links to files, as well as key data relevant to your investigation or litigation. Contained within a JSON file could be any number of nested data formats, including:
- Strings: a sequence of zero or more Unicode characters, which could include emojis in a Unicode format, usernames, or an actual text message.
- Numbers: a numeric value that can represent a date, intrinsic value, id, or potentially a true/false value represented as “1” or “0”.
- Objects: a series or collection of one or more data points represented as named value pairs that create meaning as a whole, such as longitude, latitude, elevation, rate of speed, and direction that make up the components of a device’s location.
- Arrays: a collection of values or elements of the same type that can be individually referenced by using an index to a unique identifier, such as the choices of a color of a car on a website or a set of canned response values listed in a software chat application.
The thing to remember is the JSON file is actually one big “object”, which is the parent to all of the named value pairs beneath it. Within this object, you can have more objects that contain numbers, strings, arrays, and yet even more objects. Confused yet?
Not to worry! It is understandable that all this JSON data can quickly become a source of frustration! The question that remains is, “How do I make sense of all of this structured data and present it for review in a reasonably usable format?”
Here are some tips:
- Make sure you do not overlook JSON as part of your electronic discovery protocol
- Leverage an experienced team to help you understand the JSON output
- Document the source application whenever possible (some JSON include access keys that can expire or be terminated at any time, such as Slack)
- Preserve the JSON file as you would any other evidentiary data source
- Document the chain of custody for the JSON file (originating application and version of that application, who conducted the export, as well as any access keys that may be transitory or temporary and their date of expiration)
- Treat each JSON file and associated content as a potential source of PII, confidential, and/or privileged information given the breadth of data that each may contain
- Work with a team and a product that can parse, ingest, and subsequently present JSON data in a usable format for review and production
While there is not an off-the-shelf solution for every JSON file in existence, CloudNine ESI Analyst is a platform designed for the multitude of data types that can be extracted from just about any JSON file out there. Many of which can be easily mapped to a data type construct within our SaaS application that allows for presentation, review, and production in a reasonably usable format.
Contact us today for a demonstration and further detail!