Identification

internal software infrastructure

Optimizing Your Infrastructure for LAW & Explore eDiscovery

By: Joshua Tucker

It’s safe to say Microsoft isn’t going out of business anytime soon. Last year alone they grew 18 percent, reaching 168 billion dollars*. They are continuously making updates to their software, improving their products and functionality, and purchasing emerging software. They want to empower every person and organization on the planet to achieve more*, but the power you obtain from the software is up to you. Microsoft does not know your intended purpose or use of their software; all they can do is provide the software and the barebone requirements to make it run.

CloudNine software is no different. Let’s take a deep dive into your infrastructure and how you can optimize it with the CloudNine on-premise processing platforms.

We see that several of our clients run their environments with the most minimal recommended resources. Just like Microsoft can’t know how large your SQL server needs to be, we don’t know the level of demand your client’s data is putting on your workstation. What we DO know is that the number of files per case is growing, the complexity of files is growing, and resources are sparse.

We will cover the areas where we can make vast improvements in the efficiency in the way you are using your CloudNine software.

Your Local Area Network

Let’s use the common “business triangles” as a frame of reference. Examples would be “people, technology, and process” or “team, leadership, and mission”, or, my favorite, “price, speed, and quality”. The more your balanced business triangle, the better. Too much or not enough emphasis on one side and that balance will start to wane.

The eDiscovery version of the business triangle is called the ‘Local Area Network’. The first side of this ‘Local Area Network’ is the hardware or the backbone of your infrastructure. The second side would be the software, or the muscle needed to use that backbone. The third side is your network file server or the brain’s storage area, which will hold all the knowledge that our software is going to discover for you. And finally, the three sides are then connected, like sinew, with your local network speed.

You want to find the sweet spot that balances cost, throughput demands, speed to review, and hardware budget. Let us go ahead and call this the “Goldilocks Zone”.

Real-life case study: About 8 years ago, we were working with a client that had a few virtual machines and a few physical machines. The virtual machines were 4 core and 8GB of RAM. The physical machines were 8 core and 16GB of RAM.  IT wanted to get rid of the physical machines, but there was resistance to letting them go because they were able to process so much faster than the virtual machines. We conducted some testing to find the Goldilocks Zone between the amount of data being processed, the expected speed, and the cost. We created a few virtual machines with 4, 8, and 12 cores and ran tests to determine the correct core count for our company. We determined that an 8-core box with 16GB of RAM was able to process data much faster than a 4-core box with only 8GB of RAM.

After we completed optimizing the processing machines, we ventured forth into the other areas of our infrastructure.

Next, we reached out to our SQL team to see what would happen if we added more RAM and more SQL cores. We saw the same result. As we added more resources, we found that we were able to increase the speed on LAW’s communication with SQL. Faster communication equals a faster read/write, which equated to a faster processing speed. During this testing we also found that the more SQL cores, the more we could horizontally spread out the processing tasks on our LAW machines (i.e., we could have more machines writing to the same database).

Note: Today, I have a simple equation to determine the correct size of SQL:  Take the total number of read/write instances that can be communicating or interacting with SQL. Divide that number by three. The resulting number is the SQL cores needed. For RAM, take the same number of instances and multiply it by four.

After we completed this environment review, we had larger machines, faster read/write capability, and more machines to process on each matter. The Goldilocks Zone for SQL ensures that you have the right number of SQL cores and RAM per instances that have read/write work with SQL.

(For LAW workstations is highly suggested at 8 core and 16gb of RAM. For Explore that was 8 core and 32gb of RAM.)

Note: Your LAN does not have to be local to your office, but SQL, the LAW database folder structure and the workstations all need to be in close proximity to each other. The closer the better.

Software and Upgrades

Let’s go back to our Microsoft analogy. Microsoft keeps improving their product and each version of the operating system has the potential of changing the location or how certain files work. It is imperative that the operating system that is installed on your workstations is supported by the version of the product that you are going to use. If it isn’t, the software could act in a way that is completely unexpected – or worse.

The data we process can be a threat to our organization (and this does go for everyone!) and the best way to protect yourself is to be up to date on patches and virus software. I highly suggest that you first patch in a test environment, testing each part of the tool and making sure that the patching will not interfere with your work. The more up to date you can test, the more secure your, and your client’s, data will be.

One thing I like about the right test environment is that once your testing is done, you can make an image and deploy that image to the rest of your workstations. It is fast and efficient.

How your processing engine gets metadata to you matters. For instance, there are engines, like LAW, that will expand the files and harvest all the metadata. This type of processing is slower in getting the data in review, but much faster in the final export. There are also engines, like CloudNine Explore, that will hold off on expanding the data but harvest all the text and metadata extremely quickly. This workflow is great for ECA purposes.

How deep these tools dig into your data is also important. You never want a want privileged document produced because your processing engine did not discover it. Find out if your engine is collecting all the natives, text, and metadata that you need for these legal matters, and then come up with a workflow that will accentuate the strengths of your tool.

Having an Investment in your File Storage

The price of data storage has been coming down for years. Which is great news considering the fact that discoverable data keeps growing and will continue grow at an astounding pace. It is estimated that this past year, that each person on the planet created 1.7 megabytes of information each second. Every matter’s data size has increased and with it, the speed to review. All of this must run efficiently, all of it must be backed up, and all of it must be in your disaster recovery plans.

Network speeds matters. It ties your infrastructure together. If the processing machine can’t talk to the SQL machines quickly, or to the network storage efficiently, then it won’t perform at top speed, no matter how many cores you have. Network speed should be considered not only for the processing department, but for your whole company. We highly suggest a gigabit network, and if you are a firm or legal service provider, you might want to be looking at a 10-gigabit network.

Even with a gigabit network, your workstations, SQL server, and file server need to be local to each other. Having one data center or a or central location helps keep those resources working more effectively, getting you a higher return on investment on your machines.

Pro tip! There is a quick and easy way to test your network speed without having to contacted IT. Find a photo that is near 1mb and put it in the source location. Log into one of your workstations, open a window to that source location, and drag that image to your desktop. Then, drag it back. Both times that you move this image should be instantaneous to you. If either move takes a more than one second, then your network speed needs to be improved.

RECAP

It is our responsibility to figure out what we need to get full capacity out of outside tools. To run CloudNine’s LAW we need workstations that have at least an 8 core and 16gb RAM. For CloudNine Explore workstations, we need 8core and 32gb or RAM and SQL environment that adjusts to number of instances that are interacting with it.

Ensure that your software matches up with the recommended versions for your processing engine. If you are on or are working with an operating system that wasn’t on the list of that processing engine, we know that you could get unexpected results – or worse data. Line up the programs, test before you deploy, and stay up to date.

Know where your data is stored and the speed at which your systems talk to each other. Keep your environment in close proximity.

All in all, in order to get the top speed and performance out of CloudNine’s tools (or our third-party software your purchase), you must invest into the right resources.

Keep working towards your “Goldilocks Zone” – the sweet spot between speed, price, and quality.

If you are interested in having a CloudNine expert analyze your environment and provide recommendations for efficiencies, please contact us for a free Health Check.

 

*https://www.statista.com/statistics/267805/microsofts-global-revenue-since-2002/

* https://www.priceintelligently.com/blog/subscription-revenue-adobe-gopro-microsoft-gillette

* https://www.comparably.com/companies/microsoft/mission

* https://docs.microsoft.com/en-us/sql/sql-server/install/hardware-and-software-requirements-for-installing-sql-server-2019?view=sql-server-ver15

 

Have you considered the implications of time zones when it comes to your litigation needs?

by: Trent Livingston, Chief Technology Officer

Most of today’s legal technology platforms require that a time zone be selected at the time of ingestion of data. Or, in the case of forensic software, the time stamp is displayed with a time zone offset based upon the device’s time zone setting. However, when conducting a review, the de facto time zone setting for your litigation is often determined ahead of time, often based upon subjective information. This is likely the region in which the primary custodian resides. Once that time zone is selected, everything is adjusted to that time zone. It is “set in stone” so to speak. In some cases, this is fine, but in others, it can complicate things, especially if you want to alter your time zone mid-review.

Let’s start by understanding time zones, which immediately begs the question, “how many time zones are there in the world?” After all, it can’t be that many, right? Well, don’t start up your time machine just yet! To summarize a Quora answer (https://www.quora.com/How-many-timezones-do-we-have-in-the-world) we arrive at the following confusing mess.

Spanning our globe, there are a total of 41 different time zones. Given the number of time zones, “shifting time” (so to speak) can be of the utmost importance when examining evidentiary data.

If everything is set to Eastern Standard Time but does not properly allocate for time zone changes, a software application could arbitrarily alter a time stamp inconsistently, and consistency is what really matters! What happens if two of the parties to a matter are in New York while two of the parties are in Arizona? Arizona does not observe Daylight Saving Time. This could result in a set of timestamps being thrown off by an hour spanning approximately five months of the data set (based upon Daylight Saving Time rules). Communication responses that may have happened within minutes now seemingly occur an hour later (or earlier depending on how to look at it). Forensic records could fall out of sync with other evidentiary data and communications or, worse yet, sworn testimony. The key is to ensure consistency to avoid confusion.

CloudNine’s ESI Analyst (ESIA) normalizes everything to Coordinated Universal Time (UTC) upon ingestion, leveraging the original time zone or offset. By doing this, ESIA can display the time zone of the project manager’s choosing (either set at the project level or by the specific user’s account time zone setting). This allows for the time stamp display of any evidence to be changed at any time to the desired time zone across an entire project, allowing for the dynamic view of time stamps. Not only can it be changed during a review, but also set at export. All original metadata is stored, and available during export so that the adjusted time stamp can be leveraged for timelines, while the original time stamp and time zone settings are preserved for evidentiary purposes.

When performing analysis of disparate data sets, this methodology allows users to adjust data to see relative time stamps to a particular party involved in that specific investigation. For example, an investigation may involve multiple parties that are all located in different time zones. Additionally, these users may be traveling to different countries. Adjusting everything to Eastern Time may show text messages arriving and being responded to in the late hours of the day not accounting for the fact that perhaps the user was abroad and was actually responding during normal business hours.

While seemingly innocuous, it can make a big difference in how a jury perceives the action of the party, depending on the nature of the investigation.

As they say… “timing is everything!” especially when it comes to digital evidence in today’s modern era.

Now, where did I leave my keys to my DeLorean?

Learn more about CloudNine ESI Analyst and its ability to deduplicate, search, filter, and adjust time zones across all data types at once here.

Four Times Self-Collection Went Wrong

Per FRCP Rule 26(g), attorneys must sign discovery requests, responses, and objections. To the best of the attorney’s knowledge, the signature certifies three factors: 1) the document is compliant with existing rules and regulations; 2) it has no improper purpose such as slowing litigation; 3) it is not unreasonably burdensome to the producing party. This may become an issue if your client opts for self-collection. If counsel does not oversee or supervise the collection process, they have violated the rule and will be sanctioned accordingly. [1] During self-collection, custodians are responsible for identifying and gathering potentially relevant ESI on their own. When conducted carefully, self-collection may be adequate and cost-effective for small cases. However, there are several risks involved. The client may lose valuable metadata if their collection is done incorrectly. Additionally, they may purposely or accidentally omit incriminating evidence. Overall, if the self-collection process is not defensible and well-documented, the evidence will be rejected, and sanctions will follow. [2]

Self-Collection Cases and Sanctions

  • EEOC v. M1 1500 Group is a well-known age discrimination case in which two of the defendant’s employees collected ESI without any counsel supervision. Counsel signed the discovery response despite their hands-off approach. The plaintiffs moved to compel after counsel admitted to their negligence and the defendants produced less evidence than expected. Judge Matthewman granted the defendants a second chance but required both parties to collaborate in a robust meet and confer. The court also issued sanctions and advised counsel to seek the assistance of an ESI vendor. [3]
  • Over a year after the case ended, Green v. Blitz reopened once the court discovered that the defendant destroyed and omitted relevant email evidence. Only one employee oversaw the collection process, and he described himself in court as “computer illiterate.” After confirming the relevance of the missing emails, the court imposed civil contempt sanctions worth $250,000. The defendants also faced a $500,00 purging sanction unless they provided a copy of the order to all litigants who filed against them within the past two years. As the final sanction, Blitz USA was ordered to file a copy of the order when filing any lawsuit within the next five years. [4]
  • Nat’l Day Laborer Org. v. U.S. Immigration and Customs Enforcement Agency involved various government agencies who lacked a uniform collection plan. The agencies also failed to properly document their differing collection processes. Consequently, the agencies were sanctioned for relying too heavily on self-collection. They were also reprimanded for their undocumented and uncoordinated efforts.
  • In Suntrust Mortgage Inc. v. AIG United Guaranty Corp., the defendant chose not to seek the help of any forensic experts or ESI vendors. One employee was in charge of the identification and collection process. By copying and pasting different emails together, the employee tampered with the evidence before production. The fabrication resulted in court issued financial sanctions. [2]

Avoid self-collection pitfalls by utilizing CloudNine’s Collection Manager, a breakthrough extraction solution for Office 365 emails and OneDrive files. To learn more information or request a demo, visit: https://cloudnine.com/ediscovery-software/cloudnine-collection-manager/

 

[1] Gretchen E. Moore, “The Perils of Self-Collection of Electronically Stored Information,” The National Law Review, April 28, 2021.

[2] FindLaw Attorney Writers, “Self-Collection: The Good, the Bad, and the Ugly,” FindLaw, June 20, 2016.

[3] Kelly Twigger, “Beware of the Perils of Allowing Self-Collection,” eDiscovery Assistant, July 9, 2020.

[4] Peter Vogel, “Another Trap is Sprung: The Danger of Self-Collection,” Foley & Lardner LLP, June 20, 2011.

When Litigation Hits, The First 7 to 10 Days is Critical: eDiscovery Throwback Thursdays

Here’s our latest blog post in our Throwback Thursdays series where we are revisiting some of the eDiscovery best practice posts we have covered over the years and discuss whether any of those recommended best practices have changed since we originally covered them.

This post was originally published on June 28, 2012, when eDiscovery Daily was less than two years old.  This post has already been revisited a couple of times since and has been referenced in a handful of webcasts as well.  It’s still good advice today.  Enjoy!

When a case is filed (or even before, if litigation is anticipated then), several activities must be completed within a short period of time (often as soon as the first seven to ten days after filing) to enable you to assess the scope of the case, where the key electronically stored information (ESI) is located and whether to proceed with the case or attempt to settle with opposing counsel.  Here are several of the key early activities that can assist in deciding whether to litigate or settle the case.

Activities:

  • Create List of Key Employees Most Likely to have Documents Relevant to the Litigation: To estimate the scope of the case, it’s important to begin to prepare the list of key employees that may have potentially responsive data. Information such as name, title, e-mail address, phone number, office location and where information for each is stored on the network is important to be able to proceed quickly when issuing hold notices and collecting their data.
  • Issue Litigation Hold Notice and Track Results: The duty to preserve begins when you anticipate litigation; however, if litigation could not be anticipated prior to the filing of the case, it is certainly clear once the case if filed that the duty to preserve has begun. Hold notices must be issued ASAP to all parties that may have potentially responsive data.  Once the hold is issued, you need to track and follow up to ensure compliance.  Here are a couple of recent posts regarding issuing hold notices and tracking responses.
  • Interview Key Employees: As quickly as possible, interview key employees to identify potential locations of responsive data in their possession as well as other individuals they can identify that may also have responsive data so that those individuals can receive the hold notice and be interviewed.
  • Interview Key Department Representatives: Certain departments, such as IT, Records or Human Resources, may have specific data responsive to the case. They should also have certain processes in place for regular destruction of “expired” data, so it’s important to interview them to identify potentially responsive sources of data and stop routine destruction of data subject to litigation hold.
  • Inventory Sources and Volume of Potentially Relevant Documents: Potentially responsive data can be located in a variety of sources, including: shared servers, e-mail servers, employee workstations, employee home computers, employee mobile devices (including bring your own device (BYOD) devices), portable storage media (including CDs, DVDs and portable hard drives), active paper files, archived paper files and third-party sources (consultants and contractors, including cloud storage providers). Hopefully, the organization already has created a data map before litigation to identify the location of sources of information to facilitate that process.  It’s important to get a high-level sense of the total population to begin to estimate the effort required for discovery.
  • Plan Data Collection Methodology: Determining how each source of data is to be collected also affects the cost of the litigation. Are you using internal resources, outside counsel or a litigation support vendor?  Will the data be collected via an automated collection system or manually?  Will employees “self-collect” any of their own data?  Answers to these questions will impact the scope and cost of not only the collection effort, but the entire discovery effort.

These activities can result in creating an inventory of potentially responsive information and help in estimating discovery costs (especially when compared to past cases at the same stage) that will help in determining whether to proceed to litigate the case or attempt to settle with the other side.

So, what do you think?  How quickly do you decide whether to litigate or settle?  Please share any comments you might have or if you’d like to know more about a particular topic.

Sponsor: This blog is sponsored by CloudNine, which is a data and legal discovery technology company with proven expertise in simplifying and automating the discovery of data for audits, investigations, and litigation. Used by legal and business customers worldwide including more than 50 of the top 250 Am Law firms and many of the world’s leading corporations, CloudNine’s eDiscovery automation software and services help customers gain insight and intelligence on electronic data.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine. eDiscovery Daily is made available by CloudNine solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscovery Daily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.