Optimizing Your Infrastructure for LAW & Explore eDiscovery
By: Joshua Tucker
It’s safe to say Microsoft isn’t going out of business anytime soon. Last year alone they grew 18 percent, reaching 168 billion dollars*. They are continuously making updates to their software, improving their products and functionality, and purchasing emerging software. They want to empower every person and organization on the planet to achieve more*, but the power you obtain from the software is up to you. Microsoft does not know your intended purpose or use of their software; all they can do is provide the software and the barebone requirements to make it run.
CloudNine software is no different. Let’s take a deep dive into your infrastructure and how you can optimize it with the CloudNine on-premise processing platforms.
We see that several of our clients run their environments with the most minimal recommended resources. Just like Microsoft can’t know how large your SQL server needs to be, we don’t know the level of demand your client’s data is putting on your workstation. What we DO know is that the number of files per case is growing, the complexity of files is growing, and resources are sparse.
We will cover the areas where we can make vast improvements in the efficiency in the way you are using your CloudNine software.
Your Local Area Network
Let’s use the common “business triangles” as a frame of reference. Examples would be “people, technology, and process” or “team, leadership, and mission”, or, my favorite, “price, speed, and quality”. The more your balanced business triangle, the better. Too much or not enough emphasis on one side and that balance will start to wane.
The eDiscovery version of the business triangle is called the ‘Local Area Network’. The first side of this ‘Local Area Network’ is the hardware or the backbone of your infrastructure. The second side would be the software, or the muscle needed to use that backbone. The third side is your network file server or the brain’s storage area, which will hold all the knowledge that our software is going to discover for you. And finally, the three sides are then connected, like sinew, with your local network speed.
You want to find the sweet spot that balances cost, throughput demands, speed to review, and hardware budget. Let us go ahead and call this the “Goldilocks Zone”.
Real-life case study: About 8 years ago, we were working with a client that had a few virtual machines and a few physical machines. The virtual machines were 4 core and 8GB of RAM. The physical machines were 8 core and 16GB of RAM. IT wanted to get rid of the physical machines, but there was resistance to letting them go because they were able to process so much faster than the virtual machines. We conducted some testing to find the Goldilocks Zone between the amount of data being processed, the expected speed, and the cost. We created a few virtual machines with 4, 8, and 12 cores and ran tests to determine the correct core count for our company. We determined that an 8-core box with 16GB of RAM was able to process data much faster than a 4-core box with only 8GB of RAM.
After we completed optimizing the processing machines, we ventured forth into the other areas of our infrastructure.
Next, we reached out to our SQL team to see what would happen if we added more RAM and more SQL cores. We saw the same result. As we added more resources, we found that we were able to increase the speed on LAW’s communication with SQL. Faster communication equals a faster read/write, which equated to a faster processing speed. During this testing we also found that the more SQL cores, the more we could horizontally spread out the processing tasks on our LAW machines (i.e., we could have more machines writing to the same database).
Note: Today, I have a simple equation to determine the correct size of SQL: Take the total number of read/write instances that can be communicating or interacting with SQL. Divide that number by three. The resulting number is the SQL cores needed. For RAM, take the same number of instances and multiply it by four.
After we completed this environment review, we had larger machines, faster read/write capability, and more machines to process on each matter. The Goldilocks Zone for SQL ensures that you have the right number of SQL cores and RAM per instances that have read/write work with SQL.
(For LAW workstations is highly suggested at 8 core and 16gb of RAM. For Explore that was 8 core and 32gb of RAM.)
Note: Your LAN does not have to be local to your office, but SQL, the LAW database folder structure and the workstations all need to be in close proximity to each other. The closer the better.
Software and Upgrades
Let’s go back to our Microsoft analogy. Microsoft keeps improving their product and each version of the operating system has the potential of changing the location or how certain files work. It is imperative that the operating system that is installed on your workstations is supported by the version of the product that you are going to use. If it isn’t, the software could act in a way that is completely unexpected – or worse.
The data we process can be a threat to our organization (and this does go for everyone!) and the best way to protect yourself is to be up to date on patches and virus software. I highly suggest that you first patch in a test environment, testing each part of the tool and making sure that the patching will not interfere with your work. The more up to date you can test, the more secure your, and your client’s, data will be.
One thing I like about the right test environment is that once your testing is done, you can make an image and deploy that image to the rest of your workstations. It is fast and efficient.
How your processing engine gets metadata to you matters. For instance, there are engines, like LAW, that will expand the files and harvest all the metadata. This type of processing is slower in getting the data in review, but much faster in the final export. There are also engines, like CloudNine Explore, that will hold off on expanding the data but harvest all the text and metadata extremely quickly. This workflow is great for ECA purposes.
How deep these tools dig into your data is also important. You never want a want privileged document produced because your processing engine did not discover it. Find out if your engine is collecting all the natives, text, and metadata that you need for these legal matters, and then come up with a workflow that will accentuate the strengths of your tool.
Having an Investment in your File Storage
The price of data storage has been coming down for years. Which is great news considering the fact that discoverable data keeps growing and will continue grow at an astounding pace. It is estimated that this past year, that each person on the planet created 1.7 megabytes of information each second. Every matter’s data size has increased and with it, the speed to review. All of this must run efficiently, all of it must be backed up, and all of it must be in your disaster recovery plans.
Network speeds matters. It ties your infrastructure together. If the processing machine can’t talk to the SQL machines quickly, or to the network storage efficiently, then it won’t perform at top speed, no matter how many cores you have. Network speed should be considered not only for the processing department, but for your whole company. We highly suggest a gigabit network, and if you are a firm or legal service provider, you might want to be looking at a 10-gigabit network.
Even with a gigabit network, your workstations, SQL server, and file server need to be local to each other. Having one data center or a or central location helps keep those resources working more effectively, getting you a higher return on investment on your machines.
Pro tip! There is a quick and easy way to test your network speed without having to contacted IT. Find a photo that is near 1mb and put it in the source location. Log into one of your workstations, open a window to that source location, and drag that image to your desktop. Then, drag it back. Both times that you move this image should be instantaneous to you. If either move takes a more than one second, then your network speed needs to be improved.
RECAP
It is our responsibility to figure out what we need to get full capacity out of outside tools. To run CloudNine’s LAW we need workstations that have at least an 8 core and 16gb RAM. For CloudNine Explore workstations, we need 8core and 32gb or RAM and SQL environment that adjusts to number of instances that are interacting with it.
Ensure that your software matches up with the recommended versions for your processing engine. If you are on or are working with an operating system that wasn’t on the list of that processing engine, we know that you could get unexpected results – or worse data. Line up the programs, test before you deploy, and stay up to date.
Know where your data is stored and the speed at which your systems talk to each other. Keep your environment in close proximity.
All in all, in order to get the top speed and performance out of CloudNine’s tools (or our third-party software your purchase), you must invest into the right resources.
Keep working towards your “Goldilocks Zone” – the sweet spot between speed, price, and quality.
If you are interested in having a CloudNine expert analyze your environment and provide recommendations for efficiencies, please contact us for a free Health Check.
*https://www.statista.com/statistics/267805/microsofts-global-revenue-since-2002/
* https://www.priceintelligently.com/blog/subscription-revenue-adobe-gopro-microsoft-gillette
* https://www.comparably.com/companies/microsoft/mission