Top Tip Archives

Optimizing Your Infrastructure for LAW & Explore eDiscovery

June 29, 2022

By: Joshua Tucker

It’s safe to say Microsoft isn’t going out of business anytime soon. Last year alone they grew 18 percent, reaching 168 billion dollars*. They are continuously making updates to their software, improving their products and functionality, and purchasing emerging software. They want to empower every person and organization on the planet to achieve more*, but the power you obtain from the software is up to you. Microsoft does not know your intended purpose or use of their software; all they can do is provide the software and the barebone requirements to make it run.

CloudNine software is no different. Let’s take a deep dive into your infrastructure and how you can optimize it with the CloudNine on-premise processing platforms.

We see that several of our clients run their environments with the most minimal recommended resources. Just like Microsoft can’t know how large your SQL server needs to be, we don’t know the level of demand your client’s data is putting on your workstation. What we DO know is that the number of files per case is growing, the complexity of files is growing, and resources are sparse.

We will cover the areas where we can make vast improvements in the efficiency in the way you are using your CloudNine software.

Your Local Area Network

Let’s use the common “business triangles” as a frame of reference. Examples would be “people, technology, and process” or “team, leadership, and mission”, or, my favorite, “price, speed, and quality”. The more your balanced business triangle, the better. Too much or not enough emphasis on one side and that balance will start to wane.

The eDiscovery version of the business triangle is called the ‘Local Area Network’. The first side of this ‘Local Area Network’ is the hardware or the backbone of your infrastructure. The second side would be the software, or the muscle needed to use that backbone. The third side is your network file server or the brain’s storage area, which will hold all the knowledge that our software is going to discover for you. And finally, the three sides are then connected, like sinew, with your local network speed.

You want to find the sweet spot that balances cost, throughput demands, speed to review, and hardware budget. Let us go ahead and call this the “Goldilocks Zone”.

Real-life case study: About 8 years ago, we were working with a client that had a few virtual machines and a few physical machines. The virtual machines were 4 core and 8GB of RAM. The physical machines were 8 core and 16GB of RAM. IT wanted to get rid of the physical machines, but there was resistance to letting them go because they were able to process so much faster than the virtual machines. We conducted some testing to find the Goldilocks Zone between the amount of data being processed, the expected speed, and the cost. We created a few virtual machines with 4, 8, and 12 cores and ran tests to determine the correct core count for our company. We determined that an 8-core box with 16GB of RAM was able to process data much faster than a 4-core box with only 8GB of RAM.

After we completed optimizing the processing machines, we ventured forth into the other areas of our infrastructure.

Next, we reached out to our SQL team to see what would happen if we added more RAM and more SQL cores. We saw the same result. As we added more resources, we found that we were able to increase the speed on LAW’s communication with SQL. Faster communication equals a faster read/write, which equated to a faster processing speed. During this testing we also found that the more SQL cores, the more we could horizontally spread out the processing tasks on our LAW machines (i.e., we could have more machines writing to the same database).

Note: Today, I have a simple equation to determine the correct size of SQL: Take the total number of read/write instances that can be communicating or interacting with SQL. Divide that number by three. The resulting number is the SQL cores needed. For RAM, take the same number of instances and multiply it by four.

After we completed this environment review, we had larger machines, faster read/write capability, and more machines to process on each matter. The Goldilocks Zone for SQL ensures that you have the right number of SQL cores and RAM per instances that have read/write work with SQL.

(For LAW workstations is highly suggested at 8 core and 16gb of RAM. For Explore that was 8 core and 32gb of RAM.)

Note: Your LAN does not have to be local to your office, but SQL, the LAW database folder structure and the workstations all need to be in close proximity to each other. The closer the better.

Software and Upgrades

Let’s go back to our Microsoft analogy. Microsoft keeps improving their product and each version of the operating system has the potential of changing the location or how certain files work. It is imperative that the operating system that is installed on your workstations is supported by the version of the product that you are going to use. If it isn’t, the software could act in a way that is completely unexpected – or worse.

The data we process can be a threat to our organization (and this does go for everyone!) and the best way to protect yourself is to be up to date on patches and virus software. I highly suggest that you first patch in a test environment, testing each part of the tool and making sure that the patching will not interfere with your work. The more up to date you can test, the more secure your, and your client’s, data will be.

One thing I like about the right test environment is that once your testing is done, you can make an image and deploy that image to the rest of your workstations. It is fast and efficient.

How your processing engine gets metadata to you matters. For instance, there are engines, like LAW, that will expand the files and harvest all the metadata. This type of processing is slower in getting the data in review, but much faster in the final export. There are also engines, like CloudNine Explore, that will hold off on expanding the data but harvest all the text and metadata extremely quickly. This workflow is great for ECA purposes.

How deep these tools dig into your data is also important. You never want a want privileged document produced because your processing engine did not discover it. Find out if your engine is collecting all the natives, text, and metadata that you need for these legal matters, and then come up with a workflow that will accentuate the strengths of your tool.

Having an Investment in your File Storage

The price of data storage has been coming down for years. Which is great news considering the fact that discoverable data keeps growing and will continue grow at an astounding pace. It is estimated that this past year, that each person on the planet created 1.7 megabytes of information each second. Every matter’s data size has increased and with it, the speed to review. All of this must run efficiently, all of it must be backed up, and all of it must be in your disaster recovery plans.

Network speeds matters. It ties your infrastructure together. If the processing machine can’t talk to the SQL machines quickly, or to the network storage efficiently, then it won’t perform at top speed, no matter how many cores you have. Network speed should be considered not only for the processing department, but for your whole company. We highly suggest a gigabit network, and if you are a firm or legal service provider, you might want to be looking at a 10-gigabit network.

Even with a gigabit network, your workstations, SQL server, and file server need to be local to each other. Having one data center or a or central location helps keep those resources working more effectively, getting you a higher return on investment on your machines.

Pro tip! There is a quick and easy way to test your network speed without having to contacted IT. Find a photo that is near 1mb and put it in the source location. Log into one of your workstations, open a window to that source location, and drag that image to your desktop. Then, drag it back. Both times that you move this image should be instantaneous to you. If either move takes a more than one second, then your network speed needs to be improved.

RECAP

It is our responsibility to figure out what we need to get full capacity out of outside tools. To run CloudNine’s LAW we need workstations that have at least an 8 core and 16gb RAM. For CloudNine Explore workstations, we need 8core and 32gb or RAM and SQL environment that adjusts to number of instances that are interacting with it.

Ensure that your software matches up with the recommended versions for your processing engine. If you are on or are working with an operating system that wasn’t on the list of that processing engine, we know that you could get unexpected results – or worse data. Line up the programs, test before you deploy, and stay up to date.

Know where your data is stored and the speed at which your systems talk to each other. Keep your environment in close proximity.

All in all, in order to get the top speed and performance out of CloudNine’s tools (or our third-party software your purchase), you must invest into the right resources.

Keep working towards your “Goldilocks Zone” – the sweet spot between speed, price, and quality.

If you are interested in having a CloudNine expert analyze your environment and provide recommendations for efficiencies, please contact us for a free Health Check.

*https://www.statista.com/statistics/267805/microsofts-global-revenue-since-2002/

* https://www.priceintelligently.com/blog/subscription-revenue-adobe-gopro-microsoft-gillette

* https://www.comparably.com/companies/microsoft/mission

* https://docs.microsoft.com/en-us/sql/sql-server/install/hardware-and-software-requirements-for-installing-sql-server-2019?view=sql-server-ver15

For a Positive Outlook to Discovering Emails, You Need a Closed Outlook: eDiscovery Best Practices

January 28, 2016

Does that statement seem confusing? Let me explain.

Let’s call this a “tip of the day”. As you may know, at CloudNine (shameless plug warning!), we have an automated processing capability for enabling clients to load and process their own data – they can use this capability to load their data into our review platform or they can even process data for loading into their own preferred review platform if they want. So, we can still help you even if you already use Relativity or a number of other popular platforms.

Regardless of that fact, most of our users are using the processing capability to process emails, usually from Outlook Personal Storage Table (PST) files. Let’s face it, despite increased volumes of social media and other types of electronically stored information, emails are still predominant in eDiscovery. And, for those users, we get one issue more than any other when it comes to processing those Outlook emails:

They still have Outlook open with the PST file opened when they attempt to upload that PST file or when they try to create a ZIP file containing the Outlook PST.

The resulting ZIP file that is created (either by the user or by our client application if the data is not already contained in an archive file) will almost invariably be corrupted or empty. Either way, this results in a failure during processing of the loaded data – because, that data is simply corrupt.

So, my tip of the day is this: Before attempting to create a ZIP (or RAR or other type of archive) of a PST file (or before you upload it to a platform like CloudNine for processing), make sure that Outlook is closed or at least that the PST file is closed within Outlook. For a positive outlook to discovering emails, you need a closed Outlook.

Does that make sense now? :o)

So, what do you think? Is email still the predominant source of discoverable ESI in your organization? Please share any comments you might have or if you’d like to know more about a particular topic.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine. eDiscovery Daily is made available by CloudNine solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscovery Daily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

Sometimes You May Need Turn to 34 Year Old Technology to Get the Job Done: eDiscovery Best Practices

April 20, 2015

If you’ve worked with computers for over three decades like I have, you remember some of the old ways we used computers to support litigation. Our colleague, Jane Gennarelli, covered some of those in her recent “Throwback Thursdays” series (here are the links to last year’s 12-part series: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12). But, a 34 year old software application can still be useful today.

Amy Bowser-Rollins’ excellent blog Litigation Support Guru is currently running a “Fast Tip Friday” series with videos containing fast tips (and tricks) for handling various litigation support tasks. Last Friday’s post, was titled Fast Tip Friday – Using DOS to Create File Listing.

“DOS” you say? Surely, you don’t mean venerable, old MS-DOS, which was originally introduced by Microsoft in 1981? Is that thing even still around?

Yes, it is. As, Amy demonstrates, even though we’re in the GUI age of Windows software, you can still get to DOS when you need to do so and it can still be useful to help generate file listings.

In the example that Amy walks through, she uses the DOS “dir” command (short for directory – in Windows, those are represented as folders in Windows Explorer) to generate a sample file listing. She uses parameters “/s” (to include all subdirectories within the current directory), /b (to use the “bare” format with no heading information) and “> filelisting.txt” (to write the results to a text file). She then demonstrates how you can load the resulting text file into Excel to work with your file listing.

There are parameters to show hidden or system files and to sort the files by any one of several sort options. You can also select specific files or types of files (e.g., all Excel files as “dir *.xlsx”).

File listings of directories in DOS can be useful for everything from an inventory of files to be processed or perhaps a control listing of files to be produced to perform a Quality Control check.

I have used DOS regularly to generate listing during the discovery process. In one project several years ago, I performed various searches on the corporation’s enterprise-wide document management repositories and downloaded the responsive files, then used DOS to generate control listings of each responsive set for verification and statistical analysis. Despite the fact that MS-DOS is 34 years old, it can still be useful in discovery.

Thanks, Amy, for the terrific “fast tip”!

So, what do you think? Do you use DOS to generate file listings for discovery, or any other purposes? Please share any comments you might have or if you’d like to know more about a particular topic.

Another Instance Where Word is Not So Smart – eDiscovery Best Practices

May 15, 2014

Way back within the first couple of months after this blog was launched, we discussed those stupid “smart quotes” in Microsoft® Word where Word, by default, automatically changes straight quotation marks ( ‘ or ” ) to curly quotes as you type. There’s another way where Word isn’t so smart, unless you know the workaround, which I just learned this week.

A couple of days ago, an unusual error was reported by one of the users of our review platform, OnDemand®. She was putting text into a field in her database and when she went back to that same database record, the text was altered a bit, to say the least. Here is what she was seeing (I’ll substitute a common typing sentence for her client proprietary text):

<span style=”font-size:11.0pt;line-height:115%; font-family:"Calibri","sans-serif";mso-ascii-theme-font:minor-latin;mso-fareast-font-family: Calibri;mso-fareast-theme-font:minor-latin;mso-hansi-theme-font:minor-latin; mso-bidi-font-family:"Times New Roman";mso-bidi-theme-font:minor-bidi; mso-ansi-language:EN-US;mso-fareast-language:EN-US;mso-bidi-language:AR-SA”>The quick brown fox jumped over the  lazy dog.

What a mess! Did you spot the sentence “The quick brown fox jumped over the lazy dog.” in there? Wasn’t easy, was it? Important text was bolded in red, so I simulated that by putting the last two words bolded in red as well.

It turns out that she was copying text from a Word document and pasting it into the Web form for the database field. It would look fine when she pasted it, but when she exited the database and logged back in (and returned to the specific record where she entered the text), the web form displayed all of the formatting that went with the text that she had copied. As often as people copy text from Word documents, I’m surprised the issue hasn’t come up before.

What to do? Copying the text to a plain text editor (like Notepad or Textpad) first would work as it would strip all of the formatting from the text. Copying the text from the text editor and then pasting it into the field gives you the text without the formatting. It’s a two-step process that I’ve used for years to copy text out of Word sans the formatting.

However, I learned a one-step approach from one of our OnDemand developers that I didn’t know about before. Instead of using Ctrl+V to paste text (after using Ctrl+C to copy it to the clipboard), use Ctrl+Shift+V to paste the text. You’ll get the pasted text without formatting and avoid the mess you see above. Thanks, Chris Maden!

So, what do you think? Do you have issues copying text from Word files? Please share any comments you might have or if you’d like to know more about a particular topic.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine Discovery. eDiscoveryDaily is made available by CloudNine Discovery solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscoveryDaily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.

Word’s Stupid “Smart Quotes” – Best of eDiscovery Best Practices

July 5, 2013

Even those of us at eDiscoveryDaily have to take an occasional vacation day; however, instead of “going dark” for today, we thought we would republish a post from the early days of the blog (when we didn’t have many readers yet). So, chances are, you haven’t seen this post yet! Enjoy!

I have run into this issue more times than I can count.

A client sends me a list of search terms that they want to use to cull a set of data for review in a Microsoft® Word document. I copy the terms into the search tool and then, all hell breaks loose!! Either:

The search indicates there is a syntax error

The search returns some obviously odd results

And, then, I remember…

It’s those stupid Word “smart quotes”. Starting with Office 2003, Microsoft Word, by default, automatically changes straight quotation marks ( ‘ or ” ) to curly quotes as you type. This is fine for display of a document in Word, but when you copy that text to a format that doesn’t support the smart quotes (such as HTML or a plain text editor), the quotes will show up as garbage characters because they are not supported ASCII characters. So:

“smart quotes”

will look like this…

âsmart quotesâ

As you can imagine, that doesn’t look so “smart” when you feed it into a search tool and you get odd results (if the search even runs). So, you’ll need to address those to make sure that the quotes are handled correctly when searching for phrases with your search tool.

To disable the automatic changing of quotes to Microsoft Word smart quotes: Click the Microsoft Office icon button at the top left of Word, and then click the Word Options button to open options for Word. Click Proofing along the side of the pop-up window, then click AutoCorrect Options. Click the AutoFormat tab and uncheck the Replace “Smart Quotes” with “Smart Quotes” check box. Then, click OK.

Often, however, the file you’ve received already has smart quotes in it. If you’re going to use the terms in that file, you’ll need to copy them to a text editor first – (e.g., Notepad or Wordpad – if Wordpad is in plain text document mode) should be fine. Highlight the beginning quote and copy it to the clipboard (Ctrl+C), then Ctrl+H to open up the Find and Replace dialog, put your cursor in the Find box and press Ctrl+V to paste it in. Type the “ character on the keyboard into the Replace box, then press Replace All to replace all beginning smart quotes with straight ones. Repeat the process for the ending smart quotes. You’ll also have to do this if you have any single quotes, double-hyphens, fraction characters (e.g., Word converts “1/2” to “½”) that impact your terms.

So, what do you think? Have you ever run into issues with Word smart quotes or other auto formatting options? Please share any comments you might have or if you’d like to know more about a particular topic.

eDiscovery Tips: Word’s Stupid “Smart Quotes”

November 24, 2010

I have run into this issue more times than I can count.

The search indicates there is a syntax error

The search returns some obviously odd results

And, then, I remember…

“smart quotes” aren’t very smart

will look like this…

âsmart quotesâ arenât very smart

To disable the automatic changing of quotes to Microsoft Word smart quotes: For Office 2007, click the Microsoft Office icon button at the top left of Word, and then click the Word Options button to open options for Word. Click Proofing along the side of the pop-up window, then click AutoCorrect Options. Click the AutoFormat tab and uncheck the Replace “Smart Quotes” with “Smart Quotes” check box. Then, click OK.

To replace Microsoft Word smart quotes already in a file: Often, however, the file you’ve received already has smart quotes in it. If you’re going to use the terms in that file, you’ll need to copy them to a text editor first – Notepad or Wordpad (if Wordpad is in plain text document mode) should be fine. Highlight the beginning quote and copy it to the clipboard (Ctrl+C), then Ctrl+H to open up the Find and Replace dialog, put your cursor in the Find box and press Ctrl+V to paste it in. Type the “ character on the keyboard into the Replace box, then press Replace All to replace all beginning smart quotes with straight ones. Repeat the process for the ending smart quotes. You’ll also have to do this if you have any single quotes, double-hyphens, fraction characters (e.g., Word converts “1/2” to “½”), etc. that impact your terms.

So, what do you think? Have you ever run into issues with Word smart quotes or other Word auto formatting options? Please share any comments you might have or if you’d like to know more about a particular topic.

From all of us at Trial Solutions…Have a Happy Thanksgiving!!

eDiscovery Searching 101: Sites for Common Misspellings

September 23, 2010

Yesterday, we talked about the importance to include misspellings when searching for relevant ESI to broaden the search to retrieve potentially responsive files that might be otherwise missed and the use of “fuzzy searching” (with a product like FirstPass™, powered by Venio FPR™ that supports this capability) to identify variations as potential misspellings within the collection. Another way to identify misspellings is to use a resource that tracks the most typical misspellings for common words.

Examples of Sites

At Dumbtionary.com, you can check words against a list of over 10,000 misspelled words. Simply type the correct word into the search box with a “plus” before it (e.g., “+management”) to get the common misspellings for that word. You can also search for misspelled names and places.

Wikipedia has a list of common misspellings as well. It breaks the list down by starting letter, as well as variations on 0-9 (e.g., “3pm” or “3 pm”). You can go to the starting letter you want to search, then do a “find” on the page (by pressing Ctrl+F) and type in the string to search.

Wrongspelled.com and Spellgood.net are two other examples of sites for searching for common misspellings. Not all sites have the same misspellings, so it’s good to check multiple sites to comprise a comprehensive list. Each site provides an ability to search for your terms and identify common misspellings for each, enabling you to broaden your search to include those variations and most of these sites are updated regularly with new common misspellings.

Using Fuzzy search or sites with typical misspellings for your terms is one method of ensuring a more diligent eDiscovery search process by retrieving additional “hits” that might otherwise be missed. Over the weeks to come, we’ll talk about others.

In the meantime, what do you think? Are you aware of other sites to find common misspellings? Please share any comments you might have or if you’d like to know more about a particular topic.

eDiscovery Searching 101: It's a Mistake to Ignore the Mistakes

September 22, 2010

How many times have you received an email sent to “All Employees” like this? “I am pleased to announce that Joe Smith has been promoted to the position of Operations Manger.”

Do you cringe when you see an email like that? I do. I cringe even more when the email comes from me, which happens more often than I’d like to admit.

Of course, we all make mistakes. And, forgetting that fact can be costly when searching for, or requesting, relevant documents in eDiscovery. For example, if you’re searching for e-mails that relate to management decisions, can you be certain that “management” is spelled perfectly throughout the collection? Unlikely. It could be spelled “managment” or “mangement” and you would miss those potentially critical emails without an effective plan to look for them.

Finding Misspellings Using Fuzzy Searching

How do you find them if you don’t know how they might be misspelled? Use a search tool like FirstPass™, powered by Venio FPR™ that supports “fuzzy” searching, which is a mechanism by finding alternate words that are close in spelling to the word you’re looking for (usually one or two characters off). FirstPass will display all of the words – in the collection – close to the word you’re looking for, so if you’re looking for someone named “Brian”, you can find variations such as “Bryan” or even “brain” – that could be relevant. Then, simply select the variations you wish to include in the search. Fuzzy searching is the best way to broaden your search to include potential misspellings and FirstPass provides a terrific capability to select possible misspellings to review additional potential “hits” in your collection.

The most popular TV series all use “cliffhangers” to keep the audience hooked, so tomorrow, I’ll talk about sites available to identify common misspellings for terms as another way to broaden searches to include mistakes. 🙂

In the meantime, what do you think? Do you have any real-world examples of how fuzzy searching has aided in eDiscovery search and retrieval? Please share any comments you might have or if you’d like to know more about a particular topic.

Top Tip