eDiscovery Daily Blog
You May Soon Be Told to “Go Jump in a Lake” for Your ESI: eDiscovery Trends
A data lake, that is. So, what is it and why should you care? Let’s take a look.
Leave it to Rob Robinson and his excellent Complex Discovery blog to provide links to several useful articles to help better understand data lakes and the potential they have to impact the business world (which, in turn, impacts the eDiscovery world). Here’s one example:
In this article in BizTech (Data Lakes Prove Key to Modern Data Platforms, written by Jennifer Zaino), the author defines data lakes as “stor[ing] data of any type in its raw form, much as a real lake provides a habitat where all types of creatures can live together.
A data lake is an architecture for storing high-volume, high-velocity, high-variety, as-is data in a centralized repository for Big Data and real-time analytics. And the technology is an attention-getter: The global data lakes market is expected to grow at a rate of 28 percent between 2017 and 2023.
Companies can pull in vast amounts of data — structured, semistructured and unstructured — in real time into a data lake, from anywhere. Data can be ingested from Internet of Things sensors, clickstream activity on a website, log files, social media feeds, videos and online transaction processing (OLTP) systems, for instance. There are no constraints on where the data hails from, but it’s a good idea to use metadata tagging to add some level of organization to what’s ingested, so that relevant data can be surfaced for queries and analysis.”
“To ensure that a lake doesn’t become a swamp, it’s very helpful to provide a catalog that makes data visible and accessible to the business, as well as to IT and data management professionals,” says Doug Henschen, vice president and principal analyst at Constellation Research.
The author also advises not to confuse data lakes (which store raw data) with data warehouses (which store current and historical data in an organized fashion).
Data warehouses are best for analyzing structured data quickly and with great accuracy and transparency for managerial or regulatory purposes. Meanwhile, data lakes are primed for experimentation, explains Kelle O’Neal, founder and CEO of management consulting firm First San Francisco Partners.
With a data lake, businesses can quickly load a variety of data types from multiple sources and engage in ad hoc analysis. Or, a data team could leverage machine learning in a data lake to find “a needle in a haystack,” O’Neal says.
Data warehouses follow a “schema on write” approach, which entails defining a schema for data before being able to write it to the database. Online analytical processing (OLAP) technology can be used to analyze and evaluate data in a warehouse, enabling fast responses to complex analytical queries.
Data lakes take a “schema on read” approach, where the data is structured and transformed only when it is ready to be used. For this reason, it’s a snap to bring in new data sources, and users don’t have to know in advance the questions they want to answer. With lakes, “different types of analytics on your data — like SQL queries, Big Data analytics, full-text search, real-time analytics and machine learning — can be used to uncover insights,” according to Amazon. Moreover, data lakes are capable of real-time actions based on algorithm-driven analytics.
Businesses may use both data lakes and data warehouses. The decision about which to use turns on “understanding and optimizing what the different solutions do best,” O’Neal says.
Want to know more – a lot more – about data lakes? Check out Rob’s post here with links to several other articles as well.
So, what do you think? Has your organization learned to “fish” from data lakes yet? Please share any comments you might have or if you’d like to know more about a particular topic.
Sponsor: This blog is sponsored by CloudNine, which is a data and legal discovery technology company with proven expertise in simplifying and automating the discovery of data for audits, investigations, and litigation. Used by legal and business customers worldwide including more than 50 of the top 250 Am Law firms and many of the world’s leading corporations, CloudNine’s eDiscovery automation software and services help customers gain insight and intelligence on electronic data.
Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine. eDiscovery Daily is made available by CloudNine solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscovery Daily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.