eDiscovery Best Practices: Database Discovery Pop Quiz ANSWERS

January 10, 2011

So, how did you do? Did you know all the answers from Friday’s post – without “googling” them? 😉

Here are the answers – enjoy!

What is a “Primary Key”? The primary key of a relational table uniquely identifies each record in the table. It can be a normal attribute that you expect to be unique (e.g., Social Security Number); however, it’s usually best to be a sequential ID generated by the Database Management System (DBMS).

What is an “Inner Join” and how does it differ from an “Outer Join”? An inner join is the most common join operation used in applications, creating a new result table by combining column values of two tables. An outer join does not require each record in the two joined tables to have a matching record. The joined table retains each record in one of the tables – even if no other matching record exists. Sometimes, there is a reason to keep all of the records in one table in your result, such as a list of all employees, whether or not they participate in the company’s benefits program.

What is “Normalization”? Normalization is the process of organizing data to minimize redundancy of that data. Normalization involves organizing a database into multiple tables and defining relationships between the tables.

How does a “View” differ from a “Table”? A view is a virtual table that consists of columns from one or more tables. Though it is similar to a table, it is a query stored as an object.

What does “BLOB” stand for? A Binary Large OBject (BLOB) is a collection of binary data stored as a single entity in a database management system. BLOBs are typically images or other multimedia objects, though sometimes binary executable code is stored as a blob. So, if you’re not including databases in your discovery collection process, you could also be missing documents stored as BLOBs. BTW, if you didn’t click on the link next to the BLOB question in Friday’s blog, it takes you to the amusing trailer for the 1958 movie, The Blob, starring a young Steve McQueen (so early in his career, he was billed as “Steven McQueen”).

What is the different between a “flat file” and a “relational” database? A flat file database is a database designed around a single table, like a spreadsheet. The flat file design puts all database information in one table, or list, with fields to represent all parameters. A flat file is prone to considerable duplicate data, as each value is repeated for each item. A relational database, on the other hand, incorporates multiple tables with methods (such as normalization and inner and outer joins, defined above) to store data efficiently and minimize duplication.

What is a “Trigger”? A trigger is a procedure which is automatically executed in response to certain events in a database and is typically used for keeping the integrity of the information in the database. For example, when a new record (for a new employee) is added to the employees table, a trigger might create new records in the taxes, vacations, and salaries tables.

What is “Rollback”? A rollback is the undoing of partly completed database changes when a database transaction is determined to have failed, thus returning the database to its previous state before the transaction began. Rollbacks help ensure database integrity by enabling the database to be restored to a clean copy after erroneous operations are performed or database server crashes occur.

What is “Referential Integrity”? Referential integrity ensures that relationships between tables remain consistent. When one table has a foreign key to another table, referential integrity ensures that a record is not added to the table that contains the foreign key unless there is a corresponding record in the linked table. Many databases use cascading updates and cascading deletes to ensure that changes made to the linked table are reflected in the primary table.

Why is a “Cartesian Product” in SQL almost always a bad thing? A Cartesian Product occurs in SQL when a join condition (via a WHERE clause in a SQL statement) is omitted, causing all combinations of records from two or more tables to be displayed. For example, when you go to the Department of Motor Vehicles (DMV) to pay your vehicle registration, they use a database with an Owners and a Vehicles table joined together to determine for which vehicle(s) you need to pay taxes. Without that join condition, you would have a Cartesian Product and every vehicle in the state would show up as registered to you – that’s a lot of taxes to pay!

If you didn’t know the answers to most of these questions, you’re not alone. But, to effectively provide the information within a database responsive to an eDiscovery request, knowledge of databases at this level is often necessary to collect and produce the appropriate information. As Craig Ball noted in his Law.com article Ubiquitous Databases, “Get the geeks together, and get out of their way”. Hey, I resemble that remark!

So, what do you think? Did you learn anything? Please share any comments you might have or if you’d like to know more about a particular topic.

eDiscovery Daily Blog