The sheer volume of data is exploding, so much so that a puny one gigabyte of storage on one lowly flash drive could produce up to 100,000 pages of printed material. The data generated worldwide already has surpassed enough printed pages to comprehend.
The greatest percentage of all that data is created in electronic formats. Worse yet, a large percentage of any organization’s data has a high likelihood of containing redundant, outdated or trivial data that has no relevant value. The data explosion makes data analytics no longer a luxury for attorneys involved in litigation, but an absolute necessity. With analytics lawyers can take fewer actions and still get more results.
If you’re not using methods/technology on the front end to give you information about your data and if we go back to statistics that imply the majority of corporate data is, either riddled with ROT (redundant, outdated or trivial), or not structured in a way that allows for precise recall, you find yourself in a poor position to best control related expenses, budget, make strategic decisions and defend one of your most valuable corporate assets—your data.
Three common ‘buckets’ of discovery analytics
Three common types, or buckets, of discovery analytics for attorneys to consider, either alone or in combinations include:
- Structured analytics. These tools group documents based on their similarities in text and organization, putting a basic structure around and providing information about the unknown data. Grouping email threads and identifying near duplicates are examples of structured analytics. Think of it as a basic portal to information, providing statistics about and organization for unorganized data.
- Concept analytics. These tools expand beyond typical keyword searches to retrieve related items,
helping attorneys find the meaning of content within a dataset. Concept clustering, concept searching, and keyword expansion are examples of concept analytics. Such analytics are helpful in positioning a legal case, understanding potentially relevant material and determining a more accurate budget that will minimize costs while keeping a case defensible.
- Predictive analytics. These tools start with an attorney weeding out unneeded documents from a subset of documents, then training the computer, usually in several iterations, to mimic the decisions and statistically “predict” how to properly code the rest of the documents. Such analytics are especially useful with very large data sets, where putting attorneys’ eyes on everything would be cost-prohibitive.
Using such analytics, attorneys can obtain dashboard-like overviews including the number of documents that are originals, duplicates or near-duplicates; the top terms in the documents; and the top senders and recipients of documents. Analytics also can help attorneys zoom in or zoom out for data-map views of the results, just as Google Earth users can zoom in on specific geographic locations or zoom out to see a broader, less detailed context.
The bottom line is, the brute-force approach is no longer feasible. The cost savings in discovery and the success in defensibility make analytics a necessity, not a luxury, today.