Predictive Coding: Powerful, Pervasive, Court Sanctioned and Here to Stay

By admin | July 9, 2013 |

by Alex Lubarsky, LL.M., Esq. | Director, TERIS

Two major concerns with e-Discovery are the costs and time involved. Since there’s often a staggering amount of electronic information to sort through, it can be cumbersome to determine what’s relevant and what isn’t. Predictive coding is a cutting-edge, time-saving — and best of all, cost-effective method for narrowing down electronic information to find material that’s relevant.

A common misconception is that Predictive Coding is meant to supplant document review. However, and somewhat paradoxically, Predictive Coding begins and ends with review. There’s even review in-between. In fact, what makes Predictive Coding different is that the bulk of the review is accomplished by technology that amplifies and extrapolates upon expert human review decisions against a much larger data set than was initially reviewed.

A quick rundown of the process: To start, an attorney or expert with extensive subject-matter expertise reviews a small subset of the entire data corpus that needs to be reviewed. During this initial review phase, the expert identifies and classifies relevant documents. The resulting relevant documents are bucketed into a group often referred to as a “seed set.” The seed set is then fed back into the Predictive Coding engine, which uses an algorithm to “learn” why each document is relevant. The engine then crawls the remainder of the document set and scores each document according to how close it matches the relevancy of the seed set documents.

Once this first round of classification is complete, the expert review team scores how well the machine learned what makes a document relevant. After all, even machines aren’t infallible. This evaluation is accomplished in an iterative fashion: the expert reviewers are fed a sample of classified documents and they make a determination as to whether the machine scored them properly. If the level of accuracy is not acceptable — that is, if the expert reviewers are overturning machine learned decisions more than a predetermined percentage of time, say 5%, then more scoring of random documents is needed.

The whole process, from sample set review to machine classification to evaluation, is repeated until the machine has learned what it needs to learn about relevance. In this way, Predictive Coding allows for what normal linear search and review does not: self-iterative refinement. This refinement process is repeated until the accuracy of the tool’s categorization reaches an acceptable level. Upon reaching an acceptable percentage of accuracy, the algorithm assigns predictive categorization to all documents in the review set. The loop is complete.

This all sounds good, but is Predictive Coding accurate and it effective? According to Forbes magazine, Predictive Coding can be used to reduce review between five and 20 percent.

Predictive coding is also surprisingly accurate. In an opinion issued in Monique da Silva Moore vs. Publicis Group, 11 Civ. 1279 (ALC) (AJP); 2012 U.S. Dist. LEXIS 23350 (S.D.N.Y. Feb. 24, 2012), Judge Andrew Peck wrote that predictive coding “works better than most of the alternatives, if not all of the alternatives.” It’s important to note that Judge Andrew Carter upheld Judge Peck’s ruling in Da Sliva Moore with the appellate court stating that Judge Carter’s decision was well reasoned. Da Silva Moore v. Publicis Groupe SA, No. 11 Civ. 1279 (ALC) (AJP) (S.D.N.Y. Nov. 8, 2012.)

What does this mean for the law firm or corporation looking to save money on review costs? Simply put, Predictive Coding is a valid process for identifying relevant documents in a streamlined, accurate, and cost-effective manner. To highlight but one compelling use-case for Predictive Coding, consider regulatory enforcement proceedings. Because of the relatively short timelines involved when preparing for an administrative hearing, Predictive Coding is a truly compelling method for quickly and accurately bringing relevant documents to bear, especially when faced with a daunting amount of data.

Posted in Blog Posts

Leave a Comment Cancel Reply