ESI: Why a Keyword Search isn't Always the Best Option
Those of us who use Google or Yahoo everyday to get information on the best cameras, or how to remove stains from clothing, or what tourist attractions there are in Iowa come up with satisfactory results. We get pages of relevant sites in a subject we’re curious about and presto – the search engines tell us what we need to know.
But…if you’re a lawyer getting ready for eDiscovery and the case you’re working on involves voluminous stacks of ESI, you may not find the search engines to be an ally. Not when, say, you get THREE MILLION HITS, many of which are what is nicely referred to in legal circles as non-responsive documents (and not nicely, but perhaps more accurately, referred to as total crap).
So what can you do? Grossman and Sweeney, in their article What Lawyers Need to Know About Search Tools, advise you to use a spade instead of a crane.
Crane Concept
By crane, they refer to other search tools that could return a higher rate of responsive documents than a simple keyword search. Instead of keywords, the search relies on concepts The argument is that keyword or Boolean searches are based on individual words which different researchersmay use differently (or misspell), while concept search models have the capability to produce more relevant documents.
To illustrate this difference: if you’re looking for dog food, you’d obviously type “dog food” or “where to buy dog food” or “healthy dog food.” Concept search casts, however, case a wider net because they include synonyms, antonyms and other related words.
Concept search models include:
-
Taxonomies and ontologies – Grossman and Sweeney describe these models as “thesauruses on steroids.” They integrate classes and sub-classes of words with hierarchical relationships.
-
Mathematical and statistical models – instead of words, these models use complex mathematical analysis.
-
Latent semantic indexing – this model also incorporates a mathematical technique called principal component analysis. It groups words and a combination of words with similar meanings, something akin to a thesaurus and dictionary rolled into one.
-
Machine learning tools – the authors cited the use of “seed sets” of documents. They work on the basis of relevance ranking of documents where a document is tagged either as “most likely” or “least likely” to be relevant.
There’s a caveat though: concept search models may not work as effectively for small amounts of data. And they may not work where the whole body of ESI does not include documents that may be important to a case.
Refining ESI Searches
So what’s the best option? The authors point to common sense: use the right tools and methods that match your objectives. Consideration should be given to combining keyword search with any of the concept search models, or to combine a concept search model with information derived from metadata (i.e. by date, by custodian, by recipient, etc).
Grossman and Sweeney also mention the Trek Legal Track, which studies methods for more efficient eDiscovery searches and conducts workshops on information retrieval for eDiscovery. To give you an idea of its activities, go here.
Whatever search strategy lawyers decide on, they shouldn’t hesitate to request expertise, the authors say. They add, “There is simply no substitute for careful planning, informed legal judgment, and appropriate quality control, especially when timelines and budgets are tight and stakes are high.”