Keyword Searching & Text Indexing

The document review phase during a case can seem like a near impossible task. What looks like an endless pit of data is made even more complex through the mass amounts of words, phrases, characters and other searchable content that could in turn be possible evidence. Although it may seem complicated, once understood the basis for search methodology is relativity straight forward. Once common terms such as boolean operators, wild cards, and full text searches are known. It becomes easier to gain a grasp and control over your data.

Here are a few common terms followed by an educational resource to help assist in building a base knowledge of some of the more common search and index terms used in the eDiscovery, digital forensics, and review hosting industries.

Boolean Searching – The methodology of searching through a set of combining words, known as boolean operators, to build your search.

Boolean Operators – The five common boolean operators are: AND | OR | NOT | “Quotation Marks ” | ( Parentheses)

Wildcard – Wildcards are a variation of boolean operators being that they can represent multiple values, acting as a place holder. A common wildcard used in the index searching tools is (*) to match numbers or characters. Wildcards can also be used to expand keywords into catching alternative spellings.

Stemming / Truncation – Expansion of data through gathering the various potential strings and variations of a keyword that are derived from the same base room or stem. This is commonly used within boolean and other search methods to map multiple variations of a phrase.

SQL Server – SQL is a relational database owned and maintained by Microsoft. It primarily serves as a reference database that stores and retrieves data requested by outside applications. Within SQL is a complex engine that connects commands and queries with stored database files, tables, pages, indexes, and more.

Full Text Search – Full text search works through utilizing text indexes. Full text search is the use of leveraging full-text queries against the character derived data. This can range from a single word being used as the query to a combination of multiple forms of a phrase.

Text Index – Text indexes are the backbone of a search. The query uses the text index to connect with matches in the larger database. If the text index acts essentially as a key, then it is important to acknowledge that outdated search and text indexes can lead to missed hits.

Saved Searches – the method of using effective and pre-mapped queries as a repeatable process across multiple efforts. Saving searchs is a scalable way to apply a standard set of search queries across multiple custodians.

For a more in-depth look at these terms used in scenarios you can check out the EDRM’s Search Methodologies which is a great resource when looking to go deeper into search terms and best practices with eDiscovery.

