Improving Ephesoft Performance and Maintainability with Many Document Types with the Same Fields

A common situation with Ephesoft projects is taking documents for an entire department of a business (HR, Financial Aid, Insurance Correspondence, etc), classifying, extracting identifying information, and then matching that document to a record in a database. With projects like these, all the document types typically have the same fields and utilize the same database for fuzzy and non-fuzzy lookups.

The drawbacks arise because of duplicated configuration. The same fuzzy DB, validation, and export configuration will be duplicated for every document type which makes any sort of configuration change a time-consuming annoyance. The Fuzzy DB indexes are also duplicated for each document type which can take up some serious drive space. Another drawback is that Ephesoft has no way to tell it that field X on document type A is the same as field X on document type B, so if a user changes a document type on the Validation screen all fields will be wiped out.

Luckily there is a way to work around these headaches to improve both the admin and operator experiences.

Create a Generic Document Type

Start by creating a generic document type that contains all the fields from the other document types as well as a Document Type field. The document type field will be a List type and contain all the document types in the batch class minus the generic document type (the list can be created manually or via script).

Inside the generic document type define your Fuzzy DB, formation conversion, validation, export and any other shared configuration that applies to all document types. Going forward you will only have to edit the configuration in this document type instead of all document types in the Batch Class. If not all of your document types utilize the same data for the database lookup, create a generic document type for each database connection.

Once the generic document type is configured, you can delete the Fuzzy DB configuration in all the other document types so that you aren't wasting resources rebuilding the same index for every document type. Other configuration like Validation or Format Conversion can be left as-is even though it won't be utilized or can be removed to prevent confusion in the future.

Add Code to a Script

You'll then want to add code to a script (usually ScriptExtraction) to change the document type to the generic document and set the document type field to the original document type (or description) for each document in the batch. The position of this script in the workflow is important. You want the script to run after KV Extraction and any other plugins that utilize configuration unique to each document type, but before any steps configured in the generic document type. Typically, the script should run after KV Extraction so that when the Format Conversion, Fuzzy DB, and Automated Regex Validation plugins run they will use the configuration inside the generic document type.

Depending on the type of export you are doing, you may also need to update the ScriptExport to set the document type back in the XML. Otherwise, if you are doing something custom, you can just map the document type field's value to the document type in the customer's CMS.

Train the Users

It could be a little confusing to the users that they will use the normal document type drop-down to change document types on the Review screen, but a different document type drop-down on the Validation screen. You'll want to enforce to your users that the default document type drop-down will wipe out the data if changed on the Validation screen, whereas the custom document type field will allow them to be able to change the document type from the Validation screen without losing any data.

Jordan Hotmann, Developer