Intelligent Data Extraction (OCR)

Convert unstructured text into extractable and searchable data in an instant

In today’s world, where instant data access, business intelligence, security, and efficiency are critical to success, many companies are realizing that valuable data is trapped in their documents. These documents could be paper, email or standard electronic office documents. The data contained in these documents must be manually read, tracked, routed, processed and reported upon. In fact, more than 80% of information is trapped in unstructured content. This means that only 20% of data is structured and can be easily searched and retrieved from relational databases.

If your company faces any of the following
hurdles due to heavy document volumes,
our Smart Capture solution can help

01/

Slow, manual processes

02/

High costs of ink and paper for printing

03/

Need for physical storage space for physical files

04/

Time-consuming efforts to manual sort and file documents

05/

Manual use of barcodes, document preparation, and other labor-intensive processes

06/

Ability to scan files, but unable to search for specific data in them

07/

Invoice data that can't be easily matched to POs or receiving documents

08/

Difficulty predicting trends or analyzing customer data

09/

Risk of human error

Document capture technology is not new, but the industry has advanced with innovative tools and functionality that allows businesses to do much more than simply scan documents. Now, technology automation enables businesses to classify, learn and extract meaning from their documents. Through automation, we can leverage and organize all data, both structured and unstructured.

 

The bottom line is that smart document capture technology is important not only for gaining efficiencies and reducing operating costs but through classification and data extraction, it can lead to better business processes.

Smart Capture Workflow Overview

1. Ingestion

There are multiple ways to capture data: scanners, multi-function peripherals (MFPs), UNC folders (network folders), fax, email, content services or document repositories, mobile devises or through an outsourced business process organization (BPO).

2. Image Processing

Documents and images are normalized, cleaned up and rotated in preparation for classification. The system applies despeckle and deskew filters to improve image quality. The resulting document can then be identified, and the data can be easily extracted.

3. Classification

This is where the system determines what type of document it ingested through Optical Character Recognition (OCR), Intelligent Character Recognition (ICR) and/or Optical Mark Recognition (OMR). This step will determine if a document is, for example, an invoice, patient record, loan file, or tax record. An advanced document capture system only needs one or two samples, so it can “learn” to classify the documents; Shamrock Solutions accomplishes this via patented, supervised machine learning algorithms. The system uses a variety of technologies to classify the data: search content, images, bar codes and one document merging. If the system has low confidence in any document it attempts to classify, the processes can call upon a human operator for confirmation.

4. Extraction

This is the process of identifying metadata within the documents. Metadata is a set of data that describes and gives information about other data. In the case of documents, metadata can be used to organize, find and/or feed documents into another type of business system. The system is set up to extract the data based on business rules and information that a company needs through database lookups and fuzzy logic.

5. Validation

If there are any documents that fall below pre-set tolerance levels, they are highlighted for human review. For example, this can happen when there are smudges, spills, blurry characters or possibly missing fields. The system alerts you to these documents for manual verification and correction.

6. Export & Deliver

Once all documents have been validated, the documents and data are moved to a repository or other line of business system. The exported documents and data can be stored on a local server or cloud-based storage, like Alfresco, Box or SAP.

Shamrock Solutions Whitepapers

Intelligent Data Extraction (OCR) Solution Overview

Shamrock Smart Capture

eBook

Mailroom Automation

Solution Guide

Multitenant Architecture

Best Practices

Accounting Solutions

Guide

Let's talk about your data extraction needs.