Intelligent Data Extraction (OCR)
Convert unstructured text into extractable and searchable data in an instant
In today’s world, where instant data access, business intelligence, security, and efficiency are critical to success, many companies are realizing that valuable data is trapped in their documents. These documents could be paper, email or standard electronic office documents. The data contained in these documents must be manually read, tracked, routed, processed and reported upon. In fact, more than 80% of information is trapped in unstructured content. This means that only 20% of data is structured and can be easily searched and retrieved from relational databases.
If your company faces any of the following
hurdles due to heavy document volumes,
our Smart Capture solution can help
Slow, manual processes
High costs of ink and paper for printing
Need for physical storage space for physical files
Time-consuming efforts to manual sort and file documents
Manual use of barcodes, document preparation, and other labor-intensive processes
Ability to scan files, but unable to search for specific data in them
Invoice data that can't be easily matched to POs or receiving documents
Difficulty predicting trends or analyzing customer data
Risk of human error
Document capture technology is not new, but the industry has advanced with innovative tools and functionality that allows businesses to do much more than simply scan documents. Now, technology automation enables businesses to classify, learn and extract meaning from their documents. Through automation, we can leverage and organize all data, both structured and unstructured.
The bottom line is that smart document capture technology is important not only for gaining efficiencies and reducing operating costs but through classification and data extraction, it can lead to better business processes.
Smart Capture Workflow Overview
There are multiple ways to capture data: scanners, multi-function peripherals (MFPs), UNC folders (network folders), fax, email, content services or document repositories, mobile devises or through an outsourced business process organization (BPO).
2. Image Processing
Documents and images are normalized, cleaned up and rotated in preparation for classification. The system applies despeckle and deskew filters to improve image quality. The resulting document can then be identified, and the data can be easily extracted.
This is where the system determines what type of document it ingested through Optical Character Recognition (OCR), Intelligent Character Recognition (ICR) and/or Optical Mark Recognition (OMR). This step will determine if a document is, for example, an invoice, patient record, loan file, or tax record. An advanced document capture system only needs one or two samples, so it can “learn” to classify the documents; Shamrock Solutions accomplishes this via patented, supervised machine learning algorithms. The system uses a variety of technologies to classify the data: search content, images, bar codes and one document merging. If the system has low confidence in any document it attempts to classify, the processes can call upon a human operator for confirmation.
This is the process of identifying metadata within the documents. Metadata is a set of data that describes and gives information about other data. In the case of documents, metadata can be used to organize, find and/or feed documents into another type of business system. The system is set up to extract the data based on business rules and information that a company needs through database lookups and fuzzy logic.
If there are any documents that fall below pre-set tolerance levels, they are highlighted for human review. For example, this can happen when there are smudges, spills, blurry characters or possibly missing fields. The system alerts you to these documents for manual verification and correction.
6. Export & Deliver
Once all documents have been validated, the documents and data are moved to a repository or other line of business system. The exported documents and data can be stored on a local server or cloud-based storage, like Alfresco, Box or SAP.