See How Cognitive Capture is Disrupting Legacy Document Processing
When you think about artificial intelligence and data integration, what's the first thing that comes to mind?
Maybe deep learning neural nets crunching away at big data sets?
Or aggregating data from dozens or hundreds of repositories and streaming it into analytics or business intelligence platforms? Or maybe predictive healthcare and extending human life?
I'm willing to bet you didn't think about document data integration. It's a challenge that the enterprise has always struggled with. Until now, solving the problem of integrating unstructured data from documents has been a very painful experience.
But that has changed. Intelligent document processing platforms now offer a streamlined approach that produce real results. The process isn't easy or for the faint of heart, but the technology has finally arrived.
But why now? What's the difference between long-standing capture tools and this new breed of cognitive document processing?
5 Core Innovations of Cognitive Document Processing That are Disrupting Work:
- New computer vision techniques help machines more than ever
- Visualized machine learning simplifies things
- Several enhancements maximizing OCR
- Fuzzy regular expression = easier data discovery
- Classification engines help machines understand data in new ways
For the sake of this article, I'm going to call a few different categories of things documents:
- Microfilm / microfiche
- Scanned paper of any kind
- Electronic documents (word processor, email, PDF, etc.)
- Text files
Many industries rely on documents for important workflows:
- In healthcare, large text files communicate explanation of benefits, email contains retrospective denials. Then there's claims, and the list goes on and on.
- In manufacturing and supply chain, there's an endless amount of logistics documents.
- Legal runs on paper, and so does the oilfield.
- Education processes transcripts at a dizzying rate.
- Financial institutions process everything from mortgage and loan documents to personal checks.
- And you know that Government loves its forms...
Organizations in nearly every industry are saturated with paper, and storing millions of archived records (and all this data is a literal gold mine).
Tech Finally Catches Up to Paper
But why the renewed focus on documents? The reason is that technology has finally caught up. For decades, data contained on paper has been extremely difficult to integrate.
While it's true that tools have existed for setting up rigid templates that "know" where certain data is on a document, their use is extremely limited.
In the real world, these templates have caused a lot of suffering because of how fragile they are. If a word or number is just outside of where the template is looking, another template must be created to find it. This is hardly scalable.
Cognitive Document Processing: The 5 Critical Features
1. New Computer Vision Techniques Help Machines More than Ever
Computer vision (CV) is the technology responsible for making scanned documents machine-readable. While all non-text artifacts on a document are no problem for a human to read past, they cause many problems for machines.
Humans understand that a hole punch is not a word, and stamps, lines, barcodes, and images are all just there to support the intent of the document.
But these non-text elements cause big problems for optical character recognition.
Improved OCR Accuracy and Handwriting Recognition
OCR is only as good as the document image it runs on. Modern analytics and business intelligence platforms (and neural nets) all require very accurate (and labeled) data. Traditional OCR's low accuracy doesn't produce acceptable data. This is one of the reasons quality cognitive document processing has been difficult to achieve.
New CV algorithms paired with advanced hardware acceleration enables near-100% OCR accuracy using both new and traditional OCR engines.
And handwriting? New advances in computer vision now enable robust handwriting recognition that streams even more information from documents.
2. Visualized Machine Learning Simplifies Things
A new approach to machine learning and classification sheds light on the complicated algorithms doing the heavy lifting. Solutions using this tech provide a user interface which reveals trained data in a way that is easy to understand. This visualization framework automates human understanding of otherwise hidden algorithms.
The design philosophy behind this approach is that subject matter experts understand their data better than anyone else. As a result, automating their understanding of how A.I. is operating is both easier and achieves better results than a "dark" machine learning model.
This kind of transparency is based on the belief that a subject matter expert will always be able to make better decisions on data than "hidden" A.I.
3. Several Enhancements Maximizing OCR
As previously mentioned, traditional OCR engines need help for maximum performance. Several key OCR innovations are at the core of intelligent document processing platforms:
Iterative OCR is a technique that captures text missed by an OCR engine after processing text on a document. As the name suggests, OCR is run multiple times.
The key innovation is that accurately recognized text is automatically removed from the document image before performing additional OCR passes. Less distractions makes processing remaining text easier.
Cellular validation is a technique designed to deal with the challenges caused by text split into columns, arranged in offset patterns, or by differing font types and sizes.
The key innovation is that the document image is split into appropriate grids to allow the OCR engine to process each section independently.
Bound Region Detection
Bound region detection enables OCR to focus on just the text within "boxes." Because traditional OCR engines read a document from top to bottom, and left to right, the text in tables is recognized, but out of sequence.
This innovation provides technology with a deep understanding of document structure, and how text inside a box is related to "normal" text on the page.
Layered OCR is a technique designed to process documents with multiple font types, including handwriting. Some types of documents, like checks have been difficult to process.
Layered OCR is an innovative approach because it is designed to run multiple, specific OCR engines until the desired accuracy is achieved.
What happens with OCR results that are less than ideal? OCR synthesis is an innovation that reprocesses OCR results that have a low accuracy confidence score.
Because a confidence rating is assigned to each individual character, groups of characters with low confidence are automatically identified and OCR'd again.
4. Fuzzy Regular Expression = Easier Data Discovery
Regular expressions (RegEx) have been used to process text since the 1950's.
Modern data science tools have enabled a new kind of RegEx that allows for less literal character matches. In fact, Fuzzy RegEx enables true machine reading by providing a more organic understanding of text.
The way this innovation works is by "fuzzy matching" results to lexicons and external data sources by using weighted accuracy thresholds. Machines now return results that are "close to" what a user is searching for and that is extremely valuable in discovering data.
5. Classification Engines Help Machines Understand in New Ways
Automating document classification is a critical step for accurate data integration.
In many real-world scenarios, documents are not always stored in the proper sequence, or manually separated by type. Humans have no problem looking at a document and understanding the context of the information.
BUT, if we expect a machine to read and integrate data from documents, creating an understanding of the intent of the document is necessary.
Classification engines use machine learning or rules-based logic to recognize and assign a document type to a page, or a group of pages in a document. Here are three types of classification techniques:
Natural language processing looks at the text of the whole document to interpret context.
The classification engine uses key words or features that identify a document, like a title, section heading, or any specific data element.
Computer vision analyzes the visual structure of a document without using OCR to determine document type.
Advances in machine learning enable users to train cognitive document processing systems in a visual interface to see exactly how the machine is learning. This makes classifying new document types or troubleshooting problems extremely easy.
So what are organizations doing with these new cognitive processing innovations?
Among the many improvements, there is:
- Massive disruption in healthcare
- The paper-based digital oilfield is seeing massive efficiency gains
- Financial institutions now have rapid document processing solutions
- Government innovations are unlocking historical data
- and the list goes on and on.
Unlocking data trapped in documents is disrupting traditional business through incredible efficiency gains and deep operational insight.
Ready to start your document data integration journey?
This article was updated 12/2/2020.