How to Take OCR Optical Character Recognition to Next Level

by Brad Blood | February 3, 2020

OCR - Optical Character Recognition - has been around a long time.

So long, in fact – it’s become mired in confusion and mismatched expectations. Most organizations still use a lot of paper and PDF documents. While some industries are particularly drawn to paper, humanity is a long way off from eradicating the exchange of information through documents and forms.

Enterprise OCR provided by “OCR Engines” from vendors such as Tesseract, ABBYY, OmniPage, AnyDoc, Transym, Azure, Google, and others are only a small part of what organizations really need.

[Skip to Video]

I talk to people every day who are looking for better OCR optical results. What they’re really looking for is a better way to get accurate information from data trapped in documents.

Discover the Most Accurate OCR

BUT it’s only possible with 2 things:

  1. Dealing with OCR killers
  2. Machine reading

Imperfect Documents Kill OCR

Take a skewed document image, for example.

ocr scanningIndependent research and our own observations prove you’ll get something like 40% recognition accuracy. And the reality is that you’ll probably get less, which is pretty much worthless. Errors in dates, numbers, amounts, names, etc. make trusting the data difficult.

Document skewing isn’t the only OCR killer. Poor scan quality, hole punches, pictures, data in tables, mismatched font types, and text that spans pages all wreak havoc on OCR. And these are just some of the complications. If the solution is manual human review – that’s just too time-consuming and expensive for projects with hundreds of thousands or millions of pages.

Machine Reading

Machine reading is what OCR was (supposed to be) when it was first mainstream. It’s innovative because it's the result of powerful data sciences and image processing tools that strike back at the OCR killers.

Machine reading solutions achieve better than twice-as-accurate OCR, and intelligent integration of information contained within documents. It’s the heart of modern data integration from physical and electronic documents.

By adding powerful document image pre-processing, and new data sciences innovations, organizations are doing more with data locked in documents than they ever thought possible.

Intelligent document processing platforms enable almost-magical machine reading by providing unique image processing activities before applying OCR.

By feeding traditional OCR engines properly formatted documents, almost-magical results are possible. See for yourself on this short demonstration using OCR to classify documents:

 

Discover the Most Accurate OCR



4 Steps to Achieving Wisdom You can Use at Work Today

4 Steps to Achieving Wisdom You can Use at Work Today

How to create an Information as a Second Language program. [Free Guide]

4 Steps to Achieving Wisdom You can Use at Work Today

4 Steps to Achieving Wisdom You can Use at Work Today

We are proud to announce that Grooper software, as well as all software products under the BIS brand, is 100% Made in the USA. Every line of code, every feature, and every update stems from our dedicated team working diligently at our Oklahoma City headquarters. Additionally, our support services are exclusively provided by local talent based in our Headquarters office, ensuring that you receive firsthand, quality assistance every time. Our unwavering commitment to local expertise emphasizes our dedication to top-tier quality and innovation. Thank you for your continued trust in our homegrown solutions.