Optical Character Recognition (OCR) is a widely-used technology to create text from images. Typically, OCR technology has been used for business documents, but it's original use was to help the blind read type-written pages.
Today, OCR is built into most smartphones and integrated almost seamlessly into many every day products.
But many people experience slow OCR when capturing data off business documents. Why is OCR so slow? What are practical, easy things you can do to get faster OCR? Get the answers to your questions below:
In the business world, OCR is still largely used to pull text out of important documents to help automate business processes. Manual data entry processes can be drastically reduced by using OCR.
But OCR alone is rarely the answer. That's because there needs to be extra work done to an OCR'd page to really help automate a process.
We like to say OCR is the Lowest Common Denominator of a system, the LCD. That's where document automation starts, by getting text from an image.
In this guide, you will discover:
Download Now:
OCR is one of the original machine learning algorithms. And because of the nature of machine learning being very compute-heavy — even to this day — OCR will take 100% of the CPU thread it is running on.
So making OCR "faster" can be a bit of a challenge. But there are certain things you can do on any system to increase OCR's throughput.
Here's a quick list, but it's best to reach out to your vendor to find out how best to increase the speed for the system you have:
OCR is a process of transforming data. So the more data that you have to transform means that the process will take longer.
If you have larger images, color images, or high-resolution images ("DPI" or dots per inch), then it will take longer to OCR than smaller, less dense images. But I'm sure you see the trade-off here. More data allows for better OCR.
(Not all OCR systems can handle color or multi-bit greyscale images. Check with your vendor to verify their recommended image settings.)
But one way to speed up OCR is to decrease the image resolution. Again, there is a trade-off; but on clean documents, you may be able to drop down to 200 DPI black and white and still get good OCR results.
Your mileage will vary, and you'll need to compare the outcomes between the current resolution and a lower one, but this will make your OCR system process more images in the same amount of time because it's working on less data.
Another way to increase OCR speeds is to make a system that can process more data in the same amount of time. Faster hardware will drastically speed up OCR processing.
Think about it this way: that image file has to be transferred from the hard drive it's sitting on, into the memory of the server. So the faster that can be done, the faster OCR will run.
Some systems will benefit from more processors. For instance, Grooper is architected to take advantage of any resource that is available to it. It has an unlimited scaling ability. We have Grooper users running more than 1,500 images per minute on a bare metal data center system.
Compute was expensive back when those systems were created. So they were designed to limit resource consumption rather than grab it all.
So talk to your vendor and see what you need to do to add more CPUs to the system. Most systems will charge you extra for more CPUs. But if you need the speed, you aren't left with much choice.
However, Grooper licensing limits your volume, but not the number of processors. So this means you can meet whatever throughput goals you have.
Test your system on bare metal vs. virtual machines. Because VMs are completely software-based these days, data can move across the system faster in a lot of cases.
You probably already have virtual machine licenses just sitting around. Or in the case of Hyper-V, there's no additional licensing.
OCR has a lot of options, and there are a lot of pre-processing options that are typically available in the way of image cleanup. You can tune your OCR engine's settings to get better speed.
Grooper has some built-in profiles that allow you to test quickly between:
Most systems will have some sort of mechanism to lower the aggressiveness of the OCR algorithm and get you a little more speed.
But again, you will trade more OCR speed for less OCR accuracy.
Your OCR system should have an easy way to switch between OCR engines for you to test. Grooper includes two OCR engines out of the box, and both can be tuned to increase speed and / or accuracy.
(As a side note, speed is also very dependent upon the types of documents you are trying to process. Dense documents with photos or graphics will be slower to OCR than semi-structured forms like Invoices that typically don't have a lot of text. You'll need to remember that OCR throughput will vary with the document type being processed.)
There is one last way to upgrade your OCR to get more speed. That's to upgrade your system to Grooper. Grooper was built to succeed where other intelligent document processing or OCR systems fail.
We have a simplified pricing structure that won't require you to have an advanced degree to understand. We use Grooper internally in our Document Migration Center and have processed more than 100 million document images through it.
Our customers have done even more than that on an annual basis.
So reach out today to find out more!
Or download our Cheat Sheet to get more tips on how to improve your OCR results: