Optical Character Recognition (OCR) is a widely-used technology to create text from images. Typically, OCR technology has been used for business documents, but it's original use was to help the blind read type-written pages.
Today, OCR is built into most smartphones and integrated almost seamlessly into many every day products.
But many people experience slow OCR when capturing data off business documents. Why is OCR so slow? What are practical, easy things you can do to get faster OCR? Get the answers to your questions below:
Table of Contents:
- How OCR is Different for the Business World
- Why is OCR So Slow?
- 8 Ways to Speed Up OCR:
How OCR is Different for the Business World
In the business world, OCR is still largely used to pull text out of important documents to help automate business processes. Manual data entry processes can be drastically reduced by using OCR.
But OCR alone is rarely the answer. That's because there needs to be extra work done to an OCR'd page to really help automate a process.
We like to say OCR is the Lowest Common Denominator of a system, the LCD. That's where document automation starts, by getting text from an image.
Get Tons More Tips on How to Improve Your OCR
Get Our Free Cheat Sheet to OCR Software
In this guide, you will discover:
- 3 Key technologies that lead to superior OCR results
- The 16 image processing tools that take OCR from okay to excellent
- How to get better OCR for handwriting
- How traditional OCR works — and how the best OCR improve on it
- The 2 unique U.S. patents that can make OCR even more powerful
Why is OCR So Slow?
OCR is one of the original machine learning algorithms. And because of the nature of machine learning being very compute-heavy — even to this day — OCR will take 100% of the CPU thread it is running on.
So making OCR "faster" can be a bit of a challenge. But there are certain things you can do on any system to increase OCR's throughput.
Here's a quick list, but it's best to reach out to your vendor to find out how best to increase the speed for the system you have:
8 Ways to Speed Up OCR:
#1: Decrease Image Resolution
OCR is a process of transforming data. So the more data that you have to transform means that the process will take longer.
If you have larger images, color images, or high-resolution images ("DPI" or dots per inch), then it will take longer to OCR than smaller, less dense images. But I'm sure you see the trade-off here. More data allows for better OCR.
(Not all OCR systems can handle color or multi-bit greyscale images. Check with your vendor to verify their recommended image settings.)
But one way to speed up OCR is to decrease the image resolution. Again, there is a trade-off; but on clean documents, you may be able to drop down to 200 DPI black and white and still get good OCR results.
Your mileage will vary, and you'll need to compare the outcomes between the current resolution and a lower one, but this will make your OCR system process more images in the same amount of time because it's working on less data.
#2: Get Faster Hardware
Another way to increase OCR speeds is to make a system that can process more data in the same amount of time. Faster hardware will drastically speed up OCR processing.
But not just CPU speed — you need to take advantage of all the technology upgrades that are available to you. More and faster RAM and NVMe SSD hard drives will make a huge difference.
Think about it this way: that image file has to be transferred from the hard drive it's sitting on, into the memory of the server. So the faster that can be done, the faster OCR will run.
#3: Get More Processors: Modern vs Legacy Software
Some systems will benefit from more processors. For instance, Grooper is architected to take advantage of any resource that is available to it. It has an unlimited scaling ability. We have Grooper users running more than 1,500 images per minute on a bare metal data center system.
But we designed Grooper this way because other systems on the market do not easily scale. A lot of the legacy capture systems were designed during the archiving stage of document capture's evolution. Those legacy systems include:
Compute was expensive back when those systems were created. So they were designed to limit resource consumption rather than grab it all.
So talk to your vendor and see what you need to do to add more CPUs to the system. Most systems will charge you extra for more CPUs. But if you need the speed, you aren't left with much choice.
However, Grooper licensing limits your volume, but not the number of processors. So this means you can meet whatever throughput goals you have.
#4: Test OCR Software Speed on a Virtual Machine
Test your system on bare metal vs. virtual machines. Because VMs are completely software-based these days, data can move across the system faster in a lot of cases.
I have had systems where bare metal was slower than the same system on a VMWARE machine. It's worth the try because that's something you can do that's practically free.
You probably already have virtual machine licenses just sitting around. Or in the case of Hyper-V, there's no additional licensing.
#5: Tune Up Your OCR Engine
OCR has a lot of options, and there are a lot of pre-processing options that are typically available in the way of image cleanup. You can tune your OCR engine's settings to get better speed.
Everything with OCR is a trade-off between speed and accuracy. So if your OCR accuracy is where you want it, then you can start to back off on your accuracy settings and see where that gets you.
Grooper has some built-in profiles that allow you to test quickly between:
- A robust, highly accurate setting
- A blended one, and
- A fast one
Most systems will have some sort of mechanism to lower the aggressiveness of the OCR algorithm and get you a little more speed.
But again, you will trade more OCR speed for less OCR accuracy.
#6: Experiment With a New OCR Engine
Many OCR scanner software systems come with more than one OCR engine. If you can test your system with a different OCR engine, you may be able to find an engine that's faster for the documents you're working with.
Your OCR system should have an easy way to switch between OCR engines for you to test. Grooper includes two OCR engines out of the box, and both can be tuned to increase speed and / or accuracy.
#7: Check Your Software License
Some systems will be limited by licensing. In these situations, your vendor has to sell you a new license to increase your throughput. There are still a few vendors who intentionally throttle their OCR throughput.
(As a side note, speed is also very dependent upon the types of documents you are trying to process. Dense documents with photos or graphics will be slower to OCR than semi-structured forms like Invoices that typically don't have a lot of text. You'll need to remember that OCR throughput will vary with the document type being processed.)
#8: Upgrade to Grooper OCR
There is one last way to upgrade your OCR to get more speed. That's to upgrade your system to Grooper. Grooper was built to succeed where other intelligent document processing or OCR systems fail.
We have a simplified pricing structure that won't require you to have an advanced degree to understand. We use Grooper internally in our Document Migration Center and have processed more than 100 million document images through it.
Our customers have done even more than that on an annual basis.
Or download our Cheat Sheet to get more tips on how to improve your OCR results: