How to Maximize Results on Microfilm and Microfiche Digitization Projects
Recent tech advances bring new opportunities for the process of digitizing microfilm, microfiche, and aperture cards.
If you've looked at microfilm, microfiche, or aperture cards, you know that image quality varies, and that imperfections from the original conversion and scratches from use are common.
These defects are no cause for alarm to a human. But imagine the trouble a computer will have when digitizing microfilm for optical character recognition or data extraction!
This guide covers modern approaches to digitizing microfilm.
First, A Software Decision:
Native Scanner Software vs. Stand-Alone Software - Which is better for digitizing microfilm?
Native Scanner Software
The software packaged with scanner hardware is an easy first choice. And it makes sense on very small projects. For bigger projects there are limitations:
- Cost: When scanner software requires human interaction, projects cannot scale without adding more scanners and scan operators
- Speed: Typical scanner software is slow because it runs on a single computer connected to the scanner, or on the scanner itself
Dedicated Software for Digitizing Microfilm
A stand-alone film / fiche software solution has the power to tackle very large or very complex digitization and conversion projects. For bigger projects there are distinct advantages:
- Cost: Do more work with fewer scan operators because of highly automated conversion steps
- Speed: Use as many computers or servers as needed to get the job done
Learn how the U.S. Nuclear Regulatory Commission Digitized over 20 million microfiche and aperture cards in less than a year.
6 Features Make Digitizing Microfilm More Powerful
- For large projects you need to be sure you're saving time at every step. Choose a microfilm conversion solution that performs well with raw scanner output while operating in the fastest mode possible.
- Automated digitization of media into individual cards, or frames.
- The use of a technology called computer vision to extract individual documents.
- Natural language processing - this is an important functionality that analyzes the content of documents on microfilm.
- The ability to quickly and natively correct the warping effect caused when the original documents were photographed.
- Automated scratch removal and repair on text and pictures.
6 Steps for Efficiently Digitizing Microfilm
A solution that provides these steps will ensure a high rate of success for your project.
- Scan - This is a user attended activity and is the physical operation of the scanner hardware.
- Sort - This is an automated activity which runs on several computers or servers and digitally sorts raw tiles and organizes them into subfolders by strip.
This is the point in the process where you'll want to assemble a low-resolution preview of the physical media.
- Detect Frames - This critical point in the digitization is where computer vision is used to find document frames, and flag any strips where documents are not confidently identified.
Flagged strips will be reviewed by an operator in the next step. This is also the step where inconsistent gutters and lines are discovered. If target sheets are available, use them at this point to help make document identification decisions.
- Verify - This is the second user-attended activity that should only be used to deal with any flagged strips. Operators save an immense amount of time if they only review the problems.
In fact, for large projects, this step should be performed by multiple operators at the same time.
- Clip Frames - This is another automated step to clip each frame from the master image. Clipping these images gives the software the opportunity to run image processing algorithms to produce high fidelity digital copies.
- Image Processing - One of the most crucial steps is cleaning up digital images to permanently or temporarily enhance the records. Here's where scratches, warping, lines, and boxes are dealt with.
Temporary fixes are usually geared towards performing better OCR. Permanent fixes are made to ensure human-readability of the digitized microfilm.
How to Intelligently Fix Original Indexing Mistakes
There will be times when the original documents were not converted in the correct order, or have been incorrectly indexed. To avoid duplicating the same errors, it is critical to compare the data on the film to an outside data source. This enables automated, intelligent classification, and separation of the digitized microfilm.
Here's a short video that shows how intelligent classification works:
Integrating microfilm conversion software with external databases or lexicons provides highly accurate data that has been validated by known good data sources.
As you can see, those who are digitizing microfilm with intelligent software are gaining new efficiencies and creating new value from archived physical records.