Automating mill test reporting is a daunting task because the reports contain a diverse set of a given raw material’s chemical and physical properties. They certify a material and are designed to prove quality assurance and compliance with industry standards like ANSI and ASME. Mill test report processing is performed by trained professionals and is a very time-consuming but necessary process.
What makes automating mill test report (MTR) processing difficult is the vast number of fields contained on the reports and that the layout of the fields is different for different manufacturers. Adding to the difficulty is that many of the reports have poor image quality because they’ve been printed, hand-signed, and scanned.
Here’s a sample MTR for a steel plate from the Paul Mueller Company, a stainless steel manufacturer:
As you can see, there are a lot of fields of data!
- Data and general product info
- Product description
- Physical properties
- Chemical analysis
- Mechanical properties
The crucial thing to note is that once a raw product is shipped to a manufacturer, it is then their responsibility to track and maintain an awareness of the information on the mill test report. For example, in the steel industry this is done through a heat code, or heat number stamped or written on the metal itself.
MTRs Are Vital for Quality Assurance and Compliance
MTRs are so important that they must be kept on file for up to three years after a product has been used to create something. It’s obvious that this critical data be kept in a database for tracking and reporting.
And this is where automating the ingestion of MTR data is huge time-saver for companies who use and process a large number of raw products from many different vendors. To make the point about why automation has been so difficult, perform a Google search for mill test report examples and you’ll see that there’s no standardization for how the reports are structured.
6 Steps to Automate Mil Test Report Processing
- Digitization / image processing
- Optical character recognition
- Machine learning / training
- Build data models
- Human data review
- Data integration
Some MTRs are emailed in PDF format as part of the shipment notification for ordered products. However, in many cases only physical copies are available, so they must obviously be scanned. The first step is taking the digital copy of the mill test report and using image processing software to remove all non-text elements.
This means digitally removing all the lines and non-text objects. Humans need this kind document structure to gain an understanding of the data, but for software to “read” the MTR, these lines and objects only get in the way. And if the MTR has been faxed and drug around a dirty shop floor, and then scanned – it’s going to need some digital cleanup!
The second step is to perform optical character recognition (OCR) on the MTR to recognize all the text characters on the page. Modern OCR tools are advanced enough to run multiple OCR “engines” on the document until a desired level of accuracy is achieved. Even if 100% accuracy isn’t obtained, this will be corrected in later steps.
The third step is to use a supervised machine learning algorithm to train a software to recognize that a document is the mill test report. This is important in case additional documents are attached to the MTR. You may or may not want to collect this additional data.
This training may sound daunting, but it really isn’t. Training is as simple as putting a test batch of MTRs into the system and telling it which pages are the MTR. Training might require a dozen examples, but not hundreds or thousands. Once the system learns what an MTR is, you can test it on a much large set of documents to verify it is correctly classifying the documents.
After training has been completed, the fourth step is to build data collection models. Building these models requires an expert who has a deep understanding of what data is on a mill test report, what data needs to be collected, and the different ways information is referenced. For example, Heat #, Heat No., and Heat Number all mean the same thing.
A data collection model must be built for every important piece of information you need to collect from the MTR. If you build all the models you think you need, and later discover that adding a new one would be useful, you can always build the model and re-process your MTRs to extract just that one new data element.
The fifth step is one of the most important: human data review and correction. Remember I mentioned earlier that there may be some characters not read with 100% accuracy by the OCR engines? Here’s where the system is programmed to flag any word or number it isn’t 100% sure about. In a data review screen, the mill test report will be displayed in a visual format so that what the system “read” can be compared with the actual document.
A critical part of human review is to set up the system to automatically search through known information, like Purchase Order Number, Material Grade, and Order Requirements. By “looking up” this information in your database and comparing it to what was found by the automated processing software, you add an additional layer of quality assurance. So, if your purchase order was for a particular material grade but the MTR lists something different, this will also be flagged for human review.
The sixth and final step of the process is to integrate the MTR data and the digital copy of the document(s) with your existing quality software or reporting tool.