Automating mill test reporting is a daunting task because the reports contain a diverse set of a given raw material’s chemical and physical properties.
They certify a material and are designed to prove quality assurance and compliance with industry standards like ANSI and ASME. Material test report processing is performed by trained professionals and is a very time-consuming but necessary process.
What makes automating material test report (MTR) processing difficult?
- The vast number of fields contained on the reports
- The fact that the layout of the fields is different for each manufacturers
- Poor image quality because they’ve been printed, hand-signed, and scanned
- They often contain multiple languages since international distribution of these commodities is so common
Here’s a sample MTR for a steel plate from the Paul Mueller Company, a stainless steel manufacturer:
As you can see, there are a lot of fields of data!
- Data and general product info
- Product description
- Physical properties
- Chemical analysis
- Mechanical properties
The crucial thing to note is that once a raw product is shipped to a manufacturer, it is then their responsibility to track and maintain an awareness of the information on the test report. For example, in the steel industry this is done through a heat code, or heat number stamped or written on the metal itself.
MTRs Are Vital for Quality Assurance and Compliance
MTRs are so important that ASME certification requires they must be kept on file for up to three years after a product has been used to create something. It’s obvious that this critical data be kept in a database for tracking and reporting.
And this is where automating the ingestion of MTR data is a huge time-saver for companies who use and process a large number of raw products from many different vendors. To make the point about why automation has been so difficult, perform a Google search for mill test report examples and you’ll see that there’s no standardization for how the reports are structured.
6 Steps to Automate Mill Test Report Processing
- Digitization / image processing
- Optical character recognition
- Machine learning / training
- Build data models
- Human data review
- Data integration
Some MTRs are emailed in PDF format as part of the shipment notification for ordered products. However, in many cases only physical copies are available, so they must obviously be scanned.
Step 1: Digitization / Image Processing
The first step is taking the digital copy of the test report and using image processing software to remove all non-text elements.
This means digitally removing all the lines and non-text objects. Humans need this kind of document structure to gain an understanding of the data, but for software to “read” the MTR, these lines and objects only get in the way. And if the MTR has been faxed and dragged around a dirty shop floor, and then scanned – it’s going to need some digital cleanup!
Step 2: Optical Character Recognition
Next, perform optical character recognition (OCR) on the MTR to recognize all languages and text characters on the page. Modern OCR tools are advanced enough to run multiple OCR “engines” on the document until a desired level of accuracy is achieved. Even if 100% accuracy isn’t obtained, this will be corrected in later steps.
Step 3: Machine Learning / Training
The third step is to use a supervised machine learning algorithm to train software to recognize that a document is the mill test report. This is important in case additional documents are attached to the MTR. You may or may not want to collect this additional data.
This training may sound daunting, but it really isn’t. Training is as simple as putting a test batch of MTRs into the system and telling it which pages are the MTR. Training might require a dozen examples, but not hundreds or thousands. Once the system learns what an MTR is, you can test it on a much larger set of documents to verify it is correctly classifying the documents.
Step 4: Build Data Models
After training has been completed, the next step is to build data collection models. Building these models requires an expert who has a deep understanding of what data is on a mill test report, what data needs to be collected, and the different ways information is referenced. For example, Heat #, Heat No., and Heat Number all mean the same thing.
A data collection model must be built for every important piece of information you need to collect from the MTR. If you build all the models you think you need, and later discover that adding a new one would be useful, you can always build the model and re-process your MTRs to extract just that one new data element.
Step 5: Human Data Review
This step is one of the most important: human data review and correction. Remember I mentioned earlier that there may be some characters not read with 100% accuracy by the OCR engines? Here’s where the system is programmed to flag any word or number it isn’t 100% sure about.
In a data review screen, the test report will be displayed in a visual format so that what the system “read” can be compared with the actual document.
A critical part of human review is to set up the system to automatically search through known information, like Purchase Order Number, Material Grade, and Order Requirements.
By “looking up” this information in your database and comparing it to what was found by the automated processing software, you add an additional layer of quality assurance. So, if your purchase order was for a particular material grade but the MTR lists something different, this will also be flagged for human review.
Step 6: Data Integration
The final step of the process is to integrate the MTR data and the digital copy of the document(s) with your existing quality software or reporting tool.