Integrating data from pipeline documents is a challenging task. With upcoming deadlines, and steep fines for non-compliance, operators are turning to artificial intelligence for help.
Why is A.I. necessary?
There are over 300,000 miles of natural gas pipelines in the U.S. and a significant portion of them were constructed prior to safety regulations mandated in the 1970's. Much, if not all of the information related to these pipelines is on paper.
The alternative to processing paperwork with A.I. would be manually entering required pipeline attributes. With over 27 fields of data from numerous construction reports, this would be a tedious chore. But more than that, if you've just acquired pipeline, you don't have the luxury of time to get all the information entered.
Because integrity management reporting must include all available information about the entire pipeline, many different types of records in as many different formats must be processed.
A.I.-based document processing is the only way to quickly and accurately process and integrate the diversity of pipeline information.
How A.I. Helps with Pipeline Integrity Reporting
Integrity management is all about increasing safety with pipeline data. For operators with historical pipeline documentation that hasn't been integrated into management systems, A.I. plays a crucial role in automating the process through several important steps:
Making Text Machine-Readable
Most pipeline documents have been around for a long time, so they have their fair share of damage and blemishes.
To enable accurate optical character recognition (OCR) on documents, it is necessary to apply image processing to scanned documents. One of the major obstacles to OCR is non-text artifacts like speckles, hole punches, lines, pictures, warped document images, and even handwriting.
Using intelligent software, everything that isn't text is identified and removed prior to OCR. This ensures extremely high accuracy of text recognition.
Machine reading is a bit different than OCR. Traditional OCR "engines" process each pixel on a scanned page, and decide what character it represents. This works OK, but accuracy is easily 50% or worse. It's difficult for OCR to know if a zero should be the letter 'O' or if the number one is really the letter 'l.' There are seemingly endless examples of this.
Accurate pipeline data is a matter of human safety, so accuracy is vitally important.
The difference in machine reading is that it uses intelligence, outside data sources (called lexicons), and multiple OCR engines to consistently extract data with over 90% accuracy.
Machine reading engines break up text into logical groups. For construction documents and blueprints where text is grouped into boxes or tables of information, machine reading will identify these groups and process them individually. This is a major advancement that helps create machine understanding of what the group of data represents.
After accurate data is extracted from documents, it needs to be labeled. Simply turning a document of information into alphabet soup isn't very helpful.
Intelligent software looks at both the field labels on the document and the actual information itself to make judgement calls on what the data represents. For example, you'll need to know the difference between pipe diameter and wall thickness or maximum operating pressure and temperature. Because A.I. isn't perfect, accuracy thresholds are created to flag questionable results for human review.
In some cases, field labels on a document are either not available, or not very useful. In these cases, A.I. offers unique approaches like machine learning to label data based on surrounding data, or lexicons of information to compare data against.
Operators who trust A.I. need visibility into how results are obtained. Because intelligent software isn't perfect, it should offer a visual representation that explain results. This ensures the system can be adjusted for greater accuracy. Software that processes data without A.I. (or without transparency) is not robust enough to handle the diverse needs of the digital oil field.
Transparency also includes providing a digital copy of the document that shows what pipeline data was extracted. This helps in cases where human review is necessary to verify a result.
Not every operator works from the same information systems or has the same analytics tools. Data integration is more than pushing data to a csv file. Look for a tool that provides not just a data output file, but the ability to integrate according to existing master data models. This kind of tight integration attaches greater meaning to data. Intelligent software that integrates data into virtually any pipeline data management application, or reporting application (even if it's an Excel spreadsheet) will help you maximize the use of data.
Pipeline regulations don't have to cut into productivity or profitability. And since time isn't on your side, choose an intelligent software platform to automate pipeline records processing.