Our current Federal Administration has set a course for achieving transformational change through modernizing electronic government. This paper outlines how to become compliant.
What's in this Guide
Background and outlook - why government agencies are moving to electronic records. There are two important presidential mandates addressing the requirement for change.
Challenges to overcome - why paper is a problem. Paper records cost money, hurt cross-agency collaboration, and add unnecessary risk. But, moving to digital records has it's own challenges.
Solution and opportunity overview - what moving to electronic records looks like, and the impact to both agencies and the public.
Deadlines - there are upcoming deadlines, starting December 2019. I'll guide your through what's required.
Required metadata - metadata is required on all electronic records. There are 15 required fields, and I describe each one.
How to get started - tips on how to approach the transition to electronic records, the skills you need, and how to bring key personnel together to ensure success.
Technology overview - what's involved in moving to electronic records. I'll overview everything from microform scanners, software choices, image processing, OCR, and storage options.
Vendor selection - how to find a good fit with potential vendors, and the importance of an accurate requirements list.
Additional resources - I've prepared a detailed list of websites to review for recommended products, software, vendors, and professional organizations.
Background and Outlook
In March of 2018, the President issued a President’s Management Agenda[i] (PMA) establishing a sweeping vision to focus on improving the following six key areas: our country’s mission, customer service, stewardship, information technology, data transparency, and the workforce of the 21st century.
In addition, three months later in June 2018, a proposal was made to transition federal agency business processes and recordkeeping to a fully electronic environment, and end the National Archives and Records Administration’s (NARA) acceptance of paper records by December 31, 2022[ii].
This will improve agency efficiency, effectiveness, and responsiveness to citizens by converting paper-based processes to electronic workflows, expanding online services, and enhancing management of government records, data, and information.
The Federal Government spends hundreds of millions of taxpayer dollars and thousands of hours creating, using, searching, and storing paper records. Storage facilities and handling electronic requests for paper records force citizens to conduct business where records can only be extracted manually - and at great expense.
Cost-effective solutions must be considered by every federal agency to transition to an electronic environment in support of this new reform plan.
All federal agencies are directed to comply with all records management, laws, and regulations. More specifically, agencies must:
- Ensure that all federal records are created, retained, and managed in electronic formats with appropriate metadata; and
- Consistent with records management laws and regulations, develop plans to close agency-operated storage facilities for paper and other, analog records, and transfer those records to Federal Records Centers operated by NARA or commercial storage facilities.
Federal agencies spend billions of dollars supporting paper-based records practices. Far too many government services still depend on paper. This use of paper causes increased costs for managing the movement, processing, and storage of huge volumes of paper records. Efforts to solve this problem have been inconsistent and ineffective agency-wide.
Antiquated, unsecure technology presents real risks for poorly digitized documents. And taking into account outdated duties being performed with outdated skillsets, it’s reasonable that countless hours in documenting and recording information are at risk.
Silos of information and recordkeeping hurt cross-agency collaboration. The resulting fragmentation of citizen services creates unnecessary delays, and excessive costs related to the delivery of basic services.
Another challenge is navigating the vendors, software, and technology available in today’s marketplace. With intelligent software solutions on the rise, and the existence of proven, legacy solutions – the choice isn’t an easy one.
Federal agencies must adopt a comprehensive, lifecycle approach[iii] to records management. This includes stopping all paper processes to every extent possible and properly digitizing existing paper records.
The Electronic Records Mandate includes the oversight of records management to the NARA. This safeguards the authenticity of archival records, ensures compliance, and maintains record-keeping practices. Furthermore, the addition of strict metadata requirements ensures the plan for master data management stays intact.
This push for electronic records also creates a more nimble and effective approach to the technologies and skillsets needed to ensure the federal workforce meets current and future needs.
Ultimately, adopting electronic records management will improve citizen services by preserving public access to records, and creating cost-saving efficiencies and processes. It will become easier for the public to connect with the federal government to apply for and receive needed services.
By digitizing records today, agencies also win. Decreasing records storage space opens up more real estate for workers and an opportunity to cut costs. Additionally, streamlining workflows which previously required accessing or creating paper records reduces stress and limits costly errors.
The Deadline for the Acceptance of Paper Records
December 31, 2020 marks the end of NARA’s acceptance of paper records. This forces agencies to direct attention and resources to maintain compliance. While many agencies, such as USCIS and the National Records Center have centralized millions of paper records into a single facility, there is still a demand for digitizing these records with the required metadata that enables search, retrieval, and integration with software applications.
By December 2019, Federal agencies will manage all permanent electronic records in an electronic format.
By December 2022, Federal agencies will manage all permanent electronic records in an electronic format and with appropriate metadata.
By December 2022, Federal agencies will manage all temporary records in an electronic format or store them in commercial records storage facilities.
Beginning January 2023, All legal transfers of permanent records must be in electronic format, to the fullest extent possible, regardless of whether the records were originally created in electronic formats. All analog records must be digitized before transfer to NARA.
What is the Required Metadata?
Federal agencies are responsible for managing electronic records in accordance with NARA statutes and regulations[iv], and the Federal Records Act[v]. Metadata are the elements of information that answer the critical questions regarding a record. Essentially, the who, what, where, when, and why.
Metadata provide administrative, descriptive, and technical information regarding the structure, and content of the record. Additionally, metadata elements provide contextual information that explains how the records were created, used, managed, maintained, and how they relate to other records[vi]. There are currently 15 required metadata fields[vii].
Required Metadata Terms and Descriptions
The following is the minimal list of metadata terms necessary for describing permanent electronic records.
The complete name of the computer file including its extension (if present).
The unique identifier assigned by an agency or a records management system.
A name given to the record. Typically, a Title will be a name by which the record is formally known.
A narrative description of the content of the record, including abstracts for document-like objects or content descriptions for audio or video records.
The agent primarily responsible for the creation of the record. Examples of a Creator include a person, an organization, or a service. Typically, the name of a Creator should be used to indicate the entity.
The date that the file met the definition of a Federal record. If a file (such as a case file, database or spreadsheet) holds multiple records created at different times, then this element should note the date the file was originally created and the span of dates should be recorded in the element Coverage. Date may be used to express temporal information at any level of granularity. Best practice is to use an encoding scheme, such as the W3CDTF profile of ISO 8601[viii].
The classification allocated to the record indicating its official security status. The purpose of this qualifier is to facilitate proper and appropriate management of sensitive or security classified records.
The classification allocated to the record indicating its official security status prior to its current status. Many official documents have their security classification reduced over time. The ability to search on current and previous markings allows a user to locate records that have changed their classification.
Legal or other rights an individual has to access the record or that regulate the agency's right to release or provide access to the record. This repeatable field contains information documenting any restrictions related to information access that applies to the record. These are described in the General Restrictions section at 36 CFR 1235.20, and listed at 1256.40 through 1256.62[viiii].
Information about copyright and trademarks rights held in and over the record. Typically, rights information includes a statement about various property rights associated with the record, including intellectual property rights. Agencies should indicate any licenses obtained in the creation of the record.
A person or organization, owning, or managing intellectual property rights relating to the record.
The geographic extent or scope of the content of the record. Spatial topic and spatial applicability may be a named place or a location specified by its geographic coordinates. A jurisdiction may be a named administrative entity or a geographic place to which the record applies. Where appropriate, named places can be used in preference to numeric identifiers such as sets of coordinates.
The temporal extent or scope of the content of the record. Temporal topic is a date or date range.
A related record that is either physically or logically required in order to form a complete record. Best practice is to identify the related record by means of a string conforming to a formal identification system.
A related record in which the described record is physically or logically included. Recommended best practice is to identify the related record by means of a string conforming to a formal identification system.
Where to Start?
Begin exploring electronic records requirements, and related problems / concerns now. Don’t wait for a vendor to tell you they have the all-encompassing solution! Successful agency-wide transition requires a great deal of knowledge, judgement, and wisdom on your part. It’s most likely that a combination of technologies and strategies will best support your agency and processes.
Learn the basics of how automated records systems are created and how they function. When you have a more detailed understanding of data and information management principles and techniques, you’ll better understand how to extract the required metadata and integrate more accurate information.
Your goal is to learn to speak the language of data and be able to perform basic records data modeling. It sounds complicated, but experienced vendors have the expertise to guide you in what you need to learn.
Identify key data and information management players within your agency. Who manages your central databases? Who’s involved with solving data and information management issues? Who is responsible for risk assessment, management, and audits? Begin forming partnerships with these individuals in order to connect the dots between these functions. They may not all coordinate today, and your success will depend on a program which leverage the expertise of each of these areas.
Key Technologies for Success
Successful electronic records conversion and management requires automating the collection of metadata. This is best achieved from the following core technologies:
High-Quality, High-Speed Scanners for Paper and Microform
There’s nothing better than having a high-speed scanner that produces excellent digital copies of records.
When owning a scanning center doesn’t make sense for your agency, look for vendors who will not only create digital images of your records, but also perform the metadata extraction and data integration with your systems. Whether you have microfilm, microfiche, or books of physical records, conversion centers with intelligent processing technology are becoming more and more cost-effective compared to in-house scanning.
Image processing enhances images displayed to users throughout record workflows, removes image artifacts known to interfere with optical character recognition (OCR), provides crisp record versions for permanent archival, and analyzes page structure to assist with automations in downstream processes. Image processing applies to all forms of physical records, not just paper.
Image processing includes features like:
Dithering and other halftone patterns are a direct result of legacy document imaging platforms poorly converting color images to black and white. These artifacts must be eliminated to prevent massive errors in OCR results, particularly with punctuation like periods and commas.
Borders have historically been very tricky to remove when the black region doesn’t extend all the way to the edge of a page. These need to be recognized and cleanly removed.
Images often contain artifacts that make the record difficult to view. Image processing will temporarily remove these non-text items like specks, pictures, hole punches, etc on a page to facilitate better OCR.
Line Detection and Removal
Lines are used all throughout standardized forms, table structures, and pages with “fill-in-the-blank” comb boxes to provide visual cues that increase legibility for readers. These lines, particularly the short, vertical ones, are commonly picked up by OCR engines as characters. Line detection features erase these lines.
Optical Character Recognition
Optical character recognition (OCR) is the electronic conversion of typed or printed text. If you are familiar with OCR, you may think converting paper to digital text is obvious, and does not need to be listed. Although OCR technology has been around for a very long time, there are recent, modern advances that make acquiring accurate metadata much easier.
Great OCR starts by removing everything on a page that isn’t text. This is done through image processing. Humans easily recognize characters on a page, and even words that are misspelled aren’t a problem. But when machines read records, the margin for error is much smaller.
For the most accurate OCR, look for these features:
This is a technique designed to capture text that the OCR engines miss the first time around. By digitally dropping out portions of the page where text is easily recognized, OCR can be run again with less distractions on the page. This makes subsequent OCR passes more accurate.
When text is printed in columns on a page, or when font sizes differ between columns, or the text is offset between columns, this presents a unique challenge for OCR engines. By identifying cells, or segments of data, and processing them independently of the other cells, cellular validation ensures high accuracy.
Layered OCR is similar to iterative OCR, except that it uses multiple OCR engines. This is to overcome the problem of multiple font types. It’s easy for humans to read different fonts, but software needs help in this area. The layered OCR technique drops out accurate text, and intelligently chooses another type of OCR engine to read the remaining text. Multiple OCR engines can be cycled through, including those that read handwriting.
OCR technology that uses tools like K-Means Clustering, text removal, and text correction provide extremely accurate OCR, which is necessary for automating metadata extraction. Spelling errors occur for many reasons, so planning to automate the correction saves time.
Document Management System
If you don’t already have a document management system, this will be the place to store your electronic records. In order to ensure maximum accessibility to electronic records, choose a document management system which provides both integration with existing (or future) software applications and capabilities for managing required metadata.
A search for 'enterprise content management systems' will reveal top technologies available today (I've also listed some in the links below). More advanced systems include workflow management, collaboration tools, e-signature capability, email integration, and more advanced records searching.
While solutions exist for both cloud and on-premise electronic records storage, there are solid arguments for choosing either option. For most agencies, a blend of cloud and on-premises storage is likely the best option. They key is to consider scalability and long-term costs.
Classification is critical for extracting accurate metadata. Classification is a method of organizing and grouping records using machine learning models that are trained to recognize different records types. This may sound complicated, but today’s technology makes training machines (software) easy, and doesn’t require a data scientist.
Classification is performed using many different methods such as natural language processing, text-based rules, and by the visual layout of records. Now, even photographs and other media types can be classified.
Automating classification is absolutely critical. Without an understanding of what type of document the record is, data extraction software won’t know what information to expect on the record in order to create accurate metadata.
After all records information has been processed, it needs to be added back to the record as metadata, and the data in the record needs to be integrated into a document management system for easy retrieval and archival with NARA. Data integration systems will migrate records between systems, generate the text files needed to search and discover the record, and use external databases to verify information found on the record.
Additionally, data integration tools create human-readable records. This includes creating PDF versions of records which are intended to be viewed on-screen or printed, if needed. The previously mentioned image processing techniques are an important part of creating a permanent document image, optimized for human viewing.
There are multiple technologies available for every stage of the electronic records journey. Vendor selection is an important consideration. A key component of establishing high-quality vendor relationships is the creation of a requirements list.
A thorough requirements list is like turn-by-turn directions to get to a new destination. Just like a single street rarely takes you to your destination, no single vendor will meet each requirement. Clearly establishing needs upfront ensures long-term success.
The very best way to ensure success with any vendor is through the help of a certified business analyst (BA). A BA has the experience to help guide the creation of a thorough requirements list and will foresee and mitigate future roadblocks and compatibility problems.
Many vendors provide an analyst as part of their services and organizations like IIBA provide BA resources and referrals for additional help.
Getting Started Resources
The following links will help jump start your research into the core technologies you need for a successful electronic records project.
High-speed document scanners:
Image Processing, OCR, classification, and metadata:
Document management systems:
- Alfresco https://www.alfresco.com/
- ApplicationXtender https://www.opentext.com/products-and-solutions/partners-and-alliances/extend-your-software/applicationxtender
- BOX https://www.box.com/content-management/enterprise
- IBM https://www.ibm.com/cloud/automation-software/enterprise-content-management
- SkySync https://www.skysync.com/
Cloud Data Storage
- AWS https://aws.amazon.com/what-is-cloud-object-storage/
- Azure https://azure.microsoft.com/en-us/services/storage/
- Nutanix https://www.nutanix.com/solutions/private-cloud
On-Premise Data Storage
- Dell EMC https://www.dellemc.com/en-us/storage/ecs/index.htm?PID=dc-ecs-elastic-cloud-storage
- Hitachi https://www.hitachivantara.com/en-us/products/storage.html
- IBM https://www.ibm.com/it-infrastructure/storage
- Bamboo Solutions https://bamboosolutions.com/
- BIS https://www.bisok.com/
- IIG https://iig.technology/
- Intellective https://www.intellective.com/
- Sirius Solutions https://www.sirsol.com/
- SkySync https://www.skysync.com/
[iv] 36 CFR Chapter 12, Subchapter B; https://www.archives.gov/about/regulations/regulations.html