14 Truths About Document AI - Everything You Need to Get Started

by Chris Dearner | November 6, 2020

What is document AI, and how does it work? This primer on all things artificial intelligence was written by:

Dr. Chris Dearner,
Grooper Product Manager

14 Truths About How Document AI Works to Get Data:

  1. What is artificial intelligence?
  2. What is machine learning?
  3. What is supervised machine learning?
  4. What is unsupervised machine learning?
  5. What is broad vs. narrow AI?
  6. What is natural language processing?
  7. What is deep learning?
  8. What is a neural network?
  9. How do neural nets work?
  10. What is Tensorflow?
  11. What is a GAN?
  12. What is TFIDF?
  13. What is computer vision?
  14. The brutal truth about AI

Two Sides of Artificial Intelligence

ai documentAI is a loose term, but mostly means using computers to solve problems that are generally understood to require “intelligent” judgement, such as recognizing faces, driving cars, classifying documents, making medical decisions, among others.

Many, if not most, AI tasks in the real world involve either:

  1. Classification of inputs into categories or
  2. Generation of content similar to a training set.

When you read about AI being used to, for example, screen resumes, predict the risk of recidivism, or diagnose cancer, it falls under the former category.

If you see an AI that generates human-like text, it’s the latter category.

Learn More About Intelligent Document Processing

This may involve machine learning; it may not. They may also be thinking of using neural networks or other sophisticated machine learning systems to solve problems without much human guidance; more on this later.

Machine Learning is a way to build systems that display artificial intelligence (read: can solve hard problems).

Generally speaking, it is a strategy that builds systems that:

  1. Are not specifically programmed to perform their task until they are given training data
  2. Get better (to a point) at performing their task when exposed to more data or more inputs

Machine Learning in Document AI

Machine learning algorithms are generally divided into two types: supervised and unsupervised.

Intelligent document processing (IDP), classification, separation, and extraction systems all incorporate Machine Learning algorithms.

Generally, when people are concerned with ML, chances are they’re really concerned with how much work they’re going to have to do to get Grooper to process their data. More on that below, following the section on Unsupervised ML.

How Supervised Machine Learning Applies to Document AIGrooper Intelligent Document Processing

This is how the central machine learning algorithm in document AI works: the Grooper Architect determines a discrete set of categories, and gives an initial set of properly labeled training for the algorithm to begin making predictions.

Grooper is designed this way because human-supervised ML works better, especially for the sorts of complicated problems that are our bread and butter.

Learn More About Intelligent Document Processing

Why Unsupervised Machine Learning Does Not Apply to Document AI

Grooper document AI does not use unsupervised machine learning, because it:

  1. Requires very large data sets to work at all
  2. Doesn’t work very well

Generally, if people ask about unsupervised machine learning, they’re getting at one of two different things:
  1. Will the AI get better at classifying (or extracting) on its own?
  2. Will AI save me from having to understand my documents or data?

The answer to both of these questions is no.

Document AI doesn’t get better on its own, because human design of intelligent document processing systems achieves better outcomes than unsupervised ML in almost every case. 

Some Other General AI Terms

Weak (or Narrow) AI vs. Broad (or Strong, or General) AI is a pretty easy distinction:

  • Narrow AI solves one or a small number of related problems
  • Broad AI solves a wide range of them

Narrow AI exists in a number of different domains. But Broad AI?

Broad AI doesn’t exist. Broad AI will never exist. Broad AI was 10-20 years away in 1980. Broad AI was 10-20 years away in 2000. Broad AI remains 10-20 years away in 2020.

See the pattern? More about this later.

NLP's Relationship With Document AI

Natural Language Processing really just refers to using computers to process human language. Most Grooper tools are NLP tools: extractors, regular expressions, OCR, TF/IDF, all involve the processing of natural language.

Porter stemming is a form of NLP we use, and Grooper also integrates with Azure Translate to provide translation.

Other advanced NLP techniques, such as sentiment analysis, part-of-speech tagging, named entity tagging, etc. are theoretically possible with Grooper but not included out-of-the-box.

Learn More About Intelligent Document Processing

Why Deep Learning Technology Isn't Great with Getting Document Data

Generally, deep learning is synonymous with neural-network-based AI. Deep learning models typically involve multiple layers, each of which extracts different “features” from the data.

computer ai learningBut unless you’re talking to an AI researcher, people generally just mean big, complicated AI systems (think Watson or Tensorflow) when they talk about “deep learning” as a concept.

They may be thinking of neural networks specifically, either with or without a great understanding of what these are. More on neural networks in the next section.

Grooper doesn't use deep learning, because it doesn’t work very well for the sorts of problems we solve.

How is Document AI Intelligent?

Keep in mind that since “AI” is an incredibly general term – it can mean as little as “doing difficult things with computers” – it’s impossible to mention all of the methods and algorithms that might be considered AI.

Luckily for us, when people talk about AI, they are usually thinking of a relatively small number of things.

The (Sort of) Bleeding Edge: Neural Networks

Neural nets are important not only because they are the most important type of AI in development today, but because the philosophy underpinning them is the core philosophy of most AI research.

What I’d like to point to is two things:

  1. Neural networks aim to generate human-like decisions by imitating a highly simplified model of what we currently understand the brain to be structured like
  2. The way in which neural networks match “inputs” (a picture of a cat) to “outputs” (the label “cat”) is unpredictable and, in a fundamental way, unintelligible (that is – we can’t understand it)

What the definition above includes is that they’re able to make these decisions without any prior knowledge of cats; what it omits is that the decision process of neural networks, once developed, may bear little or no relationship to human decisioning.

Which gets us to those “philosophical underpinnings” I mentioned earlier – using neural nets to do AI (solve complicated problems) assumes – or asserts - that by imitating a simplified model of the brain we’ll get not only as-good-as-human judgments, but judgments that are ultimately better than the ones humans would make, for reasons we ultimately won’t understand

Remember, it’s not the whiskers and the fur that lets a neural network know it’s looking at a picture of a cat. What is it? Who knows... I’ll leave aside the inherent contradiction here (that a simplified model of the brain will generate better results than a human can) and leave it at that for now. We’ll talk more about the limitations of Neural Net-based document AI later.

Neural networks have a mechanism to alter the weights of those connections based on learning. The complexity – and power – of neural networks comes from the number of nodes and layers, but the ways in which the hidden layer makes decisions are difficult to inspect – hence the name.

Tensorflow is Google’s open-source library for building neural networks. Tensorflow makes it relatively easy for someone with a basic understanding of Python to build and train neural networks on home-desktop grade hardware.

It is used widely in both business and academic applications. You’re unlikely to hear anyone reference this directly (you might!) but it’s an important technology to be aware of.

computer-aiGANs are essentially two neural networks pitted against each other. They can be trained to generate strikingly realistic images – if you’ve seen pictures of AI-generated faces, those were produced by a GAN.

They are very, very, cool, but (as far as I’m aware) their practical application is still an open question. Again, you’re unlikely to run into someone asking about GANs in the wild, but they’re another important technology to be aware of.

Document AI Without Neural Nets

So, now that we’ve talked about neural nets (and, by association, deep learning), we can talk about the AI/ML algorithms that Grooper uses, and why these tend to work better for us. We’ll get into the limitations of neural-net based AI and unsupervised ML in the next section, too.

Why TF-IDF Works Quite Well in Documents

We’re going to spend a minute talking about TFIDF, because it’s the core AI (and only ML) algorithm that Grooper uses. TFIDF stands for “term frequency/inverse document frequency,” and classifies documents (or “collections”) by comparing how frequently words are seen on the target document types versus how frequently they occur in the sample set as a whole.

Without getting too much in the weeds, TFIDF works by identifying words (or inputs) that are unique (or more common) in a particular type of document compared to the document set overall. It’s a deceptively simple way of classifying documents, and it generally does so in a similar way to how humans do it: by looking at the individual words on the document.

Now, it’s worth noting that TFIDF doesn’t “read” or “understand” the words (virtually no ML algorithms do), and it’s not sensitive to where in the documents the words occur. It’s just counting words (or features, if your extractor isn’t feeding it words).

Learn More About Intelligent Document Processing

Do you want the algorithm to look at words? Two word pairs? Names of mammals? The presence of dates or names? The words “phantom” and “empire?”

Good. Great. Fantastic, even. You can write extractors to feed that into the algorithm. This is crucial, because it lets a human determine which types of features are most likely to matter for a given document – and there are some types of documents (think a Mineral Ownership Report) where the content of the words won’t ever tell you what the document is.

Our TFIDF implementation also lets architects inspect the feature weightings, which lets them directly and completely understand how it’s making decisions. This is not possible using Neural nets! And generally not available in other systems, even those implementing TFIDF or similar algorithms.

TFIDF also requires a relatively small number of training samples (generally less than 100, although not always so) to reach optimum decision making, which makes it much quicker to train than most machine learning algorithms.

How TF-IDF Works in Grooper Document AI

TFIDF can be used in extraction, separation, and classification within Grooper. In separation and classification, it lets you work with unstructured documents.

In extraction, it lets you pick out the correct instance of a value type on a page (e.g. tell date of birth from date of service) by looking at features surrounding the detected value.

If people ask about ML in Grooper, ours is:

  • Faster
  • Easier to train
  • Provides more insight and
  • More control over the classification process

This requires more touch (initially) than unsupervised ML, but it gets much better results over a wide range of problems.

Discover Grooper Intelligent Document Processing

#13 - What is Computer Vision, and How Does It Find Data in Documents?

ai document extractionGrooper also implements a number of other algorithms – generally around image processing – that could be called AI algorithms.

These include our line and OMR box recognition, our blob detection (used in both blob removal and deskew), our ability to remove, e.g. combs from lines, and our periodicity detection.

Grooper uses a number of Computer Vision algorithms (many developed in-house) that provide best-in-class ability to clean up documents for OCR and detect non-character-based semantic (information containing) elements on the document such as lines, boxes, etc.

#14 - The Brutal Truth About AI

Here’s a fairly representative paragraph about AI (emphasis mine):

Artificial intelligence has been conquering hard problems at a relentless pace lately.

brutal truth about aiIn the past few years, an especially effective kind of artificial intelligence known as a neural network has equaled or even surpassed human beings at tasks like discovering new drugs, finding the best candidates for a job, and even driving a car.

Neural nets, whose architecture copies that of the human brain, can now—usually—tell good writing from bad, and—usually—tell you with great precision what objects are in a photograph.

Such nets are used more and more with each passing month in ubiquitous jobs like Google searches, Amazon recommendations, Facebook news feeds, and spam filtering—and in critical missions like military security, finance, scientific research, and those cars that drive themselves better than a person could.

This was written in 2015 (you can find the article here), and the claims it makes are fairly typical of what people tend to say about AI.

They are also all incorrect.

AI Demos Set Very High Expectations

I tried to track down the source of these claims, and generally couldn’t – which is also representative of claims people make about AI.

Big, bold, elusive, and almost universally somewhere from an overstatement to an outright falsehood.

This might not sound quite right to you, though – maybe you’ve seen Azure translate at work, or even Azure image recognition. Maybe you’ve seen other demonstrations of AI that look cool and powerful, and make it seem like there’s very little AI can’t do.

If you’re in sales or marketing, or just a generally skeptical person, you can probably guess what’s going on here: AI demonstrations generally engage in a lot of expectation shaping, and generally have very sturdy guardrails set up just out of sight.

Will Document AI Ever Have General Intelligence?

If I — for example — show you an AI that can correctly categorize an in-focus, good quality picture of:

  1. A dog
  2. A cat and
  3. A house

Does that really prove that it can “tell you what objects are in a photograph?”

If I can build an AI-based program to win Jeopardy, or to beat human grand masters at Chess, does that mean it’s going to be good at other things?

ai computer playing chessHere’s the central trick that discourse around AI plays on us: by claiming to be based on the structure of the brain, the implication is that AI-based “intelligence” should, will, or can work like human intelligence.

Remember where the author says that neural networks “copy” the structure of the human brain? Ken Jennings is a pretty smart guy, so he’s good at a lot of things that aren’t Jeopardy. My co-worker Randall can identify dogs in pictures, so he can also identify a lot of other things.

But AI doesn’t – and won’t ever – work like that. There is absolutely zero evidence that a trained neural network – or any other AI system – has or can have anything like generalized intelligence. Absolutely none.

The only reason people expect that behavior from AI is because they

  1. Don’t understand the technology
  2. Don’t understand the assumptions (what I previously called “philosophical underpinnings”) behind the technology
  3. Have a very, very impoverished understanding of human cognition

Expecting neural networks to be able to solve problems they’ve never seen before because you can train them to do lots of different things is a little bit like thinking because you can build a lot of different jigs to hold a router that, at some point, it will be able to mill things on its own.

I promise you - it will not.

Going Forward, What Document AI Tech Will Help Us With Our Problems?

What this means for us – and for our industry – is that AI and ML won’t save us. AI won’t save us from having to understand our own documents, AI won’t save us from having to solve hard problems, and AI won’t turn documents into data.

AI-based tools can be incredibly helpful, but they’re only ever going to be part of a full solution. Intelligent document processing isn’t a narrow problem, and it’s never going to be. That’s why AI won’t save you.

So, having said all that, I am going to say one other thing – document AI is, in fact, the future.

Remember how we defined AI as “using computers to do something hard?” That won’t stop happening.

But it won’t happen in the way people expect it — or with the tools people now think are revolutionary.

Learn More About Intelligent Document Processing



4 Steps to Achieving Wisdom You can Use at Work Today

4 Steps to Achieving Wisdom You can Use at Work Today

How to create an Information as a Second Language program. [Free Guide]

4 Steps to Achieving Wisdom You can Use at Work Today

4 Steps to Achieving Wisdom You can Use at Work Today