There's been a lot of press about OpenAI's ChatGPT engine. You may have seen our press release last week about our integration. Since that time and today, OpenAI has released ChatGPT as an API.
We've been testing the technology at BIS to see whether it has commercial uses since its early research release in November of 2022. TL:DR - Yes, yes, it does.
Bring the transformative power of OpenAI's breakthroughs to your organization's document processing and workflows.
See how to leverage GPT's progressive language processing abilities to get far more accurate data and drastically reduce manual labor and streamline document-intensive workflows. Watch the video, or learn more:
Ok, then, what is a GPT engine, you ask? Great question. In fact, let's ask ChatGPT that exact question:
“A Generative Pre-trained Transformer (GPT) is a type of neural network architecture that can ‘understand’ and generate human language. It’s trained on a large collection of text to ‘understand’ how language works.
"It can be used for tasks like language translation, text generation, and text summarization. This type of program uses a specific architecture called a transformer that allows it to handle text of different lengths.”
The quotes around “understand” were inserted by me. Because it’s not true — it doesn’t “understand” language - more on that later.
And I know what you’re thinking now - “What’s a transformer?” No, it’s not a robot. At least not a big metal one that turns into a yellow Camaro.
A transformer is a newer type of neural network architecture introduced in 2017 by Google (I bet that has something to do with why Google’s scrambling to have an answer to ChatGPT).
A Transformer’s specialty is text processing:
In other words, Supervised Machine Learning. ChatGPT has been trained on a huge language data set (called a Large Language Model (LLM)), with a general cutoff of 2021.
So anything after that date will not be in the model’s data. But the key to ChatGPT’s results is that it has been trained on 175 billion parameters - 10 times more than the last GPT 3 model. Oh yeah, there are four other models available through OpenAI’s APIs.
So, these AI engines can answer questions and generate text based on requests.
And I’ll admit, they’re impressive. With ChatGPT, I have:
And they have all worked 100%.
But there are a few massive caveats to all this power, however:
I recently tested ChatGPT with a purchase order. The PO is a PDF, so I had to copy and paste the text in, as ChatGPT only accepts text.
How did it fare? Well, the formatting wasn't great. Some of the fields were smashed together.
But ChatGPT gave perfect responses to simple queries like:
ChatGPT even created a very well-formed tabular response to the query “Give me the relevant data in a table.” And when I told it: “Translate this invoice into Spanish,” it translated everything into perfect Español.
Check it out below:
This is a nice way of saying it can (and will) create false data. That’s because it’s a generative engine. That means it generates text that is logically and semantically similar to real text. But it does not “understand.”
And OpenAI’s description using the “understanding” nomenclature is a huge problem. “Understanding” language structure does not mean understanding the meaning of words or phrases.
However, in the GPT's API form, you can turn this propensity for B.S. down so it won’t get creative. But it’s something to watch out for if you use one of these engines in a production environment.
And while everyone is having fun playing with the ChatGPT website (yours truly included), the API for the new ChatGPT engine was just released today.
So we'll need to test to see how much better the new API is than the ones we've been testing with. I feel good about it. There are four other engines that have varying strengths and use cases, and we've done all our API testing with those to date with stunning results.
For us, ChatGPT is just another way to analyze text. I haven't seen anything better than what Grooper can do.
But it could give us the ability to get initial results faster in Grooper.
How?
Well, if I can write a simple natural language query in ChatGPT such as "What is the invoice number?" instead of setting up regular expression extractors in Grooper, it's possible that we could set up a content model faster and also take a good step toward making complex document processing much more simple for your everyday staff member (not just for technicians).
And there are some new features and functionality that Grooper will have access to that are difficult to get without an interface like ChatGPT. For instance, ChatGPT can validate zipcodes and cities without the need for a postal service database lookup.
How about quick in-line translations? Grooper has an integration to Azure Translate as well, but this is a lot easier to set up in ChatGPT; simply typing "give me the results in Spanish" will do it. It can also easily format responses into different outputs like JSON, XML, or even a specifically-formatted CSV file. All with a simple natural language prompt.
That's where the black box fails. These engines need at least 100 samples to start improving their models. Grooper needs only one sample configured correctly. And the cost model for the GPT engines is more expensive when you fine-tune time.
But there is enormous potential in these new technologies to help get the best data to your business systems quickly and easily.
So there's no need to ask, "who’s better — Grooper or ChatGPT?" because we've built it into the product.
We hope our clients find revolutionary ways to augment their document extraction and analysis.
(It's important to note that it's currently impossible to run state-of-the-art GPT3X engines on-premises. They have to live in the cloud. OpenAI has theirs available in Azure, and Google has theirs available in its cloud.
So for the near future, the only way those engines will be able to be leveraged is by sending your data to the cloud, and that's not possible for some organizations.)
But making improvements in document and data processing is what we do at BIS.
This is what innovation looks like.
When breakthrough technology becomes available, we evaluate it and see if it can improve Grooper. We’ve been doing this since Grooper's inception, which is keeping us far ahead of our competition.
Grooper 2023 was released at the end of March 2023. So get in touch with us if you’d like to take it for a spin and see how much better data extraction can be.