Both Grooper's Natural Language Processing capabilities through TF-IDF machine learning and ESP Auto Document Classification and Separation solve problems previously thought impossible in the world of document processing.
The cost and effort of organizing and maintaining a clean data set from unstructured documents has been seen as too high in the past. Grooper gives you the tools to make this an efficient reality.
This presentation will walk you though the proper approach using Grooper and an overview of TF-IDF classification and why it is a powerful algorithm. This is a presentation on theory and is a great primer for creating critical understanding before working in Grooper.
Documents Discussed for TF-IDF classification:
- Oil and gas lease packets
- Contracts
- Royalty assignments
- Letters
- Invoices
- Mineral ownership reports
Grooper features:
- N-Grams
- Stop words
- Porter-stemming
- Named entity tokens
- Field labels
- Sequencing hints