Classification of documents having written language (unstructured text) requires huge manual effort.
Recent advances in the field of machine learning (ML) and natural language processing (NLP) allow for very precise and fully automated classification of documents based on their textual content.
The prerequisite for such an automation is the training data containing examples that represent correctly classified documents to allow a machine to use this history data to correctly classify a document.