Why Deep Learning is the Only Solution for Document Automation

Intelligent Document Processing (IDP) is an AI solution providing functional end to end document processing to accelerate and optimize existing operations. IDP, or document automation, has become increasingly crucial for operations for financial institutions, insurance companies, legal services and even supply chain management solutions, since it saves both time and cost by reducing the need for manual document extraction, while also reducing human error risk which comes hand in hand with repetitive, low-value tasks. 

No such thing as an "unstructured" document 

The process of extracting information from unstructured documents and transforming it into actionable insights has essentially defined entire industries from logistics to finance and insurance for a century. Insurance companies, for instance, are often faced with the task of processing policy documents which can run into the hundreds of pages worth of specific information on the insured. Manually extracting, sorting and entering data from ‘unstructured’ documents has been immensely time and resource consuming while remaining prone to human error. 

While the conundrum of automating unstructured documents has spurred the development of AI solutions, it does remain somewhat of a misnomer. Indeed, an ‘unstructured’ document is, in fact, very much structured; at least, structured in the way that a human reader would be able to analyze the data it contains. After all, it was intended to be scanned and sorted by human eyes for decades. But it remains ‘unstructured’ from the point of view of a computer.

Man & Machine: A short history of deep learning's mission to overcome Polanyi's paradox

“Self-ignorance” as the Hungarian philosopher Michael Polanyi observed, is an integral part of daily human activities. Human beings rely on tacit knowledge to perform tasks which they would otherwise be unable to describe. How can you tell a dog apart from a cat despite both species being indistinguishably described as domesticated animals possessing four legs, a tail and a head? Or what about when you can’t verbally describe someone when asked, yet instantly recognize a face on a photograph? A human answer would be “I just…know”. 

The biggest challenge in AI thus far has been the replication of the human ability to describe tasks or recognize objects without being able to assign them specific descriptors. Computers are expecting an exact set of instructions, something which tacit knowledge is not well suited for. This makes automating tasks that demand common sense, adaptability, and flexibility – which are intuitive forms of knowledge for humans – very difficult for AI to perform.

The deep learning approach to AI training models provides a solution to Polanyi’s Paradox by broadly mimicking human learning patterns. Since human tacit knowledge is likely the result of evolutionary patterns and a lifetime’s worth of tacit exposure, deep learning does away with manual feature engineering in favor of a biologically-inspired artificial neural network which enables models to be trained through feeding it massive amounts of datasets. 

Applying deep learning to document automation

Deep learning’s ability to mimic human reasoning enables it to digest documents designed for human eyes in virtually the same way as humans would. While humans may be able to learn with far fewer examples, deep learning models have the advantage of being able to ingest so much more data that it's only a matter of data volume before AI not only replicates human tasks, but surpasses it by multiple factors in terms of accuracy and speed. 

From evolution to adaptation

Despite a generation’s worth of attempts to extract rules to replicate tacit knowledge, humans benefit from an evolutionary advantage which AI lacks. We know with some degree of certainty that virtually all innate human skills, such as reasoning, sensorimotor or perception skills, are the product of ‘biological machinery’ which have been continuously optimized and refined through natural selection. Mutations which facilitated survival were often more likely to be passed down to succeeding generations until they became virtually instinctual among anatomically modern humans.

Rather than store vast amounts of learned information in our brains, much like a computer would, humans have learned to predict likely outcomes by filling in gaps based on bits of knowledge we already possess. AI, on the other hand, bases the accuracy of their calculations on the size of training data sets they ingest. 

As Moravec pointed out, humans have essentially been pre-trained before birth through thousands of previous generations to turn intuitive and abstract thought into automated actions. It is only in this decade that application specific computers have been realized to the point where computer power is now sufficient for processing enough data to handle perception and sensory skills with the level of finesse that humans do. Noted AI researcher Andrew Ng famously predicted in 2017 that “almost anything a typical human can do with less than one second of mental thought, we can probably now or in the near future automate using AI.”

Of course, deep learning still learns at varying paces. Counterintuitively, AI seems to have a much easier time excelling at tasks that humans find mentally challenging, like defeating grandmasters at chess, than handling tasks requiring basic perception skills that a human toddler could do. The Moravec paradox is perhaps most humorously exemplified by the infamous Chihuahua vs. blueberry muffin captchas. Naturally, the internet collectively breathed a sigh of relief when it discovered that a future as depicted in Cameron’s Terminator universe could easily be averted given current AIs difficulty differentiating between labradoodle puppies and fried chicken. 

How Cognaize solves the "difficult parts"

While deep learning’s effectiveness as a solution depends largely on the sheer volume of training data it ingests, not every deep learning model is created the same. Cognaize has successfully extended AI’s reach to applications previously seen as difficult, such as table detection, or page structure detection enabling clients to concentrate on the specific parts through integration into the core processes. 

Cognaize makes this possible by focusing our area of concentration on a single industry: the financial services sector, allowing us to not only accumulate more data than most, but also more targeted data for analysis. The predictable result is unparalleled accuracy when detecting difficult tasks. 

Unlike other solutions, Cognaize’s business-centric AI platform incorporates pre-trained and easily reusable base models to deliver extremely rapid results with virtually no errors. These are scalable by design, and adaptable to any use case with document automation being an obvious application.