What AI-Generated Fingers Can Teach You About Training Your Own Models

 

Generative AI is everywhere, and it is easy to see why. This powerful technology has unlocked creativity for extraordinary creations, such as an ecstatic Friedrich Nietzsche dancing or a caricature of Immanuel Kant with a sneaky grin.

friedrich_nietzsche_dancing_AI_generated_image         immanuel_kant_smiling_AI_generated_image


Rather than endeavoring to decipher Kant's intricate ideas, we could indulge in the delightful spectacle of Elvis Presley imparting philosophical wisdom, ensuring an experience that's both effortlessly enjoyable and infinitely entertaining.

elvis_presley_teaching_philosophy_AI_generated_image


While these pictures are beautiful, you might have noticed that the fingers are creepy. Here are some more generally beautiful pictures, but the fingers seem from a horror movie.

woman_chess_DALL-E 2         woman chess DALL-E 2 generated image


DALL-E 2 generates stunning images, while ChatGPT produces impressively persuasive text. Nonetheless, it's essential to remember that AI lacks a genuine understanding of the world. It doesn't possess a concept of a person or how fingers move. In contrast, artists maintain a mental model of how fingers function. The human brain likely constructs multiple models for each object or concept, originating from various inputs such as different sensor regions or entirely different sensors. The consensus of the models is what we perceive as reality. It's like our brain is thousands of little brains, referred to as cortical columns, working at once.

This differs from traditional neural networks. Current neural networks operate on the assumption that the neocortex processes input from sensory organs in a hierarchical manner. Sensory input moves from one region to another, with cells at each level responding to larger areas and more complex features within the sensory data. The assumption is that the complete recognition of objects can only occur at a sufficiently high layer in the hierarchy, where cells have access to the entire sensory input.

The immense complexity of generative AI demands numerous layers, culminating in around 3.5 billion parameters that must be trained, as in the case of DALL-E 2. A vast amount of training data is required to train these parameters effectively. In this data, hands are naturally underrepresented for two reasons: firstly, hands are versatile and adopt far more positions than the face; secondly, hands are less captivating than faces and are therefore less available as training data.

Moreover, we exhibit a significant bias towards hands. Hands are vital in providing sensory input and are our primary means of engaging with the world around us. The 3D sensorimotor homunculus offers a visual representation of the importance of our hands. A sensorimotor homunculus is a visual representation of the human body that illustrates the amount of sensory and motor neurons dedicated to different body parts within the brain through the body part's size. It results in a distorted human figure as we dedicate different amounts of neurons to body parts.

homunculus - DALL-E 2


Yet, this bias is not factored into generative AI models. These models assign equal importance to the accuracy of fingers as they do to the correctness of a chess piece, demonstrating a key discrepancy between human perception and AI-generated content.

What does that mean for your company?

I believe the key to enhancing artificial intelligence lies in teaching it to construct models of concepts similar to the way humans do. To do this, we need a better understanding of how the human brain works - an endeavor that has proved elusive despite recent advances. Consequently, I perceive artificial intelligence as a challenge rooted in neuroscience rather than computer science, which can be resolved by increasing data and computational power.

We have already reached ungraspable levels in both these aspects, emphasizing the need for a different approach.

The brain is a complex organ that we are only beginning to understand. Cutting-edge technologies like functional MRI offer us a non-invasive, in-depth look at some aspects of the brain's inner workings; but it has taken decades to build upon Mountcastle's research and develop the Theory of a Thousand Brains, which represents the most recent perspective on intelligence. It will take many years, if not decades, until we make further advances in neuroscience and then translate them into artificial intelligence.

In the meantime, we can - and perhaps must - harness this remarkable technology by training it on our domain-specific data. Instead of attempting to build solely upon excessively large models that demand immense computational power and bring along a host of issues, such as hallucinations, we must utilize this technology by applying it to our particular challenges and training it on our unique datasets. By doing so, we pave the way for unparalleled effective AI applications, truly unlocking the potential of this technology in our diverse fields.

Bloomberg recognizes the need to develop domain-specific models from scratch and has recently announced a domain-specific language model for the financial industry for internal use. Even one of the largest investors in OpenAI realizes that need. Microsoft recently announced the launch of a dedicated language model for life sciences. Given AI's current and foreseeable state, this approach offers the most promising path to achieving unparalleled results with artificial intelligence. By catering to specific domains and industries, AI models can provide more accurate, targeted, and effective outcomes, truly harnessing the power of AI for specialized purposes.

In addition to yielding the best results, developing domain-specific models is also the only viable way to compete long-term with AI giants like OpenAI or Google. For the past two decades, the traditional notion of data as a company's primary value had shifted toward analytical capabilities. The general perspective on highly successful companies was that data is freely available, while the ability to analyze data quickly and inexpensively is the key differentiator. Google is the prime example of this: a company that began with no data but exceptional analytical prowess managed to dominate a whole industry.

With the advent of AI, these same companies have altered their strategies. While openly sharing their analytical methodologies through published research and code, they withhold the training data, recognizing its true value. By using their AI models as a service, we inadvertently contribute to their growing repository of assets, making it increasingly challenging to compete as their AI systems become powerful enough to drive analytical costs near zero while maintaining significant accuracy. To remain competitive, we must invest in developing our own domain-specific models by leveraging the unique data at our disposal.

 

"How fingers generated by AI, teach you to train your own AI models" article is featured in Forbes.