NextFin News - Google’s flagship artificial intelligence model, Gemini, can draft complex legal briefs, write functional code, and analyze massive datasets, yet it frequently fails at a task taught to five-year-olds: spelling its own creator's name. According to a report by TechCrunch on May 27, 2026, users continue to document instances where Gemini, along with rival models from OpenAI and Anthropic, struggles to count the letters in simple words or reverse strings of text. This persistent blind spot highlights a fundamental disconnect between how humans perceive language and how machines process it, raising questions about the limits of current deep learning architectures.
Andrej Karpathy, a prominent artificial intelligence researcher and co-founder of OpenAI, has long maintained a cautious, technically rigorous stance on the capabilities of large language models, frequently warning that their apparent fluency masks deep structural flaws. Writing in his technical essays and lectures, Karpathy has identified tokenization—the process of breaking text into chunks before feeding it to a neural network—as one of the most problematic and poorly understood components of modern AI. In his view, tokenization is not merely an engineering quirk but a foundational limitation that distorts a model's understanding of spelling, arithmetic, and basic string manipulation.
To understand why Gemini cannot reliably spell "Google," one must look at how the model "sees" text. Unlike humans, who read letter by letter, large language models process information in tokens. A token can be a single character, a syllable, or an entire word. For instance, the word "Google" might be processed as a single token, "Google," or split into two tokens like "Go" and "ogle." Because the model only receives these numerical token IDs, it has no inherent awareness of the individual letters that comprise them. When asked to count the number of "o"s in "Google" or spell the word backward, the model must rely on statistical associations learned during training rather than direct visual or structural inspection. It is akin to asking a person to count the letters in a word written in a shorthand script they can only read phonetically.
Karpathy’s emphasis on tokenization as a critical bottleneck is widely shared among deep learning engineers, but it does not represent an absolute consensus regarding the future of AI development. Some researchers at Google DeepMind and other leading labs view tokenization as a temporary engineering hurdle rather than a permanent barrier to artificial general intelligence. They argue that the issue can be mitigated through reinforcement learning from human feedback or by integrating external tools, such as Python interpreters, that allow the model to verify its spelling and math programmatically. This school of thought suggests that a model does not need to understand individual letters to exhibit high-level reasoning, just as a human driver does not need to understand the internal combustion engine to navigate a highway.
Furthermore, alternative architectures are beginning to emerge that bypass tokenization altogether. Researchers have experimented with character-level models, such as Google's own ByT5, which process text byte-by-byte rather than token-by-token. While these models avoid spelling errors, they require significantly more computational power because the sequence length of the input increases dramatically. Another approach involves multi-modal models that process text as raw pixels, essentially "reading" the letters visually just as a human would. However, these methods remain computationally expensive and have yet to achieve the efficiency and scale of token-based models.
For tech giants like Alphabet, these spelling errors are more than an embarrassment; they represent a practical challenge for enterprise adoption. Businesses looking to deploy AI for precise data extraction, document parsing, or automated coding require absolute accuracy at the character level. A model that cannot reliably distinguish between "affect" and "effect" or count characters in a serial number poses integration risks. While Google has implemented post-processing filters and heuristic patches to correct these obvious blunders before they reach the user, these solutions are superficial fixes that do not address the underlying architectural blind spot.
The persistence of these spelling errors serves as a reminder of the vast gulf between machine pattern recognition and human comprehension. As long as the industry relies on token-based architectures to maximize computational efficiency, Gemini and its peers will continue to operate with a fundamental sensory deficit. The machine that can write a sonnet in seconds remains, at its core, blind to the very letters that form it.
Explore more exclusive insights at nextfin.ai.
