Sam Altman on Voice AI, Hallucinations and the Limits of 'Quick' Intelligence

NextFin News -

On a recent Sam Roberts' Show segment published on the notsam channel, OpenAI CEO Sam Altman appears in short reaction clips to a creator's videos that highlight repeated errors from voice AI models. The segment interleaves the creator's demonstrations with Altman's replies and on-the-record comments; the exchange focuses on capability limits — timing, spelling and factual reliability — and on what users should expect from voice-enabled models.

The meeting took place as part of a studio episode of the Sam Roberts' Show; the program shows and discusses short clips in which a creator (presented in the show as "Huck" or "Husk") prompts voice models and then plays Altman's recorded responses back for the hosts. The video itself does not include an explicit public timestamp for when Altman's short reactions were recorded, and the show presents those clips as recent comments by the OpenAI CEO.

On voice-model capabilities and timers

Altman repeatedly addresses the specific claim that a voice model should be able to start and run a stopwatch. In one clip he describes capability differences between model variants and the available tooling, saying that some models simply lack the necessary tool integrations: That model, that voice model doesn't have tools to like start a timer or anything like that. In the same interchange he answers the user's demand to time a mile, telling the runner: I can absolutely keep track of time for you. While we're on this call, I can... I'll track it as precisely as possible.

Despite the assurance, the clips illustrate inconsistency: the voice output sometimes insists it timed the run but then gives differing durations across attempts. Altman acknowledges those mismatches and frames them as known deployment gaps: If you don't have the technology to do that, it's okay… it's totally okay to double check me, but I promise I'm doing my best.

On inconsistency, self-correction and learning

When confronted with repeated corrections from the user, the model's responses alternate between apologizing, offering a different answer, and repeating incorrect information. Altman points to the models' behavior and the limits of their online correction: Sometimes it will give you the same wrong answer. It depends. It keeps trying to correct itself, but it doesn't learn from itself. He characterizes the issue as one of model architecture and deployment rather than deliberate misdirection.

On hallucinations and factual errors

Much of the segment centres on concrete examples of hallucination: the model names months containing an "X," mis-spells words aloud, and offers invented timings. Altman frames these failures as a fundamental limitation of what current systems can do, stressing the distinction between speed of retrieval and analytical reasoning: They have quicker access to information than we do. They don't even have access to more information than we do… but they don't have analytical power. In short excerpts he reiterates that models will sometimes produce confident but incorrect outputs and that users must remain skeptical.

On public trust and user responsibility

Altman repeatedly calls for user verification. When the show highlights how viewers can take incorrect-sounding answers as fact, Altman warns of the social consequence: People will just hear it and go, 'Done. Got it.' That's far more of a problem than people using it to cheat — it's the passive acceptance of answers. He urges users to cross-check outputs and not to treat voice-AI replies as authoritative without confirmation.

On rollout choices, product expectations and known issues

Responding to criticism that imperfect voice features were made publicly available, Altman offers a pragmatic product explanation: some voice models were released with constrained capabilities and without integrated tools, which can produce the sort of errors shown in the clip. He frames the early nature of the rollout as the reason for rough edges: We had to release voice models that respond very quickly; they are not as good as the text-based models right now. At times he responds with self-effacing humor — telling users to double-check and that tools and fixes will arrive — while acknowledging the real frustration viewers display when the model contradicts itself.

On infrastructure, storage costs and energy

Beyond correctness, the conversation touches on practical infrastructure and cost questions. Altman discusses how demand for storage and compute is reflected in consumer prices for devices with chips and SSDs, noting that higher usage of large models is a factor in broader market pressures: Data centers… electricity bills… it just takes so much energy. In the segment he connects the rough edges of early deployments to the reality that large-scale AI requires significant hardware and energy, with knock-on effects for supply chains and consumer prices.

On tone, persona and what models will and won't say

The show plays examples where a voice model refuses certain prompts (for instance, claims of human love). Altman is clear about constraints and guardrails: the models will refuse or avoid some classes of content and will state their limitations. He emphasizes candor in responses, telling the user: No games, no fluff… From here on out, I'll just give it to you straight.

Selected direct excerpts from the clips

I can absolutely keep track of time for you. While we're on this call, I can … I'll track it as precisely as possible.

That model, that voice model doesn't have tools to like start a timer or anything like that.

They want to convince you that they do, but they don't have analytical power.

It's totally okay to double check me, but I promise I'm doing my best.

References

Video segment: Sam Roberts' Show — notsam (YouTube channel).

OpenAI CEO profile and public posts: Sam Altman on X.

For context on creators testing voice models and public demonstrations of hallucination, see examples circulating on social platforms and news coverage of voice-model deployments on major AI company channels.

Explore more exclusive insights at nextfin.ai.

Sam Altman on Voice AI, Hallucinations and the Limits of 'Quick' Intelligence

On voice-model capabilities and timers

On inconsistency, self-correction and learning

On hallucinations and factual errors

On public trust and user responsibility

On rollout choices, product expectations and known issues

On infrastructure, storage costs and energy

On tone, persona and what models will and won't say

Selected direct excerpts from the clips

References

Insights

What are the main technical limitations of current voice AI models?

How did the deployment choices affect the performance of voice models?

What user feedback has been noted regarding the accuracy of voice AI responses?

What are the latest developments in voice AI technology as discussed by Sam Altman?

What social consequences arise from users accepting voice AI outputs as fact?

How do voice models handle self-correction and learning from user interactions?

What are the implications of the energy and infrastructure costs on the voice AI market?

How do the capabilities of voice AI models compare to text-based models?

What examples of hallucination were highlighted in the discussion with Sam Altman?

What are the expected future improvements for voice AI models according to Altman?

What challenges are currently faced by developers in the voice AI space?

How does Altman suggest users should verify information from voice models?

What are the core controversies surrounding the rapid rollout of voice AI features?

How might voice AI technology evolve in response to user feedback and errors?

What does Altman mean by 'analytical power' in relation to voice AI models?

In what ways do hardware and energy demands impact consumer prices for voice AI devices?

How do different voice model variants affect user experience and expectations?

What role does user responsibility play in the effective use of voice AI?

What constraints are placed on voice models regarding content they can generate?

How does the public perception of voice AI influence its development and deployment?