On a recent Sam Roberts' Show segment published on the notsam channel, OpenAI CEO Sam Altman appears in short reaction clips to a creator's videos that highlight repeated errors from voice AI models. The segment interleaves the creator's demonstrations with Altman's replies and on-the-record comments; the exchange focuses on capability limits — timing, spelling and factual reliability — and on what users should expect from voice-enabled models.
The meeting took place as part of a studio episode of the Sam Roberts' Show; the program shows and discusses short clips in which a creator (presented in the show as "Huck" or "Husk") prompts voice models and then plays Altman's recorded responses back for the hosts. The video itself does not include an explicit public timestamp for when Altman's short reactions were recorded, and the show presents those clips as recent comments by the OpenAI CEO.
On voice-model capabilities and timers
Altman repeatedly addresses the specific claim that a voice model should be able to start and run a stopwatch. In one clip he describes capability differences between model variants and the available tooling, saying that some models simply lack the necessary tool integrations: That model, that voice model doesn't have tools to like start a timer or anything like that.
In the same interchange he answers the user's demand to time a mile, telling the runner: I can absolutely keep track of time for you. While we're on this call, I can... I'll track it as precisely as possible.
Despite the assurance, the clips illustrate inconsistency: the voice output sometimes insists it timed the run but then gives differing durations across attempts. Altman acknowledges those mismatches and frames them as known deployment gaps: If you don't have the technology to do that, it's okay… it's totally okay to double check me, but I promise I'm doing my best.
On inconsistency, self-correction and learning
When confronted with repeated corrections from the user, the model's responses alternate between apologizing, offering a different answer, and repeating incorrect information. Altman points to the models' behavior and the limits of their online correction: Sometimes it will give you the same wrong answer. It depends. It keeps trying to correct itself, but it doesn't learn from itself.
He characterizes the issue as one of model architecture and deployment rather than deliberate misdirection.
On hallucinations and factual errors
Much of the segment centres on concrete examples of hallucination: the model names months containing an "X," mis-spells words aloud, and offers invented timings. Altman frames these failures as a fundamental limitation of what current systems can do, stressing the distinction between speed of retrieval and analytical reasoning: They have quicker access to information than we do. They don't even have access to more information than we do… but they don't have analytical power.
In short excerpts he reiterates that models will sometimes produce confident but incorrect outputs and that users must remain skeptical.
On public trust and user responsibility
Altman repeatedly calls for user verification. When the show highlights how viewers can take incorrect-sounding answers as fact, Altman warns of the social consequence: People will just hear it and go, 'Done. Got it.' That's far more of a problem than people using it to cheat — it's the passive acceptance of answers.
He urges users to cross-check outputs and not to treat voice-AI replies as authoritative without confirmation.
On rollout choices, product expectations and known issues
Responding to criticism that imperfect voice features were made publicly available, Altman offers a pragmatic product explanation: some voice models were released with constrained capabilities and without integrated tools, which can produce the sort of errors shown in the clip. He frames the early nature of the rollout as the reason for rough edges: We had to release voice models that respond very quickly; they are not as good as the text-based models right now.
At times he responds with self-effacing humor — telling users to double-check and that tools and fixes will arrive — while acknowledging the real frustration viewers display when the model contradicts itself.
On infrastructure, storage costs and energy
Beyond correctness, the conversation touches on practical infrastructure and cost questions. Altman discusses how demand for storage and compute is reflected in consumer prices for devices with chips and SSDs, noting that higher usage of large models is a factor in broader market pressures: Data centers… electricity bills… it just takes so much energy.
In the segment he connects the rough edges of early deployments to the reality that large-scale AI requires significant hardware and energy, with knock-on effects for supply chains and consumer prices.
On tone, persona and what models will and won't say
The show plays examples where a voice model refuses certain prompts (for instance, claims of human love). Altman is clear about constraints and guardrails: the models will refuse or avoid some classes of content and will state their limitations. He emphasizes candor in responses, telling the user: No games, no fluff… From here on out, I'll just give it to you straight.
Selected direct excerpts from the clips
I can absolutely keep track of time for you. While we're on this call, I can … I'll track it as precisely as possible.
That model, that voice model doesn't have tools to like start a timer or anything like that.
They want to convince you that they do, but they don't have analytical power.
It's totally okay to double check me, but I promise I'm doing my best.
References
Video segment: Sam Roberts' Show — notsam (YouTube channel).
OpenAI CEO profile and public posts: Sam Altman on X.
For context on creators testing voice models and public demonstrations of hallucination, see examples circulating on social platforms and news coverage of voice-model deployments on major AI company channels.
Explore more exclusive insights at nextfin.ai.

