
OpenAI appears to be building the next significant upgrade to ChatGPT's voice capabilities. A new audio model, tentatively named GPT-Bidi-1, has surfaced across multiple tech publications following code references and user sightings on web and mobile clients. If the reports hold, this would mark a fundamental shift in how voice assistants handle conversation, moving away from the rigid back-and-forth structure that has defined the category for years.
GPT-Bidi-1 is a next-generation audio model being tested within ChatGPT. Its existence has emerged through code references, UI sightings, and limited early user tests. OpenAI has made no official announcement.
Unlike the current Advanced Voice Mode, GPT-Bidi-1 is reportedly built on a bidirectional, full-duplex audio architecture. This means ChatGPT would speak and listen at the same time rather than waiting for the user to finish before responding.
Reportedly yes. According to Android Authority and TestingCatalog, the model is designed to absorb interruptions mid-response, issue short acknowledgements without cutting the user off, and hold longer conversational context than the current voice stack allows.
UI elements linked to GPT-Bidi-1 suggest selectable tiers labeled High, Medium, and Instant. Whether these reflect model size, processing speed, or decoding tradeoffs remains unclear without official OpenAI documentation.
19 Jun 2026 - Vol 04 | Issue 76
Shubhanshu Shukla relives the space odyssey that put India into orbit
Some users in app previews have spotted GPT-Bidi-1 in a model-selector UI alongside the existing Advanced Voice Mode, with the voice bubble reportedly changing colour when the new mode is active.
The move is widely seen as OpenAI's attempt to close the gap between ChatGPT's advanced text capabilities and its comparatively limited audio experience. A genuine bidirectional audio model would materially change how voice-first products and AI assistants are designed and built.
GPT-Bidi-1 signals that OpenAI is serious about making ChatGPT's audio model feel less like a voice assistant and more like a real conversation.
(With inputs from yMedia)