【AI前沿】OpenAI Exposed as Preparing to Launch New Dual-Directional Voice Model GPT-Bidi-1
AI NEWSLatest AI NewsArticleOpenAI Exposed as Preparing to Launch New Dual-Directional Voice Model GPT-Bidi-1Published in Latest AI NewsTime :Jun 17, 2026Read :3minuteOpenAI has recently been exposed as preparing to launch a next-generation bidirectional audio model called “GPT-Bidi-1,” aimed at significantly upgrading the voice mode of its ChatGPT. As a core breakthrough of this technology, “GPT-Bidi-1” adopts a bidirectional (Bidirectional) architecture, completely changing the previous limitations of “simplex communication” in AI voice interaction. The model supports the system to listen and speak simultaneously, enabling real-time capture of user interruptions and interjections, and dynamically adjusting semantic output without stuttering or freezing, greatly enhancing the naturalness of real-time voice conversations.From the current development points, OpenAI has already laid the foundational code for the launch of this model on both web and mobile platforms. In terms of product form, after the new feature is launched, it is expected to coexist with the existing advanced voice mode (Advanced Voice Mode), allowing users to switch to the “Bidi (latest)” mode at will. Additionally, based on text-level classification, this model introduces for the first time three intelligence and speed classifications on the voice side: “High (High), Medium (Medium), and Instant (Instant),” allowing users to balance between the depth of interaction and response speed according to specific tasks.This technological iteration is not just a simple upgrade in sound quality or tone, but also a key complement to OpenAI’s multimodal strategy.Previously, OpenAI’s text large model had evolved to the GPT-5.5 generation with stronger reasoning capabilities, while its voice large model lagged behind, leading to a gap in the multimodal experience. The release of GPT-Bidi-1 not only bridges this reasoning capability gap but also demonstrates OpenAI’s strategic ambition to view voice as a core entry point for the next generation of AI. This also lays a crucial technical foundation for its subsequent comprehensive layout of audio-first hardware devices and enterprise-level voice support tools.