OpenAI's New Models Enhance AI Transcription and Image Generation Capabilities
OpenAI has made significant strides in enhancing its Realtime API with the introduction of three new model snapshots. These updates focus on improving transcription accuracy, speech synthesis, and function calling efficiency.
The gpt-4o-mini-transcribe variant notably reduces hallucinations, while the gpt-4o-mini-tts achieves a remarkable 35 percent reduction in word error rates for text-to-speech tasks. Additionally, the gpt-realtime-mini model increases instruction adherence by 22 percent.
OpenAI's advancements extend to its image generation capabilities, with the release of GPT Image 1.5, which enhances prompt interpretation and speeds up image creation. This update aims to bolster OpenAI's competitive edge against Google’s Gemini model, which has also seen improvements in voice task handling and user instruction compliance.
Both companies continue to push the boundaries of AI technology, striving for superior performance and user experience.
The press radar on this topic:
Google's updated Gemini 2.5 Flash Native Audio handles complex voice tasks better
OpenAI's new ChatGPT image model matches Google's Nano Banana Pro on complex prompts
OpenAI releases new models for its Realtime API
Welcome!

infobud.news is an AI-driven news aggregator that simplifies global news, offering customizable feeds in all languages for tailored insights into tech, finance, politics, and more. It provides precise, relevant news updates, overcoming conventional search tool limitations. Due to the diversity of news sources, it provides precise and relevant news updates, focusing entirely on the facts without influencing opinion. Read moreExpand