2025-12-17 04:40:08
Artificial Intelligence
Technology

OpenAI's New Models Enhance AI Transcription and Image Generation Capabilities

OpenAI has made significant strides in enhancing its Realtime API with the introduction of three new model snapshots. These updates focus on improving transcription accuracy, speech synthesis, and function calling efficiency.

The gpt-4o-mini-transcribe variant notably reduces hallucinations, while the gpt-4o-mini-tts achieves a remarkable 35 percent reduction in word error rates for text-to-speech tasks. Additionally, the gpt-realtime-mini model increases instruction adherence by 22 percent.

OpenAI's advancements extend to its image generation capabilities, with the release of GPT Image 1.5, which enhances prompt interpretation and speeds up image creation. This update aims to bolster OpenAI's competitive edge against Google’s Gemini model, which has also seen improvements in voice task handling and user instruction compliance.

Both companies continue to push the boundaries of AI technology, striving for superior performance and user experience.

TechCrunch
16. Dezember 2025 um 18:22

OpenAI continues on its ‘code red’ warpath with new image generation model

OpenAI has released a new version of ChatGPT Images called GPT Image 1.5, which promises improved instruction-following, more precise editing, and faster image generation speeds. This comes as part of OpenAI's efforts to regain its position as the AI leader after Google's Gemini model took market share. The update also includes new features like a dedicated entry point for images in the ChatGPT sidebar and improved search query displays with visuals. OpenAI aims to enhance the user experience..
THE DECODER
16. Dezember 2025 um 20:45

Google's updated Gemini 2.5 Flash Native Audio handles complex voice tasks better

Google has updated its Gemini 2.5 Flash Native Audio to improve voice assistant capabilities. The new version handles complex workflows better, follows user instructions more precisely, and conducts natural conversations with improved accuracy. The update boosts compliance with developer instructions from 84 to 90 percent and outperforms OpenAI's gpt-realtime in the ComplexFuncBench benchmark with a score of 71.5 percent.
THE DECODER
16. Dezember 2025 um 20:11

OpenAI's new ChatGPT image model matches Google's Nano Banana Pro on complex prompts

OpenAI's new GPT-Image 1.5 model has been released, offering several major improvements over its predecessor, including more accurate prompt interpretation, better detail preservation, and significantly faster image generation times. The model can generate images up to four times faster than before and handles complex prompts with ease. It also excels in photo editing, virtual try-ons, and style transformations, outperforming Google's Nano Banana Pro in some areas.
THE DECODER
16. Dezember 2025 um 17:30

OpenAI releases new models for its Realtime API

OpenAI has updated its Realtime API with three new model snapshots designed to improve transcription, speech synthesis, and function calling. According to developers, the gpt-4o-mini-transcribe variant significantly reduces hallucinations. For text-to-speech tasks, gpt-4o-mini-tts cuts the word error rate by 35 percent. The gpt-realtime-mini model, which targets voice assistants, follows instructions 22 percent more accurately and improves function calling by 13 percent. OpenAI also explicitly..
CW

Account

Waiting list for the personalized area


Welcome!

InfoBud.news

infobud.news is an AI-driven news aggregator that simplifies global news, offering customizable feeds in all languages for tailored insights into tech, finance, politics, and more. It provides precise, relevant news updates, overcoming conventional search tool limitations. Due to the diversity of news sources, it provides precise and relevant news updates, focusing entirely on the facts without influencing opinion. Read moreExpand

Your World, Tailored News: Navigate The News Jungle With AI-Powered Precision!